Klas Leino Identifying, Analyzing, and Addressing Weaknesses in Deep Networks: Foundations for Conceptually Sound Neural Networks Degree Type: Ph.D. in Computer Science Advisor(s): Matt Fredrikson Graduated: May 2022 Abstract: Deep neural networks have seen great success in many domains, with the ability to perform complex human tasks such as image recognition, text translation, and medical diagnosis; however, despite their remarkable abilities, neural networks have several peculiar shortcomings and vulnerabilities. Many of these weaknesses relate to a lack of conceptual soundness in the features encoded and used by the network–that is, the features the network learns to use may represent concepts that are not appropriate for the task at hand, even when they apparently allow the network to perform well on previously unseen validation data. This thesis examines the problems that arise in deep networks when they are not sufficiently conceptually sound, and provides steps towards improving the conceptual soundness of modern networks. The first contribution of this thesis is a general, axiomatically justified framework for explaining neural network behavior, which serves as a powerful tool for assessing conceptual soundness. This work takes the unique perspective that to accurately assess the conceptual soundness of a model, an explanation must provide a faithful account of its behavior. By contrast, the literature has often attempted to justify explanations based on their appeal to human intuition; however, this begs the question, as it assumes the model captured conceptually sound human intuition in the first place. To the contrary, a large body of prior work provides conclusive evidence that conceptual soundness is not the norm in standard deep networks, as adversarial examples–small, semantically meaningless input perturbations that cause erroneous behavior–found ubiquitously therein, violate the tenets of conceptual soundness. The second part of this thesis addresses this issue by contributing a state-of-the-art method for training neural networks with provable guarantees against a common class of adversarial examples. Finally, we demonstrate that robustness to malicious input perturbations is only the first step—with contributions uncovering several orthogonal weaknesses and vulnerabilities relating to the conceptual soundness of deep networks. Thesis Committee: Matt Fredrickson (Chair) Anupam Datta J. Zico Kolter Corina Păsăreanu (CMU/NASA Ames) Kamalika Chaudhuri (University of Southern California, San Diego/Meta AI) Srinivasan Seshan, Head, Computer Science Department Martial Hebert, Dean, School of Computer Science Keywords: Machine Learning, Artificial Intelligence, Security, Privacy, Robustness, Trans- parency, Neural Networks CMU-CS-22-104.pdf (15.02 MB) ( 189 pages) Copyright Notice