Securing Deep Learning against Adversarial Attacks for Connected and Autonomous Vehicles

Intelligent mobile robots, including autonomous agents, highly rely on the correctness of surrounding environment perception. Recently, Deep Learning-based perception models have been shown to be vulnerable to adversarial attacks through one kind of well-designed input called adversarial examples. Existing defenses include mainly adversarial training and adversarial detecting; however, they fail to solve the intrinsic issue of current deep learning models, which is the weak adversarial robustness, which partly lies in the opaque nature of the black box models. This project developed a deep ensemble network for image classification based on the fusion of discriminative features and generative models. Specifically, a causal adversarial graph is built into a generative model to model the distribution of adversarial perturbations. To improve the accuracy of generative classifiers, pre-trained object features and original images are fused together. We show that the ensemble network is robust against adversarial examples even without adversarial training (i.e., trained only with clean data), yet needs shorter training time and lower computation cost. In addition, we leverage counterfactual explanations to evaluate the model causality of the ensemble network.