# neural_networks_with_recurrent_generative_feedback__f1989391.pdf

Neural Networks with Recurrent Generative Feedback

Yujia Huang1 James Gornet1 Sihui Dai1 Zhiding Yu2 Tan Nguyen3

Doris Y. Tsao1 Anima Anandkumar1,2

1California Institute of Technology 2NVIDIA 3Rice University

Neural networks are vulnerable to input perturbations such as additive noise and adversarial attacks. In contrast, human perception is much more robust to such perturbations. The Bayesian brain hypothesis states that human brains use an internal generative model to update the posterior beliefs of the sensory input. This mechanism can be interpreted as a form of self-consistency between the maximum a posteriori (MAP) estimation of an internal generative model and the external environment. Inspired by such hypothesis, we enforce self-consistency in neural networks by incorporating generative recurrent feedback. We instantiate this design on convolutional neural networks (CNNs). The proposed framework, termed Convolutional Neural Networks with Feedback (CNN-F), introduces a generative feedback with latent variables to existing CNN architectures, where consistent predictions are made through alternating MAP inference under a Bayesian framework. In the experiments, CNN-F shows considerably improved adversarial robustness over conventional feedforward CNNs on standard benchmarks.

1 Introduction

Internal Model

Feedforward

Figure 1: An intuitive illustration of recurrent generative feedback in human visual perception system.

Vulnerability in feedforward neural networks Conventional deep neural networks (DNNs) often contain many layers of feedforward connections. With the ever-growing network capacities and representation abilities, they have achieved great success. For example, recent convolutional neural networks (CNNs) have impressive accuracy on large scale image classiﬁcation benchmarks [33]. However, current CNN models also have signiﬁcant limitations. For instance, they can suffer signiﬁcant performance drop from corruptions which barely inﬂuence human recognition [3]. Studies also show that CNNs can be misled by imperceptible noise known as adversarial attacks [32].

Feedback in the human brain To address the weaknesses of CNNs, we can take inspiration from of how human visual recognition works, and incorporate certain mechanisms into the CNN design. While human visual cortex has hierarchical feedforward connections, backward connections from higher level to lower level cortical areas are something that current artiﬁcial networks are lacking [6]. Studies suggest these backward connections carry out top-down processing which improves the representation of sensory input [15]. In addition, evidence suggests recurrent feedback in the human visual cortex is crucial for robust object recognition. For example, humans require recurrent feedback to recognize challenging images [11]. Obfuscated images can fool humans without recurrent feedback [5]. Figure 1 shows an intuitive example of recovering a sharpened cat from a blurry cat and achieving consistent predictions after several iterations.

34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada.

Predictive coding and generative feedback Computational neuroscientists speculate that Bayesian inference models human perception [14]. One speciﬁc formulation of predictive coding assumes Gaussian distributions on all variables and performs hierarchical Bayesian inference using recurrent, generative feedback pathways [28]. The feedback pathways encode predictions of lower level inputs, and the residual errors are used recurrently to update the predictions. In this paper, we extend the principle of predictive coding to explicitly incorporate Bayesian inference in neural networks via generative feedback connections. Speciﬁcally, we adopt a recently proposed model, named the Deconvolutional Generative Model (DGM) [25], as the generative feedback. The DGM introduces hierarchical latent variables to capture variation in images, and generates images from a coarse to ﬁne detail using deconvolutional operations.

Our contributions are as follows:

Self-consistency We introduce generative feedback to neural networks and propose the selfconsistency formulation for robust perception. Our internal model of the world reaches a selfconsistent representation of an external stimulus. Intuitively, self-consistency says that given any two elements of label, image and auxillary information, we should be able to infer the other one. Mathematically, we use a generative model to describe the joint distribution of labels, latent variables and input image features. If the MAP estimate of each one of them are consistent with the other two, we call a label, a set of latent variables and image features to be self-consistent (Figure 4).

CNN with Feedback (CNN-F) We incorporate generative recurrent feedback modeled by the DGM into CNN and term this model as CNN-F. We show that Bayesian inference in the DGM is achieved by CNN with adaptive nonlinear operators (Figure 2). We impose self-consistency in the CNN-F by iterative inference and online update. Computationally, this process is done by propagating along the feedforward and feedback pathways in the CNN-F iteratively (Figure 3).

Adversarial robustness We show that the recurrent generative feedback in CNN-F promotes robustness and visualizes the behavior of CNN-F over iterations. We ﬁnd that more iterations are needed to reach self-consistent prediction for images with larger perturbation, indicating that recurrent feedback is crucial for recognizing challenging images. When combined with adversarial training, CNN-F further improves adversarial robustness of CNN on both Fashion-MNIST and CIFAR-10 datasets.

In this section, we ﬁrst formally deﬁne self-consistency. Then we give a speciﬁc form of generative feedback in CNN and impose self-consistency on it. We term this model as CNN-F. Finally we show the training and testing procedure in CNN-F. Throughout, we use the following notations:

Let x Rn be the input of a network and y RK be the output. In image classiﬁcation, x is image and y = (y(1), . . . , y(K)) is one-hot encoded label. K is the total number of classes. K is usually much less than n. We use L to denote the total number of network layers, and index the input layer to the feedforward network as layer 0. Let h Rm be encoded feature of x at layer k of the feedforward

Inference in DGM

Initialization Iteration 1

Iteration 2

Figure 2: Left: CNN, Graphical model for the DGM and the inference network for the DGM. We use the DGM to as the generative model for the joint distribution of image features h, labels y and latent variables z. MAP inference for h, y and z is denoted in red, green and blue respectively. f and g denotes feedforward features and feedback features respectively. Right: CNN with feedback (CNN-F). CNN-F performs alternating MAP inference via recurrent feedforward and feedback pathways to enforce self-consistency.

Convolution

Resize Reshape Encoding

Ada Pool Ada Re LU

g ˆz P g g(k)

Figure 3: Feedforward and feedback pathway in CNN-F. a) ˆy and ˆz are computed by the feedforward pathway and ˆh is computed from the feedback pathway. b) Illustration of the Ada Re LU operator. c) Illustration of the Ada Pool operator.

pathway. Feedforward pathway computes feature map f(ℓ) from layer 0 to layer L, and feedback pathway generates g(ℓ) from layer L to k. g(ℓ) and f(ℓ) have the same dimensions. To generate h from y, we introduce latent variables for each layer of CNN. Let z(ℓ) RC H W be latent variables at layer ℓ, where C, H, W are the number of channels, height and width for the corresponding feature map. Finally, p(h, y, z; θ) denotes the joint distribution parameterized by θ, where θ includes the weight W and bias term b of convolution and fully connected layers. We use ˆh, ˆy and ˆz to denote the MAP estimates of h, y, z conditioning on the other two variables.

2.1 Generative feedback and Self-consistency

External Internal

Figure 4: Self-consistency among ˆh, ˆz, ˆy and consistency between ˆh and h.

Human brain and neural networks are similar in having a hierarchical structure. In human visual perception, external stimuli are ﬁrst preprocessed by lateral geniculate nucleus (LGN) and then sent to be processed by V1, V2, V4 and Inferior Temporal (IT) cortex in the ventral cortical visual system. Conventional NN use feedforward layers to model this process and learn a one-direction mapping from input to output. However, numerous studies suggest that in addition to the feedforward connections from V1 to IT, there are feedback connections among these cortical areas [6].

Inspired by the Bayesian brain hypothesis and the predictive coding theory, we propose to add generative feedback connections to NN. Since h is usually of much higher dimension than y, we introduce latent variables z to account for the information loss in the feedforward process. We then propose to model the feedback connections as MAP estimation from an internal generative model that describes the joint distribution of h, z and y. Furthermore, we realize recurrent feedback by imposing self-consistency (Deﬁnition 2.1).

Deﬁnition 2.1. (Self-consistency) Given a joint distribution p(h, y, z; θ) parameterized by θ, (ˆh, ˆy, ˆz) are self-consistent if they satisfy the following constraints: ˆy = arg max y p(y|ˆh, ˆz), ˆh = arg max h p(h|ˆy, ˆz), ˆz = arg max z p(z|ˆh, ˆy) (1)

In words, self-consistency means that MAP estimates from an internal generative model are consistent with each other. In addition to self-consistency, we also impose the consistency constraint between ˆh and the external input features (Figure 4). We hypothesize that for easy images (familiar images to human, clean images in the training dataset for NN), the ˆy from the ﬁrst feedforward pass should automatically satisfy the self-consistent constraints. Therefore, feedback need not be triggered.

For challenging images (unfamiliar images to human, unseen perturbed images for NN), recurrent feedback is needed to obtain self-consistent (ˆh, ˆy, ˆz) and to match ˆh with h. Such recurrence resembles the dynamics in neural circuits [12] and the extra effort to process challenging images [11].

2.2 Generative Feedback in CNN-F

CNN have been used to model the hierarchical structure of human retinatopic ﬁelds [4, 10], and have achieved state-of-the-art performance in image classiﬁcation. Therefore, we introduce generative feedback to CNN and impose self-consistency on it. We term the resulting model as CNN-F.

We choose to use the DGM [25] as generative feedback in the CNN-F. The DGM introduces hierarchical binary latent variables and generates images from coarse to ﬁne details. The generation process in the DGM is shown in Figure 3 (a). First, y is sampled from the label distribution. Then each entry of z(ℓ) is sampled from a Bernoulli distribution parameterized by g(ℓ) and a bias term b(ℓ). g(ℓ) and z(ℓ) are then used to generate the layer below: g(ℓ 1) = W( )(ℓ)(z(ℓ) g(ℓ)) (2) In this paper, we assume p(y) to be uniform, which is realistic under the balanced label scenario. We assume that h follows Gaussian distribution centered at g(k) with standard deviation σ.

2.3 Recurrence in CNN-F

In this section, we show that self-consistent (ˆh, ˆy, ˆz) in the DGM can be obtained via alternately propagating along feedforward and feedback pathway in CNN-F.

Feedforward and feedback pathway in CNN-F The feedback pathway in CNN-F takes the same form as the generation process in the DGM (Equation (2)). The feedforward pathway in CNN-F takes the same form as CNN except for the nonlinear operators. In conventional CNN, nonlinear operators are σRe LU(f) = max(f, 0) and σMax Pool(f) = maxr r f, where r is the dimension of the pooling region in the feature map (typically equals to 2 or 3). In contrast, we use σAda Re LU and σAda Pool given in Equation (3) in the feedforward pathway of CNN-F. These operators adaptively choose how to activate the feedforward feature map based on the sign of the feedback feature map. The feedforward pathway computes f(ℓ) using the recursion f(ℓ) = W(ℓ) σ(f(ℓ 1))} + b(ℓ) 1.

σAda Re LU(f) = σRe LU(f), if g 0 σRe LU( f), if g < 0 σAda Pool(f) = σMax Pool(f), if g 0 σMax Pool( f), if g < 0 (3)

MAP inference in the DGM Given a joint distribution of h, y, z modeled by the DGM, we aim to show that we can make predictions using a CNN architecture following the Bayes rule (Theorem 2.1). To see this, ﬁrst recall that generative classiﬁers learn a joint distribution p(x, y) of input data x and their labels y, and make predictions by computing p(y|x) using the Bayes rule. A well known example is the Gaussian Naive Bayes model (GNB). The GNB models p(x, y) by p(y)p(x|y), where y is Boolean variable following a Bernoulli distribution and p(x|y) follows Gaussian distribution. It can be shown that p(y|x) computed from GNB has the same parametric form as logistic regression. Assumption 2.1. (Constancy assumption in the DGM).

A. The generated image g(k) at layer k of DGM satisﬁes ||g(k)||2 2 = const. B. Prior distribution on the label is a uniform distribution: p(y) = const. C. Normalization factor in p(z|y) for each category is constant: P

z eη(y,z) = const. Remark. To meet Assumption 2.1.A, we can normalize g(k) for all k. This results in a form similar to the instance normalization that is widely used in image stylization [35]. See Appendix A.4 for more detailed discussion. Assumption 2.1.B assumes that the label distribution is balanced. η in Assumption 2.1.C is used to parameterize p(z|y). See Appendix A for the detailed form. Theorem 2.1. Under Assumption 2.1 and given a joint distribution p(h, y, z) modeled by the DGM, p(y|h, z) has the same parametric form as a CNN with σAda Re LU and σAda Pool.

Proof. Please refer to Appendix A.

Remark. Theorem 2.1 says that DGM and CNN is a generative-discriminative pair in analogy to GNB and logistic regression.

1σ takes the form of σAda Pool or σAda Re LU.

We also ﬁnd the form of MAP inference for image feature ˆh and latent variables ˆz in the DGM. Speciﬁcally, we use z R and z P to denote latent variables that are at a layer followed by Ada Re LU and Ada Pool respectively. 1( ) denotes indicator function.

Proposition 2.1 (MAP inference in the DGM). Under Assumption 2.1, the following hold:

A. Let h be the feature at layer k, then ˆh = g(k). B. MAP estimate of z(ℓ) conditioned on h, y and {z(j)}j =ℓin the DGM is: ˆz R(ℓ) = 1(σAda Re LU(f(ℓ)) 0) (4) ˆz P (ℓ) = 1(g(ℓ) 0) arg max r r (f(ℓ)) + 1(g(ℓ) < 0) arg min r r (f(ℓ)) (5)

Proof. For part A, we have ˆh = arg maxh p(h|ˆy, ˆz) = arg maxh p(h|g(k)) = g(k). The second equality is obtained because g(k) is a deterministic function of ˆy and ˆz. The third equality is obtained because h N(g(k), diag(σ2)). For part B, please refer to Appendix A.

Remark. Proposition 2.1.A show that ˆh is the output of the generative feedback in the CNN-F. Proposition 2.1.B says that ˆz R = 1 if the sign of the feedforward feature map matches with that of the feedback feature map. ˆz P = 1 at locations that satisfy one of these two requirements: 1) the value in the feedback feature map is non-negative and it is the maximum value within the local pooling region or 2) the value in the feedback feature map is negative and it is the minimum value within the local pooling region. Using Proposition 2.1.B, we approximate {ˆz(ℓ)}ℓ=1:L by greedily ﬁnding the MAP estimate of ˆz(ℓ) conditioning on all other layers.

Iterative inference and online update in CNN-F We ﬁnd self-consistent (ˆh, ˆy, ˆz) by iterative inference and online update (Algorithm 1). In the initialization step, image x is ﬁrst encoded to h by k convolutional layers. Then h passes through a standard CNN, and latent variables are initialized with conventional σRe LU and σMax Pool. The feedback generative network then uses ˆy0 and {ˆz0(ℓ)}ℓ=k:L to generate intermediate features {g0(ℓ)}ℓ=k:L, where the subscript denotes the number of iterations. In practice, we use logits instead of one-hot encoded label in the generative feedback to maintain uncertainty in each category. We use g0(k) as the input features for the ﬁrst iteration. Starting from this iteration, we use σAda Re LU and σAda Pool instead of σRe LU and and σMax Pool in the feedforward pathway to infer ˆz (Equation (20) and (21)). In practice, we ﬁnd that instead of greedily replacing the input with generated features and starting a new inference iteration, online update eases the training and gives better robustness performance. The online update rule of CNN-F can be written as: ˆht+1 ˆht + η(gt+1(k) ˆht) (6) ft+1(ℓ) ft+1(ℓ) + η(gt(ℓ) ft+1(ℓ)), ℓ= k, . . . , L (7) where η is the step size. Greedily replacement is a special case for the online update rule when η = 1.

Algorithm 1: Iterative inference and online update in CNN-F Input : Input image x, number of encoding layers k, maximum number of iterations N. Encode image x to h0 with k convolutional layers; Initialize {ˆz(ℓ)}ℓ=k:L by σRe LU and σMax Pool in the standard CNN; while t < N do

Feedback pathway: generate gt(k) using ˆyt and ˆzt(ℓ), ℓ= k, . . . , L; Feedforward pathway: use ˆht+1 as the input (Equation (6)); update each feedforward layer using Equation (7); predict ˆyt+1 using the updated feedforward layers; end return ˆh N, ˆy N, ˆz N

2.4 Training the CNN-F

During training, we have three goals: 1) train a generative model to model the data distribution, 2) train a generative classiﬁer and 3) enforce self-consistency in the model. We ﬁrst approximate self-consistent (ˆh, ˆy, ˆz) and then update model parameters based on the losses listed in Table 1. All losses are computed for every iteration. Minimizing the reconstruction loss increases data likelihood given current estimates of label and latent variables log p(h|ˆyt, ˆzt) and enforces consistency between

ˆht and h. Minimizing the cross-entropy loss helps with the classiﬁcation goal. In addition to reconstruction loss at the input layer, we also add reconstruction loss between intermediate feedback and feedforward feature maps. These intermediate losses helps stabilizing the gradients when training an iterative model like the CNN-F.

Table 1: Training losses in the CNN-F.

Form Purpose

Cross-entropy loss log p(y | ˆht, ˆzt; θ) classiﬁcation Reconstruction loss log p(h | ˆyt, ˆzt; θ) = ||h ˆh||2 2. generation, self-consistency Intermediate reconstruction loss ||f0(ℓ) gt(ℓ)||2 2 stabilizing training

3 Experiment

3.1 Generative feedback promotes robustness

As a sanity check, we train a CNN-F model with two convolution layers and one fully-connected layer on clean Fashion-MNIST images. We expect that CNN-F reconstructs the perturbed inputs to their clean version and makes self-consistent predictions. To this end, we verify the hypothesis by evaluating adversarial robustness of CNN-F and visualizing the restored images over iterations.

Adversarial robustness Since CNN-F is an iterative model, we consider two attack methods: attacking the ﬁrst or last output from the feedforward streams. We use ﬁrst and e2e (short for end-to-end) to refer to the above two attack approaches, respectively. Due to the approximation of non-differentiable activation operators and the depth of the unrolled CNN-F, end-to-end attack is weaker than ﬁrst attack (Appendix B.1). We report the adversarial accuracy against the stronger attack in Figure 5. We use the Fast Gradient Sign Attack Method (FGSM) [8] Projected Gradient Descent (PGD) method to attack. For PGD attack, we generate adversarial samples within L -norm constraint, and denote the maximum L -norm between adversarial images and clean images as ϵ.

Figure 5 (a, b) shows that the CNN-F improves adversarial robustness of a CNN on Fashion-MNIST without access to adversarial images during training. The error bar shows standard deviation of 5 runs. Figure 5 (c) shows that training a CNN-F with more iterations improves robustness. Figure 5 (d) shows that the predictions are corrected over iterations during testing time for a CNN-F trained with 5 iterations. Furthermore, we see larger improvements for higher ϵ. This indicates that recurrent feedback is crucial for recognizing challenging images.

Figure 5: Adversarial robustness of CNN-F with standard training on Fashion-MNIST. CNNF-k stands for CNN-F trained with k iterations. a) Attack with FGSM. b) Attack with PGD using 40 steps. c) Train with different number of iterations. Attack with PGD-40. d) Evaluate a trained CNN-F-5 model with various number of iterations against PGD-40 attack.

Image restoration Given that CNN-F models are robust to adversarial attacks, we examine the models mechanism for robustness by visualizing how the generative feedback moves a perturbed image over iterations. We select a validation image from Fashion-MNIST. Using the image s two largest principal components, a two-dimensional hyperplane R28 28 intersects the image with the image at the center. Vector arrows visualize the generative feedback s movement on the hyperplane s position. In Figure 6 (a), we ﬁnd that generative feedback perturbs samples across decision boundaries toward the validation image. This demonstrates that the CNN-F s generative feedback can restore perturbed images to their uncorrupted objects.

We further explore this principle with regard to adversarial examples. The CNN-F model can correct initially wrong predictions. Figure 6 (b) uses Grad-CAM activations to visualize the network s

c CNN-F(iter. 1) CNN-F(iter. 4) d

Spatial Frequency

Reconstructed

CNN-F(iter. 1)

incorrect prediction

CNN-F(iter. 2) correct prediction

Figure 6: The generative feedback in CNN-F models restores perturbed images. a) The decision cell cross-sections for a CNN-F trained on Fashion-MNIST. Arrows visualize the feedback direction on the cross-section. b) Fashion-MNIST classiﬁcation accuracy on PGD adversarial examples; Grad-CAM activations visualize the CNN-F model s attention from incorrect (iter. 1) to correct predictions (iter. 2). c) Grad-CAM activations across different feedback iterations in the CNN-F. d) From left to right: clean images, corrupted images, and images restored by the CNN-F s feedback.

attention from an incorrect prediction to a correct prediction on PGD-40 adversarial samples [30]. To correct predictions, the CNN-F model does not initially focus on speciﬁc features. Rather, it either identiﬁes the entire object or the entire image. With generative feedback, the CNN-F begins to focus on speciﬁc features. This is reproduced in clean images as well as images corrupted by blurring and additive noise 6 (c). Furthermore, with these perceptible corruptions, the CNN-F model can reconstruct the clean image with generative feedback 6 (d). This demonstrates that the generative feedback is one mechanism that restores perturbed images.

3.2 Adversarial Training

Adversarial

Adversarial Adversarial x

Reconstruction loss(gadv, hclean)

Adversarial

Adversarial

Figure 7: Loss design for CNN-F adversarial training, where v stands for the logits. x, h and g are input image, encoded feature, and generated feature, respectively.

Adversarial training is a well established method to improve adversarial robustness of a neural network [20]. Adversarial training often solves a minimax optimization problem where the attacker aims to maximize the loss and the model parameters aims to minimize the loss. In this section, we show that CNN-F can be combined with adversarial training to further improve the adversarial robustness.

Training methods Figure 7 illustrates the loss design we use for CNN-F adversarial training. Different from standard adversarial training on CNNs, we use cross-entropy loss on both clean images and adversarial images. In addition, we add reconstruction loss between generated features of adversarial samples from iterative feedback and the features of clean images in the ﬁrst forward pass.

Experimental setup We train the CNN-F on Fashion-MNIST and CIFAR-10 datasets respectively. For Fashion-MNIST, we train a network with 4 convolution layers and 3 fully-connected layers. We use 2 convolutional layers to encode the image into feature space and reconstruct to that feature space. For CIFAR-10, we use the Wide Res Net architecture [39] with depth 40 and width 2. We reconstruct to the feature space after 5 basic blocks in the ﬁrst network block. For more detailed hyper-parameter settings, please refer to Appendix B.2. During training, we use PGD-7 to attack the ﬁrst forward pass of CNN-F to obtain adversarial samples. During testing, we also perform SPSA

[34] and transfer attack in addition to PGD attack to prevent the gradient obfuscation [1] issue when evaluating adversarial robustness of a model. In the transfer attack, we use the adversarial samples of the CNN to attack CNN-F.

Main results CNN-F further improves the robustness of CNN when combined with adversarial training. Table 2 and Table 3 list the adversarial accuracy of CNN-F against several attack methods on Fashion-MNIST and CIFAR-10. On Fashion-MNIST, we train the CNN-F with 1 iterations. On CIFAR-10, we train the CNN-F with 2 iterations. We report two evaluation methods for CNN-F: taking the logits from the last iteration (last), or taking the average of logits from all the iterations (avg). We also report the lowest accuracy among all the attack methods with bold font to highlight the weak spot of each model. In general, we ﬁnd that the CNN-F tends to be more robust to end-to-end attack compared with attacking the ﬁrst forward pass. This corresponds to the scenario where the attacker does not have access to internal iterations of the CNN-F. Based on different attack scenarios, we can tune the hyper-paramters and choose whether averaging the logits or outputting the logits from the last iteration to get the best robustness performance (Appendix B.2).

Table 2: Adversarial accuracy on Fashion-MNIST over 3 runs. ϵ = 0.1.

Clean PGD (ﬁrst) PGD (e2e) SPSA (ﬁrst) SPSA (e2e) Transfer Min

CNN 89.97 0.10 77.09 0.19 77.09 0.19 87.33 1.14 87.33 1.14 77.09 0.19 CNN-F (last) 89.87 0.14 79.19 0.49 78.34 0.29 87.10 0.10 87.33 0.89 82.76 0.26 78.34 0.29 CNN-F (avg) 89.77 0.08 79.55 0.15 79.89 0.16 88.27 0.91 88.23 0.81 83.15 0.17 79.55 0.15

Table 3: Adversarial accuracy on CIFAR-10 over 3 runs. ϵ = 8/255.

Clean PGD (ﬁrst) PGD (e2e) SPSA (ﬁrst) SPSA (e2e) Transfer Min

CNN 79.09 0.11 42.31 0.51 42.31 0.51 66.61 0.09 66.61 0.09 42.31 0.51 CNN-F (last) 78.68 1.33 48.90 1.30 49.35 2.55 68.75 1.90 51.46 3.22 66.19 1.37 48.90 1.30 CNN-F (avg) 80.27 0.69 48.72 0.64 55.02 1.91 71.56 2.03 58.83 3.72 67.09 0.68 48.72 0.64

4 Related work

Robust neural networks with latent variables Latent variable models are a unifying theme in robust neural networks. The consciousness prior [2] postulates that natural representations such as language operate in a low-dimensional space, which may restrict expressivity but also may facilitate rapid learning. If adversarial attack introduce examples outside this low-dimensional manifold, latent variable models can map these samples back to the manifold. A related mechanism for robustness is state reiﬁcation [17]. Similar to self-consistency, state reiﬁcation models the distribution of hidden states over the training data. It then maps less likely states to more likely states. Mag Net and Denoising Feature Matching introduce similar mechanisms: using autoencoders on the input space to detect adversarial examples and restore them in the input space [21, 37]. Lastly, Defense-GAN proposes a generative adversarial network to approximate the data manifold [29]. CNN-F generalizes these themes into a Bayesian framework. Intuitively, CNN-F can be viewed as an autoencoder. In contrast to standard autoencoders, CNN-F requires stronger constraints through Bayes rule. CNNF through self-consistency constrains the generated image to satisfy the maximum a posteriori on the predicted output.

Computational models of human vision Recurrent models and Bayesian inference have been two prevalent concepts in computational visual neuroscience. Recently, Kubilius et al. [16] proposed CORnet as a more accurate model of human vision by modeling recurrent cortical pathways. Like CNN-F, they show CORnet has a larger V4 and IT neural similarity compared to a CNN with similar weights. Linsley et al. [19] suggests h GRU as another recurrent model of vision. Distinct from other models, h GRU models lateral pathways in the visual cortex to global contextual information. While Bayesian inference is a candidate for visual perception, a Bayesian framework is absent in these models. The recursive cortical network (RCN) proposes a hierarchal conditional random ﬁeld as a model for visual perception [7]. In contrast to neural networks, RCN uses belief propagation for both training and inference. With the representational ability of neural networks, we propose CNN-F to approximate Bayesian inference with recurrent circuits in neural networks.

Feedback networks Feedback Network [40] uses conv LSTM as building blocks and adds skip connections between different time steps. This architecture enables early prediction and enforces

hierarchical structure in the label space. Nayebi et al. [24] uses architecture search to design local recurrent cells and long range feedback to boost classiﬁcation accuracy. Wen et al. [38] designs a bidirectional recurrent neural network by recursively performing bottom up and top down computations. The model achieves more accurate and deﬁnitive image classiﬁcation. In addition to standard image classiﬁcation, neural networks with feedback have been applied to other settings. Wang et al. [36] propose a feedback-based propagation approach that improves inference in CNN under partial evidence in the multi-label setting. Piekniewski et al. [27] apply multi-layer perceptrons with lateral and feedback connections to visual object tracking.

Combining top-down and bottom-up signals in RNNs Mittal et al. [23] proposes combining attention and modularity mechanisms to route bottom-up (feedforward) and top-down (feedback) signals. They extend the Recurrent Independent Mechanisms (RIMs) [9] framework to a bidirectional structure such that each layer of the hierarchy can send information in both bottom-up direction and top-down direction. Our approach uses approximate Bayesian inference to provide top-down communication, which is more consistent with the Bayesian brain framework and predictive coding.

Inference in generative classiﬁers Sulam et al. [31] derives a generative classiﬁer using a sparse prior on the layer-wise representations. The inference is solved by a multi-layer basis pursuit algorithm, which can be implemented via recurrent convolutional neural networks. Nimmagadda and Anandkumar [26] propose to learn a latent tree model in the last layer for multi-object classiﬁcation. A tree model allows for one-shot inference in contrast to iterative inference.

Target propagation The generative feedback in CNN-F shares a similar form as target propagation, where the targets at each layer are propagated backwards. In addition, difference target propagation uses auto-encoder like losses at intermediate layers to promote network invertibility [22, 18]. In the CNN-F, the intermediate reconstruction loss between adversarial and clean feature maps during adversarial training promotes the feedback to project perturbed image back to its clean version in all resolution scales.

5 Conclusion

Inspired by the recent studies in Bayesian brain hypothesis, we propose to introduce recurrent generative feedback to neural networks. We instantiate the framework on CNN and term the model as CNN-F. In the experiments, we demonstrate that the proposed feedback mechanism can considerably improve the adversarial robustness compared to conventional feedforward CNNs. We visualize the dynamical behavior of CNN-F and show its capability of restoring corrupted images. Our study shows that the generative feedback in CNN-F presents a biologically inspired architectural design that encodes inductive biases to beneﬁt network robustness.

Broader Impacts

Convolutional neural networks (CNNs) can achieve superhuman performance on image classiﬁcation tasks. This advantage allows their deployment to computer vision applications such as medical imaging, security, and autonomous driving. However, CNNs trained on natural images tend to overﬁt to image textures. Such ﬂaw can cause a CNN to fail against adversarial attacks and on distorted images. This may further lead to unreliable predictions potentially causing false medical diagnoses, trafﬁc accidents, and false identiﬁcation of criminal suspects. To address the robustness issues in CNNs, CNN-F adopts an architectural design which resembles human vision mechanisms in certain aspects. The deployment of CNN-F renders more robust AI systems.

Despite the improved robustness, current method does not tackle other social and ethical issues intrinsic to a CNN. A CNN can imitate human biases in the image datasets. In automated surveillance, biased training datasets can improperly calibrate CNN-F systems to make incorrect decisions based on race, gender, and age. Furthermore, while robust, human-like computer vision systems can provide a net positive societal impact, there exists potential use cases with nefarious, unethical purposes. More human-like computer vision algorithms, for example, could circumvent human veriﬁcation software. Motivated by these limitations, we encourage research into human bias in machine learning and security in computer vision algorithms. We also recommend researchers and policymakers examine how people abuse CNN models and mitigate their exploitation.

Acknowledgements

We thank Chaowei Xiao, Haotao Wang, Jean Kossaiﬁ, Francisco Luongo for the valuable feedback. Y. Huang is supported by DARPA Lw LL grants. J. Gornet is supported by supported by the NIH Predoctoral Training in Quantitative Neuroscience 1T32NS105595-01A1. D. Y. Tsao is supported by Howard Hughes Medical Institute and Tianqiao and Chrissy Chen Institute for Neuroscience. A. Anandkumar is supported in part by Bren endowed chair, DARPA Lw LL grants, Tianqiao and Chrissy Chen Institute for Neuroscience, Microsoft, Google, and Adobe faculty fellowships.

[1] A. Athalye, N. Carlini, and D. Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICLR, 2018. 8

[2] Y. Bengio. The consciousness prior. ar Xiv:1709.08568, 2019. 8

[3] S. Dodge and L. Karam. A study and comparison of human and deep learning recognition performance under visual distortions. In ICCCN, 2017. 1

[4] M. Eickenberg, A. Gramfort, G. Varoquaux, and B. Thirion. Seeing it all: Convolutional network layers map the function of the human visual system. Neuro Image, 2017. 4

[5] G. Elsayed, S. Shankar, B. Cheung, N. Papernot, A. Kurakin, I. Goodfellow, and J. Sohl-Dickstein. Adversarial examples that fool both computer vision and time-limited humans. In Neur IPS, 2018. 1

[6] D. J. Felleman and D. C. Van Essen. Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex, 1991. 1, 3

[7] D. George, W. Lehrach, K. Kansky, M. Lázaro-Gredilla, C. Laan, B. Marthi, X. Lou, Z. Meng, Y. Liu, H. Wang, A. Lavin, and D. S. Phoenix. A generative vision model that trains with high data efﬁciency and breaks text-based CAPTCHAs. Science, 2017. 8

[8] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In ICLR, 2015. 6

[9] A. Goyal, A. Lamb, J. Hoffmann, S. Sodhani, S. Levine, Y. Bengio, and B. Schölkopf. Recurrent independent mechanisms. ar Xiv:1909.10893, 2019. 9

[10] T. Horikawa and Y. Kamitani. Hierarchical neural representation of dreamed objects revealed by brain decoding with deep neural network features. Front Comput Neurosci, 2017. 4

[11] K. Kar, J. Kubilius, K. Schmidt, E. B. Issa, and J. J. Di Carlo. Evidence that recurrent circuits are critical to the ventral stream s execution of core object recognition behavior. Nature Neuroscience, 2019. 1, 4

[12] T. C. Kietzmann, C. J. Spoerer, L. K. Sörensen, R. M. Cichy, O. Hauk, and N. Kriegeskorte. Recurrence is required to capture the representational dynamics of the human visual system. PNAS, 2019. 4

[13] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. ar Xiv:1412.6980, 2014. 16

[14] D. C. Knill and W. Richards. Perception as Bayesian inference. Cambridge University Press, 1996. 2

[15] P. Kok, J. F. Jehee, and F. P. De Lange. Less is more: expectation sharpens representations in the primary visual cortex. Neuron, 2012. 1

[16] J. Kubilius, M. Schrimpf, A. Nayebi, D. Bear, D. L. Yamins, and J. J. Di Carlo. Cornet: modeling the neural mechanisms of core object recognition. bio Rxiv preprint, 2018. 8

[17] A. Lamb, J. Binas, A. Goyal, S. Subramanian, I. Mitliagkas, D. Kazakov, Y. Bengio, and M. C. Mozer. State-reiﬁcation networks: Improving generalization by modeling the distribution of hidden representations. In ICML, 2019. 8

[18] D.-H. Lee, S. Zhang, A. Fischer, and Y. Bengio. Difference target propagation. In ECML-PKDD, 2015. 9

[19] D. Linsley, J. Kim, V. Veerabadran, C. Windolf, and T. Serre. Learning long-range spatial dependencies with horizontal gated recurrent units. In Neur IPS, 2018. 8

[20] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu. Towards deep learning models resistant to adversarial attacks. ar Xiv:1706.06083, 2017. 7

[21] D. Meng and H. Chen. Magnet: a two-pronged defense against adversarial examples. In CCS, 2017. 8

[22] A. Meulemans, F. S. Carzaniga, J. A. Suykens, J. Sacramento, and B. F. Grewe. A theoretical framework for target propagation. ar Xiv:2006.14331, 2020. 9

[23] S. Mittal, A. Lamb, A. Goyal, V. Voleti, M. Shanahan, G. Lajoie, M. Mozer, and Y. Bengio. Learning to combine top-down and bottom-up signals in recurrent neural networks with attention over modules. In ICML, 2020. 9

[24] A. Nayebi, D. Bear, J. Kubilius, K. Kar, S. Ganguli, D. Sussillo, J. J. Di Carlo, and D. L. Yamins. Task-driven convolutional recurrent models of the visual system. In Neur IPS, 2018. 9

[25] T. Nguyen, N. Ho, A. Patel, A. Anandkumar, M. I. Jordan, and R. G. Baraniuk. A bayesian perspective of convolutional neural networks through a deconvolutional generative model. ar Xiv:1811.02657, 2018. 2, 4, 12

[26] T. Nimmagadda and A. Anandkumar. Multi-object classiﬁcation and unsupervised scene understanding using deep learning features and latent tree probabilistic models. ar Xiv:1505.00308, 2015. 9

[27] F. Piekniewski, P. Laurent, C. Petre, M. Richert, D. Fisher, and T. Hylton. Unsupervised learning from continuous video in a scalable predictive recurrent network. ar Xiv:1607.06854, 2016. 9

[28] R. P. N. Rao and D. H. Ballard. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-ﬁeld effects. Nature Neuroscience, 1999. 2

[29] P. Samangouei, M. Kabkab, and R. Chellappa. Defense-gan: Protecting classiﬁers against adversarial attacks using generative models. In ICLR, 2018. 8

[30] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In ICCV, 2017. 7

[31] J. Sulam, A. Aberdam, A. Beck, and M. Elad. On multi-layer basis pursuit, efﬁcient algorithms and convolutional neural networks. IEEE Trans. PAMI, 2019. 9

[32] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, 2014. 1

[33] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception archi-tecture for computer vision. In CVPR, 2016. 1

[34] J. Uesato, B. O Donoghue, A. v. d. Oord, and P. Kohli. Adversarial risk and the dangers of evaluating against weak attacks. In ICML, 2018. 8

[35] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization. ar Xiv:1607.08022, 2016. 4, 16

[36] T. Wang, K. Yamaguchi, and V. Ordonez. Feedback-prop: Convolutional neural network inference under partial evidence. In CVPR, 2018. 9

[37] D. Warde-Farley and Y. Bengio. Improving generative adversarial networks with denoising feature matching. In ICLR, 2017. 8

[38] H. Wen, K. Han, J. Shi, Y. Zhang, E. Culurciello, and Z. Liu. Deep predictive coding network for object recognition. In ICML, 2018. 9

[39] S. Zagoruyko and N. Komodakis. Wide residual networks. ar Xiv:1605.07146, 2016. 7, 18

[40] A. R. Zamir, T.-L. Wu, L. Sun, W. B. Shen, B. E. Shi, J. Malik, and S. Savarese. Feedback networks. In CVPR, 2017. 8