# gan_steerability_without_optimization___e91ea6da.pdf

Published as a conference paper at ICLR 2021

GAN STEERABILITY WITHOUT OPTIMIZATION

Nurit Spingarn Eliezer Ron Banner Tomer Michaeli

Technion Israel Institute of Technology , Habana Labs - Intel {nurits@campus,tomer.m@ee}.technion.ac.il, ron.banner@intel.com

Recent research has shown remarkable success in revealing steering directions in the latent spaces of pre-trained GANs. These directions correspond to semantically meaningful image transformations (e.g., shift, zoom, color manipulations), and have similar interpretable effects across all categories that the GAN can generate. Some methods focus on user-speciﬁed transformations, while others discover transformations in an unsupervised manner. However, all existing techniques rely on an optimization procedure to expose those directions, and offer no control over the degree of allowed interaction between different transformations. In this paper, we show that steering trajectories can be computed in closed form directly from the generator s weights without any form of training or optimization. This applies to user-prescribed geometric transformations, as well as to unsupervised discovery of more complex effects. Our approach allows determining both linear and nonlinear trajectories, and has many advantages over previous methods. In particular, we can control whether one transformation is allowed to come on the expense of another (e.g., zoom-in with or without allowing translation to keep the object centered). Moreover, we can determine the natural end-point of the trajectory, which corresponds to the largest extent to which a transformation can be applied without incurring degradation. Finally, we show how transferring attributes between images can be achieved without optimization, even across different categories.

1 INTRODUCTION

Since their introduction by Goodfellow et al. (2014), generative adversarial networks (GANs) have seen remarkable progress, with current models capable of generating samples of very high quality (Brock et al., 2018; Karras et al., 2019a; 2018; 2019b). In recent years, particular effort has been invested in constructing controllable models, which allow manipulating attributes of the generated images. These range from disentangled models for controlling e.g., the hair color or gender of facial images (Karras et al., 2019a;b; Choi et al., 2018), to models that even allow specifying object relations (Ashual & Wolf, 2019). Most recently, it has been demonstrated that GANs trained without explicitly enforcing disentanglement, can also be easily steered (Jahanian et al., 2020; Plumerault et al., 2020). These methods can determine semantically meaningful linear directions in the latent space of a pre-trained GAN, which correspond to various different image transformations, such as zoom, horizontal/vertical shift, in-plane rotation, brightness, redness, blueness, etc. Interestingly, a walk in the revealed directions typically has a similar effect across all object categories that the GAN can generate, from animals to man-made objects.

To detect such latent-space directions, the methods of Jahanian et al. (2020) and Plumerault et al. (2020) require a training procedure that limits them to transformations for which synthetic images can be produced for supervision (e.g., shift or zoom). Other works have recently presented unsupervised techniques for exposing meaningful directions (Voynov & Babenko, 2020; H ark onen et al., 2020; Peebles et al., 2020). These methods can go beyond simple user-speciﬁed transformations, but also require optimization or training of some sort (e.g., drawing random samples in latent space).

In this paper, we show that for most popular generator architectures, it is possible to determine meaningful latent space trajectories directly from the generator s weights without performing any kind of training or optimization. As illustrated in Fig. 1, our approach supports both simple userdeﬁned geometric transformations, such as shift and zoom, and unsupervised exploration of directions that typically reveals more complex controls, like the 3D pose of the camera or the blur of the

Published as a conference paper at ICLR 2021

User-prescribed geometric transformations

Zoom Shift X Shift Y Background

Transferred

Color Transfer

Pose Transfer

Transferred

Automatically detected transformations

Rotate Depth of

Field Temperature Light

Figure 1: Steerability without optimization. We determine meaningful trajectories in the latent space of a pre-trained GAN without using optimization. We accommodate both user-prescribed geometric transformations, and automatic detection of semantic directions. We also achieve attribute transfer without any training. All images were generated with Big GAN (Brock et al., 2018).

background. We also discuss how to achieve attribute transfer between images, even across object categories (see Fig. 1), again without any training. We illustrate results mainly on Big GAN, which is class-conditional, but our trajectories are class-agnostic. Our approach is advantageous over existing methods in several respects. First, it is 104 -105 faster. Second, it seems to detect more semantic directions than other methods. And third, it allows explicitly accounting for dataset biases.

First order dataset biases As pointed out by Jahanian et al. (2020), dataset biases affect the extent to which a pre-trained generator can accommodate different transformations. For example, if all objects in the training set are centered, then no walk in latent space typically allows shifting an object too much without incurring degradation. This implies that a steering latent-space trajectory should have an end-point. Our nonlinear trajectories indeed possess such convergence points, which correspond to the maximally-transformed versions of the images at the beginning of the trajectories. Conveniently, the end-point can be computed in closed form, so that we can directly jump to the maximally-transformed image without performing a gradual walk.

Second order dataset biases Dataset biases can also lead to coupling between transformations. For example, in many datasets zoomed-out objects can appear anywhere within the image, while zoomed-in objects are always centered. In this case, trying to apply a zoom transformation may also result in an undesired shift so as to center the enlarged object. Our unsupervised method allows controlling the extent to which transformation A comes on the expense of transformation B.

1.1 RELATED WORK

Walks in latent space Many works use walks in a GAN s latent space to achieve various effects (e.g., (Shen et al., 2020; Radford et al., 2015; Karras et al., 2018; 2019b; Denton et al., 2019; Xiao et al., 2018; Goetschalckx et al., 2019)). The recent works of Jahanian et al. (2020) and Plumerault et al. (2020) speciﬁcally focus on determining trajectories which lead to simple user-speciﬁed transformations, by employing optimization through the (pre-trained) generator. Voynov & Babenko

Published as a conference paper at ICLR 2021

Pre-trained

Figure 2: User-prescribed spatial manipulations. We calculate directions in latent space whose effect on the tensor at the output of the ﬁrst layer, is similar to applying transformation P on that tensor. This results in the generated image experiencing the same transformation.

(2020) proposed an unsupervised approach for revealing dominant directions in latent space. This technique reveals more complex transformations, such as background blur and background removal, yet it also relies on optimization. Most recently, the work of H ark onen et al. (2020) studied unsupervised discovery of meaningful directions by using PCA on deep features of the generator. The method seeks linear directions in latent space that best map to those deep PCA vectors, and results in a set of non-orthogonal directions. Similarly to the other methods, it also requires a very demanding training procedure (drawing random latent codes and regressing the latent directions), which can take a day for models like Big GAN. Nonlinear walks in latent space Linear latent-space trajectories may arrive at regions where the probability density is low. To avoid this, some methods proposed to replace the popular Gaussian latent space distribution by other priors (Kilcher et al., 2018), or to optimize the generator together with the latent space (Bojanowski et al., 2018). Others suggested to use nonlinear walks in latent space that avoid low-probability regions. For example, Jahanian et al. (2020) explored nonlinear trajectories parametrized by two-layer neural networks, while White (2016) proposed spherical paths for interpolating between two latent codes. Hierarchical GAN architectures Recently there is tendency towards hierarchical GAN architectures (Karras et al., 2018; 2019a; Brock et al., 2018; Choi et al., 2018), which are capable of producing high resolution images at very high quality. It is known that the earlier scales in such models are responsible for generating the global composition of the image, while the deeper scales are responsible for more local attributes (Karras et al., 2019a; Yang et al., 2019; H ark onen et al., 2020). Here, we distil this common knowledge and show how meaningful directions can be detected in each level, and how these architectures allow transferring attributes between images.

2 USER-SPECIFIED GEOMETRIC TRANSFORMATIONS

Most modern generator architectures map a latent code vector z Rd having no notion of spatial coordinates, into a two-dimensional output image. In some cases (e.g., Big GAN), different parts of z are processed differently. In others (e.g., Big GAN-deep), z is processed as a whole. However, in all cases, the ﬁrst layer maps z (or part of it) into a tensor with low spatial resolution (e.g., 4 4 1536 in Big GAN 128). This tensor is then processed by a sequence of convolutional layers that gradually increase its spatial resolution (using fractional strides), until reaching the ﬁnal image dimensions.

Our key observation is that since the output of the ﬁrst layer already has spatial coordinates, this layer has an important role in determining the coarse structure of the generated image. This suggests that if we were to apply a geometric transformation, like zoom or shift, on the output of the ﬁrst layer, then we would obtain a similar effect to applying it directly on the generated image (Fig. 2). In fact, it may even allow slight semantic changes to take place due to the deeper layers that follow, which can compensate for the inability of the generator to generate the precise desired transformed image. As we now show, this observation can be used to ﬁnd latent space directions corresponding to simple geometric transformations.

Published as a conference paper at ICLR 2021

2.1 LINEAR TRAJECTORIES

Let us start with linear trajectories. Given a pre-trained generator G and some transformation T , our goal is to ﬁnd a direction q in latent space such that G(z + q) T {G(z)} for every z. To this end, we deﬁne P to be the matrix corresponding to T in the resolution of the ﬁrst layer s output. Denoting the weights and biases of the ﬁrst layer by W and b, respectively, our goal is therefore to bring1 W (z + q) + b as close as possible to P (W z + b). To guarantee that this holds on average over random draws of z, we formulate our problem as

min q Ez pz

D W (z + q) + b P (W z + b) 2 , (1)

where pz is the probability density function of z, and D is a diagonal matrix that can be used to assign different weights to different elements of the tensors. For example, if P corresponds to a horizontal shift of one element to the right, then we would not like to penalize for differences in the leftmost column of the shifted feature maps (see Fig. 2). In this case, we set the corresponding diagonal elements of D to 0 and the rest to 1. Assuming E[z] = 0, as is the case in most frameworks, the objective in (1) simpliﬁes to

D (I P )W z 2 + D W q + (I P )b 2 , (2)

where I is the identity matrix. The ﬁrst term in (2) is independent of q, and the second term is quadratic in q and is minimized by

q = W TD2 W 1 W TD2(P I) b. (3) We have thus obtained a closed form expression for the optimal linear direction corresponding to transformation P in terms of only the weights W and b of the ﬁrst layer.

Figure 2 illustrates this framework in the context of the Big GAN model, in which the feature maps at the output of the ﬁrst layer are 4 4. For translation, we use a matrix P that shifts the tensor by one element (aiming at translating the output image by one fourth its size). For zoom-in, we use a matrix P that performs nearest-neighbor 2 up-sampling, and for zoom-out we use sub-sampling by 2 . For each such transformation, we can control the extent of the effect by multiplying the steering vector q by some α > 0.

Figure 1 (top-left) and Fig. 3(a) show example results for zoom and shift with the Big GAN generator. As can be seen, this simple approach manages to produce pronounced effects, although not using optimization through the generator, as in (Jahanian et al., 2020). Following (Jahanian et al., 2020), we use an object detector to quantify our zoom and shift transformations. Figure 4 shows the distributions of areas and centers of object bounding boxes in the transformed images. As can be seen, our trajectories lead to similar effects to those of Jahanian et al. (2020), despite being 104 faster to compute (see Tab. 1). Please refer to App. A.1 for details about the evaluation, and see additional results with Big GAN and with the DCGAN architecture of (Miyato et al., 2018) in App. A.3. 2.2 ACCOUNTING FOR FIRST-ORDER DATASET BIASES VIA NEUMANN TRAJECTORIES

With linear trajectories, the generated image inevitably becomes improbable after many steps, as pz(z + αq) is necessarily small for large α. This causes the generated image to distort until eventually becoming meaningless after many steps. One way to remedy this, is by using nonlinear trajectories that have endpoints. Here, we focus on walks in latent space, having the form zn+1 = Mzn + q, (4) for some matrix M and vector q. We coin these Neumann trajectories, since unfolding the iterations leads to a Neumann series. An important feature of such walks is that if the spectral norm of M is strictly smaller than 1 (a condition we ﬁnd to be satisﬁed in practice for the optimal M), then they have a convergence point. We use a diagonal M, which we ﬁnd gives the best results. To determine the optimal M and q for a transformation P , we modify Problem (1) into

min M,q Ez pz

D W (Mz + q) + b P (W z + b) 2 . (5)

1For architectures like Big GAN, in which the ﬁrst FC layer operates on a subset of the entries of the latent vector, we use z to refer to this subset rather than to the whole vector.

Published as a conference paper at ICLR 2021

- Shift Y +

- Shift X +

(a) Linear Trajectories (b) Nonlinear Trajectories

Our Our Our

Neumann Steer. Our

- Shift Y +

- Shift X +

Neumann Steer. Our

Neumann Steer. Our

Steer. Steer. Steer.

Figure 3: Walks corresponding to geometric transformations. We compare our zoom and shift trajectories to those of the GAN steerability work (Jahanian et al., 2020). For linear paths, the methods are qualitatively similar, whereas for nonlinear walks, our methods are advantageous.

(a) Zoom in (c) Shift Y (b) Shift X

5 steps Non-transformed

Figure 4: Quantitative comparison with (Jahanian et al., 2020). We show the probability densities of object areas and locations after 2 (top) and 5 (bottom) steps of walks for Big GAN-128. The stepsize is the same for the linear walks, and matches the size of the ﬁrst step of the nonlinear walk. Our walks have similar effects to those of Jahanian et al. (2020), with the nonlinear variants achieving lower FID scores after 5 steps, at the cost of only slightly weaker transformation effects.

We assume again that E[z] = 0, and make the additional assumption that E[zz T ] = σ2 z I, which is the case in all current GAN frameworks. In this setting, the objective in (5) reduces to

σ2 z D W M P W 2

F + D W q + (I P )b 2 , (6)

Published as a conference paper at ICLR 2021

Steerability

Steerability

(a) Linear trajectories (b) Nonlinear trajectories

Figure 5: Endpoints. (a) Linear walks eventually lead to deteriorated images (shown here for zoom). (b) Our nonlinear walks converge to meaningful images. The nonlinear trajectories of the GAN steerability method (Jahanian et al., 2020) also converge, but always to the same (unnatural) image for a given class.

where F denotes the Frobenius norm. Here, q appears only in the second term, which is identical to the second term of (2). Therefore, the optimal q is as in (3). The matrix M appears only in the ﬁrst term, which is easily shown to be minimized when setting the diagonal entries of M to

Mi,i = w T i D2P wi w T i D2 wi , (7)

where wi is the ith column of W .

Controlling the step size As opposed to linear trajectories, reﬁning the step size along our curved trajectories necessitates modifying both M and q. To do so, we can search for a matrix M and vector q with which N steps of the form zn+1 = Mzn + q are equivalent to a single step of the walk (4). Noting that the Nth step of the reﬁned walk can be explicitly written as z N = M Nz0 + (PN 1 k=0 M k) q, we conclude that the parameters of this N-times ﬁner walk are

M = M 1 N , q =

Convergence point If the spectral norm of M is smaller than 1, then we have that

lim n zn = lim n

= (I M) 1q, (9)

where we used the fact that the ﬁrst term tends to zero and the second term is a Newmann series. Superﬁcially, this may seem to imply that the endpoint of the trajectory is not a function of the initial point z0. However, recall that in hierarchical architectures, like Big GAN, z refers to the part of the latent vector that enters the ﬁrst layer. The rest of the latent vector is not modiﬁed throughout the walk. Therefore, the latent vector at the endpoint equals the latent vector of the initial point, except for its subset of entries corresponding to the ﬁrst hierarchy level, which are replaced by (I M) 1q.

2.3 ACCOUNTING FOR FIRST-ORDER DATASET BIASES VIA GREAT CIRCLE TRAJECTORIES

In the Neumann walk, the step size decreases along the path (as zn+1 zn 0). We now discuss an alternative nonlinear trajectory that has a natural endpoint yet permits a constant step size. Here we avoid low density regions by explicitly requiring that the likelihood of all images along the path is constant. For z N(0, I), this translates to the requirement that the whole trajectory lie on the sphere whose radius equals the norm of the original latent code z0. We stress that the method we discuss here can be applied to any direction q, whether determined in a supervised manner or not.

Speciﬁcally, suppose we want to steer our latent code towards a normalized direction v = q / q . Then we can walk along the great circle on the sphere that passes through our initial point z0, and

Published as a conference paper at ICLR 2021

the point z0 v (blue circle in Fig. 6). Mathematically, let V denote the (one-dimensional) subspace spanned by v and let PV = vv T and PV = I PV denote the orthogonal projections onto V and V , respectively. Then the great circle trajectory can be expressed as

zn = z0 (u cos(n + θ) + v sin(n + θ)) , (10)

where u = PV z0/ PV z0 and θ = arccos(PV z0/ z0 ) sign( z0, v ). The effect of this trajectory for a zoom-in direction is shown in Fig. 6 (third row). The natural endpoint of the greatcircle path is z0 v (blue point), beyond which the contribution of v starts to decrease. As seen in Fig. 6, this endpoint indeed corresponds to a plausible zoomed-in version of the original image.

2.4 COMPARISON

Figure 3(b) compares our nonlinear walks (Neumann and great-circle) with those of the GAN steerabilty work of Jahanian et al. (2020). As can be seen, the latter tend to involve undesired brightness changes. The advantage of our nonlinear trajectories over the linear ones becomes apparent when performing long walks, as exempliﬁed in Fig. 5. In such settings, the linear trajectories deteriorate, whereas our nonlinear paths have meaningful endpoints. This can also be seen in Fig. 4, which reports the Fre chet Inception distances (FID) achieved by the two approaches. Interestingly, the nonlinear trajectories of the GAN steerability method also have endpoints, but these endpoints are the same for all images of a certain class (and distorted).

3 UNSUPERVISED EXPLORATION OF TRANSFORMATIONS

To go beyond simple user-prescribed geometric transformations, we now discuss exploration of additional manipulations in an unsupervised manner. The key feature of our approach is that by revealing a large set of directions, we can now also account for second-order dataset biases.

3.1 PRINCIPAL LATENT SPACE DIRECTIONS

We start by seeking a set of orthonormal directions (possibly a different set for each generator hierarchy) that lead to the maximal change at the output of the layer to which z is injected. These directions are precisely the right singular vectors of the corresponding weight matrix W , i.e., the kth most signiﬁcant direction is the kth column of the matrix V in the singular value decomposition W = USV T (assuming the diagonal entries of S are arranged in decreasing order). This reveals directions corresponding to many geometric, texture, color, and background effects (see Fig. 1).

Method Memory Time Jahanian et al. (2020) 0 40 min (per dir.) H ark onen et al. (2020) 1GB 14 hrs (all) Voynov & Babenko (2020) 0 10 hrs (all) Our principal directions 0 327 ms (all)

Table 1: Complexity for Big GAN-deep-512.

Our approach is seemingly similar to GANspace (H ark onen et al., 2020), which computes PCA of activations within the network. However, they optimize over latent space directions that best map to this deep PCA basis. Concretely, they feed-forward random latent codes {z(j)} to obtain deep-feature representations {y(j)}, compute the PCA basis A and mean vector µ of these features, and then solve for a steering basis V = arg min P j V AT (y(j) µ) z(j) . Thus, besides computational inefﬁciency (see Tab. 1), they obtain a set of non-orthogonal latent-space directions (see App. Fig. 48) that correspond to repeated effects (see App. Figs.42-46). In contrast, our directions are orthogonal by construction, and therefore capture a more diverse set of effects (see App. Figs.42-46). For example, the semantic dissimilarity between G(z) and G(z + 3v) is 64% larger with our method, as measured by the average LPIPS distance (Zhang et al., 2018) over the ﬁrst 50 directions (33 10 3 for GANSpace, 54 10 3 for us).

Having determined a set of semantic directions, we now want to construct trajectories that exhibit the corresponding effects, but also account for dataset biases. As discussed in Sec. 2 and illustrated in the ﬁrst two rows of Fig. 6, performing linear walks along these directions eventually leads to distorted images. A more appropriate choice is thus to use the great-circle walk described in Sec. 2.

Published as a conference paper at ICLR 2021

Nose Y coord.

Figure 6: Orbits in latent space. A linear trajectory (magenta) in the principal direction v corresponding to zoom, eventually draws apart from the sphere and results in distorted images. The great circle (blue) that connects z0 with z0 v keeps the image natural all the way, but allows also other transformations (shift in this case). The small circle (green) that only modiﬁes vref in addition to v, does not induce any other transformation besides zoom (vref is the least dominant direction). Particularly, it keeps the nose s vertical coordinate ﬁxed (right plots). See also App. Figs. 40-41

This is illustrated in the third row of Fig. 6. While leading to meaningful endpoints, a limitation of the great circle trajectory is that when walking on the sphere towards v, we actually also modify the projections onto other principal directions. This causes other properties to change besides the desired attribute. For example, in Fig. 6, the great circle causes a shift, centering the dog in addition to the principal zoom effect (see the nose position graphs on the right). This stems from a secondorder dataset bias. Indeed, as shown in Fig. 7, Big GAN generates small (zoomed-out) dogs at almost any location within the image, but its generated large (zoomed-in) dogs tend to be centered.

3.2 ACCOUNTING FOR SECOND-ORDER DATASET BIASES VIA SMALL CIRCLE TRAJECTORIES

Using our set of directions to battle second-order biases is non-trivial, as walking on the sphere towards v while keeping the projections onto all other principal directions ﬁxed is impossible (it induces too many constraints). However, we note that if we allow the projection onto only one of the other directions, say vref, to change, then it becomes possible to keep the projections onto all other axes ﬁxed. Such a trajectory is in fact a small circle on the sphere, that lies in the afﬁne subspace that contains z0 and is parallel to V = span{v, vref}. Speciﬁcally, the small circle walk is given by zn = PV z0 + PVz0 (vref cos(n + θ) + v sin(n + θ)), (11)

where θ = arccos(PVrefz0/ PVz0 ) sign( PVz0, v ) with PVref = vrefv T ref. One natural choice for vref is the principal direction having the smallest singular value, which corresponds to the weakest effect. As can be seen in the bottom row of Fig. 6, the small circle trajectory with this choice leads to a zoom effect without shift or any other dominant transformation. This is also illustrated in Fig. 7, which shows the distribution of the horizontal translation between the initial point and the endpoint of the trajectory. As can be seen, the small circle walk incurs the smallest shift and keeps the FID highest, albeit leading to a slightly smaller zoom effect. In App. A.2 we show additional examples, including with different choices of vref.

4 ATTRIBUTE TRANSFER

In the previous sections we explicitly computed directions in latent space. An alternative way of achieving a desired effect, is to transfer attributes from a different image. As we now show, this can also be achieved without optimization. Speciﬁcally, in App. A.2 we show that for Big GAN, principal directions corresponding to different hierarchies control distinctively different attributes. Now, our key observation is that this allows transferring attributes between images, simply by copying from a target image the part of z corresponding to a particular hierarchy (see Fig. 8). For example, to transfer pose, we replace the part corresponding to the ﬁrst level. As seen in Figs. 1 and 8, this al-

Published as a conference paper at ICLR 2021

Relative Horizontal Shift

(b) Steerability (c) GANSpace (e) Our small circle (d) Our great circle

FID = 12.8 FID = 26.1 FID = 25.2 FID = 15.5

Relative Horizontal Shift

Relative Horizontal Shift

Relative Horizontal Shift

Figure 7: Accounting for second-order dataset bias. In red is the joint distribution of area and horizontal center of Big GAN-generated Labrador dogs. This plot shows that zoomed-out dogs can appear anywhere, whereas zoomed-in dogs are mostly centered. In blue are the joint distributions of area and horizontal translation (namely delta shift) achieved by walks in a zoom-in direction. All walks indeed increase the area, but also undesirably shift the dog. Our methods incur smaller shifts, with the small circle walk incurring negligible shift. From left the right, the mean shifts of the methods are 0.08, 0.10, 0.06 and 0.01. This allows us to achieve lower FIDs, but at the cost of achieving slightly smaller zoom effects (the mean areas are 0.85, 0.83, 0.80 and 0.76).

Transferred

Target Texture

Transferred

1 2 3 4 5 6

Transferred color\texture

Latent code parts (chunks) Target

Transferred

Figure 8: Attributes transfer. In Big GAN-128 the latent code is divided into 6 chunks that are injected to different hierarchy levels. Transferring pose, color or texture, can be done by copying speciﬁc parts of the latent code from the target image.

lows transferring pose even across classes. Within the same class, we can transfer color by copying the elements of hierarchies 4,5 and 6 and texture by copying hierarchies 3,4 and 5 (see Appendix for more examples). Note that unlike other works discussing semantic style hierarchies (e.g., (Karras et al., 2019a; Yang et al., 2019)), our pre-trained Big GAN was not trained to disentangle attributes.

5 CONCLUSION

We presented methods for determining paths in the latent spaces of pre-trained GANs, which correspond to semantically meaningful transformations. Our approach extracts those trajectories directly from the generator s weights, without requiring optimization or training of any sort. Our methods are signiﬁcantly more efﬁcient than existing techniques, they determine a larger set of distinctive semantic directions, and are the ﬁrst to allow explicitly accounting for dataset biases.

Acknowledgements This research was supported by the Israel Science Foundation (grant 852/17) and by the Technion Ollendorff Minerva Center.

Published as a conference paper at ICLR 2021

Oron Ashual and Lior Wolf. Specifying object attributes and relations in interactive scene generation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 4561 4569, 2019.

Piotr Bojanowski, Armand Joulin, David Lopez-Pas, and Arthur Szlam. Optimizing the latent space of generative networks. In Proceedings of the 35th International Conference on Machine Learning, pp. 600 609, 2018.

Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale gan training for high ﬁdelity natural image synthesis. In International Conference on Learning Representations, 2018.

Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. Stargan: Uniﬁed generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8789 8797, 2018.

Emily Denton, Ben Hutchinson, Margaret Mitchell, and Timnit Gebru. Detecting bias with generative counterfactual face attribute augmentation. ar Xiv preprint ar Xiv:1906.06439, 2019.

Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip Isola. Ganalyze: Toward visual deﬁnitions of cognitive image properties. In The IEEE International Conference on Computer Vision (ICCV), October 2019.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672 2680, 2014.

Erik H ark onen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. Ganspace: Discovering interpretable gan controls. ar Xiv preprint ar Xiv:2004.02546, 2020.

Ali Jahanian, Lucy Chai, and Phillip Isola. On the steerability of generative adversarial networks. In International Conference on Learning Representations, 2020.

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of GANs for improved quality, stability, and variation. In Proc. International Conference on Learning Representations (ICLR), 2018.

Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4401 4410, 2019a.

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. ar Xiv preprint ar Xiv:1912.04958, 2019b.

Yannic Kilcher, Aurelien Lucchi, and Thomas Hofmann. Semantic interpolation in implicit models. In International Conference on Learning Representations, 2018.

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations, 2018.

William Peebles, John Peebles, Jun-Yan Zhu, Alexei A. Efros, and Antonio Torralba. The hessian penalty: A weak prior for unsupervised disentanglement. In Proceedings of European Conference on Computer Vision (ECCV), 2020.

Antoine Plumerault, Herv e Le Borgne, and C eline Hudelot. Controlling generative models with continuous factors of variations. In International Conference on Learning Representations, 2020.

Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. ar Xiv preprint ar Xiv:1511.06434, 2015.

Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. Interpreting the latent space of gans for semantic face editing. In CVPR, 2020.

Published as a conference paper at ICLR 2021

Andrey Voynov and Artem Babenko. Unsupervised discovery of interpretable directions in the gan latent space. ar Xiv preprint ar Xiv:2002.03754, 2020.

Tom White. Sampling generative networks. ar Xiv preprint ar Xiv:1609.04468, 2016.

Taihong Xiao, Jiapeng Hong, and Jinwen Ma. Elegant: Exchanging latent encodings with gan for transferring multiple face attributes. In Proceedings of the European conference on computer vision (ECCV), pp. 168 184, 2018.

Ceyuan Yang, Yujun Shen, and Bolei Zhou. Semantic hierarchy emerges in deep generative representations for scene synthesis. ar Xiv preprint ar Xiv:1911.09267, 2019.

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 586 595, 2018.

Published as a conference paper at ICLR 2021

A GAN STEERABILITY WITHOUT OPTIMIZATION: APPENDIX

A.1 QUANTITATIVE EVALUATION

We adopt the method proposed in Jahanian et al. (2020) and utilize the Mobile Net-SSD-V1 detector2 to estimate object bounding boxes. To quantify shifts, we extract the centers of the bounding boxes along the corresponding axis. To quantify zoom, we use the area of the bonding boxes. In the following paragraphs, we elaborate on all quantitative evaluations reported in the main text.

Figure 4 Here, we show the probability densities of object areas and locations after 2 (top) and 5 (bottom) steps. Since we use unit-norm direction vectors, the length of the linear paths we walk through are 2 and 5 as well. As for the nonlinear path, we choose the ﬁrst step to have the same length. However, the overall length of the path is different. For example, on average, ﬁve steps of the nonlinear trajectory have a total length of 5.95, but reach at a point of distance only 4.3 from the initial point. We used 100 randomly chosen classes from the Image Net dataset, and 30k images from each class. The same images are used for both the FID measurement and for generating the PDF plots.

Figure 6 In order to ensure that each step of the linear walk and the great and small circle walks has the same geodesic distance, we set

L = G z0 = S PVz0 , (12)

where L, G and S are the step sizes of the linear, great circle and small circle walks, respectively. This ensures that the arc-length of a step on the circles is the same as the length of a step of the linear walk.

Figure 7 Here, we aim to demonstrate a particular second order dataset bias. We chose 10 classes which we found to exhibit strong coupling between the size and location of the object. For example, dogs, cats and in general, animals. We plotted 80 levels-sets of 2D KDEs computed using the seaborne package. In Fig. 7 we show results for a Labrador retriever dog, we observed similar results for the classes: golden retriever (207), Welsh springer spaniel (218), Great grey Owl (24), Persian cat (283), plane (726), tiger (292), Old English sheepdog (229), passenger car (705), goose (99), husky (248). See Figs. 40 and 41 for additional results.

Table 1 In Tab. 1, we compare the running time and memory usage of all methods. For3 (Jahanian et al., 2020), we measure the time it takes to learn one direction, which includes the training process. For4 (H ark onen et al., 2020), we measure the total time it takes to extract the directions, including the sample collection, the PCA, and the regression. We noticed that the regression stage was the heaviest. As for our method, we measure the time it takes the CPU to perform SVD. The column Memory speciﬁes the required memory for collecting samples. Only GANSpace (H ark onen et al., 2020) requires that stage.

A.2 UNSUPERVISED EXPLORATION OF PRINCIPAL DIRECTIONS

A.2.1 COMPARISONS WITH RANDOM DIRECTIONS

In Fig. 10 - 9 we explore principal directions via linear walks, using the same initial image (in the middle). In Figs. 14-16 we explore the transformations that arise in each hierarchy of Big GAN-128. Speciﬁcally, we compare our linear directions which are based on SVD, with random directions. We draw 5 different directions from an isotropic Gaussian distribution and normalize them to have unit-norms, similarly to our directions. Then, we linearly add them to the initial latent code with ﬁxed number of steps. We can observe that each random direction induces a different complex effect, which cannot be described by a single semantic property. For examples, in the ﬁrst random direction (R1) we can see rotation, zoom and background changes, while in the third (R3), there is a kind of vertical shift. On the other hand, our principal directions show one prominent transformation

2https://github.com/qfgaohao/pytorch-ssd 3https://github.com/ali-design/ 4https://github.com/harskish/ganspace/

Published as a conference paper at ICLR 2021

for each scale. We focus on directions that have the same effect for all classes and do not show directions that lead to different effects for different classes, like changes of day-night in one class and background in another class.

A.2.2 ALTERNATIVE SMALL CIRCLE WALKS

In Figs. 29-33, we show more examples, this time with small circle walks towards principal directions. In all those examples, the reference direction vref for the small circle, is the least dominant direction (namely, the singular vector with the smallest singular value). This ensures that when walking towards the principal direction v, we modify no other dominant property. That is, we modify the property associated with v without modifying the properties associated with any other principal direction, besides vref (which is the least dominant one). In Figs. 29-33 we show some cases in which the initial generated image is not in the middle of the small circle path and therefore in these cases, we need to take a different number of steps to each side. The endpoints are deﬁned as the points where the cosine in Eq. 11 becomes 0 and 1. We do not have to choose the reference direction vref to be the least dominant one. If we choose it to be a dominant direction, then we may obtain various interesting phenomena, depending on the interaction between the directions v and vref. This is illustrated in Figs. 37-39. Speciﬁcally, in Fig. 37 and 38, we perform a walk in the direction corresponding to zoom, while allowing only the vertical shift to change. In this case, the walk manages to center the object so as to achieve a signiﬁcant zoom effect. In Fig. 39, on the other hand, we perform a walk in the direction corresponding to zoom while allowing only the rotation to change. Here, the zoom effect is less dominant, but we do see a strong rotation effect.

A.2.3 SECOND ORDER DATASET BIASES

In Figs. 40 and 41 we show more examples for second order dataset biases. Speciﬁcally, those ﬁgures depict the joint distributions of area and horizontal center shift (top) and area and vertical center shift (bottom) at the end of walks that are supposed to induce only zoom-in. Our small circle walks exhibit the smallest undesired shifts.

A.2.4 COMPARISONS WITH GANSPACE

In Fig. 42-46, we show visual comparisons with GANSpace (H ark onen et al., 2020). We speciﬁcally focus on the ﬁrst 50 directions founded by each method and show that our linear directions lead to stronger effects for most of the directions. All directions were scaled to have a unit norm and are linearly added or subtracted from the initial latent code with the same step size. In Fig. 48, we show that our direction are orthogonal to each other much more then the directions found by (H ark onen et al., 2020).

A.3 USER PRESCRIBED SPATIAL MANIPULATIONS

We provide additional examples for walks corresponding to user prescribed geometric transformations. We focus on zoom, vertical shift and horizontal shift, and show both linear and our nonlinear trajectories.

A.3.1 COMPARISONS WITH JAHANIAN ET AL.

In Figs. 49 we show additional comparisons with Jahanian et al. (2020).

A.3.2 ADDITIONAL RESULTS

In Figs. 50-53 we show additional zoom trajectories and and in Fig. 54, 55 additional shift trajectories. As can be seen, the linear trajectories often remain more loyal to the original image (at the center) after a small number of steps. However, for a large number of steps, the nonlinear trajectories lead to more plausible images.

Published as a conference paper at ICLR 2021

A.3.3 RESULTS ON DCGAN

In Figs. 56 - 58, we show results with Res Net based GAN presented in Miyato et al. (2018). That GAN has a FC layer as the ﬁrst stage, which is all we need in order to perform our spatial manipulations and to extract principal components. Since that architecture is not an hierarchical one, we can manipulate the ﬁrst layer only.

Published as a conference paper at ICLR 2021

Lighting ( Shadowing )

Temperature

1st hierarchy

2nd hierarchy

3rd hierarchy

4th hierarchy

5th hierarchy

6th hierarchy

Figure 9: Our explored directions in Big GAN.

Published as a conference paper at ICLR 2021

1st hierarchy

2nd hierarchy

BW 4th hierarchy

5th hierarchy

6th hierarchy

Temperature

1st hierarchy

3rd hierarchy

4th hierarchy

Lighting ( Shadowing )

Figure 10: Our explored directions in Big GAN.

Published as a conference paper at ICLR 2021

Figure 11: Our vs. random directions. We illustrate the effects of ﬁve random directions R1, . . . , R5 (normally distributed and scaled to have unit norms) in the ﬁrst and second scales of Big GAN, in comparison with our principal directions. We can see that each random direction leads to different changes, but it is impossible to associate a single dominant property with each direction. For example, in R5 we can see changes in size, location, and pose. This is while our directions separate those effects into unique paths.

Published as a conference paper at ICLR 2021

Lighting ( Shadowing )

Figure 12: Our vs. random directions. We show the effects of ﬁve random directions in the third and fourth scales of Big GAN in comparison with our principal directions.

Published as a conference paper at ICLR 2021

Temperature

Figure 13: Our vs. random directions. We show the effects of ﬁve random directions in the ﬁfth and sixth scales of Big GAN in comparison with our principal directions.

Published as a conference paper at ICLR 2021

Figure 14: Our vs. random directions. We illustrate the effects of ﬁve random directions R1, . . . , R5 (normally distributed and scaled to have unit norms) in the ﬁrst and second scales of Big GAN, in comparison with our principal directions. We can see that each random direction leads to different changes, but it is impossible to associate a single dominant property with each direction. For example, in R5 we can see changes in size, location, and pose. This is while our directions separate those effects into unique paths.

Published as a conference paper at ICLR 2021

Lighting ( Shadowing )

Figure 15: Our vs. random directions. We show the effects of ﬁve random directions in the third and fourth scales of Big GAN in comparison with our principal directions.

Published as a conference paper at ICLR 2021

Temperature

Figure 16: Our vs. random directions. We show the effects of ﬁve random directions in the ﬁfth and sixth scales of Big GAN in comparison to our principal directions.

Published as a conference paper at ICLR 2021

Figure 17: Our vs. random directions. We illustrate the effects of ﬁve random directions R1, . . . , R5 (normally distributed and scaled to have unit norms) in the ﬁrst and second scales of Big GAN, in comparison with our principal directions. We can see that each random direction leads to different changes, but it is impossible to associate a single dominant property with each direction. For example, in R5 we can see changes in size, location, and pose. This is while our directions separate those effects into unique paths.

Published as a conference paper at ICLR 2021

Lighting ( Shadowing )

Figure 18: Our vs. random directions. We show the effects of ﬁve random directions in the third and fourth scales of Big GAN in comparison with our principal directions.

Published as a conference paper at ICLR 2021

Temperature

Figure 19: Our vs. random directions. We show the effects of ﬁve random directions in the ﬁfth and sixth scales of Big GAN in comparison to our principal directions.

Published as a conference paper at ICLR 2021

Scale 1, Direction 0

Figure 20: 1st principal direction of the ﬁrst scale in Big GAN

Published as a conference paper at ICLR 2021

Scale 1, Direction 1

Figure 21: 2nd principal direction of the ﬁrst scale in Big GAN

Published as a conference paper at ICLR 2021

Scale 1, Direction 2

Figure 22: 3rd principal direction of the ﬁrst scale in Big GAN

Published as a conference paper at ICLR 2021

Scale 3, Direction 1

Background manipulation

( Background clutter )

Figure 23: 1st principal direction of the third scale in Big GAN

Published as a conference paper at ICLR 2021

Scale 4, Direction 1

Background Removal

Figure 24: First principal direction of the fourth scale in Big GAN

Published as a conference paper at ICLR 2021

Scale 4, Direction 2

Figure 25: Second principal direction of the fourth scale in Big GAN

Published as a conference paper at ICLR 2021

Scale 4, Direction 3

Lighting ( Shadowing )

Figure 26: Third principal direction of the fourth scale in Big GAN

Published as a conference paper at ICLR 2021

Scale 5, Direction 1

Figure 27: First principal direction of the ﬁfth scale in Big GAN

Temperature

Yellow Red Green

Figure 28: First three principal direction of the sixth scale in Big GAN

Published as a conference paper at ICLR 2021

Scale 1, Direction 1

Figure 29: First principal direction of scale 1 in Big GAN (small circle walks).

Scale 1, Direction 2

Figure 30: Second principal direction of scale 1 in Big GAN (small circle walks).

Scale 1, Direction 3

Figure 31: Third principal direction of scale 1 in Big GAN (small circle walks).

Published as a conference paper at ICLR 2021

Scale 3, Direction 3

Color (B&W)

Figure 32: Third principal direction of scale 4 in Big GAN (small circle walks).

Scale 4, Direction 1

manipulations

Figure 33: First principal direction of scale 4 (small circle walks). When walking enough steps in the linear direction, a total background removal is observed (see Fig.15. However, it might come with a slight change of object colors. Therefore, we will not constantly see it within the small circle framework (see last image in that bulk in comparison to the other 3).

Scale 3, Direction 2

Brightness (Shadow)

Figure 34: Second principal direction of scale 4 in Big GAN (small circle walks).

Published as a conference paper at ICLR 2021

Scale 1, Direction 3

Scale 1, Direction 2

Vertical Shift

Figure 35: Chosen principal direction of scale 3 in Big GAN (small circle walks). When the initial generated image is not at the middle of the path, we need to take different number of steps to each side.

Published as a conference paper at ICLR 2021

Scale 3, Direction 3

Black and White

Scale 3, Direction 2

Brightness ( shadow )

Figure 36: Chosen principal direction of scale 1 in Big GAN (small circle walks).When the initial generated image is not at the middle of the path, we need to take different number of steps at each side.

Scale 1, Direction 1 (zoom) at

expense of direction 2 (shift Y)

Figure 37: Modifying the second principal direction of scale 1 on the expense of the ﬁrst principal direction of that scale in Big GAN (small circle walks).

Published as a conference paper at ICLR 2021

Scale 1, Direction 1 (zoom) at

expense of direction 2 (shift Y)

Figure 38: Modifying the second principal direction of scale 1 on the expense of the ﬁrst principal direction of that scale in Big GAN (small circle walks).

Scale 1, Direction 1 (zoom) at

expense of direction 3 (rotate)

Figure 39: Modifying the third principal direction of scale 1 on the expense of the ﬁrst principal direction of that scale in Big GAN (small circle walks).

Published as a conference paper at ICLR 2021

Horizontal Shift

Horizontal Shift

Horizontal Shift

Horizontal Shift

Jahanian et al. GANSpace Our great circle Our small circle

Vertical Shift

Vertical Shift

Vertical Shift

Vertical Shift

Figure 40: Second order dataset biases. We explore the coupling between zoom and horizontal translation (top) and zoom and vertical translation (bottom) for Persian cat class in Big GAN-deep. It can be clearly observed that the small circle path exhibits the smallest undesired shifts when increasing the area.

Published as a conference paper at ICLR 2021

Horizontal Shift

Horizontal Shift

Jahanian et al. GANSpace Our great circle Our small circle

Horizontal Shift

Horizontal Shift

Vertical Shift

Vertical Shift

Vertical Shift

Vertical Shift

Figure 41: Second order dataset biases. We explore the coupling between zoom and horizontal translation (top) and zoom and vertical translation (bottom) for husky dogs class in Big GAN-deep. It can be clearly observed that the small circle path exhibits the smallest undesired shifts when increasing the area.

Published as a conference paper at ICLR 2021

GANSpace Our Random

Figure 42: Comparison with GANSpace and random directions in Big GAN-deep (principal vectors 0-25). The image at the center of each block is the original image. We linearly added the vectors with equal steps. Both directions are normalized to have unit-norms. We can see that our trajectories induce a stronger change than those of H ark onen et al. (2020). The averaged LPIPS variance is 0.036 and 0.059 for H ark onen et al. (2020) and our method, respectively.

Published as a conference paper at ICLR 2021

GANSpace Our Random

Figure 43: Comparison with GANSpace and random directions in Big GAN-deep (principal vectors 25-50). The image at the center of each block is the original image. We linearly added the vectors with equal steps. Both directions are normalized to have unit-norms. It can be observed that our trajectories induce stronger change than those of H ark onen et al. (2020). The averaged LPIPS variance is 0.03 and 0.049 for H ark onen et al. (2020) and our method, respectively.

Published as a conference paper at ICLR 2021

GANSpace Our

Figure 44: Comparison with GANSpace in Big GAN-deep (principal directions 0-25). The image at the center of each block is the original image. We linearly added the vectors with equal steps. Both directions are normalized to have unit-norms. It can be observed that our trajectories induce stronger change than those of H ark onen et al. (2020).

Published as a conference paper at ICLR 2021

GANSpace Our

Figure 45: Comparison with GANSpace in Big GAN deep (principal directions 25-50).

Published as a conference paper at ICLR 2021

GANSpace Our

Figure 46: Comparison with GANSpace in big GAN deep - (principal directions 25-50).

Published as a conference paper at ICLR 2021

- Show horizon +

GANSpace Our,

Figure 47: Comparison with H ark onen et al. (2020). An example for the show horizon direction which we apply to edit only layers 1-5 (H ark onen et al., 2020) in Big GAN-deep 512. We can see that our linear directions achieve similar effects to those of GANSpace (blurring the background). However, in both cases, we can also see slight changes in the object size and pose. On the other hand, when using our small circle walk, we keep the same size and pose.

(a) GANSpace (b) Our

Figure 48: Evaluating orthogonality. We show absolute value of the correlation between every two directions among the ﬁrst 80 directions in GANspace and in our method for Big GAN-deep-512.

Published as a conference paper at ICLR 2021

- Shift Y + - Shift Y +

- Shift X + - Shift X +

- Zoom + - Zoom +

Our [10] Our [10] Our [10] Our [10] Our [10] Our [10] Our [10] Our [10] Our [10]

Linear Trajectories Nonlinear Trajectories

Figure 49: User prescribed transformations with Big GAN.

Published as a conference paper at ICLR 2021

Linear Non-

linear Linear Non-

linear Linear Non-

Figure 50: User prescribed zoom with Big GAN. Our method, linear vs. non-linear trajectories.

Published as a conference paper at ICLR 2021

Linear Non-

linear Linear Non-

linear Linear Non-

Linear Non-

linear Linear Non-

linear Linear Non-

Figure 51: User prescribed zoom with Big GAN. Our method, linear vs non-linear trajectories

Published as a conference paper at ICLR 2021

Linear Non-

linear Linear Non-

linear Linear Non-

Linear Non-

linear Linear Non-

linear Linear Non-

Figure 52: User prescribed zoom with Big GAN . Our method, linear vs. non-linear trajectories.

Published as a conference paper at ICLR 2021

Linear Non-

linear Linear Non-

linear Linear Non-

Linear Non-

linear Linear Non-

linear Linear Non-

Figure 53: User prescribed zoom with Big GAN. Our method, linear vs. non-linear trajectories.

Published as a conference paper at ICLR 2021

Linear Non-

linear Linear Non-

linear Linear Non-

Linear Non-

linear Linear Non-

linear Linear Non-

Figure 54: User prescribed vertical shift with Big GAN. Our method, linear vs. non-linear trajectories.

Published as a conference paper at ICLR 2021

Linear Non-

linear Linear Non-

linear Linear Non-

Figure 55: User prescribed vertical shift with Big GAN. Our method, linear vs. non-linear trajectories.

Published as a conference paper at ICLR 2021

- Zoom + - Zoom +

Figure 56: Zoom transformation with DCGAN.

Published as a conference paper at ICLR 2021

- Shift X + - Shift X +

Figure 57: Shift X transformation with DCGAN.

Published as a conference paper at ICLR 2021

A.4 ATTRIBUTE TRANSFER

We next provide more attribute transfer examples. Figures 59,60 show pose transfer examples, which are obtained by swapping the part of the latent vector corresponding to scale 1. Figure 61 depict texture transfer examples, which correspond to swapping the parts of the latent vector and the class, corresponding to scales 3,4,5.

Published as a conference paper at ICLR 2021

- Shift Y + - Shift Y +

Figure 58: Shift Y transformation with DCGAN.

Published as a conference paper at ICLR 2021

Original Images

Transferred

Transferred

Original Images

Figure 59: Pose transfer by swapping scale 1 of the latent vector.

Published as a conference paper at ICLR 2021

Transferred

Original Images

Figure 60: Pose transfer by swapping scale 1 of the latent vector.

Target Texture

Original Images

Transferred

Figure 61: Texture transfer by swapping scales 3,4,5 of the latent vector.