The Geometry of Deep Generative Image Models

Interpretable GANs using Hessian EigenDecomposition
Created on April 16|Last edited on October 10
Comment
﻿Code →\rightarrow→﻿﻿﻿
﻿﻿﻿Paper →\rightarrow→﻿﻿﻿
ContentsTLDR
The Metric Tensor
Properties of the Hessian matrix
Top eigenvectors capture significant image changes
GAN Latent Spaces are highly anisotropic
Conclusion
FigA. How a generator works. Image created by Sayantan Das
1. TLDRThis paper aims to make GAN inversion and interpretability possible. Through the lens of differential geometry, the authors propose that the metric tensor has various interpretability properties -- by realising it as the Hessian matrix of the distance metric (which is LPIPS in this case). Experiments show that through perturbing the latent codes via eigendecomposition of the above Hessian gave controllable generation in the Image Space. Also, the ranking of the eigenvectors suggest that the larger ones lead to significant image changes.
2. The Metric Tensor
2.1 Establishing Riemannian GeometryBecause a generator provides a smooth map onto image space, one relevant conceptual model for GAN latent space is a Riemannian manifold.
To define a Riemannian geometry, we need to have a smooth map and a notion of distance on it, defined by the metric tensor. For image applications, the relevant notion of distance is in image space rather than code space. Thus, we can pull back the distance function on the image space onto the latent space. Differentiating this distance function on latent space, we will get a differential geometric structure (Riemannian metric) on the image manifold.
A GAN Generator G(z)G(z)G(z)﻿ parameterizes a submanifold in the image space with z∈Rnz \in \mathbb{R}^nz∈Rn﻿. Thus, given a distance function in image space D:I×I→R+D:\mathbb{I} \times \mathbb{I} \rightarrow \mathbb{R}_+D:I×I→R+​﻿, (I1,I2)→L,(\mathbb{I}_1,\mathbb{I}_2) \rightarrow \mathbb{L},(I1​,I2​)→L,﻿we can define the distance between the images they generate; i.e. pullback the distance function to latent space through GGG﻿.
﻿d:Rn×Rn→R+d: \mathbb{R}^n\times \mathbb{R}^n \rightarrow \mathbb{R}_+d:Rn×Rn→R+​﻿, d(z1,z2)=D(G(z1),G(z2))d(z_1,z_2) = D(G(z_1),G(z_2))d(z1​,z2​)=D(G(z1​),G(z2​))﻿﻿
2.2 Enter HessianAccording to Palais 1957,
the Hessian matrix (second order partial derivative) of the squared distance
﻿d2d^2d2﻿ can be seen as the metric tensor of the image manifold.
Consider the d2d^2d2﻿ to be fixed z0z_0z0​﻿ as a function of z,fz0(z)=d2(z0,z).z,f_{z_0}(z) = d^2(z_0,z).z,fz0​​(z)=d2(z0​,z).﻿ Obviously z=z0z = z_0z=z0​﻿ is a local min of fz0(z),f_{z_0}(z),fz0​​(z),﻿ thus fz0(z)f_{z_0}(z)fz0​​(z)﻿ can be locally approximated by a positive semi-definite quadratic form H(z0)H(z_0)H(z0​)﻿.
This squared vector norm approximates the squared image distance, 
﻿d2(z0,z0+δz)≈∥δz∥H2d^2(z_0,z_0 + \delta_z) \approx \rVert \delta_z\lVert^2_Hd2(z0​,z0​+δz​)≈∥δz​∥H2​﻿﻿
﻿=δzTH(z0)δz= \delta_z^TH(z_0)\delta_z=δzT​H(z0​)δz​﻿.
In this paper, this Hessian matrix will be representing the Metric Tensor and thus our report will use them interchangeably.
3. Properties of the Hessian matrixWe will call αH(v)=vTHvvTv\alpha_H(v) = \frac{v^THv}{v^Tv}αH​(v)=vTvvTHv​﻿ as the speed of image change along vvv﻿ as measured by metric HHH﻿.
d2(z0,z)≈δzT∂2d2(z0,z)∂z2∣z0δz,H(z0):=∂2d2(z0,z)∂z2∣z0d^2(z_0,z) \approx \delta_z^T \frac{\partial^2d^2(z_0,z)}{\partial z^2} |_{z_0} \delta_z,H(z_0) := \frac{\partial^2 d^2(z_0,z)}{\partial z^2}|_{z_0}d2(z0​,z)≈δzT​∂z2∂2d2(z0​,z)​∣z0​​δz​,H(z0​):=∂z2∂2d2(z0​,z)​∣z0​​﻿
3.1 Numerical MethodThe Learned Perceptual Image Patch Similarity (LPIPS) is used as it is doubly differentiable to compute HHH﻿.
Learned Perceptual Image Patch Similarity
-------------------------------------------------------
We compute the Hessian by building a computational graph towards the gradient
﻿g(z)=∂zd2∣z=z0g(z) = \partial_z d^2|_{z=z_0}g(z)=∂z​d2∣z=z0​​﻿ and then computing the gradient towards each element in g(z)g(z)g(z)﻿. 
This method computes HHH﻿ column by column, therefore its time complexity is proportional 
to the latent-space dimensionality nnn﻿ and the backpropagation time through this graph.
The Hessian is a linear operator, which could be defined as long as one can compute the Hessian Vector Product (HVP).
Since the gradient to zzz﻿ commutes with inner product with vvv﻿, HVP can be rewritten as the gradient to  vTg,v^Tg,vTg,﻿ or the directional derivative to the gradient vT∂zgv^T\partial_zgvT∂z​g﻿.
The first form ∂z(vTg)\partial_z(v^Tg)∂z​(vTg)﻿ is easy to compute in reverse-mode autodiff, and the directional derivative is easy to compute forward-mode autodiff or finite difference. Then, Lanczos iterations are applied to the HVP operator defined in these two ways to solve the largest eigen pairs, which can reconstruct an approximate Hessian matrix.
HVP:v→HvHVP: v \rightarrow HvHVP:v→Hv﻿
=∂z(vTg(z))= \partial_z(v^Tg(z))=∂z​(vTg(z))﻿
=vT∂z(g(z))= v^T\partial_z(g(z))=vT∂z​(g(z))﻿
≈(g(z+σv)−g(z−σv))2∥ϵv∥\approx \frac{(g(z+\sigma v) - g(z - \sigma v))}{2\rVert\epsilon v\lVert}≈2∥ϵv∥(g(z+σv)−g(z−σv))​﻿
From the appendix,
HVPbackward:v→∂z(vTg(z))HVP_{backward}: v \rightarrow \partial_z(v^Tg(z))HVPbackward​:v→∂z​(vTg(z))﻿
HVPforward:v→(g(z+σv)−g(z−σv))2∥ϵv∥HVP_{forward} : v \rightarrow \frac{(g(z+\sigma v) - g(z - \sigma v))}{2\rVert\epsilon v\lVert}HVPforward​:v→2∥ϵv∥(g(z+σv)−g(z−σv))​﻿
3.2 Connection to Jacobian﻿ϕ(z):RN→RM,\phi(z): \mathbb{R}^N \rightarrow \mathbb{R}^M,ϕ(z):RN→RM,﻿ dϕ2(z1,z2)=12∥ϕ(z1)−ϕ(z2)∥22,d_\phi^2(z_1,z_2) = \frac{1}{2}\rVert\phi(z_1) - \phi(z_2)\lVert^2_2,dϕ2​(z1​,z2​)=21​∥ϕ(z1​)−ϕ(z2​)∥22​,﻿ and define a manifold  where RM\mathbb{R}^MRM﻿ is the feature map of a middle layer in the generator. The metric tensor HϕH_\phiHϕ​﻿ of this manifold can be derived as the Hessian of dϕ2d^2_\phidϕ2​﻿.
Note, there is a sample relationship between the Hessian of dϕ2d^2_\phidϕ2​﻿, HϕH_\phiHϕ​﻿ and the Jacobian of ϕ\phiϕ﻿, JϕJ_\phiJϕ​﻿.
Through this we know the eigen-spectrum of the Hessian matrix HϕH_\phiHϕ​﻿ is the square of the singular value spectrum of the Jacobian JϕJ_\phiJϕ​﻿, and the eigenvectors of HϕH_\phiHϕ​﻿ is the same as the right singular vectors of JϕJ_\phiJϕ​﻿.
﻿Hϕ(z0)=∂2∂z212∥ϕ(z0)−ϕ(z)∥22∣z0=Jϕ(z0)TJϕ(z0)H_\phi(z_0) = \frac{\partial^2}{\partial z^2} \frac{1}{2} \lVert \phi(z_0) - \phi(z)\rVert^2_2|_{z_0} = J_\phi(z_0)^TJ_\phi(z_0)Hϕ​(z0​)=∂z2∂2​21​∥ϕ(z0​)−ϕ(z)∥22​∣z0​​=Jϕ​(z0​)TJϕ​(z0​)﻿﻿
﻿vTHϕ(z0)v=∥Jϕ(z0)v∥2,Jϕ(z0)=∂zϕ(z)∣z0v^TH_\phi(z_0)v = \lVert J_\phi(z_0)v\rVert^2, J_\phi(z_0) = \partial_z \phi(z)|_{z_0}vTHϕ​(z0​)v=∥Jϕ​(z0​)v∥2,Jϕ​(z0​)=∂z​ϕ(z)∣z0​​﻿.
4. Top eigenvectors capture significant image changes
Steps Pick z0z_0z0​﻿ randomly.
Compute H(z0)H(z_0)H(z0​)﻿﻿
EigenDecomposition of H(z0)=∑iλiviviTH(z_0) = \sum_i \lambda_i v_i v_i^TH(z0​)=∑i​λi​vi​viT​﻿﻿
Explore along G(z0+μivi)G(z_0 + \mu_i v_i)G(z0​+μi​vi​)﻿﻿
ObservationWhen eigenvalues are bigger, image changes are higher -- both through visual inspection and LPIPS.
Eigenvectors at different ranks encode different types of changes.
Extracted from the paper. Images change at different rates along top vs bottom eigenvectors
5. GAN Latent Spaces are highly anisotropic
	FigB. Method inspired from Kornblith(2019 to gauge the global geometry of the latent space.
StepsRandomly sample 100-1000 z in the latent space and compute H(z)H(z)H(z)﻿ using backprop.
Perform EigenDecomposition.
ObservationA fraction of dimensions were responsible for large image changes as per the spectra graph's sudden dip. Below are some of these graphs on spectra versus dimensions for layers of a DCGAN generator.
﻿
Run set1
﻿
The authors measure speed of image change along vector vvv﻿ as αH(v)=vTHvvTv\alpha_H(v) = \frac{v^THv}{v^Tv}αH​(v)=vTvvTHv​﻿ which they use to further strengthen their argument about the anisotropic conditions of the latent space, by analytically showing (more in Appendix A.6) that the variance of the α(v)\alpha(v)α(v)﻿ is lesser than the variance among eigenvalues.
5.B Comments on the metric tensor's global geometry﻿H(z)H(z)H(z)﻿ or the metric tensor provides information on local geometry. To inspect the global consistency of the metric tensor, the authors employ the Pearson Correlation Coefficient as a workaround in the following way:
- At Position ziz_izi​﻿ we compute HiH_iHi​﻿ -- on eigendecomposition of which we obtain
	eigenvectors of the form Ui=[u1,...,un]U_i = [u_1,...,u_n]Ui​=[u1​,...,un​]﻿.
- We compute Λij\Lambda_{ij}Λij​﻿ which is the effect of metric tensor HjH_jHj​﻿ of position zjz_jzj​﻿ on
	eigenvectors UiU_iUi​﻿ such that Λij=uiTHjui\Lambda_{ij} = u_i^TH_ju_iΛij​=uiT​Hj​ui​﻿﻿
- Similarly Λj\Lambda_jΛj​﻿ stands for all terms pertaining to position zjz_jzj​﻿ and so on.
- Computing the Pearson Correlation Coefficient as corr(Λij,Λj)corr(\Lambda_{ij},\Lambda_j)corr(Λij​,Λj​)﻿ which physically
	means as the measurement of the consistency of the metric tensor.
As the spectrum usually spanned several orders of magnitude, the authors computed the correlation on the log scale.
﻿
﻿
Run: lilac-firefly-21
﻿
TakeawayThis shows that the local directions that induce image changes of different orders of magnitude are highly consistent at different points in the latent space. Because of this, the notion of a ”global” Hessian makes sense.
As latent space gets warped and mapped into image space, directions in latent spaces are scaled differently by the Jacobian of the map. Picture by Sayantan Das
﻿
ConclusionIn this work, we developed an efficient and architecture-agnostic way to compute the geometry of the manifold learnt by generative networks. This method discovers axes accounting for the largest variation in image transformation, which frequently represent semantically interpretable changes.
Subsequently, this geometric method can facilitate image manipulation, increase explainability, and accelerate optimization on the manifold (with or without gradients).
﻿
NoteI suggest readers looking to get a more in-depth of this paper to thoroughly check out Section 5 of the paper and the remainder of the Appendix.
Readers can check out the remainder of the wandb runs that are not included in this report here and feel free to drop in questions in the comment box below.
﻿
Add a comment
Tags: Advanced, Computer Vision, GenAI, Research, GAN, Github, Panels, Plots
Iterate on AI agents and models faster. Try Weights & Biases today.