Original Paper : https://arxiv.org/abs/1812.04948
Github : https://github.com/NVlabs/stylegan
If you look at the picture on the right side , the direction in which the face is looking is same for all columns. The features like color, beard, hair are changed in columns.
Style GANs have a different generator architecture than ProGANs.
Unsupervised separation of high level attributes
It borrows the concept of upsampling from Progressive GANs (ProGANs) , which starts from a low resolution image and upsample it in each layer and yield a high resolution image.
Major components of Style GAN are progressive growing, noise mapping network and Adaptive Instance Normalization(AIN).
Bi-linear upsampling - when you upsample an image let’s say from 44 to 6464 , you will get many pixel values with zeros , you need to fill in those pixel values. There are many methods for doing it, such as nearest neighbour upsampling which will copy the pixel values to nearest neighbours. Bilinear upsampling will linearly interpolate to the nearest neighbours.
This is the architecture of traditional ProGANs, where the latent vector z, is directly fed into the network .
But in style GANs , a mapping network comes into the picture where the latent vector z, is passed through 8 fully connected layers to generate a vector which is then passed into the generator network.
input noise is added to AdaIN at training time to introduce stochasticity in the normalization process
Dimensions :
initial input is 44512
Noise:
A gaussian noise is added o AdaIN operations
Perceptual path length : In this metric we measure the weighted difference between the VGG embedding of two consecutive images when interpolating between two random inputs.
• Linear separability: In this method we look at how well the latent-space points can be separated into two distinct sets via a linear hyperplane, so that each set corresponds to a specific binary attribute of the image. For Example Each of the face images belong to either male or female.
The authors of this paper applied these metrics to both w(intermediate mapping) and z(latent space) and concludes that w are more separable. This also underscores the importance of 8-layer mapping network.