Face Generation using ConditionalGANs

Generate realistic human faces using Conditional GANs
Created on December 17|Last edited on December 17
Comment
﻿
Introduction to Conditional GANsIn the previous report on Face Generation using DCGANs ﻿﻿learned how to generate human faces from noise using DCGANs. But a problem with that approach is that we cannot specify the particular class of image that we want. This can be achieved using a ConditionalGAN.
The concept of CGANs was introduced in the has introduced Conditional Generative Adversarial Nets. Before this paper, Generative Adversarial Nets were recently introduced as a novel way to train generative models. The authors extended the concept and introduced the conditional version of generative adversarial nets, which can be constructed by simply feeding the data, y, we wish to condition on to both the generator and discriminator. This allows the model to learn a multi-modal model. As a result, the ideal model can learn a multi-modal mapping from inputs to outputs by being fed with different contextual information.
The figure below illustrates a simple conditional adversarial net.
﻿
Reference: Conditional Generative Adversarial Nets
Generative adversarial nets can be extended to a conditional model if both the generator and discriminator are conditioned on some extra information y. y could be any kind of auxiliary information, such as class labels or data from other modalities. We can perform the conditioning by feeding y into the both the discriminator and generator as additional input layer.
In the generator the prior input noise pz(z)p_z(z)pz​(z)﻿, and y are combined in joint hidden representation, and the adversarial training framework allows for considerable flexibility in how this hidden representation is composed.
In the discriminator x and y are presented as inputs and to a discriminative function (embodied again by a MLP in this case).
The objective function of a two-player minimax game would be
﻿
﻿
Code
Download the Dataset!wget https://www.dropbox.com/s/rbajpdlh7efkdo1/male_female_face_images.zip
!unzip -q male_female_face_images.zip
﻿
Import the Packages!pip install -q --upgrade wandb
!pip install -q --upgrade torch_snippets
﻿
from torch_snippets import *
import torch
from torchvision.utils import make_grid
from torch_snippets import *
from PIL import Image
import torchvision
from torchvision import transforms
import torchvision.utils as vutils
from tqdm import tqdm
﻿
# Wandb Login
import wandb
wandb.login()
﻿
Setup the Configuration# Set the device
device = "cuda" if torch.cuda.is_available() else "cpu"
﻿
# wandb config
WANDB_CONFIG = { 
              '_wandb_kernel': 'neuracort'
    }
﻿
Crop Images to Obtain FacesFirst, we store the male and female image paths.
female_images = Glob('/content/females/*.jpg')
male_images = Glob('/content/males/*.jpg')
The we crop the images to retain only the faces and discard additional details in an image. We do this using the cascade filter by OpenCV.
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
We create two new folders, one corresponding to male and another for female images and dump all the cropped face images into the respective folders:
!mkdir cropped_faces_female
!mkdir cropped_faces_male
﻿
for i in range(len(female_images)):
    img = read(female_images[i],1)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray, 1.3, 5)
    for (x,y,w,h) in faces:
        img2 = img[y:(y+h),x:(x+w),:]
    cv2.imwrite('cropped_faces_female/'+str(i)+'.jpg',cv2.cvtColor(img2, cv2.COLOR_RGB2BGR))
﻿
for i in range(len(male_images)):
    img = read(male_images[i],1)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray, 1.3, 5)
    for (x,y,w,h) in faces:
        img2 = img[y:(y+h),x:(x+w),:]
    cv2.imwrite('cropped_faces_male/'+str(i)+'.jpg',cv2.cvtColor(img2, cv2.COLOR_RGB2BGR))
﻿
Apply TransformationsSpecify the transformations on each image
transform=transforms.Compose([
                               transforms.Resize(64),
                               transforms.CenterCrop(64),
                               transforms.ToTensor(),
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
                           ])
﻿
Define the Dataset Class and DataloaderWe create the Faces dataset class that returns the image and the corresponding gender of the person in it.
class Faces(Dataset):
    def __init__(self, folders):
        super().__init__()
        self.folderfemale = folders[0]
        self.foldermale = folders[1]
        self.images=sorted(Glob(self.folderfemale))+sorted(Glob(self.foldermale))
    
    def __len__(self):
        return len(self.images)
    
    def __getitem__(self, ix):
        image_path = self.images[ix]
        image = Image.open(image_path)
        image = transform(image)
        gender = np.where('female' in str(image_path),1,0)
        return image, torch.tensor(gender).long()
Define the ds dataset and dataloader.
ds = Faces(folders=['cropped_faces_female','cropped_faces_male'])
dataloader = DataLoader(ds, batch_size=64, shuffle=True, num_workers=8)
﻿
Weight InitializationWe define the weight initialization so that we do not have a widespread variation across randomly initialized weight values
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)
﻿
Discriminator Modelclass Discriminator(nn.Module):
    def __init__(self, emb_size=32):
        super(Discriminator, self).__init__()
        self.emb_size = 32
        self.label_embeddings = nn.Embedding(2, self.emb_size)
        self.model = nn.Sequential(
            nn.Conv2d(3,64,4,2,1,bias=False),
            nn.LeakyReLU(0.2,inplace=True),
            nn.Conv2d(64,64*2,4,2,1,bias=False),
            nn.BatchNorm2d(64*2),
            nn.LeakyReLU(0.2,inplace=True),
            nn.Conv2d(64*2,64*4,4,2,1,bias=False),
            nn.BatchNorm2d(64*4),
            nn.LeakyReLU(0.2,inplace=True),
            nn.Conv2d(64*4,64*8,4,2,1,bias=False),
            nn.BatchNorm2d(64*8),
            nn.LeakyReLU(0.2,inplace=True),
            nn.Conv2d(64*8,64,4,2,1,bias=False),
            nn.BatchNorm2d(64),
            nn.LeakyReLU(0.2,inplace=True),
            nn.Flatten()
        )
        self.model2 = nn.Sequential(
            nn.Linear(288,100),
            nn.LeakyReLU(0.2,inplace=True),
            nn.Linear(100,1),
            nn.Sigmoid()
        )
        self.apply(weights_init)
﻿
    def forward(self, input, labels):
        x = self.model(input)
        y = self.label_embeddings(labels)
        input = torch.cat([x, y], 1)
        final_output = self.model2(input)
        return final_output
﻿
discriminator = Discriminator().to(device)
One important thing to note in this model class is that we have an additional parameter emb_size present in Conditional GANs and not in DCGANs.
emb_size represents the number of embeddings into which we convert the input class label which is stored as label_embeddings.
The reason we convert the input class label from a one-hot encoded version to embeddings of a higher dimension is that a model has a higher degree of freedom to learn and adjust to deal with different classes.
While the model class, to a large extent remains the same as we saw in the last blog on DCGANs, we are initializing another model model2 that does the classification exercise. 
In the forward method defined, we are fetching the output of the first model (self.model(input) and the output of passing labels through label_embeddings and then concatenating the outputs. Next, we are passing the concatenated outputs through the second model self.model2 we have defined earlier that fetches us the discriminator output.
Another thing to note is that the self.model2 takes an input of 288 values as the output of self.model has 256 values per data point, which is then concatenated with the 32 embedding values of the input class label resulting in 256 + 32 = 288 input values to self.model2 
💡
﻿
Generator Modelclass Generator(nn.Module):
    def __init__(self, emb_size=32):
        super(Generator,self).__init__()
        self.emb_size = emb_size
        self.label_embeddings = nn.Embedding(2, self.emb_size)
        self.model = nn.Sequential(
            nn.ConvTranspose2d(100+self.emb_size,64*8,4,1,0,bias=False),
            nn.BatchNorm2d(64*8),
            nn.ReLU(True),
            nn.ConvTranspose2d(64*8,64*4,4,2,1,bias=False),
            nn.BatchNorm2d(64*4),
            nn.ReLU(True),
            nn.ConvTranspose2d(64*4,64*2,4,2,1,bias=False),
            nn.BatchNorm2d(64*2),
            nn.ReLU(True),
            nn.ConvTranspose2d(64*2,64,4,2,1,bias=False),
            nn.BatchNorm2d(64),
            nn.ReLU(True),
            nn.ConvTranspose2d(64,3,4,2,1,bias=False),
            nn.Tanh()
        )
        self.apply(weights_init)
﻿
    def forward(self,input_noise,labels):
        label_embeddings = self.label_embeddings(labels).view(len(labels), self.emb_size, 1, 1)
        input = torch.cat([input_noise, label_embeddings], 1)
        return self.model(input)
﻿
generator = Generator().to(device)
Note that we are using nn.Embedding to convert the 2D input (which is of classes) to a 32-dimensional vector self.emb_size .
nn.ConvTranspose2d is used to upscale towards fetching an image as output.
The forward method takes the noise values input_noise and input label labels as input and generates the output of the image
﻿
Training Step and ObjectsDefine a function noise to generate random noise with 100 values and register it to the device
def noise(size):
    n = torch.randn(size, 100, 1, 1, device=device)
    return n.to(device)
Next, we define the training function for the discriminator. 
def discriminator_train_step(real_data, real_labels, fake_data, fake_labels):
    d_optimizer.zero_grad()
    prediction_real = discriminator(real_data, real_labels)
    error_real = loss(prediction_real, torch.ones(len(real_data), 1).to(device))
    error_real.backward()
    prediction_fake = discriminator(fake_data, fake_labels)
    error_fake = loss(prediction_fake, torch.zeros(len(fake_data), 1).to(device))
    error_fake.backward()
    d_optimizer.step()    
    return error_real + error_fake
Then we define the training function for the generator.
def generator_train_step(fake_data, fake_labels):
    g_optimizer.zero_grad()
    prediction = discriminator(fake_data, fake_labels)
    error = loss(prediction, torch.ones(len(fake_data), 1).to(device))
    error.backward()
    g_optimizer.step()
    return error
Additionally we define the generator and discriminator model objects, the loss optimizers and the loss function.
discriminator = Discriminator().to(device)
generator = Generator().to(device)
﻿
loss = nn.BCELoss()
﻿
d_optimizer = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))
g_optimizer = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
﻿
# fixed_noise will be used to generate images from random noise
fixed_noise = torch.randn(64, 100, 1, 1, device=device)
﻿
# Half labels correspond to class 0 and remaining to class 1
fixed_fake_labels = torch.LongTensor([0]*(len(fixed_noise)//2) + [1]*(len(fixed_noise)//2)).to(device)
﻿
n_epochs = 25
img_list = []
﻿
Training and InferenceWe will log the discriminator and generator loss using wandb.
# Initialize W&B
run = wandb.init(project='W&B_Generate_Faces_using_ConditionalGAN', 
      config= WANDB_CONFIG)
﻿
# Train the model for 25 epochs
for epoch in tqdm(range(n_epochs), total = n_epochs):
    N = len(dataloader)
﻿
    for bx, (images, labels) in enumerate(dataloader):
	# Obtain the data
        real_data, real_labels = images.to(device), labels.to(device)
        fake_labels = torch.LongTensor(np.random.randint(0, 2, len(real_data))).to(device)
	fake_data = generator(noise(len(real_data)), fake_labels)
        fake_data = fake_data.detach()
﻿
	# Train discriminator
        d_loss = discriminator_train_step(real_data, real_labels, fake_data, fake_labels)
        fake_labels = torch.LongTensor(np.random.randint(0, 2, len(real_data))).to(device)
        
	# Train generator
	fake_data = generator(noise(len(real_data)), fake_labels).to(device)
        g_loss = generator_train_step(fake_data, fake_labels)
	
	# Log to wandb
        wandb.log(
            {
                'd_loss':d_loss.detach(),
                'g_loss':g_loss.detach()
            }
        )
	
    # Inference
    with torch.no_grad():
        fake = generator(fixed_noise, fixed_fake_labels).detach().cpu()
        imgs = vutils.make_grid(fake, padding=2, normalize=True).permute(1,2,0)
        img_list.append(imgs)
        show(imgs, sz=10)
﻿
Run set1
﻿
After training is complete, we will store the final grid image result in wandb table as defined below.
table_generated = wandb.Table(columns=['Image'], allow_mixed_types = True)
﻿
table_generated.add_data(
            wandb.Image("/content/conditional_gan_predictions.jpg"), 
        )
﻿
wandb.log({"Generated Images by Conditional GAN" : table_generated})
﻿
Run set1
﻿
In this image, we can see that the first 32 images correspond to male images and the next 32 images correspond to female images which substantiates the fact that the conditional GANs perform as expected!
﻿
Colab NotebookYou can try it out yourself using this colab notebook!
﻿
References﻿https://www.amazon.in/Modern-Computer-Vision-PyTorch-applications/dp/1839213477﻿
﻿https://paperswithcode.com/task/conditional-image-generation﻿
﻿https://towardsdatascience.com/cgan-conditional-generative-adversarial-network-how-to-gain-control-over-gan-outputs-b30620bd0cc8#:~:text=Conditional%20GAN%20(cGAN)%20allows%20us,learn%20the%20difference%20between%20them.﻿
﻿https://machinelearningmastery.com/how-to-develop-a-conditional-generative-adversarial-network-from-scratch/﻿
﻿
﻿
Add a comment