Skip to main content

Face Generation using DCGANs

Generate realistic faces using Deep Convolutional GANs
Created on November 27|Last edited on December 17

Introduction to DCGANs

GANs are a framework for teaching a Deep Learning model to capture the training data’s distribution so we can generate new data from that same distribution.
A DCGAN (Deep Convolutional Generative Adversarial Network) is a direct extension of the GAN, except that it explicitly uses convolutional and convolutional-transpose layers in the discriminator and generator, respectively. It was first described by Radford et. al. in the paper Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks.
The discriminator is made up of strided convolution layers, batch norm layers, and LeakyReLU activations. The input is a 3x64x64 input image and the output is a scalar probability that the input is from the real data distribution.
The generator is comprised of convolutional-transpose layers, batch norm layers, and ReLU activations. The input is a latent vector, z, that is drawn from a standard normal distribution and the output is a 3x64x64 RGB image. The strided conv-transpose layers allow the latent vector to be transformed into a volume with the same shape as an image.
Reference - Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
The authors highlighted a couple of guidelines, in particular:
  1. Replacing any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
2. Using batchnorm in both the generator and the discriminator.
3. Removing fully connected hidden layers for deeper architectures.
4. Using ReLU activation in generator for all layers except for the output, which uses tanh.
5. Using LeakyReLU activation in the discriminator for all layer.



Code

Download the Dataset

We begin by downloading and extracting the dataset.
!wget https://www.dropbox.com/s/rbajpdlh7efkdo1/male_female_face_images.zip
!unzip -q male_female_face_images.zip



Import the Packages

!pip install -q --upgrade torch_snippets

from torch_snippets import *
import torchvision
from torchvision import transforms
import torchvision.utils as vutils
import cv2
import numpy as np
import pandas as pd
import glob
from tqdm import tqdm
import matplotlib.pyplot as plt

# Wandb Login
import wandb
wandb.login()



Setup the Configuration

# wandb config
WANDB_CONFIG = {
'_wandb_kernel': 'neuracort'
}

# Set device
device = "cuda" if torch.cuda.is_available() else "cpu"



Crop Images to Obtain Faces

From the downloaded dataset, we require only the faces of the people. So we crop the images using open cv and discard the additional details in the image. To detect the faces, we use the cascade filter by open cv.
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
Then we create a new folder and add the cropped images there.
!mkdir cropped_faces
images = Glob('/content/females/*.jpg')+Glob('/content/males/*.jpg')

for i in range(len(images)):
img = read(images[i],1)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Detect faces
faces = face_cascade.detectMultiScale(gray, 1.3, 5)

# Crop Faces
for (x,y,w,h) in faces:
img2 = img[y:(y+h),x:(x+w),:]

cv2.imwrite('cropped_faces/'+str(i)+'.jpg',cv2.cvtColor(img2, cv2.COLOR_RGB2BGR))
The important reason behind image cropping is also that we are keeping the faces only i.e., we are only retaining the information that we want to generate.



Apply Transformations

Apply the required transformations for the images.
transform=transforms.Compose([
transforms.Resize(64),
transforms.CenterCrop(64),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])



Define the Dataset Class and DataLoader

class Faces(Dataset):
def __init__(self, folder):
super().__init__()
self.folder = folder
self.images = sorted(Glob(folder))

def __len__(self):
return len(self.images)

def __getitem__(self, ix):
image_path = self.images[ix]
image = Image.open(image_path)
image = transform(image)
return image

# Create the dataset object ds
ds = Faces(folder='cropped_faces/')

# Define the dataloader class
dataloader = DataLoader(ds, batch_size=64, shuffle=True, num_workers=2)



Weight Initialization

We define the weight initialization so that the weights have a smaller spread:
def weights_init(m):
classname = m.__class__.__name__
if classname.find('Conv') != -1:
nn.init.normal_(m.weight.data, 0.0, 0.02)
elif classname.find('BatchNorm') != -1:
nn.init.normal_(m.weight.data, 1.0, 0.02)
nn.init.constant_(m.bias.data, 0)



Discriminator

Next, we define the Discriminator model class, which takes an image of a shape of
batch size x 3 x 64 x 64 and predicts whether it is real or fake.
class Discriminator(nn.Module):
def __init__(self):
super(Discriminator, self).__init__()
self.model = nn.Sequential(
nn.Conv2d(3,64,4,2,1,bias=False),
nn.LeakyReLU(0.2,inplace=True),
nn.Conv2d(64,64*2,4,2,1,bias=False),
nn.BatchNorm2d(64*2),
nn.LeakyReLU(0.2,inplace=True),
nn.Conv2d(64*2,64*4,4,2,1,bias=False),
nn.BatchNorm2d(64*4),
nn.LeakyReLU(0.2,inplace=True),
nn.Conv2d(64*4,64*8,4,2,1,bias=False),
nn.BatchNorm2d(64*8),
nn.LeakyReLU(0.2,inplace=True),
nn.Conv2d(64*8,1,4,1,0,bias=False),
nn.Sigmoid()
)
self.apply(weights_init)

def forward(self, input):
return self.model(input)

discriminator = Discriminator().to(device)



Generator

Now, we define the Generator Model class that generates fake images from an input of shape batch size x 100 x 1 x 1
class Generator(nn.Module):
def __init__(self):
super(Generator,self).__init__()
self.model = nn.Sequential(
nn.ConvTranspose2d(100,64*8,4,1,0,bias=False,),
nn.BatchNorm2d(64*8),
nn.ReLU(True),
nn.ConvTranspose2d(64*8,64*4,4,2,1,bias=False),
nn.BatchNorm2d(64*4),
nn.ReLU(True),
nn.ConvTranspose2d( 64*4,64*2,4,2,1,bias=False),
nn.BatchNorm2d(64*2),
nn.ReLU(True),
nn.ConvTranspose2d( 64*2,64,4,2,1,bias=False),
nn.BatchNorm2d(64),
nn.ReLU(True),
nn.ConvTranspose2d( 64,3,4,2,1,bias=False),
nn.Tanh()
)
self.apply(weights_init)

def forward(self,input):
return self.model(input)

generator = Generator().to(device)



Training Step and Objects

Now, that the Generator and Discriminator Models have been defined, we define functions for training them
def discriminator_train_step(real_data, fake_data):
d_optimizer.zero_grad()
prediction_real = discriminator(real_data)
error_real = loss(prediction_real.squeeze(), torch.ones(len(real_data)).to(device))
error_real.backward()
prediction_fake = discriminator(fake_data)
error_fake = loss(prediction_fake.squeeze(), torch.zeros(len(fake_data)).to(device))
error_fake.backward()
d_optimizer.step()
return error_real + error_fake

def generator_train_step(fake_data):
g_optimizer.zero_grad()
prediction = discriminator(fake_data)
error = loss(prediction.squeeze(), torch.ones(len(real_data)).to(device))
error.backward()
g_optimizer.step()
return error
Note that the .squeeze() operation is being performed on the prediction as the output of the model has a shape of batch size x 1 x 1 x 1 and it needs to be compared to a tensor that has a shape of batch size x 1
Further, we create the generator and discriminator model objects, the optimizers and the loss functions of the discriminator to be optimized.
discriminator = Discriminator().to(device)
generator = Generator().to(device)
loss = nn.BCELoss()
d_optimizer = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))
g_optimizer = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))



Training

We now train the model for 25 epochs. To store the training logs in wandb we first initialize it with a project name and the configuration.
Then we use wandb.log() to log the desired parameters.
# Initialize W&B
run = wandb.init(project='W&B_Generate_Faces_using_DCGAN',
config= WANDB_CONFIG)

# Model Training
for epoch in tqdm(range(25), total = 25):
print()
print("Epoch: ", epoch)
N = len(dataloader)
for i, images in enumerate(dataloader):
# Load real data and generate fake data by
# passing through the generator network
real_data = images.to(device)
fake_data = generator(torch.randn(len(real_data), 100, 1, 1).to(device)).to(device)
fake_data = fake_data.detach()

# Train the discriminator
d_loss = discriminator_train_step(real_data, fake_data)
# Generate a new set of images from the noisy data and train
# the generato
fake_data = generator(torch.randn(len(real_data), 100, 1, 1).to(device)).to(device)
g_loss = generator_train_step(fake_data)
# Log the losses to wandb
wandb.log(
{
'd_loss':d_loss.item(),
'g_loss':g_loss.item()
}
)

Run set
1

You can see how the generator and discriminator loss reduce over time.

Inference

We now used the trained model to generate a sample of images and save it as well.
generator.eval()
noise = torch.randn(64, 100, 1, 1, device=device)
sample_images = generator(noise).detach().cpu()
grid = vutils.make_grid(sample_images, nrow=8, normalize=True)
img = grid.cpu().detach().permute(1,2,0)

# If you want to plot the image
plt.figure(figsize=(10,10))
plt.axis("off")
plt.imshow(img)
plt.savefig("dcgan_predictions.jpg")
Instead of plotting images, you san store them using wandb tables.
table = wandb.Table(columns=['Image'], allow_mixed_types = True)

table.add_data(
wandb.Image("/content/dcgan_predictions.jpg"),
)

wandb.log({"Generated Images by DCGAN" : table})

def save_table(table_name):
table = wandb.Table(columns=['Image'], allow_mixed_types = True)

table.add_data(
id,
wandb.Image(img),
wandb.Image(mask),
wandb.Image(cv2.cvtColor(cv2.imread("./image.jpg"), cv2.COLOR_BGR2RGB))
)

wandb.log({table_name : table})
save_table("Images and Masks Record")

Run set
1

Thus, we learnt how to generate human faces from noise using DCGANs. But there is a caveat to this. Although we have learnt about generating images of a face, we cannot specify the generation of an image that is of interest to us. In the following blog we will learn about Conditional GANs which will help us achieve this.



Colab Notebook

You can try it out yourself using this colab notebook!



References