Face Generation using DCGANs
Generate realistic faces using Deep Convolutional GANs
Created on November 27|Last edited on December 17
Comment
Introduction to DCGANs
GANs are a framework for teaching a Deep Learning model to capture the training data’s distribution so we can generate new data from that same distribution.
A DCGAN (Deep Convolutional Generative Adversarial Network) is a direct extension of the GAN, except that it explicitly uses convolutional and convolutional-transpose layers in the discriminator and generator, respectively. It was first described by Radford et. al. in the paper Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks.
The discriminator is made up of strided convolution layers, batch norm layers, and LeakyReLU activations. The input is a 3x64x64 input image and the output is a scalar probability that the input is from the real data distribution.
The generator is comprised of convolutional-transpose layers, batch norm layers, and ReLU activations. The input is a latent vector, z, that is drawn from a standard normal distribution and the output is a 3x64x64 RGB image. The strided conv-transpose layers allow the latent vector to be transformed into a volume with the same shape as an image.

Reference - Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
The authors highlighted a couple of guidelines, in particular:
- Replacing any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
2. Using batchnorm in both the generator and the discriminator.
3. Removing fully connected hidden layers for deeper architectures.
4. Using ReLU activation in generator for all layers except for the output, which uses tanh.
5. Using LeakyReLU activation in the discriminator for all layer.
Code
Download the Dataset
We begin by downloading and extracting the dataset.
!wget https://www.dropbox.com/s/rbajpdlh7efkdo1/male_female_face_images.zip!unzip -q male_female_face_images.zip
Import the Packages
!pip install -q --upgrade torch_snippetsfrom torch_snippets import *import torchvisionfrom torchvision import transformsimport torchvision.utils as vutilsimport cv2import numpy as npimport pandas as pdimport globfrom tqdm import tqdmimport matplotlib.pyplot as plt# Wandb Loginimport wandbwandb.login()
Setup the Configuration
# wandb configWANDB_CONFIG = {'_wandb_kernel': 'neuracort'}# Set devicedevice = "cuda" if torch.cuda.is_available() else "cpu"
Crop Images to Obtain Faces
From the downloaded dataset, we require only the faces of the people. So we crop the images using open cv and discard the additional details in the image. To detect the faces, we use the cascade filter by open cv.
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
Then we create a new folder and add the cropped images there.
!mkdir cropped_facesimages = Glob('/content/females/*.jpg')+Glob('/content/males/*.jpg')for i in range(len(images)):img = read(images[i],1)gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)# Detect facesfaces = face_cascade.detectMultiScale(gray, 1.3, 5)# Crop Facesfor (x,y,w,h) in faces:img2 = img[y:(y+h),x:(x+w),:]cv2.imwrite('cropped_faces/'+str(i)+'.jpg',cv2.cvtColor(img2, cv2.COLOR_RGB2BGR))
The important reason behind image cropping is also that we are keeping the faces only i.e., we are only retaining the information that we want to generate.
Apply Transformations
Apply the required transformations for the images.
transform=transforms.Compose([transforms.Resize(64),transforms.CenterCrop(64),transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
Define the Dataset Class and DataLoader
class Faces(Dataset):def __init__(self, folder):super().__init__()self.folder = folderself.images = sorted(Glob(folder))def __len__(self):return len(self.images)def __getitem__(self, ix):image_path = self.images[ix]image = Image.open(image_path)image = transform(image)return image# Create the dataset object dsds = Faces(folder='cropped_faces/')# Define the dataloader classdataloader = DataLoader(ds, batch_size=64, shuffle=True, num_workers=2)
Weight Initialization
We define the weight initialization so that the weights have a smaller spread:
def weights_init(m):classname = m.__class__.__name__if classname.find('Conv') != -1:nn.init.normal_(m.weight.data, 0.0, 0.02)elif classname.find('BatchNorm') != -1:nn.init.normal_(m.weight.data, 1.0, 0.02)nn.init.constant_(m.bias.data, 0)
Discriminator
Next, we define the Discriminator model class, which takes an image of a shape of
batch size x 3 x 64 x 64 and predicts whether it is real or fake.
class Discriminator(nn.Module):def __init__(self):super(Discriminator, self).__init__()self.model = nn.Sequential(nn.Conv2d(3,64,4,2,1,bias=False),nn.LeakyReLU(0.2,inplace=True),nn.Conv2d(64,64*2,4,2,1,bias=False),nn.BatchNorm2d(64*2),nn.LeakyReLU(0.2,inplace=True),nn.Conv2d(64*2,64*4,4,2,1,bias=False),nn.BatchNorm2d(64*4),nn.LeakyReLU(0.2,inplace=True),nn.Conv2d(64*4,64*8,4,2,1,bias=False),nn.BatchNorm2d(64*8),nn.LeakyReLU(0.2,inplace=True),nn.Conv2d(64*8,1,4,1,0,bias=False),nn.Sigmoid())self.apply(weights_init)def forward(self, input):return self.model(input)discriminator = Discriminator().to(device)
Generator
Now, we define the Generator Model class that generates fake images from an input of shape batch size x 100 x 1 x 1
class Generator(nn.Module):def __init__(self):super(Generator,self).__init__()self.model = nn.Sequential(nn.ConvTranspose2d(100,64*8,4,1,0,bias=False,),nn.BatchNorm2d(64*8),nn.ReLU(True),nn.ConvTranspose2d(64*8,64*4,4,2,1,bias=False),nn.BatchNorm2d(64*4),nn.ReLU(True),nn.ConvTranspose2d( 64*4,64*2,4,2,1,bias=False),nn.BatchNorm2d(64*2),nn.ReLU(True),nn.ConvTranspose2d( 64*2,64,4,2,1,bias=False),nn.BatchNorm2d(64),nn.ReLU(True),nn.ConvTranspose2d( 64,3,4,2,1,bias=False),nn.Tanh())self.apply(weights_init)def forward(self,input):return self.model(input)generator = Generator().to(device)
Training Step and Objects
Now, that the Generator and Discriminator Models have been defined, we define functions for training them
def discriminator_train_step(real_data, fake_data):d_optimizer.zero_grad()prediction_real = discriminator(real_data)error_real = loss(prediction_real.squeeze(), torch.ones(len(real_data)).to(device))error_real.backward()prediction_fake = discriminator(fake_data)error_fake = loss(prediction_fake.squeeze(), torch.zeros(len(fake_data)).to(device))error_fake.backward()d_optimizer.step()return error_real + error_fakedef generator_train_step(fake_data):g_optimizer.zero_grad()prediction = discriminator(fake_data)error = loss(prediction.squeeze(), torch.ones(len(real_data)).to(device))error.backward()g_optimizer.step()return error
Note that the .squeeze() operation is being performed on the prediction as the output of the model has a shape of batch size x 1 x 1 x 1 and it needs to be compared to a tensor that has a shape of batch size x 1
Further, we create the generator and discriminator model objects, the optimizers and the loss functions of the discriminator to be optimized.
discriminator = Discriminator().to(device)generator = Generator().to(device)loss = nn.BCELoss()d_optimizer = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))g_optimizer = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
Training
We now train the model for 25 epochs. To store the training logs in wandb we first initialize it with a project name and the configuration.
Then we use wandb.log() to log the desired parameters.
# Initialize W&Brun = wandb.init(project='W&B_Generate_Faces_using_DCGAN',config= WANDB_CONFIG)# Model Trainingfor epoch in tqdm(range(25), total = 25):print()print("Epoch: ", epoch)N = len(dataloader)for i, images in enumerate(dataloader):# Load real data and generate fake data by# passing through the generator networkreal_data = images.to(device)fake_data = generator(torch.randn(len(real_data), 100, 1, 1).to(device)).to(device)fake_data = fake_data.detach()# Train the discriminatord_loss = discriminator_train_step(real_data, fake_data)# Generate a new set of images from the noisy data and train# the generatofake_data = generator(torch.randn(len(real_data), 100, 1, 1).to(device)).to(device)g_loss = generator_train_step(fake_data)# Log the losses to wandbwandb.log({'d_loss':d_loss.item(),'g_loss':g_loss.item()})
You can see how the generator and discriminator loss reduce over time.
Inference
We now used the trained model to generate a sample of images and save it as well.
generator.eval()noise = torch.randn(64, 100, 1, 1, device=device)sample_images = generator(noise).detach().cpu()grid = vutils.make_grid(sample_images, nrow=8, normalize=True)img = grid.cpu().detach().permute(1,2,0)# If you want to plot the imageplt.figure(figsize=(10,10))plt.axis("off")plt.imshow(img)plt.savefig("dcgan_predictions.jpg")
Instead of plotting images, you san store them using wandb tables.
table = wandb.Table(columns=['Image'], allow_mixed_types = True)table.add_data(wandb.Image("/content/dcgan_predictions.jpg"),)wandb.log({"Generated Images by DCGAN" : table})def save_table(table_name):table = wandb.Table(columns=['Image'], allow_mixed_types = True)table.add_data(id,wandb.Image(img),wandb.Image(mask),wandb.Image(cv2.cvtColor(cv2.imread("./image.jpg"), cv2.COLOR_BGR2RGB)))wandb.log({table_name : table})save_table("Images and Masks Record")
Thus, we learnt how to generate human faces from noise using DCGANs. But there is a caveat to this. Although we have learnt about generating images of a face, we cannot specify the generation of an image that is of interest to us. In the following blog we will learn about Conditional GANs which will help us achieve this.
Colab Notebook
References
Add a comment