Face Generation using ConditionalGANs
Generate realistic human faces using Conditional GANs
Created on December 17|Last edited on December 17
Comment
Introduction to Conditional GANs
In the previous report on Face Generation using DCGANs learned how to generate human faces from noise using DCGANs. But a problem with that approach is that we cannot specify the particular class of image that we want. This can be achieved using a ConditionalGAN.
The concept of CGANs was introduced in the has introduced Conditional Generative Adversarial Nets. Before this paper, Generative Adversarial Nets were recently introduced as a novel way to train generative models. The authors extended the concept and introduced the conditional version of generative adversarial nets, which can be constructed by simply feeding the data, y, we wish to condition on to both the generator and discriminator. This allows the model to learn a multi-modal model. As a result, the ideal model can learn a multi-modal mapping from inputs to outputs by being fed with different contextual information.
The figure below illustrates a simple conditional adversarial net.

Reference: Conditional Generative Adversarial Nets
Generative adversarial nets can be extended to a conditional model if both the generator and discriminator are conditioned on some extra information y. y could be any kind of auxiliary information, such as class labels or data from other modalities. We can perform the conditioning by feeding y into the both the discriminator and generator as additional input layer.
In the generator the prior input noise , and y are combined in joint hidden representation, and the adversarial training framework allows for considerable flexibility in how this hidden representation is composed.
In the discriminator x and y are presented as inputs and to a discriminative function (embodied again by a MLP in this case).
The objective function of a two-player minimax game would be

Code
Download the Dataset
!wget https://www.dropbox.com/s/rbajpdlh7efkdo1/male_female_face_images.zip!unzip -q male_female_face_images.zip
Import the Packages
!pip install -q --upgrade wandb!pip install -q --upgrade torch_snippetsfrom torch_snippets import *import torchfrom torchvision.utils import make_gridfrom torch_snippets import *from PIL import Imageimport torchvisionfrom torchvision import transformsimport torchvision.utils as vutilsfrom tqdm import tqdm# Wandb Loginimport wandbwandb.login()
Setup the Configuration
# Set the devicedevice = "cuda" if torch.cuda.is_available() else "cpu"# wandb configWANDB_CONFIG = {'_wandb_kernel': 'neuracort'}
Crop Images to Obtain Faces
First, we store the male and female image paths.
female_images = Glob('/content/females/*.jpg')male_images = Glob('/content/males/*.jpg')
The we crop the images to retain only the faces and discard additional details in an image. We do this using the cascade filter by OpenCV.
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')
We create two new folders, one corresponding to male and another for female images and dump all the cropped face images into the respective folders:
!mkdir cropped_faces_female!mkdir cropped_faces_malefor i in range(len(female_images)):img = read(female_images[i],1)gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)faces = face_cascade.detectMultiScale(gray, 1.3, 5)for (x,y,w,h) in faces:img2 = img[y:(y+h),x:(x+w),:]cv2.imwrite('cropped_faces_female/'+str(i)+'.jpg',cv2.cvtColor(img2, cv2.COLOR_RGB2BGR))for i in range(len(male_images)):img = read(male_images[i],1)gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)faces = face_cascade.detectMultiScale(gray, 1.3, 5)for (x,y,w,h) in faces:img2 = img[y:(y+h),x:(x+w),:]cv2.imwrite('cropped_faces_male/'+str(i)+'.jpg',cv2.cvtColor(img2, cv2.COLOR_RGB2BGR))
Apply Transformations
Specify the transformations on each image
transform=transforms.Compose([transforms.Resize(64),transforms.CenterCrop(64),transforms.ToTensor(),transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),])
Define the Dataset Class and Dataloader
We create the Faces dataset class that returns the image and the corresponding gender of the person in it.
class Faces(Dataset):def __init__(self, folders):super().__init__()self.folderfemale = folders[0]self.foldermale = folders[1]self.images=sorted(Glob(self.folderfemale))+sorted(Glob(self.foldermale))def __len__(self):return len(self.images)def __getitem__(self, ix):image_path = self.images[ix]image = Image.open(image_path)image = transform(image)gender = np.where('female' in str(image_path),1,0)return image, torch.tensor(gender).long()
Define the ds dataset and dataloader.
ds = Faces(folders=['cropped_faces_female','cropped_faces_male'])dataloader = DataLoader(ds, batch_size=64, shuffle=True, num_workers=8)
Weight Initialization
We define the weight initialization so that we do not have a widespread variation across randomly initialized weight values
def weights_init(m):classname = m.__class__.__name__if classname.find('Conv') != -1:nn.init.normal_(m.weight.data, 0.0, 0.02)elif classname.find('BatchNorm') != -1:nn.init.normal_(m.weight.data, 1.0, 0.02)nn.init.constant_(m.bias.data, 0)
Discriminator Model
class Discriminator(nn.Module):def __init__(self, emb_size=32):super(Discriminator, self).__init__()self.emb_size = 32self.label_embeddings = nn.Embedding(2, self.emb_size)self.model = nn.Sequential(nn.Conv2d(3,64,4,2,1,bias=False),nn.LeakyReLU(0.2,inplace=True),nn.Conv2d(64,64*2,4,2,1,bias=False),nn.BatchNorm2d(64*2),nn.LeakyReLU(0.2,inplace=True),nn.Conv2d(64*2,64*4,4,2,1,bias=False),nn.BatchNorm2d(64*4),nn.LeakyReLU(0.2,inplace=True),nn.Conv2d(64*4,64*8,4,2,1,bias=False),nn.BatchNorm2d(64*8),nn.LeakyReLU(0.2,inplace=True),nn.Conv2d(64*8,64,4,2,1,bias=False),nn.BatchNorm2d(64),nn.LeakyReLU(0.2,inplace=True),nn.Flatten())self.model2 = nn.Sequential(nn.Linear(288,100),nn.LeakyReLU(0.2,inplace=True),nn.Linear(100,1),nn.Sigmoid())self.apply(weights_init)def forward(self, input, labels):x = self.model(input)y = self.label_embeddings(labels)input = torch.cat([x, y], 1)final_output = self.model2(input)return final_outputdiscriminator = Discriminator().to(device)
One important thing to note in this model class is that we have an additional parameter emb_size present in Conditional GANs and not in DCGANs.
emb_size represents the number of embeddings into which we convert the input class label which is stored as label_embeddings.
The reason we convert the input class label from a one-hot encoded version to embeddings of a higher dimension is that a model has a higher degree of freedom to learn and adjust to deal with different classes.
While the model class, to a large extent remains the same as we saw in the last blog on DCGANs, we are initializing another model model2 that does the classification exercise.
In the forward method defined, we are fetching the output of the first model (self.model(input) and the output of passing labels through label_embeddings and then concatenating the outputs. Next, we are passing the concatenated outputs through the second model self.model2 we have defined earlier that fetches us the discriminator output.
Another thing to note is that the self.model2 takes an input of 288 values as the output of self.model has 256 values per data point, which is then concatenated with the 32 embedding values of the input class label resulting in 256 + 32 = 288 input values to self.model2
💡
Generator Model
class Generator(nn.Module):def __init__(self, emb_size=32):super(Generator,self).__init__()self.emb_size = emb_sizeself.label_embeddings = nn.Embedding(2, self.emb_size)self.model = nn.Sequential(nn.ConvTranspose2d(100+self.emb_size,64*8,4,1,0,bias=False),nn.BatchNorm2d(64*8),nn.ReLU(True),nn.ConvTranspose2d(64*8,64*4,4,2,1,bias=False),nn.BatchNorm2d(64*4),nn.ReLU(True),nn.ConvTranspose2d(64*4,64*2,4,2,1,bias=False),nn.BatchNorm2d(64*2),nn.ReLU(True),nn.ConvTranspose2d(64*2,64,4,2,1,bias=False),nn.BatchNorm2d(64),nn.ReLU(True),nn.ConvTranspose2d(64,3,4,2,1,bias=False),nn.Tanh())self.apply(weights_init)def forward(self,input_noise,labels):label_embeddings = self.label_embeddings(labels).view(len(labels), self.emb_size, 1, 1)input = torch.cat([input_noise, label_embeddings], 1)return self.model(input)generator = Generator().to(device)
Note that we are using nn.Embedding to convert the 2D input (which is of classes) to a 32-dimensional vector self.emb_size .
nn.ConvTranspose2d is used to upscale towards fetching an image as output.
The forward method takes the noise values input_noise and input label labels as input and generates the output of the image
Training Step and Objects
Define a function noise to generate random noise with 100 values and register it to the device
def noise(size):n = torch.randn(size, 100, 1, 1, device=device)return n.to(device)
Next, we define the training function for the discriminator.
def discriminator_train_step(real_data, real_labels, fake_data, fake_labels):d_optimizer.zero_grad()prediction_real = discriminator(real_data, real_labels)error_real = loss(prediction_real, torch.ones(len(real_data), 1).to(device))error_real.backward()prediction_fake = discriminator(fake_data, fake_labels)error_fake = loss(prediction_fake, torch.zeros(len(fake_data), 1).to(device))error_fake.backward()d_optimizer.step()return error_real + error_fake
Then we define the training function for the generator.
def generator_train_step(fake_data, fake_labels):g_optimizer.zero_grad()prediction = discriminator(fake_data, fake_labels)error = loss(prediction, torch.ones(len(fake_data), 1).to(device))error.backward()g_optimizer.step()return error
Additionally we define the generator and discriminator model objects, the loss optimizers and the loss function.
discriminator = Discriminator().to(device)generator = Generator().to(device)loss = nn.BCELoss()d_optimizer = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))g_optimizer = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))# fixed_noise will be used to generate images from random noisefixed_noise = torch.randn(64, 100, 1, 1, device=device)# Half labels correspond to class 0 and remaining to class 1fixed_fake_labels = torch.LongTensor([0]*(len(fixed_noise)//2) + [1]*(len(fixed_noise)//2)).to(device)n_epochs = 25img_list = []
Training and Inference
We will log the discriminator and generator loss using wandb.
# Initialize W&Brun = wandb.init(project='W&B_Generate_Faces_using_ConditionalGAN',config= WANDB_CONFIG)# Train the model for 25 epochsfor epoch in tqdm(range(n_epochs), total = n_epochs):N = len(dataloader)for bx, (images, labels) in enumerate(dataloader):# Obtain the datareal_data, real_labels = images.to(device), labels.to(device)fake_labels = torch.LongTensor(np.random.randint(0, 2, len(real_data))).to(device)fake_data = generator(noise(len(real_data)), fake_labels)fake_data = fake_data.detach()# Train discriminatord_loss = discriminator_train_step(real_data, real_labels, fake_data, fake_labels)fake_labels = torch.LongTensor(np.random.randint(0, 2, len(real_data))).to(device)# Train generatorfake_data = generator(noise(len(real_data)), fake_labels).to(device)g_loss = generator_train_step(fake_data, fake_labels)# Log to wandbwandb.log({'d_loss':d_loss.detach(),'g_loss':g_loss.detach()})# Inferencewith torch.no_grad():fake = generator(fixed_noise, fixed_fake_labels).detach().cpu()imgs = vutils.make_grid(fake, padding=2, normalize=True).permute(1,2,0)img_list.append(imgs)show(imgs, sz=10)
After training is complete, we will store the final grid image result in wandb table as defined below.
table_generated = wandb.Table(columns=['Image'], allow_mixed_types = True)table_generated.add_data(wandb.Image("/content/conditional_gan_predictions.jpg"),)wandb.log({"Generated Images by Conditional GAN" : table_generated})
In this image, we can see that the first 32 images correspond to male images and the next 32 images correspond to female images which substantiates the fact that the conditional GANs perform as expected!
Colab Notebook
References
Add a comment