Assignment 4
Instructions
- The goal of this assignment is fivefold: (i) train RBMs using Gibbs Sampling and Contrastive Divergence (ii) generate image samples using a trained RBM (iii) [optional] train a GAN for image-to-image translation (e.g., translating a day image to a night image) (iv) [optional] use a pre-trained StyleGAN to generate images (v) [optional] train and generate images using a Variational Autoencoder
- We strongly recommend that you work on this assignment in a team of size 2. Both the members of the team are expected to work together (in a subsequent viva both members will be expected to answer questions, explain the code, etc).
- Collaborations and discussions with other groups are strictly prohibited.
- For Part A (RBMs) you must only use Python (numpy and pandas) for your implementation.
- For Parts B and C you can use any and all packages from keras, pytorch, tensorflow
- You can run the code in a jupyter notebook on colab by enabling GPUs.
- You have to generate the report in the same format as shown below using wandb.ai. You can start by cloning this report using the clone option above. Most of the plots that we have asked for below can be (automatically) generated using the apis provided by wandb.ai. You will upload a link to this report on gradescope.
- You also need to provide a link to your github code as shown below. Follow good software engineering practices and set up a github repo for the project on Day 1. Please do not write all code on your local machine and push everything to github on the last day. The commits in github should reflect how the code has evolved during the course of the assignment.
- You have to check moodle regularly for updates regarding the assignment.
Part A
You will experiment with the Fashion-MNIST dataset and learn hidden representations (hh) for the images from the 784 dimensional raw features (VV). Specifically, given the 784 dimensional (V ) binary fashion-MNIST data you need to learn a nn-dimensional hidden representation (h). You need to convert the real valued fashion-MNIST data into binary data by using a threshold of 127 (any pixel having a value less than 127 will be treated as 0 and any pixel having a value greater than or equal to 127 will be treated as 1). You will split the 60000 images in the training data as training and validation data. You will keep the 10000 test images aside. The specific tasks are listed below:
Question 1 (6 Marks)
Build a RBM which contains the following layers: (i) input layer containing 784 neurons (ii) hidden layer containing nn neurons.
The code should be flexible such that the dimension of hidden layer can be changed.
Question 2 (10 Marks)
Write code to train the model using Block Gibbs Sampling (using only python and numpy; you cannot use any automatic training/differentiation packages from keras, pytorch, tensorflow, etc). You will use the learned representations as input to a simple 1 layer classifier. This classifier would take the nn dimensional hidden representation as input and compute a distribution over the 10 classes using a softmax layer. Use the test data to track the loss of this classifier. The steps would be as follows:
- train the RBM model for 1 epoch
- using this partially trained model compute hidden representations of the validation data
- using these hidden representations train a simple 10-class logistic regression model
- evaluate the logistic regression model on the test data
- log the accuracy on the test data and the cross entropy loss on the test data for the first epoch using wandb
- repeat the above process for kk epochs and log the accuracy and loss on the test data in each epoch so that wandb can automatically generate the plot required in Q2
Using the sweep feature in wandb find the best hyperparameter configuration. Here are some suggestions but you are free to decide which hyperparameters you want to explore
- n, i.e., dimension of the hidden layer: 64, 128, 256, ...
- k, i.e., the number of steps for which you will run the Markov chain: 200, 300
- r, i.e., the number of samples drawn after the chain converges : 10, 20, 30
Based on your sweep please paste the following plots which are automatically generated by wandb:
- accuracy v/s created plot (I would like to see the number of experiments you ran to get the best configuration).
- parallel co-ordinates plot
- correlation summary table (to see the correlation of each hyperparameter with the loss/accuracy)
Also write down the hyperparameters and their values that you sweeped over.
Question 3 (9 Marks)
Based on the above plots write down some insightful observations. For example,
- Using higher value of nn leads to better performance
- Using higher value of kk leads to better performance
- Using higher value of rr leads to better performance
(Note: I don't know if any of the above statements is true. I just wrote some random comments that came to my mind)
Of course, each inference should be backed by appropriate evidence.
Question 4 (10 Marks)
Write code to train the model using Contrastive Divergence (using only python and numpy; you cannot use any automatic training/differentiation packages from keras, pytorch, tensorflow, etc). You will use the learned representations as input to a simple 1 layer classifier. This classifier would take the nn dimensional hidden representation as input and compute a distribution over the 10 classes using a softmax layer. Use the test data to track the loss of this classifier. The steps would be as follows:
- train the RBM model for 1 epoch
- using this partially trained model compute hidden representations of the validation data
- using these hidden representations train a simple logistic regression model
- evaluate the logistic regression model on the test data and note the accuracy
- log the accuracy on the test data and the cross entropy loss on the test data for the first epoch using wandb
- repeat the above process for kk epochs and log the accuracy and loss so that wandb can automatically generate the plot required in Q2
Using the sweep feature in wandb find the best hyperparameter configuration. Here are some suggestions but you are free to decide which hyperparameters you want to explore
- n, i.e., dimension of the hidden layer: 64, 128, 256, ...
- k, i.e., the number of steps in Contrastive Divergence: 1, 5, 10
Based on your sweep please paste the following plots which are automatically generated by wandb:
- accuracy v/s created plot (I would like to see the number of experiments you ran to get the best configuration).
- parallel co-ordinates plot
- correlation summary table (to see the correlation of each hyperparameter with the loss/accuracy)
Also write down the hyperparameters and their values that you sweeped over.
Question 5 (9 Marks)
Based on the above plots write down some insightful observations. For example,
- Using higher value of nn leads to better performance
- Using higher value of kk leads to better performance
(Note: I don't know if any of the above statements is true. I just wrote some random comments that came to my mind)
Question 6 (6 Marks)
You will now visualise the samples generated from the distribution. Suppose it took mm steps for SGD to converge when using Contrastive Divergence. By convergence, I mean you did not see any benefit of training the model for more than mm steps. Plot the samples generated by Gibbs chain after every m64\frac{m}{64} steps of SGD. Use an 8 × 8 grid to plot these 64 samples. Write down any interesting observations that you make (e.g., how do the samples look like at the beginning of training? how do they look like as you approach convergence?)
Question 7 (5 Marks)
Use t-SNE to plot the learned representations in a 2-dimensional space (tSNE will essentially take the n-dimensional representation and plot it in a 2d space such the images which are close in the nn-dimensional space will be close in the 2d space also). More specifically, take your best model from above and use it to compute the nn dimensional hidden representations of the test data. Now use t-SNE to plot these representations in a 2-dimensional space. While plotting use a different color for each of the 10 classes and write down your observations.
Question 8 (10 Marks)
Paste a link to your github code for Part A
Example: https://github.com/<user-id>/cs6910_assignment3/partA;
-
We will check for coding style, clarity in using functions and a README file with clear instructions on training and evaluating the model (the 10 marks will be based on this).
-
We will also run a plagiarism check to ensure that the code is not copied (0 marks in the assignment if we find that the code is plagiarised).
-
We will check the number of commits made by the two team members and then give marks accordingly. For example, if we see 70% of the commits were made by one team member then that member will get more marks in the assignment (note that this contribution will decide the marks split for the entire assignment and not just this question).
-
We will also check if the training and test splits have been used properly. You will get 0 marks on the assignment if we find any cheating (e.g., adding test data to training data) to get higher accuracy.
Part B
Question 1 (0 Marks)
Note that this question does not carry any marks and will not be graded. This is only for students who are looking for a challenge and want to get something more out of the course.
In this question you will experiment with Google's Maps Dataset and train a conditional GAN (the Pix2Pix model) to translate satellite images to Google Map images as shown below.
You can replicate the steps in this blog.
Once you get comfortable with the code you can also experiment with a few other datasets listed in Section 4 of this paper
Question 2 (0 Marks)
Note that this question does not carry any marks and will not be graded. This is only for students who are looking for a challenge and want to get something more out of the course.
In this question, you will use a pre-trained StyleGAN to generate realistic images of people who do not exist! You can follow this blog and use the original code released by Nvidia. It is very expensive to train this model so you are advised to only use a pretrained model. You can also try to disentangle different features and linearly interpolate between two faces (for example, start with your face and generate intermediate images till you reach your friend's face). You will find more details on the blog.
Part C
Question 1 (0 Marks)
Note that this question does not carry any marks and will not be graded. This is only for students who are looking for a challenge and want to get something more out of the course.
In this question you will train a VAE using the MNIST dataset. You can simply follow the steps in this blog.
Self Declaration
- ...
- ...
- ...
- ...
- ...
- ...
- ...
- ...
- ...
- ...