Performed experiments with encoded dimension [2, 4, 8, 16, 32, 64], kl_coefficient = 1, batch_size = 100.
Encoder architecture composed by 3 Dense layers of dimension 512, 256, 128 (+ bottleneck) and a mirrored Decoder resulting in around 12M parameters (similar to the CNN architecture)
Report the validation loss, reconstructions and generations for each run. A few comments on the most relevant ones.
Sweep
Select runs to visualize data in this parallel coordinates chart.
Run set
0
Encoded_dim = 4
Run set
0
Comments
Suprisinglly the run with encoded_dim = 4 had the lowest val_loss
The reconstructions only capture the most general features (overall shade, hair color...)
The resulting generations looks very similar in the same way, space is not well regularized.
Encoded_dim = 64
Run set
0
Comments
Reconstructions look a little bit more accurate overall, capture some lower level features such as face tilting in some cases
Generations look quite 'general' also in this case