UNIF

Experiments while trying to implement the UNIF model
Created on May 1|Last edited on May 1
Comment
﻿
UNIF Cosine Positive (unif-pos)This model tries to minimize the distance between a code embedding and its corresponding description embedding.
The problem we have found with this approach is that it does not work for Top-N ranking because it tries to make every embedding have the same value. For instance, all the code and description embeddings will look like <0.3, 0.5, 0.1>. In this case, the cosine similarity loss will be minimized, but it will be useless for Top-N prediction.
UNIF N2 2DAs a sanity check we ran unif-pos using a dataset of 2 records with an embedding size 2. In the plots below we can see that the train loss function is effectively going down. We verified that the cosine similarity gets closer to 1.0 after each epoch.
﻿
Run set58
﻿
UNIF N100 32DIn this experiment we increased the number of records to 1000, the embedding size to 32 and we set the learning rate to 0.05.
In the plot below we can observe that the training loss is going down as the number of epochs increases. However, this does not help the accuracy or NDCG. Interestingly, there is a slight increase in the Top-N metrics implemented by Angel, but it is so small it could be attributed to randomness.
Most of the gradients are very close to zero for both the code and description layers. The code embedding layer has slightly bigger gradients.
﻿
Run set1
﻿
UNIF N1000 32D AttentionIn this experiment we also have 1000 records, an embedding size of 32 and a learning rate of 0.05. Differently to the previous model, we have replace the averaging of the code token embeddings with a self-attention mechanism.
We can see that althought the training loss goes down, there is no improvement on the accuracy or NDCG, nor in the Top-N metrics (these are measurements taken from the training set). Again, this can be explained by the fact that the algorithm is making every embedding have the same value for all the code and description entries.
﻿
Run set58
﻿
UNIF Cosine Negative (unif-neg)This version of UNIF tries to minimize the similarity between a code embedding and its corresponding description embedding. The ultimate goal is that the cosine similarity between a code embedding and its description embedding is -1.0. This model is built as a sanity check to confirm the CosineEmbeddingLoss and CosineSimilarity classes work the way we expect them to. In the actual UNIF model we try to minimize the cosine similarity between a code embedding and a random description embedding that does not correspond to the code embedding (negative sample).
UNIF-neg N2 2DAs a sanity check we ran unif-neg using a dataset of 2 records with an embedding size 2. In the plots below we can see that the train loss function is effectively going down. We verified that the cosine similarity gets closer to -1.0 after each epoch. For this to happen it is necessary to set the margin parameter of the CosineEmbeddingLoss class to -1.0.
﻿
Run set58
﻿
UNIF-neg N1000 32DIn this experiment we increased the number of records to 1000, the embedding size to 32 and we set the learning rate to 0.05.
We can see that the training loss goes down and we manually verify that the cosine similarity between the code and description embeddings is close to -1.0. We can see that this model does not help accuracy, NDCG nor Top-N metrics.
﻿
Run set1
﻿
﻿
Add a comment