Phase 3 - CR Hyperparameters & Variants
Created on September 30|Last edited on October 13
Comment
We first do experiments on the trimmed set and then see whether those insights generalise to the main set.
Sweep: Vary Loss Scales
Run set
5
In varying the loss scales there does not seem to be any difference. Now we vary the learning rate. Its best to keep them as equal I guess.
I still suspect that there might be no influence of this factor.
Sweep: LR , ELR
Here we vary learning rate and encoder learning rate.
LR : 2e-5 4e-5 1e-4
ELR: 1e-5 5e-5 2e-6
Run set
16
So from this it seems that the best performing combinations are as follows:
LR = 1e-3 and ELR (encoder learning rate) = 5e-5
LR | Encoder LR | Avg F1 |
---|---|---|
1e-3 | 5e-5 | 0.4578 |
5e-4 | 5e-5 | 0.4497 |
2e-4 | 1e-5 | 0.4268 |
5e-4 | 1e-5 | 0.4208 |
Another thing to note is that all of crpr runs are worse than `cr runs. That can't be a good sign!
Sweep: CRPR/CR Trim Baselines (default param, varying size)
LR Full Dataset (Sweep LR Results)
Clearly, the 'default' learning rate - 2e-4, 1e-5 seems to work better than every other one. So, NO, the insights from limited 50 instance experiments do not directly translate to the main, whole dataset.
So then, we need to figure out what's a good size of dataset after which we can conclude that indeed, it works.
Run set
5
Clearly, the 'default' learning rate - 2e-4, 1e-5 seems to work better than every other one. So, NO, the insights from limited 50 instance experiments do not directly translate to the main, whole dataset.
Unary hdim
HOI Trainer
Add a comment
# Unary hdim
Test is inconclusive. Trims still don't represent the real deal.
unary_hdim of 1000 works better. But there's little difference between 500 and 100 🤷
Reply
# HOI Trainer
There is NO difference in the two. Ignore hoitrainer.
Reply