Using SGD vs Adam Optimizer
Analyzing SGD and Adam for training full transformer
Created on February 27|Last edited on March 1
Comment
Analysis for Sheepscot River ME
On the The Sheepscot River Adam (purple) seems to preform better in terms of validation loss but have much less stability.
In contrast with respect to training loss the model doesn't seem to converge at all compared to vanilla SGD. This is odd y as if the model doesn't converge one might wonder why the validation loss is better.
Run set
2
Analysis for the Kennebec River Forks
Run set
2
Analysis for Dead River (ME)
Run set
2
Analysis of East Branch Wesserunsett
Run set
2
Add a comment