Skip to main content

Using SGD vs Adam Optimizer

Analyzing SGD and Adam for training full transformer
Created on February 27|Last edited on March 1

Analysis for Sheepscot River ME




02468epoch5000600070008000

On the The Sheepscot River Adam (purple) seems to preform better in terms of validation loss but have much less stability.

102030Step0.20.40.60.81

In contrast with respect to training loss the model doesn't seem to converge at all compared to vanilla SGD. This is odd y as if the model doesn't converge one might wonder why the validation loss is better.

Run set
2


Analysis for the Kennebec River Forks




Run set
2


Analysis for Dead River (ME)




Run set
2


Analysis of East Branch Wesserunsett




Run set
2