Using SGD vs Adam Optimizer

Analyzing SGD and Adam for training full transformer

Created on February 27|Last edited on March 1

Comment

﻿
Analysis for Sheepscot River ME
﻿
﻿
﻿
Validation Loss
Validation Loss
02468epoch5000600070008000
On the The Sheepscot River Adam (purple) seems to preform better in terms of validation loss but have much less stability.
Training Loss
Training Loss
102030Step0.20.40.60.81
In contrast with respect to training loss the model doesn't seem to converge at all compared to vanilla SGD. This is odd y as if the model doesn't converge one might wonder why the validation loss is better.
Run set2
﻿
Analysis for the Kennebec River Forks
﻿
﻿
﻿
Run set2
﻿
Analysis for Dead River (ME)
﻿
﻿
﻿
Run set2
﻿
Analysis of East Branch Wesserunsett
﻿
﻿
﻿
Run set2
﻿
﻿

Add a comment