Skip to main content

Sweep/tuning.

Are our runs reasonable?
Created on March 14|Last edited on March 14
We did not cherry pick runs, but development might result in results that are overfit. We sweep to demonstrate this is not the case.
This best-of aggregation is a fairer demonstration (than e.g. box plot of average performance) that NDT-2.32 runs are actually better (since they aren't as clearly converged as other runs). The NDT-2.32 runs do appear more brittle, however.


1101001kepoch0.4
tag: time_pre-sweep-base_v2
tag: f32_pre-sweep-base_v2
tag: time-sweep-base_v2
tag: stitch-sweep-base_v2
tag: single_f8-sweep-base_v2
tag: f32-sweep-base_v2
Run set
93


cf. runs used in arch/base.

Run set
4