Skip to main content

TempVerseFormer Training Logs

This report provides an overview of the training logs for different temporal models within the TempVerseFormer project. Each section below links to the corresponding Weights & Biases project and offers a brief description of the experiments conducted.
Created on March 19|Last edited on March 19

TempFormer (Vanilla Transformer)


This project tracks the training of the TempFormer, a Vanilla Transformer architecture adapted for temporal sequence modeling. Experiments here explore the performance of a standard Transformer with temporal chaining on the rotating shapes dataset. Logs include training curves for loss, reconstruction metrics, and visualizations of predicted shape rotations. This serves as a key baseline for comparison against the memory-efficient TempVerseFormer.

TempVerseFormer (Reversible Transformer)


This project showcases the training of the TempVerseFormer, the core Reversible Temporal Transformer architecture proposed in our research. These logs demonstrate the training process using time-agnostic backpropagation and reversible blocks for memory efficiency. Experiments cover various temporal patterns (constant, accelerated, oscillating, interrupted, combined rotations) to evaluate the model's predictive accuracy and robustness. Observe the training curves, performance metrics, and generated sample images to assess TempVerseFormer's capabilities.

Standard Transformer (Pipe-Transformer)


This project documents the training of a Standard Transformer architecture, referred to as "Pipe-Transformer" in our work. This model processes the entire input context at once, without temporal chaining, serving as a non-recurrent baseline. The logs here help understand the performance of a typical Transformer when applied to temporal data in a non-recurrent manner. Compare these results to TempFormer and TempVerseFormer to highlight the benefits of temporal chaining and reversibility.

LSTM


This project tracks the training of an LSTM (Long Short-Term Memory) network, a traditional recurrent neural network architecture. These logs provide a performance baseline using a conventional sequence model on the rotating shapes dataset. Comparing the LSTM training curves and metrics with the Transformer-based models (TempFormer and TempVerseFormer) illustrates the relative strengths and weaknesses of recurrent vs. Transformer approaches for this temporal modeling task.