Skip to main content

Prediction Training Dashboard

Created on August 19|Last edited on September 9

Goals

This dashboard documents model training code, experiments and tracked metrics. We perform an analysis of the tracked metrics from multiple trials and present our key findings related to the prediction model.

For complete reproduce-ability, the code used to train the model is documented below.

Code

Analysis of Grouped Trials from Latest Run

We conduct hyperparameter search using ray and analyze the results using the following table.

loss
avg_loss
_timestamp
_time_this_iter_s
_training_iteration
time_this_iter_s
should_checkpoint
done
timesteps_total
episodes_total
training_iteration
trial_id
experiment_id
date
timestamp
time_total_s
pid
hostname
node_ip
time_since_restore
timesteps_since_restore
iterations_since_restore
warmup_time
config/train_loop_config/batch_size
config/train_loop_config/dataset_key
config/train_loop_config/lr
config/train_loop_config/max_num_steps
config/train_loop_config/num_workers
config/train_loop_config/shuffle
config/train_loop_config/cfg/format_version
config/train_loop_config/cfg/model_params/future_num_frames
config/train_loop_config/cfg/model_params/history_num_frames
config/train_loop_config/cfg/model_params/model_architecture
config/train_loop_config/cfg/model_params/render_ego_history
config/train_loop_config/cfg/model_params/step_time
config/train_loop_config/cfg/raster_params/dataset_meta_key
config/train_loop_config/cfg/raster_params/disable_traffic_light_faces
config/train_loop_config/cfg/raster_params/ego_center
config/train_loop_config/cfg/raster_params/filter_agents_threshold
config/train_loop_config/cfg/raster_params/map_type
config/train_loop_config/cfg/raster_params/pixel_size
config/train_loop_config/cfg/raster_params/raster_size
config/train_loop_config/cfg/raster_params/satellite_map_key
config/train_loop_config/cfg/raster_params/semantic_map_key
config/train_loop_config/cfg/raster_params/set_origin_to_bottom
config/train_loop_config/cfg/train_data_loader/batch_size
config/train_loop_config/cfg/train_data_loader/key
config/train_loop_config/cfg/train_data_loader/num_workers
config/train_loop_config/cfg/train_data_loader/shuffle
config/train_loop_config/cfg/train_params/checkpoint_every_n_steps
config/train_loop_config/cfg/train_params/eval_every_n_steps
config/train_loop_config/cfg/train_params/max_num_steps
config/train_loop_config/cfg/val_data_loader/batch_size
config/train_loop_config/cfg/val_data_loader/key
config/train_loop_config/cfg/val_data_loader/num_workers
config/train_loop_config/cfg/val_data_loader/shuffle
logdir
25
24
22
4
11
23
13
14
Key insights
  • Lower learning rates (~0.0006) yield lower avg loss (~10 MSE)
  • Training on the semantic rasters usually yields lower avg loss when compared to satellite rasters
  • The model loss stabilizes around roughly 5000 training steps.
  • Higher learning rates (> 0.006) results in model avg loss (>50 MSE) not stabilizing in training.

Training Metrics


Run set
56

Key Insights
  • The largest batch size (24) in our experiments results in roughly 80% GPU utilization. We can perhaps make increasing the batch size to get more out of the compute.
  • Higher learning rates cause instability in training.

Tuned Hyperparameter Analysis


Run set
56

Key insights
  • It's evident from the plot above that Ray optuna iteratively reduces the hyperparamter search space to quickly find the best set of hyperparameters.
  • Lower learning rates ( <0.0009) with higher number of training steps yields lower average loss.
  • Higher batch sizes in combination with lower learning rates yields lower average loss.
  • There is very little correlation between the type of raster (semantic vs satellite) and the average loss.


File<(table)>