In this report we will investigate the results of experiments using a modified Transformer model to forecast COVID-19 around densely populated United States counties. We will examine these results both with and without formal transfer learning. There are four experiment scenarios we aim to study:
Our evaluation methods are defined as follows:
Our model architecture is displayed below visually. Fundamentally in every time series problem we have a number of measurements in this case our measurements are: new cases, weekday, month, and 6 forms of mobility data. Thus we have nine total measurements which means the input to our model will be (batch_size, forecast_hist, 9).
Hopefully that diagram explains our model pretty well. Basic idea is that the top linear layer and the forecast length layer are the only layers that are initialized fresh, the rest leverage pre-training.
We aim to study the following counties in this report.
So what does this mean?
Warning although effort was taken to train on the exactly the same time span due to data issues results for some counties may have slightly different train and test days. These differences are generally limited to two or three days at the most and should have minimal impact on performance. We are currently re-running problematic experiments with exactly the same rows
Now we will look at the best runs in terms of the total test_loss MSE.
In this next part of the report we will look at how the best models performed on forecasting at fifteen day period beginning 5/30/2020.
Things to investigate further