The TransformerEncoder

In this report we will investigate the results of experiments using a modified Transformer model to forecast COVID-19 around densely populated United States counties. We will examine these results both with and without formal transfer learning. There are four experiment scenarios we aim to study:

Our evaluation methods are defined as follows:

Our model architecture is displayed below visually. Fundamentally in every time series problem we have a number of measurements in this case our measurements are: new cases, weekday, month, and 6 forms of mobility data. Thus we have nine total measurements which means the input to our model will be (batch_size, forecast_hist, 9).

Transformer Model Diagram

Hopefully that diagram explains our model pretty well. Basic idea is that the top linear layer and the forecast length layer are the only layers that are initialized fresh, the rest leverage pre-training.

Section 3

Counties to Study

We aim to study the following counties in this report.

Parameter importance

So what does this mean?

Section 3

Examining Test MSE Metrics

Warning although effort was taken to train on the exactly the same time span due to data issues results for some counties may have slightly different train and test days. These differences are generally limited to two or three days at the most and should have minimal impact on performance. We are currently re-running problematic experiments with exactly the same rows

Now we will look at the best runs in terms of the total test_loss MSE.

Section 7

Top Performing Runs on Last Week

In this next part of the report we will look at how the best models performed on forecasting at fifteen day period beginning 5/30/2020.

Section 9

Takeaways and Analysis

Things to investigate further