Diffusion on the Clouds: Short-term solar energy forecasting with Diffusion Models
Using diffusion models to predict cloud movement on satellite imagery to forecast solar energy production
Created on January 27|Last edited on April 28
Comment
In this article, we trained a diffusion model to predict the future cloud movent. We trained on a dataset of multiband satellite imagery consisting of a variety of bands. The idea was to generate a proof-of-concept using the techniques developed on diffusion models to "outpaint" the future. Using the model to generate the missing future frame on a sequence of images from the satellite.
This is the accompanying article for our GTC 2023 session "Diffusion on the Clouds: Short-term solar energy forecasting with Diffusion Models".
💡
Solar Energy: A cheap energy resource but with high variability
Solar energy is a great resource that has immense potential to provide clean, renewable and sustainable energy for many years to come. However, one of the drawbacks of solar energy is its unpredictability. The availability of solar energy depends on many factors, such as weather patterns, geography, and time of day. Therefore, forecasting solar energy production becomes very important to maximize its usage and minimize variability.
Solar energy production is affected by various variables that can change rapidly and unpredictably. Cloud cover, temperature variations, and other weather elements can cause significant fluctuations in solar production throughout the day, making it difficult to determine exactly how much solar power systems will produce on any given day. Accurate forecasting helps energy providers and grid operators to prepare for fluctuations in solar energy, providing them with a better understanding of how much energy they can expect to receive from solar sources.

You basically want to predict when this is going to happen (from SteadySun)
The importance of solar energy forecasting is magnified when we consider the growing trend of integrating renewable generation into the power grid. Grid operators must balance the supply and demand of energy on a minute-by-minute basis, and any fluctuations in supply can cause stability issues or even blackouts. By accurately forecasting solar production, grid operators can better manage energy demand and supply, reducing the challenges associated with grid instability and other operational issues. Additionally, reliable forecasts of solar power output can help energy providers to optimize their energy portfolios, allowing them to strategically plan for energy procurement and maximize the deployment of energy storage solutions.
In conclusion, while solar energy is a great resource, it is unpredictable, and forecasting is essential in understanding how much energy will be produced on any given day. Accurate solar production forecasts can help energy providers and grid operators to balance the supply and demand of energy in real-time, minimize disruption to the power grid, and ultimately help to accelerate the transition to a more environmentally sustainable and renewable energy future.
The Dataset 🛰️
We have partnered with SteadySun, a leader in solar energy forecasting. They have been doing this for a while now and clearly understand the technology and the market. They have provided a dataset of satellite imagery for us to work with. We can take a look at the data from the satellite in the Table below. Each row represents one day of data, and each column one spectral band. Some bands are not always available, like the HRV (High Resolution Visible).
🛰️ Check the info on the satellite bands here.
The Problem: Predicting the future cloud movement
Most of the work shown here is based on the report on how to do next frame prediction on Moving MNIST.
Next Frame Prediction Using Diffusion: The fastai Approach
In this article, we look at how to use diffusion models to predict the next frame on a sequence of images, and we iterate fast over the MovingMNIST dataset.
Using Stable Diffusion VAE to encode satellite images
We take the pretrained Stable Diffusion Variable Auto Encoder (VAE) to represent satellite imagery into latent space.
We will formulate the problem as an autoregressive task: Given three previous frames, can we predict the 4th?

Images are captured by the satellite every 15 minutes
Are you able to predict what's going to happen with the cloud in the future frames? (Open to see)

🤯 The cloud is gone!!!
If we can create a model that is expressive enough to account for cloud "destruction"/"creation" we would have something useful.
Repurposing Diffusion Models as Next Frame Predictors
In the previous article, we showed that we could use diffusion models to "outpaint" the future by conditioning the denoising process on past frames. It worked exceptionally well, and the code base is short and straightforward.
We created an initial baseline by retraining the same model used for MovingMNIST using DDPM's simple training script. Input images are resized to 64x64, and only one band is used for fast iteration. This dataset has around 20k sequence samples.
Same as for MovingMNIST, the task consists of feeding the model 3 frames and predicting the fourth. To do so, we sample random noise from a normal distribution and progressively denoise the fourth frame. This formulation of the problem is straightforward and enables sampling multiple futures from the model once trained.

A batch of input data consisting of 3 past frames and a noisy image. The number represents the timestep for the noise schedule.
Bigger Model + More data
We present different possible generations of 10 future frames conditioned on three past frames (13 frames in total).
You can compare the generation with the Ground Truth column (gt) and frame by frame on the gt/gen column.
These results come from the model retrained on the bands B07 and IR108 stacked together. It is also using input images resized to 128x128 pixels.
- No EMA.
- With no scheduler scaling, we could probably improve by retraining with better scheduler hyperparameters, as explained in this article.
Full size image model: Simple Diffusion
I am using @lucidrains' code from here; this model can perform training and inference on full size images without the use of a VAE by doing tiling/patching on the input images straight away.
- Full size image input and output (512x512)
- Used simple ViT tiling/patching to downsize images to (64x64)
- It can scale to multiple transformer layers, so it is highly paralellizable.
- Trained for 30 epochs
It uses a bunch of other tricks to achieve better training stability and faster convergence:
- the noise schedule is adjusted for high resolution images
- It is sufficient to scale only a particular part of the architecture
- dropout should is added at specific locations in the architecture
- downsampling is an effective strategy to avoid high resolution feature maps
Add a comment