Skip to main content

What does it take to write a NeurIPS paper?

Created on December 9|Last edited on April 20
This is how participants at NASA's Frontier Development Lab (FDL) – Sairam Sundaresan and J. Emmanuel Johnson, published a paper at NeurIPS.


NASA's Frontier Development Lab

The Frontier Development Lab (FDL) applies AI technologies to science to push the frontiers of research and develop new tools to help solve some of the biggest challenges that humanity faces. Researchers from different disciplines of science come together to work on this 8-week long challenge.


Problem Statement

Magnetic activity in stars manifests as dark spots on their surfaces that modulate the brightness observed by telescopes. These light curves contain important information on stellar rotation. However, the accurate estimation of rotation periods is computationally expensive due to scarce ground truth information, noisy data, and large parameter spaces that lead to degenerate solutions.
The goal of the team was to accurately predict stellar rotation periods from Kepler light curves.
The team spent the first week understanding the problem statement and brainstorming solutions. However, given the online nature of the sprint, it was hard to collaborate and the team relied on online services like Miro and W&B.
Simply put, their goal was to develop a model that would accurately predict the rotation period of the star when provided with the light curve. In order to solve this supervised learning problem, it was essential to have good quality labeled data. The team decided to build a basic pipeline first.


The Two Aha Moments

The team spent the next 6-7 weeks developing the solution. At the first glance, multiple light curves looked exactly the same with no discernable pattern. The problem was extremely degenerate. They started with simple models like a random forest and CNN. However, the loss was flat. The model didn't pick up anything.
They had to compute the rotation period using the raw data via the Physics accepted SOTA (ACF mentioned in the paper). They realized that these ACF estimates were noisy for all the stars in the catalog and this was causing the models to not converge. They then cleaned our training, test, and validation set to have only the stars for which the McQuillan rotation estimate was available. This was their first “Aha” moment.
Initially, the team fed the time series to a 1D CNN regressor. But this model had its limitations and did not achieve high enough accuracy. The second “Aha” moment was when they mapped the light curves into images by using three transformations and stacking the results along the channels axis.


Debugging and Reproducibility

"We could not have done so much if we were not able to log and experiment "
When talking about debugging their experiments, the authors used W&B for sanity checking their results. It was through this debugging that the team realized that they needed better quality labels. Using W&B, they could ensure that the pipeline worked correctly. To narrow down the search space and the right hyperparameters, the team used W&B Sweeps. They launched a large number of hyperparameter sweeps and found the set of hyperparameters that gave the lowest loss.
"W&B has changed the way I think about model training"
The team did not have to depend on a whole suite of code because of W&B. The authors created tons of simulations and consolidated them in W&B. They are extremely proud of the fact that every aspect of the experiment is completely reproducible. Without reproducibility, they could not have had this level of consistency, especially when they needed to debug. It provided them with peace of mind and faith in their code since they could verify it anytime.
The authors took advantage of the visualizations available to debug their experiments.
When asked about their experience using W&B, the authors explained how it took only 10 minutes for them to set it up and was extremely easy to understand. They valued having W&B as a central platform for collaboration.


Result

Exoplanet hunting missions and surveys (i.e. Kepler and TESS) have already generated Terabytes of stellar light curves that are a treasure trove of data for understanding exoplanet hosts, stellar rotation and magnetism. However, traditional algorithms used to estimate stellar properties, like ACF, are expensive and require long observational baselines. Here we demonstrate that our pipeline (RotNet) based on a supervised, pre-trained Convolutional Neural Networks is able to estimate stellar rotation periods with a similar level of accuracy as the full ACF approach. It does so with 65 times less data points and is 10,000 times faster.


Advice on Structure ML Projects

Lastly, the authors gave some great advice on how to structure ML projects based on their experience participating in FDL 2020.
  • Structuring code correctly is extremely important.
  • It is essential to follow naming conventions when working with branch and commit names.
  • The choice between using notebooks or scripts depends on the project you are developing.
  • When the team tried using Tensorboard, they realized that it did not have a clear organization and lacked sweeps and customizable visualizations.
  • The team chose to use PyTorch Lightning. Given that the basic training and eval loops were standardized, they could focus on constructing experiments and custom data loaders instead.



Run set
10

Iterate on AI agents and models faster. Try Weights & Biases today.