Check out the accompanying Kaggle kernel →


Numerai is a crowdsourced AI hedge fund that operates on predictions made by data scientists worldwide (like you)! In this report, we show you how to get started with Numerai and compete on the hardest data science tournament on the planet using Weights & Biases.

Numerai was founded by Richard Craib in 2015. Some very experienced people in quantitative finance, like Howard Morgan (Co-Founder of Renaissance Technologies) and Marcos López de Prado (Professor at Cornell University and scientific advisor to Numerai), are involved with the project.

By combining the predictions of thousands of data scientists, they can gain a competitive edge against other quantitative hedge funds. In contrast, data scientists can financially benefit by contributing their predictions to the platform.

To get a feeling of what Numerai is all about, check out this short video:


The hardest data science tournament on the planet

Practically the first thing you see on the Numerai homepage is the bold statement "The hardest data science tournament on the planet." Why is competing on Numerai (not) so hard?

Why is Numerai Hard?

Why is Numerai not That Hard?

So, How Does This all Work in Practice?

The basis of all transactions on the Numerai platform is the Numeraire (NMR) token. This token operates on the Ethereum platform and enables Numerai to facilitate transactions to its data scientists easily.

Each week data scientists deliver predictions using Numerai's datasets, and these predictions will be used for stock investments in their meta-model. Each user can then stake as much NMR on their model as they want. Depending on the quality of your model, your NMR stake will increase or decrease. Staking ensures that users deliver sensible models and precludes "Sybil attacks." By delivering steady predictions every week, your reputation will increase along with your leaderboard position. Note that you do not have to share any details about your model, and this makes it almost impossible for Numerai to reverse-engineer your model. Numerai and its users are, therefore, dependent on each other and share the risks in a balanced way.

Section 2

Data Processing

Numerai has its own API (NumerAPI) that provides a convenient interface to download the datasets, get information about the competition and upload your predictions. We can download the latest data, unzip and load it in just a few lines of code.

Full code →

import numerapi
NAPI = numerapi.NumerAPI(verbosity="info")

# Download new data
DIR = "my_data_directory"
NAPI.download_current_dataset(dest_path=DIR, unzip=True)

# Load data
full_path = f'{DIR}/numerai_dataset_{NAPI.get_current_round()}/'
train = pd.read_csv(full_path + 'numerai_training_data.csv')
test_df = pd.read_csv(full_path + 'numerai_tournament_data.csv')

# Split validation and test
val = test_df[test_df['data_type'] == 'validation']
test = test_df[test_df['data_type'] != 'validation']

The Numerai Dataset

Section 2


Spearman Correlation

When competing on Numerai, your model will be evaluated on the "Spearman Correlation" metric. I have made a Kaggle Kernel dedicated to this metric that you can check out here. Scipy provides an excellent implementation to calculate the Spearman correlation:

Full code →

from scipy.stats import spearmanr

def spearman(y_true, y_pred, axis=0):
    """ Calculate Spearman correlation """
    return spearmanr(y_true, y_pred, axis=axis)

Sharpe Ratio

Even though Spearman Correlation is the main metric, it does not take into account how stable your model is across multiple eras. Therefore, it is generally more useful to monitor the "Sharpe ratio". This metric is used a lot in quantitative finance. The basic Sharpe ratio for Numerai predictions can be calculated by taking the average correlation per era and dividing by the standard deviation of the correlations per era.

In Python code the calculation looks something like this:

import numpy as np
import pandas as pd
from scipy.stats import spearmanr
def sharpe(df: pd.DataFrame) -> np.float32:
    Calculate the Sharpe ratio by using grouped per-era data
    :param df: A Pandas DataFrame containing the columns "era", "target_kazutsugi" and "prediction_kazutsugi"
    :return: The Sharpe ratio for your predictions.
    def _score(sub_df: pd.DataFrame) -> np.float32:
  	    """ Calculate Spearman correlation for Pandas' apply method """
  	    return spearmanr(sub_df["target_kazutsugi"],  sub_df["prediction_kazutsugi"])[0, 1]
    corrs = df.groupby("era").apply(_score)
    return corrs.mean() / corrs.std()
# Get Sharpe Ratio for validation data

Full code →

For this report, we will be monitoring the Spearman correlation, Sharpe Ratio, Numerai Payout Ratio, and the Mean Absolute Error (MAE) metrics. Additionally, we calculate the feature exposure, which I will talk about in the next section.

Feature Engineering and Selection

The features have a remarkably low correlation to the target variable. Even the most correlated features only have around 1.5% correlation with the target. Engineering useful features out of feature and era groupings are key for creating good Numerai models.

Also, the importance of features may change over time. By selecting a limited number of features, we risk having a high "feature exposure." Feature exposure can be quantified as the standard deviation of all your predictions' correlations with each feature. You can mitigate this risk by using dimensionality reduction techniques like Principal Component Analysis (PCA) to integrate almost all features into your model. In this starter example we take 150 features that are most correlated to the target variable.

# Calculate correlations with target
full_corr = train.corr()
corr_with_target = full_corr["target_kazutsugi"].T.apply(abs).sort_values(ascending=False)

# Select features with highest correlation to the target variable
features = corr_with_target[:150]
features.drop("target_kazutsugi", inplace=True)
feature_list = features.index.tolist()

Modeling / Hyperparameter Optimization

To get a first good model for Numerai, we will train a LightGBM model and use Weights and Biases to do a hyperparameter sweep. In this example, it will be a grid search over some of the most important hyperparameters for LightGBM. First, we define the configuration of the sweep.

sweep_config = {
   'method': 'grid',
   'metric': {
          'name': 'mse',
          'goal': 'minimize'   
   'parameters': {
       "num_leaves": {'values': [30, 40, 50]}, 
       "max_depth": {'values': [4, 5, 6]}, 
       "learning_rate": {'values': [0.05, 0.01, 0.005]},
       "bagging_freq": {'values': [7]}, 
       "bagging_fraction": {'values': [0.6, 0.7, 0.8]}, 
       "feature_fraction": {'values': [0.85, 0.75, 0.65]},
sweep_id = wandb.sweep(sweep_config, project="numerai_tutorial")

After that we define a function (_train) using wandb.config attributes so Weights and Biases can perform the grid search. We make sure to log all the metrics and can then start the agent!

# Prepare data for LightGBM
dtrain = lgb.Dataset(train[feature_list], label=train["target_kazutsugi"])
dvalid = lgb.Dataset(val[feature_list], label=val["target_kazutsugi])
watchlist = [dtrain, dvalid]

def _train():
    # Configure and train model
    wandb.init(project="numerai_tutorial", name="LightGBM_sweep")
    lgbm_config = {"num_leaves": wandb.config.num_leaves, 
                   "max_depth": wandb.config.max_depth, 
                   "learning_rate": wandb.config.learning_rate,
                   "bagging_freq": wandb.config.bagging_freq, 
                   "bagging_fraction": wandb.config.bagging_fraction, 
                   "feature_fraction": wandb.config.feature_fraction,
                   "metric": 'mse', 
                   "random_state": seed}
    lgbm_model = lgb.train(lgbm_config, 
    # Create predictions for evaluation
    val_preds = lgbm_model.predict(val[feature_list], num_iteration=lgbm_model.best_iteration)
    val.loc[:, "prediction_kazutsugi"] = val_preds
    # W&b log metrics
    spearman, payout, feature_exposure, numerai_sharpe, mae = evaluate(val)
    wandb.log({"Spearman": spearman, "Payout": payout, "Feature Exposure": feature_exposure, 
               "Numerai Sharpe Ratio": numerai_sharpe, "Mean Absolute Error": mae})
# Run sweep
wandb.agent(sweep_id, function=_train)

Full code →

The results reveal that the learning rate and max_depth are the most important hyperparameters for our LightGBM model. The parallel coordinates plot below shows that the model with the highest Spearman correlation will not necessarily lead to the highest Sharpe ratio. Be sure to compare multiple metrics when evaluating your Numerai models.

Section 10


It is possible to upload a CSV file with your predictions directly on the Numerai tournament page. However, it becomes tedious to do this every week and NumerAPI makes it easy to upload your predictions. This requires you to add API keys when you are initializing NumerAPI.

Section 4

Once you have obtained your API keys, you can easily submit your predictions with a few lines of code.

SUB_PATH = "my_submission_directory/submission1.csv"

# Initialize API with API Keys
NAPI = numerapi.NumerAPI(public_id=PUBLIC_ID, 
# Upload predictions for current round
test[["id", "prediction_kazutsugi"]].to_csv(SUB_PATH, index=False)
NAPI.upload_predictions(DIR, tournament=NAPI.get_current_round())

Full code →

Caveat Emptor (Things to be Aware of)

Final Tips

I hope this introduction got you excited about starting with Numerai! If so, be sure to check out the accompanying Kaggle kernel with this report.

Check out the Accompanying Kaggle Kernel →

If you have any questions or feedback, feel free to comment below. You can also contact me on Twitter @carlolepelaars.