Financial Sentiment Analysis on Stock Market Headlines With FinBERT & HuggingFace
In this article, we analyze the sentiment of stock market news headlines with the HuggingFace framework using a BERT model fine-tuned on financial texts, FinBERT.
Created on September 23|Last edited on July 28
Comment
Financial news headlines are a fertile source of NLP data, especially when it comes to predicting how a stock will perform. Frequently, this is done via sentiment analysis, an NLP task that buckets phrases into positive, negative, and neutral.
In this article, we'll briefly discuss FinBERT and sentiment analysis before digging into our experiment. But first: note that you can follow along in the Google Colab below and look at some of our data (and associated predictions) in a Table.
You can follow along on this Google Colab ↓
Here's a W&B Table example:
Table of Contents
What is FinBERT? Downloading the Stock Market News Data from KaggleFinBERT Using HuggingFaceRunning Inference with FinBERT and Stock Market News HeadlinesVisualizing the Results Interactively as a W&B TableVisualizing the W&B TableTips & Tricks for IInteracting with W&B TablesConclusion
What is FinBERT?
FinBERT is a pre-trained NLP model based on BERT, Google's revolutionary transformer model. Simply put: FinBERT is just a version of BERT trained on financial data (hence the "Fin" part of its name), specifically for sentiment analysis.
Remember: BERT is a general language model. Financial news and stock reports often involve a lot of domain-specific jargon (there's plenty in the Table above, in fact), so a model like BERT can't generalize well in this domain. You'd see similar problems using BERT on, say, legal filings or medical literature.
The Importance Of Sentiment Analysis in Finance ML
Sentiment analysis, meanwhile, is a very common task in NLP that aims to assign a "feeling" or an "emotion" to text. Typically, it predicts whether the sentiment is positive, negative, or neutral.
You often see sentiment analysis around social media responses to hot-button issues or to determine the success of an ad campaign. But it's promising in the financial domain as changes in sentiment around a company could help predict a rise or fall in that company's stock.
What data was FinBERT trained on?
The data used to train FibBERT is text from financial news services, as well as the FiQA dataset. For the financial news:
the annotators were asked to give labels according to how they think the information in the sentence might affect the mentioned company stock price.
In other words, sentiment here is more or less a proxy for how people felt certain news and information would affect a company's price. Negative sentiment would lead to a stock losing value, while positive sentiment would, of course, result in this guy:

Downloading the Stock Market News Data from Kaggle
We want to take the fine-tuned FinBERT model but put it to the test on a different dataset. For this experiment, we'll use the "Daily Financial News for 6000+ Stocks" from Kaggle.

The dataset is made up of over 1.8M stock market news headlines. In the example Colab accompanying this blog post, we'll run inference on just 300 headlines. But do feel free to check out the full dataset too.
The Google Colab Notebook Code
Looking at the code in the Colab notebook, we'll start off by git cloning a GitHub Gist with a small portion of that dataset.
!git clone https://gist.github.com/c1a8c0359fbde2f6dcb92065b8ffc5e3.git
We'll also read the .csv file with Pandas Python library and print a little snippet.
import pandasheadlines_df = pandas.read_csv('c1a8c0359fbde2f6dcb92065b8ffc5e3/300_stock_headlines.csv')headlines_df.head(5)
Then, we'll use NumPy to shuffle the entries and convert them to a normal Python list containing the headlines. This list can be used as an input to the FinBERT model.
import numpy as npheadlines_array = np.array(headlines_df)np.random.shuffle(headlines_array)headlines_list = list(headlines_array[:,2])print(headlines_list)

Here's a quick example of what our headlines look like
FinBERT Using HuggingFace
HuggingFace makes it easy for us to try out different NLP models. We can find the FinBERT model on the HuggingFace model hub — and even run a test inference using a little text box on their website!

Back to the Colab Notebook
We'll start working with the NLP model by installing the HuggingFace transformers library.
!pip install transformers
Then, from the HuggingFace Model Hub, we'll download the pre-trained tokenizer, which is used to convert text into tokens that NLP models can understand. We also load the pre-trained model itself in a similar way.
from transformers import AutoTokenizer, AutoModelForSequenceClassificationtokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")
Running Inference with FinBERT and Stock Market News Headlines
Alright, so now we can take the Python list of the headlines we've preprocessed (note: yes, tokenizer accepts Python lists of strings as input), and pass them through the tokenizer to preprocess them before being inputted into the model with the following lines of code.
inputs = tokenizer(headlines_list, padding = True, truncation = True, return_tensors='pt')print(inputs)
Let's run the inference now!🔥
outputs = model(**inputs)print(outputs.logits.shape)
Next, all that's left is to post-process the outputs with the softmax activation function. It's useful because we can't have all three classes have the value of one - the maximum value. It would mean that a headline is extremely positive, negative, and neutral - all at the same time.
This function makes it such that all of the scores for our classes (e.g., positive, negative, neutral - in our case) add up to 1 and, thus, can be interpreted as probabilities. Also, the larger input components will correspond to larger probabilities.
import torchpredictions = torch.nn.functional.softmax(outputs.logits, dim=-1)print(predictions)
Visualizing the Results Interactively as a W&B Table
W&B Tables is an awesome Weights & Biases feature that lets interactively visualize and explore tabular data. And it is extremely easy to create one.
All we need to do is define a Pandas DataFrame (basically, defining a table in Python) with the four relevant to us columns.
Note: To populate the "Headline" column we are using headlines_list. It contains just 300 headlines as we noted above
💡
import pandas as pdpositive = predictions[:, 0].tolist()negative = predictions[:, 1].tolist()neutral = predictions[:, 2].tolist()table = {'Headline':headlines_list,"Positive":positive,"Negative":negative,"Neutral":neutral}df = pd.DataFrame(table, columns = ["Headline", "Positive", "Negative", "Neutral"])df.head(5)

Preview of how the Pandas DataFrame table looks
Logging a W&B Table
Okay. We're just 5 lines of code away from logging a W&B Table! The first thing we do is pip install and import the W&B library.
!pip install wandbimport wandb
Then, we initialize a new W&B project (if you've never used us before, think of it as creating a new GitHub repo but for your machine learning experiments). Here's what I call mine:
wandb.init(project="FinBERT_Sentiment_Analysis_Project")
Note: it may ask you at this stage to paste the API key and log into your existing W&B account. If you don't have it, you can quickly create one and then proceed with creating those wonderful tables.
💡
And all that's left for us to do now is to just pass the Pandas DataFrame directly into the wandb.run.log() function.
wandb.run.log({"Financial Sentiment Analysis Table" : wandb.Table(dataframe=df)})wandb.run.finish()
Visualizing the W&B Table
Now that we've logged the Table, it will start a new run in our project and print something like this in the console. We can click on this "Run page" link to open our Run Page dashboard and see the W&B Table we've created.


Tips & Tricks for IInteracting with W&B Tables
Of course, the benefit of reading this in a W&B Report is that I can paste the table directly from my project dashboard and show you a few cool tips & tricks on what we can do with them.
Filtering🔥
For example, you can use write simple filter expressions to show the desired entries. Here, I am only displaying entries with positive scores higher than 0.9 .

Sorting🔥
This one is probably my favorite. We can sort the Positive, Negative and Neutral columns in ascending or descending order. You can think of this finding "the most positive/negative/neutral" headlines. Pretty cool stuff, huh?

Here's an interactive example of how it looks when we look for the most negative ones.😈
Also, feel free to just click around and see what cool ways to analyze this financial sentiment data using W&B Tables you can come up with
Also, here's the Colab notebook I am featuring in this report ↓
Conclusion
In this tutorial, we learned about what financial sentiment analysis is, why it's hard, and why it's important. We took a model from the HuggingFace Model Hub and ran inference on it (with a different dataset from Kaggle), and then visualized our results using W&B Tables.
If you enjoyed this report, give it a heart and feel free to leave your comments down below. I really hope you found it useful!
Add a comment
Tags: Intermediate, NLP, Sentiment Analysis, HuggingFace, Experiment, Tutorial, FinBERT, Tables, FiQA, Financial, Exemplary, Large Models, LLM
Iterate on AI agents and models faster. Try Weights & Biases today.