Skip to main content

A Beginner's Guide to Named Entity Recognition (NER)

This article will explore the world of NER, examining the underlying concepts and techniques that make it possible and showcasing real-world applications of NER in business and beyond
Created on February 22|Last edited on October 25
Named Entity Recognition using SpaCy | Colab Notebook | Image by Author

Introduction

We generate and consume vast amounts of information everyday, and extracting meaningful insights from this data can mean the difference between success and failure in many fields. However, the sheer volume and complexity of the information we encounter can make this a daunting task.
Enter Named Entity Recognition (NER), a subfield of Natural Language Processing (NLP) that allows us to identify and extract important entities such as names, places, organizations, and other relevant information from unstructured text data.
The applications of NER are vast and varied, ranging from business and finance to healthcare and law enforcement. By leveraging NER, we can perform tasks such as entity extraction, entity level sentiment analysis, information extraction for knowledge graphs, and many more.
In this article, we will explore the world of NER in-depth, examining the underlying concepts and techniques that make it possible and showcasing real-world applications of NER in business and beyond.
Here's what we'll cover in this article:

Table of Contents


Please note that this article contains numerous examples and code snippets, but the complete code is not included (we've skipped pip installs, dependency issue workarounds, etc.). In order to make the most of this article, we highly recommend that you refer to the notebooks throughout.
💡

What is Named Entity Recognition (NER)

Named Entity Recognition is a fundamental task in NLP and Information Extraction that helps us in identifying and extracting named entities from unstructured text data. So, what exactly are named entities?
A named entity is a word or a sequence of words that represent a specific object or concept, such as a person, organization, location, product, date, and time. Basically, any proper name or numerical expression can be a named entity. For example, given the text:
Pikachu was captured by Team Rocket when Ash was busy challenging the gym leader Erika in Celadon City.
The NER model should extract 'Pikachu', 'Team Rocket', 'Ash', 'Erika' and 'Celadon City' along with their respective Entity label/category.
Example Section in Colab Notebook | Image by Author
In the above example, you can see PERSON, ORG and GPE entity labels. Also, the model identifies Pikachu as a PERSON, which isn't correct as Pikachu is not a person but a Pokémon! Why!?
We see this false positive in our result as the model used for this example is a SpaCy pre-trained model i.e., en_core_web_trf which was trained using a combination of various newswire and web data. The entity labels of this model includes, ['CARDINAL', 'DATE', 'EVENT', 'FAC', 'GPE', 'LANGUAGE', 'LAW', 'LOC', 'MONEY', 'NORP', 'ORDINAL', 'ORG', 'PERCENT', 'PERSON', 'PRODUCT', 'QUANTITY', 'TIME', 'WORK_OF_ART']. As you can see, the Pokémon entity isn't present in this list of named entities the model can identify.
Since this model wasn't trained with a Pokémon entity label, we can further fine-tune it with some newly annotated text data so that it can identify Pokémon entities.
Fine-tuning large language models such as en_core_web_trf can be computationally expensive and time-consuming as it requires loads of annotated training data. However, there are other approaches that can be used to get our desired output with less effort!
Here's one way to do this:
Since it's relatively easier to create a list of Pokémon names which we can obtain from some Pokémon fandom page, one approach is to try combining a Rule-Based NER method that assigns Pokémon labels by checking if the Pokémon names in the list are present in the given text, along with a pre-trained NER model for other entities to get our desired output.
In short:
Step 1: Use Rule-Based NER methods to extract Pokémon entity labels.
Step 2: Add a pre-trained NER model to our pipeline for other named entity labels.
This approach, which combines multiple NER methods is known as the Hybrid Approach. If you're new to these methods and feeling confused, fret not, we'll learn how these methods work and how to implement all these methods in the "Named Entity Recognition Techniques" section below.
If you're short on time and want to learn how to implement Hybrid NER, feel free to check out this notebook.
SpaCy Matcher section in Colab notebook | Image by Author
In the example above, how is the model able to identify 'Team Rocket' as a single entity, even though it consists of two separate words ('Team' and 'Rocket')?
To label a sequence of tokens as a single named entity, NER models use BIO (or IO, BIOES, BMEWO, BILOU) tags along with named entity tags. These tags are used to indicate the boundaries of named entities that span multiple tokens. The BIO acronym stands for "Beginning", "Inside", and "Outside", which represent the positions of each token in a named entity.

In this sentence, by assigning Beginning and Inside for the tokens 'Team' and 'Rocket', the model can join these two tokens and form a single representation (ORG).
Here's an example of how this sentence can be labelled with IO, BIO and BIOES tags.


By identifying and classifying these named entities, NER can help us in extracting useful information from unstructured text and enable more accurate and efficient analysis of textual data.

A Brief History of NER

Named Entities were first used in the Message Understanding Conferences (MUC) for Information Extraction (IE) research in the 1990s. To be more specific, named entities were used to extract structured information about company activities from unstructured information sources such as newspaper articles [Grishman and Sundheim 1996] by using Rule-Based approaches, where the rules were manually created by domain experts and linguists.
Many NER evaluation-based projects like Multilingual Entity Tracking (MET) for Japanese, Chinese and Spanish, and Conference on Computational Natural Language Learning (CoNLL) for English, German, Dutch and Spanish were initiated in the early 2000s.
Along with these evaluation projects, the NER landscape experienced a rise in supervised and deep learning techniques for NER, like Maximum Entropy Model, Support Vector Machine, Conditional Random Field, RNN, LSTMs, etc. These models were better at capturing the complex relationships between the words and identifying the named entities.
For example, if the text has a misspelt named entity, the supervised models will be able to extract these entities, whereas rule-based systems could have a hard time dealing with misspellings.
In recent years, large language models such as BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer) are transforming the NLP field, including named entity recognition tasks.
BERT, for instance, has been fine-tuned for various NER tasks and has achieved state-of-the-art results on standard benchmarks. BERT-based models have also been used for Named Entity Linking (NEL), where the goal is not only to recognize named entities but also to link them to a knowledge base, such as Wikipedia.
For instance, in the medical domain, a NEL system could use a medical ontology such as SNOMED-CT or UMLS to identify and link named entities such as diseases, symptoms, and drugs. Check out how to use UMLS for NEL in this notebook section.
The code below uses DBpedia (Wikipedia knowledge-base). 
# import spacy
import spacy
# import the spacy_dbpedia_spotlight
import spacy_dbpedia_spotlight
# load your model as usual
nlp = spacy.load('en_core_web_trf')
# add the pipeline stage
nlp.add_pipe('dbpedia_spotlight')
# get the document
doc = nlp('Team Rocket caused chaos in the city as they attempted to capture Pikachu')
# see the entities
print('Entities', [(ent.text, ent.label_, ent.kb_id_) for ent in doc.ents])
# inspect the raw data from DBpedia spotlight
print(doc.ents[0]._.dbpedia_raw_result)
Now that we know what named entity recognition is and how its approaches have evolved over the years, let's have a look at how to build these NER systems, starting with some of the main features used in rule-based NER systems.

Features Used for Named Entity Recognition

Selecting and constructing relevant features from input data is crucial for building a successful NER system. This step, known as feature engineering, helps the model learn and recognize entities. In rule-based systems, it's especially important for machine learning (ML) engineers to identify which features can help the model extract the desired entities from the text data. Some common features used in NER are:
Gazetteer and Dictionary features: These are lists of known entities, such as people, places, and organizations. They can be used to match the input against a pre-defined list of entities. The Pokémon example above uses this method.
Word-level features: Word-level features are the surface form of the word, its part-of-speech tag, and its context (e.g., the surrounding words).
Character-level features: Character-level features like the length of the word, whether it contains digits or punctuation, and whether it is capitalized.
Lexical features: Lexical features refer to the characteristics of words themselves, such as their spelling, pronunciation, and meaning. Eg: The frequency of the word in the training data, the number of times it appears in the document, and the number of times it appears in a window around the current word.
Syntactic features: Syntactic features refer to the grammatical properties of words and phrases that determine how they are used in a sentence and how they relate to other words and phrases within that sentence. Syntactic features consist of features like the parse tree of the sentence, the dependency relations between words, and the headword of the current word.
Semantic features: Semantic features refer to the basic components or attributes of meaning that are used to describe a word's or phrase's sense. These include the word's meaning, such as its WordNet synset or its vector representation in a pre-trained embedding space.
Please note that the features mentioned here are not an exhaustive list. There can be various other features that can be considered and incorporated into an NER system, depending on the entities to be extracted and the requirements of the end-user. The choice of features can vary from one NER system to another and may impact the accuracy and effectiveness of the system.

Named Entity Recognition Techniques

As we saw earlier, the main NER methods include Rule-Based systems, Statistical models, Deep Learning models (which include RNN-based LSTMs, GRUs, and Transformer architecture-based LLMs) and Hybrid Systems.
Let's now have a look at all these NER techniques, starting with the OG — Rule-Based Systems. Use this notebook section to follow along!

Rule-Based Approaches

Rule-based approaches are one of the earliest and simplest techniques used for named entity recognition. In rule-based approaches, a set of handcrafted rules and patterns are defined to identify entities in text data. There are several techniques used in rule-based approaches such as dictionary and gazetteer-based approaches, rule-based pattern matching, and knowledge-based approaches.

Dictionary and Gazetteer-Based Approach

In this approach, a predefined set of dictionaries and gazetteers containing lists of named entities such as names, places, organizations, and other categories are used. During the NER process, the text data is scanned and compared against these dictionaries and gazetteers. If a match is found, then the corresponding named entity is extracted. This approach is useful for identifying entities that have a specific name or label, but it may not work well for identifying entities that do not have a predefined name or label.
# Import required libraries
import spacy
from spacy.tokens import Span
from spacy.matcher import PhraseMatcher

@spacy.Language.component("pokemon_ner")
def pokemon_ner(doc):
# Create a PhraseMatcher object with the vocabulary from the doc
matcher = PhraseMatcher(doc.vocab)
# Tokenize the phrases in the POKEMON_NAMES list
patterns = list(nlp.tokenizer.pipe(POKEMON_NAMES))
# Add the patterns to the PhraseMatcher object
matcher.add("POKEMON_NAMES", None, *patterns)
# Find all matches in the doc using the PhraseMatcher object
matches = matcher(doc)
# Create a new Span object for each match
spans = [Span(doc, start, end, label="POKEMON") for match_id, start, end in matches]
# Set the entities of the doc to the new spans
doc.ents = spans
# Return the updated doc
return doc

# Define a list of Pokemon names
POKEMON_NAMES = ['Pikachu', 'Charmander', 'Bulbasaur', 'Squirtle']
# Create a blank spacy model and add the custom component to it
nlp = spacy.blank("en")
nlp.add_pipe("pokemon_ner", name="pokemon_ner")

# Define some text to be processed
poke_txt = "I choose you, Pikachu!"
# Process the text with the spacy model
doc_poke = nlp(poke_txt)
# Print the detected entities and their labels
print([(ent.text, ent.label_) for ent in doc_poke.ents])
# Output
# [('Pikachu', 'POKEMON')]

Rule-Based Pattern Matching

In this approach, a set of predefined rules and patterns are used to identify named entities. These rules and patterns are defined based on the structure and characteristics of the text data. For example, a rule-based pattern may look for specific keywords or phrases that are associated with named entities. This approach is useful for identifying entities that have a specific structure or pattern, but it may not work well for identifying entities that do not follow a specific pattern.
# Import the English language model from spaCy
from spacy.lang.en import English

# Load the English language model
nlp = English()

# Create a new pipeline component for entity recognition
ruler = nlp.add_pipe("entity_ruler")

# Define some patterns to match entities
patterns = [
{"label": "ORG", "pattern": [{"LOWER": "team"}, {"LOWER": "rocket"}]}, # match "Team Rocket" as an organization
{"label": "GPE", "pattern": [{"LOWER": "vermilion"}, {"LOWER": "city"}]}, # match "Vermilion City" as a geographic location
{"label": "Pokemon", "pattern": "Pikachu"} # match "Pikachu" as a Pokemon
]

# Add the patterns to the entity ruler
ruler.add_patterns(patterns)

# Process some text with the pipeline
doc = nlp("Team Rocket caused chaos in the Vermilion city as they attempted to capture Pikachu")

# Print the recognized entities and their labels
print([(ent.text, ent.label_) for ent in doc.ents])
# Output
# [('Team Rocket', 'ORG'), ('Vermilion city', 'GPE'), ('Pikachu', 'Pokemon')]
Example using Regular Expression (RegEx):
import spacy
import re

nlp = spacy.load("en_core_web_sm")
doc = nlp("Team Rocket caused chaos in the Vermilion city as they attempted to capture Pikachu")

expression = r"[Tt](eam|\\.?) ?[Rr](ocket|\\.?)"
for match in re.finditer(expression, doc.text):
start, end = match.span()
span = doc.char_span(start, end)
# This is a Span object or None if match doesn't map to valid token sequence
if span is not None:
print("Found match:", span.text)
# Output
# Found match: Team Rocket
Although rule-based NER systems are versatile and simple (for Explainability), they still rely on hand-crafted rules and dictionaries, which can be time-consuming and difficult to scale for large datasets or multiple domains. Statistical models, on the other hand, can learn the patterns present in the text and can better deal with complex or ambiguous entity types.
As you might've guessed already, I'm trying to lure you into the next section, i.e., statistical models. Let's see how well they perform compared to rule-based systems.

Statistical and Probabilistic Models

Statistical NER models can perform better than rule-based NER in cases where the text data is complex and noisy, and/or when the named entity classes are not well-defined. For example, in medical text, named entities can have multiple variants and synonyms, and may be difficult to define using rules alone. In such cases, statistical models can capture the patterns and context of the text to accurately identify named entities, whereas rule-based models may struggle due to the variability and ambiguity of the data.
By saying that the statistical NER model learns the patterns and relationships in the given text what we really mean is that the model learns the probability of word occurrences (entity labels) present in the training data.
Let's now have a look at few popular statistical NER models to understand the previous statement better.

Hidden Markov Models (HMM)

Hidden Markov Model is a generative probabilistic model frequently used in NLP and Time Series Forecasting (both of which involve sequential and temporal context). HMMs as the name suggests, consists of a sequence of hidden states and a sequence of observable outputs.

In NER, the hidden states represent the underlying named entities (such as "person", "organization", "location", etc.) of the observed words in a sentence. The observable outputs represent the actual words themselves.
The model assumes that the probability of an observable output (i.e. a word) depends only on the hidden state at that time step (Output Independence Assumption).
Also, the probability of the hidden states follows the (First-Order) Markov assumption i.e. the probability of the future hidden state (Ht+1)(H_{t+1}) depends only on the present state (Ht).(H_{t}).
Now that we know the types of states and the basic assumptions in HMM, let's try to understand how a HMM can be used to extract entities using a Pokémon example:
  1. First, let's define the problem we are trying to solve. We know that NER is the task of identifying and classifying named entities (such as "person", "location", "organization", etc.) in unstructured text data. To keep things simple, we will focus only on identifying and classifying Pokémon entities in a sentence in this example.
  2. Now, let's define the components of an HMM for NER:
  • Hidden States: These are the underlying labels we want to predict (in our case, Pokémon entity labels like "Pikachu", "Charizard", etc.).
  • Observable Outputs: These are the words in the sentence that we can observe (in our case, the words in the sentence).
  • Transition Probabilities: These are the probabilities of transitioning between hidden states.
  • Emission Probabilities: These are the probabilities of observing a particular output given a hidden state.
3. We can represent these components in the form of mathematical equations:
  • Initial probability distribution: P(t1=si)=piiP(t_1 = s_i) = pi_i, where piipi_i is the initial probability of starting in state sis_i.
  • State transition probability: P(ti+1=sjti=si)=aijP(t_{i+1}=s_j|t_i=s_i) = a_{ij}, where aija_{ij} is the probability of transitioning from state sis_i to state sj s_j.
  • Emission probability: P(witi=sj)=bj(wi) P(w_i | t_i = s_j) = b_j(w_i), where bj(wi)b_j(w_i) is the probability of observing output wiw_i given hidden state sjs_j.
4. Let's assume we have a sentence: "Pikachu, I choose you to battle against Charizard!" We want to identify and classify the Pokémon entities in this sentence.
5. We start by creating a set of possible hidden states (i.e. Pokémon entities) we want to predict. Let's say we have the following set: {Pikachu, Charizard, Squirtle, Bulbasaur, etc.}.
6. Next, we calculate the transition probabilities between hidden states. These probabilities represent the likelihood of transitioning from one hidden state to another. For example, the probability of transitioning from Pikachu to Charizard might be higher than the probability of transitioning from Pikachu to Squirtle.
7. We also calculate the emission probabilities, which represent the likelihood of observing a particular output given a hidden state. For example, the probability of observing the word "Pikachu" given the hidden state 'Pokémon' is 1.0, but the probability of observing the word "Squirtle" given the hidden state 'O' is 0.0. These probabilities (both transition and emission — unknown parameters of HMM) can be estimated from training data by using Baum-Welch algorithm.
8. Now, we apply the Viterbi algorithm to find the most likely sequence of hidden states given the observed words in the sentence. This involves calculating the probability of each possible sequence of hidden states given the observed words and finding the sequence with the highest probability.
9. In our example sentence, the most likely sequence of hidden states might be: {B-POKEMON, O, O, O, B-POKEMON, O, O}. The "O" stands for "outside" and represents words that do not belong to any Pokémon entity.
10. Finally, we can use this sequence of hidden states to extract the named entities from the sentence. In our example, we can extract the named entities "Pikachu" and "Charizard".
While the independence and Markov assumptions simplify the modeling process and make it computationally efficient, they can also lead to limitations in the model's ability to capture complex dependencies in the data. For example, the independence assumption may not hold in cases where the observations are correlated, such as in language modeling where the context of a word can affect its meaning. Similarly, the Markov assumption may not hold in cases where the current hidden state depends on multiple previous hidden states or observations, such as in some types of sequential data.
HMMs are constrained to discrete states and rely solely on the previous state, making it challenging to create a state that functions based on multiple others. Additionally, HMMs have limited feature options, which restricts their ability to utilize the entire sequence's context. As a result, this restricts their effectiveness when the context of the entire sequence is crucial, as opposed to solely the previous state.
Check out this notebook section if you're interested in implementing NER using HMMs.

Maximum Entropy Markov Models

Maximum Entropy Markov Models (MEMMs) were introduced to overcome some of these limitations. MEMMs predict the state sequence given an observation space and use a maximum entropy framework for features and local normalization. Unlike HMMs, MEMMs aim to model the conditional probability distribution P(OH)P(O|H), where OO is the label and HH  is the input observation sequence, rather than the joint distribution P(O,H)P(O, H).
Furthermore, MEMMs do not assume independence between the observations, allowing for dependencies between non-consecutive observations to be captured. This makes MEMMs more suitable for modeling complex, non-Markovian relationships in the input sequence.
MEMMs also use a different normalization approach than HMMs. Rather than globally normalizing the joint distribution, MEMMs use local normalization of the conditional distribution to ensure that the output distribution is a valid probability distribution.
Using local normalization brings new problems to the table.
Due to local normalization, MEMMs tend to favor states with fewer transitions, leading to a suboptimal model (the model gets stuck at local minima). In other words, MEMMs are more likely to predict a label that appears less frequently in the training data, even if a different label is actually more appropriate.

Conditional Random Fields (CRF)

One way to address label bias in MEMMs is to use Conditional Random Fields. Instead of using local normalization, CRFs globally normalize the model parameters to ensure that the output distribution is a valid probability distribution. Furthermore, CRFs use an undirected graphical structure, which allows them to model complex, non-Markovian relationships between observations.
In CRFs, the output labels are modeled as a sequence of random variables that are connected through an undirected graph. This graph represents the dependencies between the labels, where each node represents a label, and each edge represents the dependence between two labels. Unlike in MEMMs, where the output labels are modeled as a sequence of conditional probabilities, CRFs directly model the joint probability distribution of the output labels given the input sequence.
Since CRF models the joint probability distribution of the output labels given the input sequence, where the output labels can have any arbitrary relationship with each other. This makes CRF more expressive, as it can model complex interactions between the output labels. However, it can also be computationally expensive to train and decode, especially for large output label spaces.
This is where Linear Chain CRF comes in.
A Linear Chain CRF is a specific type of CRF where the labels form a linear sequence. This means that the label at a particular position is only dependent on the labels that come before it and the observation at that position.
This makes Linear Chain CRF less expressive than CRF, but it is much more computationally efficient to train and decode.
I intentionally skipped using a lot of math to explain these concepts. Why so?
I don't want this article to be intimidating for beginners, and want to ease the learning curve as much as I can. Having said that, If you, my friend, are a Math fanatic, check out this blog on CRF by Edwin Chen. It's a valuable read!
As always, If you're interested in getting your hands dirty with implementing CRFs, take a look this notebook section.

Deep Learning Approaches

Deep learning models in general, are good at learning complex patterns in data. This is thanks to their architecture, which consists of multiple layers of artificial neurons that are able to recognize and process increasingly complex patterns in the data. As the data flows through these layers, the model is able to learn and extract important features, which can be used to make accurate predictions or classifications.
This process is known as feature learning or representation learning, and it is what allows deep learning models to perform so well on complex tasks like NER. By automatically learning and extracting high-level features from raw data, deep learning models are able to capture the complex patterns and relationships in the data that are often missed by traditional statistical models.
Furthermore, some deep neural network (DNN) architectures like transformers can also be used to train language models. Once a language model has been pre-trained on a large corpus of text data, it can be fine-tuned for a specific NER task by adding an additional output layer that predicts the named entity labels based on the input text. During fine-tuning, the weights of the pre-trained model are updated using a supervised learning approach, where the model is trained on a labelled NER dataset.
Let's now have a look at how these DNN models can be implemented for NER.

Recurrent Neural Network Models

Recurrent Neural Network (RNN) is a type of neural network that is designed to process sequential data. Unlike other types of neural networks, such as feedforward neural networks, RNNs can maintain an internal state, or "memory", which allows them to process sequences of input data.
The basic building block of an RNN is a simple neuron with a feedback loop. The feedback loop allows the output of the neuron to be fed back into the input, along with the current input, in order to compute the next output. This allows the RNN to maintain a "memory" of previous inputs, which can be used to inform future outputs.
Although RNNs are good at learning relations in sequential data, they suffer from Exploding/Vanishing Gradient Problem (EVGP). To solve this issue, new architectures such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) were developed. More on this in my Language Models article.
Take a look at this Kaggle notebook for implementing Bi-LSTM-CRF for NER.

Transformer based Large Language Models and Fine-tuning

The attention mechanism in Transformer models allows the network to selectively attend to the most relevant parts of the input sequence, while ignoring other parts that are less important. This selective attention is achieved by computing attention weights for each position in the input sequence, based on a query and a set of key-value pairs. This allows the model to capture long-term dependencies in the input sequence, without being constrained by the sequential nature of RNN models.
Furthermore, the use of multi-head attention in Transformer models enables parallel processing of different parts of the input sequence, allowing for highly efficient and scalable training of the model. This is in contrast to RNN models, which process the input sequence sequentially and therefore cannot be parallelized as easily.
We will learn how to fine-tune Transformer-based Large Language Models such as distilbert-base-uncased (DistilBERT) and en_core_web_trf (RoBERTa) using Hugging Face and SpaCy respectively for our ReliefNer app (more on this later).
The code below uses Hugging Face Transformers to create a NER pipeline:
from transformers import pipeline

classifier = pipeline(task="ner")
preds = classifier("My name is Madhana and I'm from India.")
preds = [
{
"entity": pred["entity"],
"score": round(pred["score"], 4),
"index": pred["index"],
"word": pred["word"],
"start": pred["start"],
"end": pred["end"],
}
for pred in preds
]
print(*preds, sep="\n")

Hybrid Approach

Hybrid approach for NER combines the strengths of both rule-based and machine learning-based approach. We can use rule-based systems to capture simple entities with high precision and machine learning based-systems to capture more complex entities with high recall!
For example, to extract phone numbers from text, we can use regular expressions to match patterns such as "(XXX) XXX-XXXX" or "(XXX)-XXXXX-XXXXX". Similarly, to extract dates, we can use regular expressions to match patterns such as "MM/DD/YYYY" or "YYYY-MM-DD". These patterns can be easily defined in a set of rules.
For entities such as country names or city names, we can also use a rule-based approach by defining a list of all possible values for these entities. For example, we can create a list of all countries in the world and use it to match country names in text. This approach can be highly effective for simple entities with a limited set of possible values.
However, for more complex entities with a larger set of possible values, a rule-based approach may not be sufficient. In such cases, machine learning-based approaches can be used to achieve higher recall.
Building Hybrid NER Systems is an easy task with the help of SpaCy's rule-based matching and language processing pipeline.
Let's have a look at how we can do the same:
  • Objective: Include Pokémon entity label in our NER System.
  1. Import the necessary modules from Spacy library and define a custom NER component called "pokemon_ner" which will match the given list of Pokémon names using a PhraseMatcher object.
  2. Load the pre-trained Spacy English language model and add the custom "pokemon_ner" component to the pipeline before the default "ner" component.
  3. Apply the loaded Spacy model to a sample text containing the name "Pikachu" and print the detected named entity along with its label using the .ents property of the document object.
import spacy
from spacy.tokens import Span
from spacy.matcher import PhraseMatcher

@spacy.Language.component("pokemon_ner")
def pokemon_ner(doc):
matcher = PhraseMatcher(doc.vocab)
patterns = list(nlp.tokenizer.pipe(POKEMON_NAMES))
matcher.add("POKEMON_NAMES", None, *patterns)
matches = matcher(doc)
spans = [Span(doc, start, end, label="POKEMON") for match_id, start, end in matches]
doc.ents = spans
return doc

POKEMON_NAMES = ['Pikachu', 'Charmander', 'Bulbasaur', 'Squirtle']

nlp = spacy.load("en_core_web_trf")
nlp.add_pipe("pokemon_ner", name="pokemon_ner", before='ner')

poke_txt = "I choose you, Pikachu!"
doc_poke = nlp(poke_txt)
print([(ent.text, ent.label_) for ent in doc_poke.ents])

Evaluation Metrics for Named Entity Recognition

Since NER falls under the Token Labelling task in NLP, we can use the standard evaluation metrics that we use for Classification tasks for NER as well. Let's have a look at some of those metrics now:
  1. Precision: This measures the proportion of predicted named entities that are actually correct. It is calculated as true positives divided by the sum of true positives and false positives.
  2. Recall: This measures the proportion of actual named entities that are correctly identified by the model. It is calculated as true positives divided by the sum of true positives and false negatives.
  3. F1-score: This is the harmonic mean of precision and recall, and provides a balance between the two metrics. It is calculated as 2 times the product of precision and recall divided by their sum.
  4. Accuracy: This measures the proportion of all tokens (both named entities and non-entities) that are correctly labeled by the model.
from sklearn.metrics import precision_score, recall_score, f1_score, accuracy_score
# let's use some dummy values for this example
# true labels
y_true = ['ORG', 'PER', 'LOC', 'MISC', 'O', 'PER']
# predicted labels
y_pred = ['ORG', 'PER', 'LOC', 'O', 'O', 'PER']
precision = precision_score(y_true, y_pred, average='weighted')
recall = recall_score(y_true, y_pred, average='weighted')
f1 = f1_score(y_true, y_pred, average='weighted')
accuracy = accuracy_score(y_true, y_pred)

print('Precision:', precision)
print('Recall:', recall)
print('F1-score:', f1)
print('Accuracy:', accuracy)
Going one step further, we can also use Per-Class Metrics and perform Error Analysis to improve our NER system's performance.
# For per-class metrics
from sklearn.metrics import classification_report

report = classification_report(y_true, y_pred, output_dict=True)
print('Report:', report)

Challenges in Named Entity Recognition

Despite its advantages, there are several challenges associated with NER.
  • Named Entities with Multiple Labels: Named Entities with multiple labels are entities that belong to more than one category or class. For example, a word like "Apple" can refer to a fruit or a company, depending on the context.
  • Spelling variations: Named entities can have different spellings, including variations in capitalization, abbreviation, and punctuation. This can make it difficult for a model to identify all instances of the same entity. For example, "New York" can be referred to in different ways, such as "New-York" or "NY" or "nyc".
  • Named Entities in Social Media Text: Social media text often contains abbreviations, slang, and non-standard language that may not be present in traditional language models, which can also contribute to the difficulty of accurately identifying named entities. For example, if we consider a tweet written in Thanglish (a mixture of Tamil and English), well, this is a colloquial language with no proper grammar or spelling rules, which can lead to variations in the way named entities are represented.
  • Lack of labeled data: NER models require large amounts of labelled data to train effectively. However, labelling data can be a time-consuming and expensive process, which makes it challenging to obtain sufficient training data. Also, collecting data in low-resource languages can be a daunting task.
  • Handling new named entities: NER models may not be able to recognize new named entities that were not present in the training data. This is known as the problem of out-of-vocabulary entities.

Applications of Named Entity Recognition

NER has found applications across diverse sectors, transforming the way we extract and utilize information. When I was researching for this article, I came across this impactful work by Merve Noyan, Alara Dirik and team.
They built an application named afetharita which uses various ML techniques to aid the Turkey-Syria earthquake disaster relief team in rescuing survivors.

They used an OCR-NER pipeline to extract information like 'Name', 'Phone Number', and 'Address' from screenshots and messages, store the data in an structured format and handed it over to the authorities.
When I read this, I was inspired to see how advancements in machine learning can be applied in such meaningful ways to help people in need. With that, I decided to create their work with some changes and documented the whole process, so that anyone can build their own NER powered apps.


NER in Healthcare

It is well established that the healthcare industry generates vast amounts of data, including both structured and unstructured data. This data can come from electronic health records, medical imaging, wearable devices, clinical trials, and other sources. Managing and processing this data can be challenging, and can lead to errors, inefficiencies, and missed opportunities for improving patient care. We can leverage NER models to extract medical concepts and relationships and maybe build an app that could store the whole medical history of a patient in a well structured format.
For example, we could build an OCR-NER pipeline (just like the one in afetharita app) to extract the data from a prescription and store the data in a database.
In this example below, we'll be using the en_core_med7_trf NER model. This model was trained on the Med7 dataset, which contains medical entities such as 'DOSAGE', 'DRUG', 'DURATION', 'FORM', 'FREQUENCY', 'ROUTE', 'STRENGTH'. We can use this model to extract these important entities from loads of unstructured medical text data.
import spacy

nlp = spacy.load("en_core_med7_trf")

# example medical text
text = "The doctor prescribed the patient with 30ml of Magnesium hydroxide 400mg/5ml \
suspension to be taken orally twice a day for the following 5 days."

# parse the text with spacy
doc = nlp(text)

# iterate over the entities in the text
for ent in doc.ents:
print(f"{ent.text}: {ent.label_}")
Output:
30ml: DOSAGE
Magnesium hydroxide: DRUG
suspension: FORM
orally: ROUTE
twice a day: FREQUENCY
for the following 5 days: DURATION

NER in Business

Companies often receive a large volume of unstructured data in the form of emails, customer feedback, social media posts, etc. NER can be used to extract important entities such as product names, company names, locations, and people mentioned in these texts. This can help businesses to better understand what their customers are talking about and identify key issues or trends.
The example below uses SpaCy pre-trained NER model to:
  1. Extract named entities from the text and creates nodes for a knowledge graph.
  2. Identifies relationships between entities and creates edges for the knowledge graph.
  3. Creates a directed graph using the nodes and edges.
  4. Plots the graph using the networkx and matplotlib libraries.
import spacy
from spacy.tokens import Span
from spacy import displacy
import networkx as nx
import matplotlib.pyplot as plt

nlp = spacy.load('en_core_web_sm')

for text in df['reviews.text']:
doc = nlp(text)

# extract named entities and create nodes for the knowledge graph
nodes = []
for ent in doc.ents:
node = (ent.text, ent.label_)
nodes.append(node)

# create edges for the knowledge graph based on relationships between entities
edges = []
for sent in doc.sents:
for token in sent:
if token.dep_ == "nsubj" and token.head.pos_ == "VERB":
subject = token
for child in token.children:
if child.ent_type_:
edges.append((subject.text, child.text))

# create a directed graph using the nodes and edges
graph = nx.DiGraph()
graph.add_nodes_from(nodes)
graph.add_edges_from(edges)

# plot the graph
pos = nx.spring_layout(graph)
nx.draw_networkx_nodes(graph, pos)
nx.draw_networkx_edges(graph, pos)
nx.draw_networkx_labels(graph, pos)
plt.show()

Tools and Libraries for Named Entity Recognition

There are a plethora of tools that can be used in constructing aNER system. Here are some of those tools that I found useful:

Data Sources

A successful NER system depends on high-quality, unbiased data that represents the target population or use case. A diverse, balanced dataset is crucial to avoid bias and out-of-sample data, which falls outside of the NER system's scope. Defining the system's scope beforehand and curating the dataset carefully can prevent irrelevant data and increase diversity through augmentation.
Here are some NER benchmark datasets:
  1. CoNLL 2003 dataset: This is a commonly used benchmark dataset for named entity recognition, consisting of news articles from Reuters corpus annotated with four entity types - person, location, organization, and miscellaneous.
  2. WikiNER corpus: This is another widely used dataset for named entity recognition, consisting of Wikipedia articles annotated with various entity types.
Although benchmark datasets are useful for testing the performance of NER models, we need to create a custom dataset that is specific to our target use case.
  1. Use APIs: APIs can serve as a data source for NER systems. Many APIs provide access to structured data such as product listings, customer reviews, news articles, and social media posts, etc. Check out this github repository for a list of free APIs.
  2. Web Scraping: Use web scraping tools like Scrapy, Beautiful Soup and Selenium to scrape task specific data. Web scraping can be a powerful tool for gathering data, but it's important to abide by certain rules and best practices to ensure that you're not violating any laws or ethical standards.
  3. Zero-Shot and Few-Shot Methods: Use LLMs like GPT-3 to generate NER data by designing prompts. For example, to generate disaster messages in a particular format, I used this prompt:
Generate 10 disaster distress messages with names, phone number, address, for training a NER model in this format: Hello, my name is XYZ and I live on XYZ Street. There is a fire in my neighbor's house and I can hear someone calling for help. Please send the fire department right away! My neighbor's name is Mr. XYZ and his address is 123 XYZ Street in Pittsburgh. His phone number is 9876543210.

Data Annotation Tools

Here's some open-sourced text data annotation tools for NER:
  1. Doccano
  2. Ner-annotator (provides a simple web app)
Tools with free tier:
  1. UBIAI 
Paid Tools:
  1. prodigy ( From the creators of SpaCy, has a few-shot and zero-shot (using gpt3.5) annotation support which makes the annotation process less painful.

Data Augmentation Tools

There are several data augmentation techniques that can be used specifically for Named Entity Recognition (NER), including:
  1. Synonym replacement: replacing named entities with their synonyms in the training data.
  2. Back-translation: translating named entities into another language and then translating them back to the original language to create new training examples.
  3. Typo insertion: adding typos to named entities in the training data to make the model more robust to spelling variations.
  4. Perturbation: randomly changing words in the training data to create new examples.
  5. Entity swapping: swapping named entities between two or more training examples to create new examples.
Check out Adept Augmentations repository to perform few-shot data augmentation for Named Entity Recognition.
  1. SpaCy: SpaCy has built-in NER capabilities that can be trained on custom data to recognize specific named entities that are relevant to a particular use case.
  2. Hugging Face: Hugging Face is a company and an open-source community that provides various Natural Language Processing (NLP) tools, including pre-trained models for Named Entity Recognition (NER). These models are built using deep learning techniques and are available in various languages. Check out SpanMarker for NER by Tom Aarsen and Hugging Face team. SpanMarker is based on the PL-Marker paper.
  3. Concise Concepts: This is a library that provides support for few-shot learning, which can be used to train NER models with limited labeled data. In addition to its few-shot NER capabilities, Concise Concepts also includes entity scoring, which allows users to rank the identified entities by their relevance or importance to the text. Check out this notebook section to get started!
  4. Flair: This is a popular NLP library that provides support for named entity recognition, among other tasks. It provides pre-trained models for multiple languages, and allows users to train their own models as well. It also supports contextual string embeddings, which can improve the performance of NER models. 
  5. OpenAI: We can use OpenAI APIs like GPT-3 for zero-shot NER. To use GPT-3 for NER, you can use the OpenAI API to send text input to the model, along with the prompt: f"Extract named entities from the following text: \"{text}\"" and receive the named entities identified in the text.

Example App: NewsTrackr

In this section, we'll learn how to build a web app that can extract NEWS data on a given topic (stock/index in our case) using NEWS API, then perform Aspect Based Sentiment Analysis (ABSA) to determine the sentiment of different aspects related to a stock or an index. We'll use AI21 to build our model, Streamlit as our web framework, and NewsAPI to collect news articles.
But wait, why do we need to determine the sentiment of different aspects related to a stock? More importantly, what's ABSA?

Problem Statement and Workflow

Please note that the content presented in the following section is solely for educational purposes and should not be interpreted as financial or investment advice. It is crucial to independently conduct thorough research and consult with a qualified financial professional prior to making any investment decisions.
💡
Let's say that a friend of mine is new to trading and needs some guidance on how to make informed trading decisions.
The right way to start trading would involve understanding the basics of technical and fundamental analysis, which are two primary methods used to analyze the markets and make informed investment decisions.
Technical analysis involves analyzing charts and past market data to identify trends and patterns in the market. One effective technique in technical analysis is Time Series Forecasting, which involves analyzing past market data to predict future trends.
On the other hand, fundamental analysis involves evaluating the financial health and performance of a company or index. Performing sentiment analysis on news data is a popular method in fundamental analysis.
However, when using news data for sentiment analysis, plain sentiment analysis may not always be effective. For example, a news article may mention two different stocks, but the sentiment towards each stock could be different, making it challenging to make informed trading decisions based on plain sentiment analysis.
To address this issue, Entity-Level Sentiment Analysis (ELSA) can be used to analyze the sentiment towards specific entities, such as a company or industry. This approach can provide more accurate sentiment analysis and help traders make more informed trading decisions.
But there's an even better approach: Aspect Based Sentiment Analysis (ABSA).
ABSA goes beyond ELSA by analyzing sentiment for each aspect of an entity. For example, let's say you want to analyze the sentiment towards Apple Inc. using news data. ELSA would provide an overall sentiment score for Apple Inc., while ABSA would provide sentiment scores for each aspect of Apple Inc., such as its products (iPhone, iPad, etc.), services (Apple Music, iCloud, etc.), and financials (revenue, earnings, etc.). This provides a more granular view of the sentiment towards Apple Inc. and allows for more accurate sentiment analysis.
We could fine-tune any pre-trained Large Language models to perform ABSA with the help of SpaCy, Hugging Face, or any other NLP libraries.
But to make things easier, we can build this ABSA app with (almost) no coding by leveraging the AI21 Studio platform. AI21 Studio provides API access to their Jurassic-2 large language models which we can use for our ABSA task.
With AI21 Studio, we can simply send requests to their API and receive the output, which can then be used for sentiment analysis on news data. The platform also provides a user-friendly interface that allows us to easily interact with the model and customize the analysis according to our needs.
Without any further ado, let's start building our ABSA app! Here's the architecture for the same:
ABSA Architecture using AI21 Studio, Image by Author

ABSA Model Fine-Tune Pipeline




Why do we need to fine-tune for our use-case?

Although Large Language Models like Jurassic 2 Jumbo-Instruct can be used for zero-shot ABSA, the performance of the model will be limited. Here's an comparison:
AI21 Studio provides 5 Foundational Models out of which Jurassic 2 Jumbo-Instruct is the most powerful language model and Jurassic 2 Large is the most affordable and cost-effective model. Visit here to know more about the Jurassic 2 models.
💡





As we can see, the fine-tuned Jurassic 2 Large model outperforms the Jumbo-Instruct model for our specific use-case. This is because the fine-tuned model is tailored to the task at hand, resulting in improved performance.

Data Collection

As mentioned, we used the SEntFiN-v1.1 dataset to fine-tune the Jurassic 2 Large model.
SEntFiN-v1.1 is a human-annotated dataset of 10,700+ news headlines with entity-sentiment annotations, aimed at addressing the challenging task of fine-grained financial sentiment analysis, especially in settings where multiple entities are present in a news headline and may have conflicting sentiments.
Here's the Kaggle Data Card for SEntFiN-v1.1

How do I add my Custom Dataset to AI21 Studio?

Here's how you can add your custom dataset to AI21 Studio:
  1. Go to Fine-Tuning Sets page in AI21 Studio.
  2. Click on Upload File and select your dataset (csv/jsonl).
That's it!
Check out their Build a Dataset docs page for more details.

Fine-Tuning Jurassic 2 Models for ABSA

Follow these steps to fine-tune the Jurassic 2 Large model using SEntFiN-v1.1 dataset:
  1. Go to Custom Models page in AI21 Studio.
  2. Click on Train a Model and select your uploaded SEntFiN-v1.1 dataset.
  3. Set model name, learning rate, epochs for the J2 Large model, then click 'Train Model' to start the model fine-tuning process.

Model Inference in AI21 Studio Playground

Once the model is trained, we can evaluate the model's performance by using the AI21 Studio Playground, follow these steps for the same:
  1. Go back to the Custom Models page.
  2. Click on the Playground Button near the model of your choice.
  3. In the Playground page, try experimenting with various temperature, top p, penalty levels and prompts.
Here's the whole ABSA Model Fine-Tune Pipeline process for your reference:


ABSA Inference using AI21, NEWS API and Streamlit




Setting up NEWS API Endpoint

The News API is a REST API that allows users to search and retrieve news articles from various sources on the web. With this API, users can obtain top stories from news websites or search for news related to specific keywords. Retrieving news based on certain criteria such as a specific channel or topic is possible, but an API key is required to access the service.
Visit newsapi.org to get your own API key.
import os
import requests
import streamlit as st

# Set up News API endpoint
NEWS_API_KEY = os.getenv("NEWS_API_KEY")
url = f"https://newsapi.org/v2/everything?apiKey={NEWS_API_KEY}"

# Get user topic, start, end input from Streamlit
topic = st.text_input("Enter the NEWS title for aspect-based sentiment analysis")
start = st.text_input("Please specify the date from which you would like to collect the news articles. (YYYY-MM-DD)")
end = st.text_input("Please specify the date till which you would like to collect the news articles. (YYYY-MM-DD)")

# Set query parameters and fetch news articles from the API
params = {"q": topic, "sortBy": "relevancy", "language": "en", "from": start, "to": end} #
response = requests.get(url, params=params)
articles = response.json()["articles"]
In the code above, we use the Streamlit library to create a user interface for collecting news articles from the News API. It starts by importing the necessary libraries, including os, requests, and streamlit.
Then we need to set up the News API endpoint by retrieving the NEWS_API_KEY from the operating system environment variables and appending it to the URL. If you're not familiar with .env file usage, check out this guide.
After that, we use Streamlit to get user input for the news topic, start date, and end date. These values are collected using the text_input() function and assigned to the variables 'topic', 'start', and 'end', respectively.
Finally, we set up the query parameters for the News API, including the user's input for the topic and dates. Then we use the requests library to make a GET request to the News API with the specified parameters and store the response in the 'response' variable.
Note: You can try experimenting with NEWS API alternatives such as gnews, GoogleNews, gnewsclient libraries which provides a simple interface for searching and retrieving news articles from Google News using the Google News RSS feed. Check out the NEWS API alternative section in our notebook to learn more.
💡

Setting Up AI21 API Endpoint

To use the AI21 API, you'd need to set up your personal AI21 API key. Visit the API Key page to get your personal API Key.
The code below sets up the API key to access the AI21 language model.
# Set up ai21 API key
# ai21.api_key = os.getenv("AI21_API_KEY")
ai21.api_key = 'your_ai21_api_key_here'
The commented out code shows an alternative way to set the API key using an environment variable, which is often preferred for security reasons, as it avoids exposing the key in the source code. The key can be stored as an environment variable in the OS, and accessed in the code using the os.getenv() function. In this case, the name of the environment variable containing the key would be "AI21_API_KEY".
Either method is valid, but using environment variables is generally considered more secure, especially when working on a team where multiple people have access to the code.
The code below uses the AI21 API to generate aspect-based sentiment analysis (ABSA) for news articles. The loop iterates over the list of articles and for each article, it extracts the title, content, and URL. Then, it sets up a prompt for the AI model using the title and passes it along with some other parameters to the ai21.Completion.execute() function to get the predicted ABSA for the article. The generated ABSA is stored in the ABSA variable.
The ai21.Completion.execute() function takes several arguments:
  • model: The name of the AI model to use.
  • custom_model: The name of the custom model to use for aspect-based sentiment analysis (in our case, "ASBA-j2-large-v2")
  • prompt: The prompt to use for generating the ABSA (in our case, "find aspect based sentiment analysis for this text" + prompt)
  • numResults: The number of results to generate.
  • maxTokens: The maximum number of tokens to generate in the response.
  • temperature: A value that controls the "creativity" of the response.
  • topKReturn: The number of tokens to consider at each step of generation.
  • topP: A value that controls the "diversity" of the response.
  • countPenalty, frequencyPenalty, and presencePenalty: Penalty parameters that control certain aspects of the generated text.
  • stopSequences: A list of strings that, if generated, will cause the response to be truncated.
# Process the articles and get predicted ABSA from AI model
for article in articles:
title = article["title"]
content = article["content"]
url = article['url']
prompt = article["title"]
response = ai21.Completion.execute(
model="j2-large",
custom_model="ASBA-j2-large-v2",
prompt="find aspect based sentiment analysis for this text" + prompt,
numResults=1,
maxTokens=200,
temperature=0.7,
topKReturn=0,
topP=1,
countPenalty={
"scale": 0,
"applyToNumbers": False,
"applyToPunctuations": False,
"applyToStopwords": False,
"applyToWhitespaces": False,
"applyToEmojis": False
},
frequencyPenalty={
"scale": 0,
"applyToNumbers": False,
"applyToPunctuations": False,
"applyToStopwords": False,
"applyToWhitespaces": False,
"applyToEmojis": False
},
presencePenalty={
"scale": 0,
"applyToNumbers": False,
"applyToPunctuations": False,
"applyToStopwords": False,
"applyToWhitespaces": False,
"applyToEmojis": False
},
stopSequences=[]
)
ABSA = response.completions[0].data.text
We can also use the playground interface of AI21 Studio to set all these arguments with the help of an interactive user interface, then directly copy-paste the API call code with ease.


Streamlit Output

Once we get the ABSA response from AI21 API call, all we need to do is use st.write() function to display our results.
# Display the predicted sport for each article
st.write(f"Article title: {title}")
st.write(f"Aspect Sentiment: {ABSA}")
st.write(f"Article content: {content}")
st.write(f"Article URL: {url}")
st.write("---")
Here's the complete code:
import os
import ai21
import requests
import streamlit as st

# Set up ai21 API key
ai21.api_key = os.getenv("AI21_API_KEY")

# Set up News API endpoint
NEWS_API_KEY = os.getenv("NEWS_API_KEY")
url = f"https://newsapi.org/v2/everything?apiKey={NEWS_API_KEY}"

st.title("NewsTrackr: AI News Aspect Based Sentiment Analysis")

# Get user topic input
topic = st.text_input("Enter the NEWS title for aspect-based sentiment analysis")
start = st.text_input("Please specify the date from which you would like to collect the news articles. (YYYY-MM-DD)")
end = st.text_input("Please specify the date till which you would like to collect the news articles. (YYYY-MM-DD)")

if st.button("Search"):
# Set query parameters and fetch news articles from the API
params = {"q": topic, "sortBy": "relevancy", "language": "en", "from": start, "to": end}
response = requests.get(url, params=params)
articles = response.json()["articles"]

# Process the articles and get predicted ABSA from AI model
for article in articles:
title = article["title"]
content = article["content"]
url = article['url']
prompt = article["title"]
response = ai21.Completion.execute(
model="j2-large",
custom_model="ASBA-j2-large-v2",
prompt="find aspect based sentiment analysis for this text" + prompt,
numResults=1,
maxTokens=200,
temperature=0.7,
topKReturn=0,
topP=1,
countPenalty={...},
frequencyPenalty={...},
presencePenalty={...},
stopSequences=[]
)
ABSA = response.completions[0].data.text

# Display the predicted sport for each article
st.write(f"Article title: {title}")
st.write(f"Aspect Sentiment: {ABSA}")
st.write(f"Article content: {content}")
st.write(f"Article URL: {url}")
st.write("---")

Deploy Streamlit App in Streamlit Cloud

Streamlit Cloud is a platform provided by Streamlit that allows users to easily deploy, manage, and share their Streamlit applications online. It provides a convenient way to host your Streamlit app without the need to set up your own server or worry about infrastructure.
Streamlit Cloud offers a free tier that includes a limited number of hours of CPU time and memory usage per month, as well as paid tiers that offer more resources and features. Additionally, it offers a set of tools and integrations to help you build and deploy your app more efficiently, such as GitHub integration, environment variables management, and integration with popular data storage solutions like AWS S3 and Google Cloud Storage.
To deploy the app in streamlit cloud, all you need to do is:
  1. Create a GitHub repository for your app code.
  2. Sign up for a Streamlit Cloud account and link it to your GitHub account.
  3. Create a new app in Streamlit Cloud and connect it to your GitHub repository.
  4. Configure the deployment settings for your app in Streamlit Cloud.
  5. Push your app code changes to GitHub to trigger a deployment in Streamlit Cloud.
  6. Access your deployed app URL and share it with others as needed.
Here's the NewsTrackr GitHub repository and Streamlit Cloud for your reference.

LangChain and WandbTracer (Optional)

In this optional section, we'll learn how we can leverage LangChain and WandbTracer from W&B Prompts to track our custom Aspect Based Sentiment Analysis model's performance and to log our prompts.

What's LangChain?

Think of LangChain as the interface between LLMs and LLM-based applications. We can use LangChain to create Prompt templates, chains, use memory, agents, and more.
In this section, we will concentrate on LangChain prompt templates. Prompt templates are used to define and generate prompt values in the LangChain framework. These templates provide a structured and reusable way to create prompt values by combining user input, dynamic information, and fixed template strings.

What's WandbTracer?

WandbTracer from W&B Prompts lets us securely log our LLM calls, inspect the flow of the LLMs, and view the inputs, outputs, and in-between results! Visit W&B prompt docs to learn more.
Let's now see how we can integrate WandbTracer into our project:
First, we install and import PromptTemplate and LLMChain from LangChain along with requests and wandb.
# install the packages
!pip install ai21
!pip install langchain
!pip install wandb

from langchain.llms import AI21
from langchain import PromptTemplate, LLMChain
import requests
To integrate WandbTracer in our project, all we need to do is:
  1. Use WandbTracer.init() to initialize the tracer.
from wandb.integration.langchain import WandbTracer
WandbTracer.init({"project": "Named_Entity_Recognition"})
2. Then in our LMMChain use the [WandbTracer()] call back:
template = """find aspect based sentiment analysis for this text: {news} """
prompt = PromptTemplate(template=template, input_variables=["news"])
llm = AI21(ai21_api_key=AI21_API_KEY,
model="ASBA-j2-large-v2",
maxTokens=200,
temperature=0.7,
topP=1,
countPenalty={...},
frequencyPenalty={...},
presencePenalty={...})

llm_chain = LLMChain(prompt=prompt, llm=llm, callbacks=[WandbTracer()])
results = llm_chain.run(news)
print(results)
That's it!
Once we use the WandbTracer with our LLM call, the trace table, trace timeline, and model architecture will be automatically added to a panel in the wandb workspace.


What we just saw was a intro to WandbTracer and we can do so much more with it. Check out this piece where the author creates a BadActorChain to build an app that responds to malicious inputs and adjusts the LLM's behavior to maintain adherence to ethical principles.

NewsTrackr for Fundamental Analysis

Let's now see how our app NewsTrackr can help us in understanding the market sentiment.
In the demo below, I have used the INFY (Infosys) ticker as the search criteria for the time period 13-04-23 to 17-04-23. In the ABSA results from our NewsTrackr app, we see a mixed sentiment for our search results. By analyzing further, we find that the sentiment for the Infosys stock as a whole is positive whereas the current performance is negative.
These results can be attributed to the company's fourth-quarter financial report and the culmination of its fiscal year 2023, which concluded on March 31.


In the historical price chart of INFY below, we can see that there's a huge red candle (indicates closing price is lower than the opening price for that timeframe) at the start of the market. We also see a similar behavior in ^NSEI and ^NSEBANK index (Nifty 50). This might (MIGHT) be due to the negative sentiment towards the stock shares which we saw earlier.
Note: Both NSE and BSE (stock exchanges in India) were closed on 14-04-23 in honor of Ambedkar Jayanthi, which is why we see an empty chart for both ^NSEI and ^NSEBANK on 14th and the huge red candle (bearish/negative price) on 17-04-23 instead (the next trading day).
💡



Conclusion

In this article, we learned a lot about Named Entity Recognition. Starting with how NER works, how the field has evolved over the years, and the challenges and applications of NER in real world. We then explored the endless applications of NER in finance, healthcare, disaster relief and more.
Like I mentioned, although NER might not be as notable as other NLP applications such as chatbots and AI Agents, there are still diverse use cases.
  • Want to extract and store receipt/invoice data in a structured format?
Use OCR with NER pipeline (or simply use LayoutLM or Donut model which bypasses the OCR engine).
  • Want to understand the sentiment towards your product's marketing campaign?
Use Entity Level Sentiment Analysis or Aspect Based Sentiment Analysis on social media data. The sky is the limit!
So, while NER may not always steal the spotlight, its impact and utility in processing and analyzing text data should not be underestimated. It serves as a fundamental building block for unlocking valuable insights and empowering various NLP applications.
I hope this article provides necessary knowledge and guidance for readers to delve into the fascinating realm of NER. Also, if you have any questions or require further clarification on any aspect covered in this article, please don't hesitate to reach out.
Happy learning!

References




Iterate on AI agents and models faster. Try Weights & Biases today.