Skip to main content

Named-Entity Recognition on HuggingFace

This tutorial will cover how we can train NER model using transformers with code and visualizations.
Created on June 8|Last edited on November 18

Introduction

Named-Entity Recognition is a subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into predefine categories like person names, locations, organizations , quantities or expressions etc.
Here we will use huggingface transformers based fine-tune pretrained bert based cased model on CoNLL-2003 dataset.
CoNLL-2003 dataset consist of word tokens, pos-tags,chunk-tags and ner-tags. To train NER model we will consider only word tokens and ner-tags. There are total 7 unique ner-tags.
If you want to use your own dataset then you need to have dataset in form of word tokens and ner-tags set apart with space after preprocessing.

Try it in colab

1. Preprocessing

Firstly, we need to preprocess data before training NER model . Here we considered only NER tags and label and dropped other columns from dataset.
# Download Dataset
!git clone https://github.com/synalp/NER/tree/master/corpus/CoNLL-2003
# Preprocess Dataset
# Consider word token and label ##
!cat data/eng.train | cut -d " " -f 1,4 > data/train.txt
!cat data/eng.testa | cut -d " " -f 1,4 > data/test.txt
!cat data/eng.testb | cut -d " " -f 1,4 > data/dev.txt

# preprocess.py is used here to divide word tokens of length more than MAX_LENGTH
!wget "https://raw.githubusercontent.com/stefan-it/fine-tuned-berts-seq/master/scripts/preprocess.py"
MODEL = 'bert-base-cased'
MAX_LENGTH = 120
!python preprocess.py data/train.txt $MODEL $MAX_LENGTH > data/train-f.txt
!python preprocess.py data/test.txt $MODEL $MAX_LENGTH > data/test-f.txt
!python preprocess.py data/dev.txt $MODEL $MAX_LENGTH > data/dev-f.txt

# Prepare unique_labels.txt
!cat data/train-f.txt data/test-f.txt data/dev-f.txt | cut -d " " -f 2 | grep -v "^$" | sort | uniq > data/labels.txt
After preprocessing, data will be shown like this -

Preprocessed dataset consist of word token and NER tag

2. Finetuning NER model

After preprocessing , we will now finetune pretrained bert-base-cased model to train NER model for token classification.
For finetuning , we will use run_ner.py and utils_ner.py scripts.
# Training Hyperparameters
MAX_LENGTH = 128
EPOCHS = 3
MODEL = 'bert-base-cased'
SAVE_STEPS = 100
LOGGING_STEPS = 100
BATCH_SIZE = 32
OUTPUT_DIR = 'bert-ner'
SEED = 42
It's time to use transfer learning to finetune pretrained model for NER task. We already have intrusive knowledge of English language in our case from pretrained model , and now we need to finetune that pretrained model for NER task.
Run below command to finetune model on given dataset.
!python /content/run_ner.py --data_dir ./ --model_type bert --labels ./labels.txt --model_name_or_path "bert-base-cased" --output_dir OUTPUT_DIR --max_seq_length $MAX_LENGTH --num_train_epochs $EPOCHS --per_gpu_train_batch_size $BATCH_SIZE --save_steps $SAVE_STEPS --logging_steps $LOGGING_STEPS --seed $SEED --do_train --do_eval --do_predict --overwrite_output_dir


3. Evaluation

After training model and saving checkpoints , now it's time to evaluate performance of finetuned model.
Here we will use scikit-learn classification report as an evaluation metric.
def read_file(filepath):
with open(filepath,encoding='utf-8') as f:
example = {'words':[],'labels':[]}
words = []
labels = []
for line in f.readlines():
if line.startswith('-DOCSTART-') or line=='\n' or line=='':
if words:
example['words'].append(words)
example['labels'].append(labels)
words = []
labels = []
else:
splits = line.split(' ')
words.append(splits[0])
if len(splits[1])>1:
labels.append(splits[-1].replace("\n",""))
else:
labels.append('O')
return example
test_real = read_file('/content/data/test-f.txt')['labels'][:3250]
test_predictions = read_file('/content/data/OUTPUT_DIR/test_predictions.txt')['labels']

import sklearn
import numpy as np
from sklearn.metrics import classification_report
print(classification_report(np.concatenate(test_real),np.concatenate(test_predictions)[:51275]))

Output
Evaluation Metrics
Note - Value of all evaluation metrics is 0.00 for B-MISC class because support metric value for this class is 4 , it means only 4 occurences were present in testing dataset that's why model couldn't predict results for this class accurately.

4. Inference

After evaluation, now we can load model and tokenizer and pass into huggingface transformers pipeline and predict entities for a given sequences.
model = transformers.AutoModelForTokenClassification.from_pretrained('/content/data/OUTPUT_DIR/checkpoint-1300')
tokenizer = transformers.AutoTokenizer.from_pretrained('/content/data/OUTPUT_DIR/checkpoint-1300')

model_infer = pipeline('ner',model=model,tokenizer=tokenizer)
model_infer('Weights & Biases is a california based company known for building machine learning tools for ML engineers and Researchers')


Run set
1


After evaluation we got f1score=0.9521008403361344f1_score = 0.9521008403361344 and loss=0.033555489841252804loss = 0.033555489841252804 which is pretty cool on this dataset.
We can improve performance by using ELECTRA as a pretrained model for token classification if anyone wants to use complex dataset and not getting that performance using this bert-base-cased model for this task.

Weights & Biases

Weights & Biases helps you keep track of your machine learning experiments. Use our tool to log hyperparameters and output metrics from your runs, then visualize and compare results and quickly share findings with your colleagues.
Get started in 5 minutes.