Train and Debug Your OCR Models With PaddleOCR and W&B

This article provides a quick tutorial on using the Weights & Biases integration in PaddleOCR to track training and evaluation metrics along with model checkpoints.
Manan Goel
Created on May 22|Last edited on February 7
Comment
﻿
﻿
runs.summary["Predictions"]
 - 4 of 5
Input Image
Annotated Image
4
Run set11
﻿
In this article, we're going to train a MobileNetV3 backbone on the ICDAR2015 dataset that contains a train set that has 1,000 images and a test set that has 500 images obtained with wearable cameras. We'll be using the Weights & Biases' PaddleOCR integration to track metrics. 
Here's what we'll be covering: 
Table of Contents Introduction to PaddleOCRSetting Things UpTraining the ModelConclusionRelated Work
﻿
﻿
Let's get going! 
Introduction to PaddleOCRPaddleOCR aims to create multilingual, leading, and practical OCR tools that help users train better models and apply them to practice using PaddlePaddle. The W&B integration in the library lets you track metrics on the training and validation sets during training, along with checkpoints with appropriate metadata.
We also have an amazing colab that uses PaddleOCR for the text detection module of the OCR pipeline, complete with working code if you'd like to follow that!
﻿
﻿
﻿
Run set11
﻿
Setting Things Up
Installing the W&B SDKFirst, let's install and log into our W&B account:
pip install wandb
wandb login
Installing PaddleOCRNext, let's install PaddlePaddle:
pip install paddlepaddle-gpu pyclipper attrdict -qqq
This is followed by cloning the PaddleOCR GitHub repository for installing the package and getting the training script for training pre-implemented models:
git clone https://github.com/PaddlePaddle/PaddleOCR
cd PaddleOCR
pip install -e .
Awesome! Now that we have both W&B and PaddleOCR good to go, we can move on to setting up our dataset and training the text detection model.
How to Download the ICDAR2015 dataset?We will use the ICDAR2015 dataset available here. The data has been logged as W&B artifacts for ease of use:
import wandb
api = wandb.Api()
artifact = api.artifact("manan-goel/icdar2015/icdar2015-dataset:latest")
artifact.download(root="./train_data/icdar2015")
Downloading the Pre-trained MobileNetV3 ModelFor the case of this tutorial, we'll use a pre-trained MobileNetV3 model as the backbone for our text detection model. We'll fetch the model weights from the PaddlePaddle library of image models:
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams
Training the ModelIn order to automatically start experiment tracking with W&B for your training pipelines, the following snippet can be added to the configuration yaml file, which is then used as input to the training script:
wandb:
    project: CoolOCR
    entity: my_team
    name: MyOCRModel
All the arguments used as input to wandb.init otherwise can be added under the wandb header in the yaml file. The configuration file is used for the experiments in this tutorial is available here. Adding the above lines at the bottom of the file will activate the W&B logger.
To train the model using this yaml file, use the following command in the PaddleOCR repository:
python tools/train.py -c configs/det/det_mv3_db.yml \
        -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
Visualizing the Training and Validation MetricsFinally, we train the model for 5 epochs with an evaluation step after every 10 training steps. Here's a look at some of our metrics:
Training Metrics﻿
Run set10
﻿
System MetricsW&B will also automatically keep track of the GPU and CPU utilization time for every run!
﻿
Run set10
﻿
Validation Metrics﻿
Run set10
﻿
Downloading and Using the Trained ModelThe checkpoints are logged as W&B artifacts at the end of every epoch and at every model saving step with corresponding metadata and tags. This can be downloaded for further training and evaluation purposes using the following snippet.
import wandb
artifact = wandb.Api().artifact('manan-goel/text_detection/model-2138qk4h:best', type='model')
artifact_dir = artifact.download()
To use the trained model for text detection, the inference script in the PaddleOCR repo can be used.
!python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./artifacts/model-2138qk4h:v9/model_ckpt"
This will annotate all the images in ./doc/imgs_en/ folder and store them in the output folder.
Bonus: Logging Annotated Images to Your W&B Dashboard A cool way to visualize model performance at the end of training is to log the input and output images and log them to the W&B dashboard. Let's start by initializing a new W&B run and loading the paths to images:
import wandb
import glob
﻿
wandb.init(project="text_detection")
wandb.use_artifact('manan-goel/text_detection/model-2138qk4h:best')
table = wandb.Table(columns=["Input Image", "Annotated Image"])
﻿
inp_imgs = sorted(glob.glob("./doc/imgs_en/*.jpg"), key=lambda x: x.split("/")[-1])
out_imgs = sorted(glob.glob("./output/det_db/det_results/*.jpg"), key=lambda x: x.split("/")[-1])
We then add the images to the W&B table and log it to W&B.
for inp in inp_imgs:
    for out in out_imgs:
        if out.split("/")[-1] != inp.split("/")[-1]:
            continue
        table.add_data(
            wandb.Image(inp),
            wandb.Image(out)
        )
wandb.log({
    "Predictions": table
})
wandb.finish()
﻿
﻿
Run set11
﻿
ConclusionThis tutorial gives a quick run-through on how you can use W&B in conjunction with PaddleOCR to support your all your OCR model development needs. Check out the colab for a version of this report with executable code.
Related Work
Information Extraction from Scanned Receipts: Fine-tuning LayoutLM on SROIE
An OCR demo with LayoutLM fine-tuned for information extraction on receipts data.
Information Extraction From Documents Using Machine Learning
In this article, we'll extract information from templated documents like invoices, receipts, loan documents, bills, and purchase orders, using a model.
﻿
﻿
Add a comment
Tags: Tutorial, Articles, Intermediate, Plots, OCR, Computer Vision
Iterate on AI agents and models faster. Try Weights & Biases today.