Train and Debug Your OCR Models With PaddleOCR and W&B
This article provides a quick tutorial on using the Weights & Biases integration in PaddleOCR to track training and evaluation metrics along with model checkpoints.
Created on May 22|Last edited on February 7
Comment
Run set
11
In this article, we're going to train a MobileNetV3 backbone on the ICDAR2015 dataset that contains a train set that has 1,000 images and a test set that has 500 images obtained with wearable cameras. We'll be using the Weights & Biases' PaddleOCR integration to track metrics.
Here's what we'll be covering:
Table of Contents
Let's get going!
Introduction to PaddleOCR
PaddleOCR aims to create multilingual, leading, and practical OCR tools that help users train better models and apply them to practice using PaddlePaddle. The W&B integration in the library lets you track metrics on the training and validation sets during training, along with checkpoints with appropriate metadata.
We also have an amazing colab that uses PaddleOCR for the text detection module of the OCR pipeline, complete with working code if you'd like to follow that!
Run set
11
Setting Things Up
Installing the W&B SDK
First, let's install and log into our W&B account:
pip install wandbwandb login
Installing PaddleOCR
Next, let's install PaddlePaddle:
pip install paddlepaddle-gpu pyclipper attrdict -qqq
This is followed by cloning the PaddleOCR GitHub repository for installing the package and getting the training script for training pre-implemented models:
git clone https://github.com/PaddlePaddle/PaddleOCRcd PaddleOCRpip install -e .
Awesome! Now that we have both W&B and PaddleOCR good to go, we can move on to setting up our dataset and training the text detection model.
How to Download the ICDAR2015 dataset?
We will use the ICDAR2015 dataset available here. The data has been logged as W&B artifacts for ease of use:
import wandbapi = wandb.Api()artifact = api.artifact("manan-goel/icdar2015/icdar2015-dataset:latest")artifact.download(root="./train_data/icdar2015")
Downloading the Pre-trained MobileNetV3 Model
For the case of this tutorial, we'll use a pre-trained MobileNetV3 model as the backbone for our text detection model. We'll fetch the model weights from the PaddlePaddle library of image models:
wget -P ./pretrain_models/ https://paddle-imagenet-models-name.bj.bcebos.com/dygraph/MobileNetV3_large_x0_5_pretrained.pdparams
Training the Model
In order to automatically start experiment tracking with W&B for your training pipelines, the following snippet can be added to the configuration yaml file, which is then used as input to the training script:
wandb:project: CoolOCRentity: my_teamname: MyOCRModel
All the arguments used as input to wandb.init otherwise can be added under the wandb header in the yaml file. The configuration file is used for the experiments in this tutorial is available here. Adding the above lines at the bottom of the file will activate the W&B logger.
To train the model using this yaml file, use the following command in the PaddleOCR repository:
python tools/train.py -c configs/det/det_mv3_db.yml \-o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained
Visualizing the Training and Validation Metrics
Finally, we train the model for 5 epochs with an evaluation step after every 10 training steps. Here's a look at some of our metrics:
Training Metrics
Run set
10
System Metrics
W&B will also automatically keep track of the GPU and CPU utilization time for every run!
Run set
10
Validation Metrics
Run set
10
Downloading and Using the Trained Model
The checkpoints are logged as W&B artifacts at the end of every epoch and at every model saving step with corresponding metadata and tags. This can be downloaded for further training and evaluation purposes using the following snippet.
import wandbartifact = wandb.Api().artifact('manan-goel/text_detection/model-2138qk4h:best', type='model')artifact_dir = artifact.download()
To use the trained model for text detection, the inference script in the PaddleOCR repo can be used.
!python3 tools/infer_det.py -c configs/det/det_mv3_db.yml -o Global.infer_img="./doc/imgs_en/" Global.pretrained_model="./artifacts/model-2138qk4h:v9/model_ckpt"
This will annotate all the images in ./doc/imgs_en/ folder and store them in the output folder.
Bonus: Logging Annotated Images to Your W&B Dashboard
A cool way to visualize model performance at the end of training is to log the input and output images and log them to the W&B dashboard. Let's start by initializing a new W&B run and loading the paths to images:
import wandbimport globwandb.init(project="text_detection")wandb.use_artifact('manan-goel/text_detection/model-2138qk4h:best')table = wandb.Table(columns=["Input Image", "Annotated Image"])inp_imgs = sorted(glob.glob("./doc/imgs_en/*.jpg"), key=lambda x: x.split("/")[-1])out_imgs = sorted(glob.glob("./output/det_db/det_results/*.jpg"), key=lambda x: x.split("/")[-1])
We then add the images to the W&B table and log it to W&B.
for inp in inp_imgs:for out in out_imgs:if out.split("/")[-1] != inp.split("/")[-1]:continuetable.add_data(wandb.Image(inp),wandb.Image(out))wandb.log({"Predictions": table})wandb.finish()
Run set
11
Conclusion
This tutorial gives a quick run-through on how you can use W&B in conjunction with PaddleOCR to support your all your OCR model development needs. Check out the colab for a version of this report with executable code.
Related Work
Information Extraction from Scanned Receipts: Fine-tuning LayoutLM on SROIE
An OCR demo with LayoutLM fine-tuned for information extraction on receipts data.
Information Extraction From Documents Using Machine Learning
In this article, we'll extract information from templated documents like invoices, receipts, loan documents, bills, and purchase orders, using a model.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.