Skip to main content

ASR Guide

Automatic Speech Recognition (ASR) refers to automatically transcribing spoken language, otherwise known as speech-to-text. In this blog, you will learn how to use NVIDIA’s Neural Modules (NeMo) toolkit to train an end-to-end ASR system and Weights & Biases to keep track of various experiments and performance metrics.
Created on August 25|Last edited on August 25

Setting up the Environment

Now that we have some idea about Automated Speech Recognition and the tools that we're going to use as part of this blog post, the first step is to set up the environment so we can run code.
We'll first launch an instance using AWS and then install the required dependencies for NeMo to run on the machine. We'll be using Nvidia NGC and Jupyter Notebooks here.
Here are the steps to set up the environment for Nvidia NeMo:
  1. Launch AWS Instance - p2.xlarge and use NVIDIA GPU-Optimized AMI.
  2. SSH into AWS instance and port forward 8888.
  3. Download the Jupyter Notebook from NGC here. This downloads files.zip
  4. Pull Nvidia NeMo docker container from NGC docker pull nvcr.io/nvidia/nemo:1.6.1.
  5. Run docker container using command docker run --runtime=nvidia -it --rm --shm-size=16g -p 8888:8888 --ulimit memlock=-1 --ulimit stack=67108864 -v $(pwd):/notebooks nvcr.io/nvidia/nemo:1.6.1.
  6. When inside docker container, launch Jupyter Notebook - jupyter notebook --port 8888.
  7. Go to localhost:8888 to access Jupyter Notebook.
  8. Upload downloaded files.zip in step-3 and unzip to get access to ASR W&B notebook.
And that's it! With 6 simple steps we should be inside an AWS instance with NeMo code ready to be run.