ASR Guide
Automatic Speech Recognition (ASR) refers to automatically transcribing spoken language, otherwise known as speech-to-text. In this blog, you will learn how to use NVIDIA’s Neural Modules (NeMo) toolkit to train an end-to-end ASR system and Weights & Biases to keep track of various experiments and performance metrics.
Created on August 25|Last edited on August 25
Comment
Setting up the Environment
Now that we have some idea about Automated Speech Recognition and the tools that we're going to use as part of this blog post, the first step is to set up the environment so we can run code.
We'll first launch an instance using AWS and then install the required dependencies for NeMo to run on the machine. We'll be using Nvidia NGC and Jupyter Notebooks here.
- SSH into AWS instance and port forward 8888.
- Pull Nvidia NeMo docker container from NGC docker pull nvcr.io/nvidia/nemo:1.6.1.
- Run docker container using command docker run --runtime=nvidia -it --rm --shm-size=16g -p 8888:8888 --ulimit memlock=-1 --ulimit stack=67108864 -v $(pwd):/notebooks nvcr.io/nvidia/nemo:1.6.1.
- When inside docker container, launch Jupyter Notebook - jupyter notebook --port 8888.
- Go to localhost:8888 to access Jupyter Notebook.
- Upload downloaded files.zip in step-3 and unzip to get access to ASR W&B notebook.
And that's it! With 6 simple steps we should be inside an AWS instance with NeMo code ready to be run.
Add a comment