CS6910: Project
Collect speech data for Indian languages, Build ASR models for Indian languages
Created on January 18|Last edited on January 18
Comment
Instructions
- This project will be divided into two parts: (i) Collect speech data for Indian languages (ii) Build ASR models for Indian languages and use them for a specific downstream application (e.g., enable digital payments in Indian languages)
- We strongly recommend that you work on this assignment in a team of size 2. Both the members of the team are expected to work together (in a subsequent viva both members will be expected to answer questions, explain the code, etc).
- Collaborations and discussions with other groups are strictly prohibited.
- At the end of the project you have to generate a report in a specified format (will be announced soon) using wandb.ai.You will upload a link to this report on gradescope.
- You also need to provide a link to your github code as shown below. Follow good software engineering practices and set up a github repo for the project on Day 1. Please do not write all code on your local machine and push everything to github on the last day. The commits in github should reflect how the code has evolved during the course of the assignment.
- You have to check moodle regularly for updates regarding the assignment.
Part 1
Each team will collecting a total of 10 hours of speech data and corresponding transcripts from 10 districts of India using an app that will be provided by the course TAs (the team members can choose the districts). The team will have to enhance the app to add features to automatically detect the following:
- Background noise (such audio should be rejected)
- Low volume (such audio should be rejected)
- Skipped words (e.g., if the user was shown the text "I live in India" and he/she only speaks "I live India") The last feature can be implemented using an existing ASR model which will be provided to the students.
The timelines would be as follows:
- Adding above features in the app: 15-Feb-2022
- 5 hours of speech data: 28-Feb-2022 (we will automatically check this in the backend)
- 10 hours of speech data: 10-Mar-2022 (we will automatically check this in the backend)
- Report: 12-Mar-2022
Part 2
The goal here would be to fine-tune an ASR model using the data collected in Part 1 (as well as additional data provided by course instructors)
The timelines would be as follows:
- 1-page project proposal describing the end application, languages that will be supported and the social impact: 31-Jan-2022
- Plan to collect data specific for this application to finetune/benchmark your model (e.g., for digital banking you would have to ensure that your data has terms like ``balance'', ''interest'', etc): 28-Feb-2022
- A report on the performance of the ASR models that you have built for all the languages supported by your application: 20-Apr-2022
- Demo of the final application: 25-Apr-2022 to 27-Apr-2022
Self Declaration
List down the contributions of the two team members:
For example,
CS20Mzzzz: (70% contribution)
- ...
- ...
- ...
- ...
- ...
- ...
CS20Myyyy: (30% contribution)
- ...
- ...
- ...
- ...
We, Name_XXX and Name_ZZZ, swear on our honour that the above declaration is correct.
Note: Your marks in the assignment will be in proportion to the above declaration. Honesty will be rewarded (Help is always given in CS6910 to those who deserve it!).
This is an opportunity for you to come clean. If one of the team members has not contributed then it should come out clearly from the above declaration. There will be a viva after the submission. If your performance in the viva does not match the declaration above then both the team members will be penalised (50% of the marks earned in the assignment).
Add a comment