MultiTask-MultiLanguage NLU design
How to make chatbots intelligence closer to human one
Created on June 28|Last edited on June 28
Comment
Introduction
Text and token classification are two of the most popular downstream tasks in Natural Language Processing (NLP), enabling semantical and lexical analysis of utterances, respectively. Both problems are intrinsecally linked, even though there has always been some disparity between them, therefore it makes sense to ask ourselves if there is any procedure to combine them in a network to help one task solve the other, and vice versa.
A simple example to motivate this problem is the following one: suppose you have a patient looking for a medical specialist, and in its query appears the work cancer (this would be an entity of our dataset). Even if you have no further information about the patient message, would it help you determine the appropriate specialist? How likely is that it ends up being assisted by an oncologist (this would be our intent)?
It will be shown that, with the appropriate model architecture and training strategy approaches, a MultiTask setup can outperform single-task models. For implementation details, please check this repo.
Overall Results
To make a fair comparison of models performance, we carried out three experiements:
- Train a single-task intent classification (IC) model (1st row).
- Train a MultiTask model (2nd row).
- Train a single-task named-entity recognition (NER) model (3rd row).
As shown in the below table, the performance of MultiTask model is boosted up to 6% in terms of macro f1-score compared to the single-task IC one, at the cost of less than 1% efficiency in terms of macro f1-score compared to the single-task NER approach.
Run set
5
Language performance
Taking a look at the individual performance of the models by language, we find out that the results are incredibly powerful in the MultiTask side, especially in most spoken languages. It is also worth mention that the smallest f1-score in the MultiTask version is attained in Korean language (76.1%), while in the single-task IC model is 68.4%.
Run set
4
Add a comment