The Answer Key: Unlocking the Potential of Question Answering With NLP

A deep dive into question answering in machine learning, examining its challenges, techniques, and models, along with a step-by-step Python code illustration.
Mostafa Ibrahim
Created on February 16|Last edited on March 10
Comment
﻿
﻿Source﻿
In this article, we'll be exploring the basics of question-answering in machine learning, including an overview of creating such models, their challenges, and applications. 
Whether you're new to the field or just looking to gain a better understanding, this article will provide a comprehensive introduction to question-answering. Get ready to learn about the exciting world of question-answering and the ways it can enhance various NLP tasks. 
Here's what we'll be covering: 
Table of ContentsWhat Is a Question Answering in AI?What Is Question Answering in NLP?What Are the Challenges of Question Answering?What Is Generative Question Answering?How Do You Create a Question-Answering System?Datasets for Question AnsweringUsing HuggingFace for Question AnsweringTutorialStep 1: Import the Necessary LibrariesStep 2: Extract the Pre-Trained ModelStep 3: Creating Our Testing SampleStep 4: Tokenize the Answer_text and QuestionStep 5: Create an Attention MaskStep 6: Obtain the Model’s OutputStep 7: Defining the Start and End IndexesStep 8: Decode the Final AnswerOutputConclusion
﻿
What Is a Question Answering in AI?﻿Question Answering (QA) in Artificial Intelligence refers to the capability of a machine to respond to questions asked in natural language. The main objective of this technology is to extract relevant information from vast amounts of data and present it in the form of a concise answer. 
To accomplish this, AI researchers and engineers have developed a special type of model known as the Question-Answering model.
These models take in a question and then process a large amount of text data to determine the most accurate answer. For example, if the question is "What is the highest mountain peak in the world?" the QA model will scan its database and return the answer "Mount Everest." 
The ultimate goal of a Question Answering model is to truly understand the meaning behind the question and provide an answer that is relevant and fitting for the context. The success of a QA model is measured by its ability to provide accurate and meaningful answers to a wide range of questions.
What Is Question Answering in NLP?Question Answering in NLP, is an area of research and development that aims to provide human-like responses to questions in natural language. The goal is to create a system that can understand the context of a question, search for relevant information, and present an accurate answer.
In recent years, the demand for conversational AI systems and virtual assistants has grown, which has driven the development of Question Answering in NLP. These systems rely on sophisticated NLP techniques, including information retrieval, text classification, and machine learning, to analyze questions, find relevant information, and generate answers. By doing so, they aim to provide users with the same experience they would get when asking a knowledgeable human for information.
What Are the Challenges of Question Answering?Question Answering is a difficult task, as it involves overcoming multiple challenges in order for it to work effectively. Here are some of the challenges that need to be addressed:
Understanding the meaning of the question: The first challenge is to understand the intention behind the question, even if the wording is complicated or unclear. This requires the AI system to have a deep understanding of the natural language and the ability to differentiate between similar-sounding words and phrases. Here the effectiveness of a model is shown. In general, as the model gets to be more complex, the more deep understanding of the actual context it can provide. This is crucial, as in most cases, a given question may be rephrased, unclear, or even requires multiple distance sentences to answer effectively.
Retrieving Information: Another significant challenge is to extract relevant information from large amounts of data. The AI system must be able to search efficiently through vast amounts of text data to find the information it needs to provide an answer. This involves the use of sophisticated information retrieval techniques such as semantic analysis and information extraction. For example, Chat GPT, the groundbreaking AI tool, is trained on over 45 terabytes of textual data. Even though it is still capable of returning the required answer in a matter of seconds.
Representing Knowledge: To provide accurate answers, the AI system must be able to understand and represent the knowledge contained in the data it processes. This requires the use of advanced knowledge representation techniques, like ontologies and semantic networks, to categorize and organize information. These techniques are mainly performed to allow the model to understand how different words relate in a sentence. These approaches require word tokenization, which breaks down a piece of text into smaller units called tokens, which can then be analyzed and processed by a computer program.
Contextual Reasoning: Apart from understanding the meaning of the question, the AI system must also be aware of the context of the question and provide an answer that is appropriate and relevant to that context. This involves the system having a deep understanding of the relationships between different pieces of information and the ability to make decisions based on those relationships.
Verifying Answers: Finally, the AI system must be able to validate the accuracy of its answer by taking into account factors such as the reliability of sources and potential biases. This requires the system to critically evaluate information and make decisions based on multiple sources of evidence. It goes without saying that the accuracy of these models should not be relied upon heavily, as they are trained on human-generated data that often includes inaccuracies.
What Is Generative Question Answering?Generative Question Answering is a cutting-edge approach to the classic problem of question-answering. Unlike the conventional QA systems that merely fetch information from already existing sources, GQA systems have the capability to create their own answers from what they have learned. 
This opens up new avenues for more imaginative and subtle responses.
﻿Source﻿
﻿OpenAI's GPT-3, of which ChatGPT is a component, can be considered a generative question-answering model. GPT-3 uses a large-scale language generation model to generate text in response to questions and prompts, effectively answering questions. It has the ability to generate new answers based on the information it has learned from vast amounts of text data, making it a form of generative question answering.
For example, when using ChatGPT, you can ask a question, and the model will generate a text-based answer based on its training data. This allows for a more dynamic and creative Question Answering experience, as the model can generate new answers rather than simply retrieving information from pre-existing sources.
﻿
GQA systems are making their way into an array of applications, including virtual assistants, customer service bots, and educational platforms. The ability to generate new answers adds a new dimension of interactivity and user engagement to these fields.
How Do You Create a Question-Answering System?Creating a question-answering system can be broken down into several steps:
Data Collection and Preprocessing: The first thing you'll need is a large corpus of text data for the system to learn from. This data can come from sources like news articles, books, or databases. Then, you'll want to clean and format the data, so it's ready for further processing. This may involve removing irrelevant information, stemming or lemmatizing words, and tokenizing the text into individual words or phrases.
Information Retrieval: Next, you'll need to develop algorithms that can extract relevant information from the text corpus to answer questions. This can involve techniques like keyword search, text classification, and named entity recognition. For example, a keyword search algorithm can help identify the most relevant articles in the corpus based on the words in the question.
Question Analysis: It's important to analyze the question to understand its intent and identify keywords or phrases that will guide the information retrieval process. This can involve using techniques like part-of-speech tagging, dependency parsing, and named entity recognition to identify important words and phrases in the question.
Answer Generation: Once the relevant information has been retrieved, the Question Answering system needs to generate a response in natural language. This often involves techniques like text generation and summarization. For example, a text generation algorithm can generate a response based on the most relevant information retrieved in the previous step.
Model Training: The Question Answering system must be trained on the preprocessed text data and the answers generated to improve its performance and accuracy. This can involve using supervised or unsupervised machine learning algorithms to identify patterns in the data and improve the accuracy of the answers generated.
Model Evaluation: Finally, you'll want to evaluate the performance of the Question Answering system using metrics like precision, recall, and F1 score. This will help you determine how well the system is answering questions correctly and identify areas for improvement.
These steps can be repeated as needed to continue improving the Question Answering system. As new techniques in NLP and machine learning become available, they can be incorporated to make the system even more effective. The goal is to create a QA system that can understand the context of a question and generate accurate, relevant answers in natural language.
Datasets for Question AnsweringThere are several datasets available for question-answering research and development, including SQuAD (Stanford Question Answering Dataset), MS MARCO (Microsoft Machine Reading Comprehension), TREC QA (Text REtrieval Conference Question Answering Track), HotpotQA, BioASQ, CoQA, MultiRC, and RACE.
SQuAD  (Stanford Question Answering Dataset) is one of the most popular and widely used datasets in the field, containing over 100,000 questions and answers based on Wikipedia articles. 
MS MARCO focuses on the ability of question-answering systems to understand complex questions and provides answers from web pages. 
TREC QA, which has been used for over 20 years in the field, contains questions and answers from a variety of sources.
HotpotQA is a multi-hop question-answering dataset that requires reasoning over multiple paragraphs to find the answer.
BioASQ is a benchmark for biomedical semantic indexing and question-answering. 
CoQA is a conversational question-answering dataset that requires the model to maintain context and understand the relationship between questions and answers.
MultiRC is a dataset for multi-sentence reasoning in machine reading comprehension, and 
RACE is a large-scale dataset for machine reading comprehension and question answering in English.
Using HuggingFace for Question Answering
﻿Source﻿
﻿HuggingFace is a company that provides a platform for NLP research and development. They offer a massive collection of pre-trained models for various NLP tasks, such as sentiment analysis, text classification, language translation, and question answering. 
These models are designed using deep learning techniques and have been trained on large datasets. They can be fine-tuned for specific use cases as well.
HuggingFace makes it incredibly easy for developers and researchers to access and use these models with their API, which provides a user-friendly interface. The platform also provides resources and tools for NLP research, such as fine-tuning scripts, training data, and evaluation metrics. The community of users and contributors is also large and active, making it a great place for exchanging knowledge and expertise.
For question answering specifically, HuggingFace offers several pre-trained models such as BERT, GPT-2, and Roberta. These models can be used for both extractive and generative question answering. To use one of these models, all you need to do is load it, prepare your input data, and run it to get the answer.
Having said that, the steps for building a QA model using HuggingFace are quite similar to the normal approach. First, you install the HuggingFace transformer library and load the pre-trained model of your choice. Second, you prepare your input data and run your model.  
TutorialIn this part of the article, we will create a Question Answering model using Python. In this model, we will utilize the pre-trained BERT model created by HuggingFace to perform the QA functionality. ​​One commonly used dataset for fine-tuning the BertForQuestionAnswering model is the Stanford Question Answering Dataset (SQuAD).
Step 1: Import the Necessary LibrariesIn this part of the model, we will use the PyTorch library(torch) and 2 pre-trained transformers, which are the BertForQuestionAnswering and the BertTokenizer. The BertForQuestionAnswering is a pre-trained BERT (Bidirectional Encoder Representations from Transformers) model fine-tuned specifically for the Question Answering task. The BertTokenizer tokenizes the input text, converting it into numerical representation (tokens).
import torch
from transformers import BertForQuestionAnswering, BertTokenizer
Step 2: Extract the Pre-Trained ModelThe below 2 lines of code instantiate the BertForQuestionAnswering and BertTokenizer classes, respectively, and load a pre-trained BERT model with the specified name "bert-large-uncased-whole-word-masking-finetuned-squad".
model = BertForQuestionAnswering.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
tokenizer = BertTokenizer.from_pretrained("bert-large-uncased-whole-word-masking-finetuned-squad")
Step 3: Creating Our Testing SampleBelow is an answer_text string that holds the context or knowledge from which the model would return the required answer for the specified question.
answer_text = "The Great Barrier Reef is located in the Coral Sea, off the coast of Australia. It is the largest coral reef system in the world, stretching over 2,300 km and covering an area of approximately 344,400 km². The Great Barrier Reef is home to a diverse range of marine life and is considered one of the seven natural wonders of the world. It is also a UNESCO World Heritage Site threatened by climate change and other environmental factors."
question = "Where is the Great Barrier Reef located?"
Step 4: Tokenize the Answer_text and QuestionSimilar to the models' tokenization process, it is also required to perform tokenization on the new inputs of the model. This allows the model to understand the input.
input_ids = tokenizer.encode(question, answer_text)
Step 5: Create an Attention MaskThe below line of code creates an attention mask, a sequence of 1s and 0s indicating which tokens in the input_ids sequence should be attended to by the model.
attention_mask = [1] * len(input_ids)
Step 6: Obtain the Model’s OutputThe model output consists of two parts, the first part contains the logits for the start token index, and the second part contains the logits for the end token index.
output = model(torch.tensor([input_ids]), attention_mask=torch.tensor([attention_mask]))
Step 7: Defining the Start and End IndexesThe start_index and end_index determine the start and end indices of the answer in the input_ids sequence. These indices are found by selecting the token with the highest logit score from the start token logits (start) and the end token logits (end), respectively. These logit scores indicate the predicted start and end positions of the answer in the input_ids sequence. The BERT model utilizes the logit scores in order to represent the model's confidence in a particular classification.
start_index = torch.argmax(output[0][0, :len(input_ids) - input_ids.index(tokenizer.sep_token_id)])
end_index = torch.argmax(output[1][0, :len(input_ids) - input_ids.index(tokenizer.sep_token_id)])
Step 8: Decode the Final AnswerIn this step, we will revert the tokenization process for us to understand the value of the final answer.
answer = tokenizer.decode(input_ids[start_index:end_index + 1], skip_special_tokens=True)
print("Answer:", answer)
OutputThe final output for the model on the specific context and question.
Answer: coral sea
ConclusionIn conclusion, Question Answering is a fascinating field within machine learning that aims to develop algorithms capable of answering questions in natural language. NLP, or Natural Language Processing, is a crucial aspect of QA as it provides the tools and techniques necessary to process and understand human language.
Having said that, you can go on creating your own Question Answering model by utilizing the pre-trained model of your choice; you can even train your own model on the data set of your choice, such as the SQuAD (Stanford Question Answering Dataset), which is a widely used dataset for training and evaluating QA models. This data sets provides a benchmark for the performance of QA systems.
With the increasing availability of large amounts of text data, such as the Internet and digital libraries, the need for efficient and effective Question Answering systems has become increasingly important. These systems have the potential to greatly improve the way we interact with information by allowing us to quickly and easily access the information we need without having to spend time manually searching through vast amounts of data.
﻿
﻿
﻿
Add a comment
Tags: Articles, Beginner, NLP
Iterate on AI agents and models faster. Try Weights & Biases today.