GPT-3 미세 조정으로 챗봇 대화를 한 단계 끌어올리기

이 가이드는 GPT-3 기반 챗봇을 만드는 과정을 자세히 설명하며, 현실적인 대화를 위한 fine-tuning에 초점을 맞춥니다. 실용적인 인사이트와 함께 단계별 절차를 포함해, 대화형 AI에 관심 있는 개발자와 AI 애호가에게 유용한 자료입니다. 이 글은 AI로 번역된 기사입니다. 오역이 있을 경우 댓글로 알려 주세요.
Mostafa Ibrahim
Created on September 15|Last edited on September 15
Comment
﻿
소개GPT 계열 모델은 머신러닝 분야에 혁신을 가져왔습니다. 최신 기술 수준은 GPT-4와 GPT-4o이지만, 일부 활용 사례에서는 많은 조직이 이전 버전을 선택하기도 합니다. 이유는 간단합니다. 이전 GPT 구현은 비용이 더 저렴하고, GPT-2와 GPT-3도 충분히 우수한 성능을 낼 수 있기 때문입니다. 그런 활용 사례 중 하나가 바로 챗봇입니다. 
이 글에서는 바로 그 활용 사례를 위해 GPT-3를 살펴보겠습니다. 특히 챗봇 애플리케이션에서 기술이 실제 환경에서도 관련성과 효율성을 유지하도록, 왜 fine-tuning이 필수적인지 살펴봅니다. 아울러 Weights & Biases(W&B)가 시각화와 결과 비교를 통해 fine-tuning 과정을 어떻게 간소화하고, 우리의 fine-tuning 경험을 향상시키는지 보여드리겠습니다. 
GPT-3 챗봇 개발을 위한 사전 준비 사항부터 모델을 미세 조정하는 단계별 가이드, 그리고 최종 성능 평가까지, 이 글은 전체 과정을 아우르는 종합적인 로드맵을 제시하는 것을 목표로 합니다.
출처: 작성자 
왜 GPT를 미세 조정해야 할까요?Chat GPT-3를 특정 요구에 맞게 최적화하려면 미세 조정이 필수입니다. 미세 조정은 각 활용 사례의 고유한 맥락과 요구 사항에 맞춰 모델이 적응하도록 해 주기 때문입니다. 이를 통해 최신 정보를 반영하고, 원래 학습에 포함되지 않았던 새로운 데이터셋을 통합하며, 모델의 응답 방식을 사용자 정의할 수 있습니다. 이러한 과정을 거치면 미세 조정된 모델은 사용자 문의를 더 정확하게 이해하고 답변하며, 최신의 관련 정보를 제공하고, 의도된 대상이나 도메인에 보다 자연스럽고 특화된 방식으로 상호작용할 수 있습니다.
GPT-3 챗봇 개발 준비하기GPT 모델을 미세 조정하려면 먼저 OpenAI API가 필요합니다. 이를 사용해 학습 작업을 생성하고 모델을 미세 조정합니다. Jupyter 노트북을 설정하고 필요한 라이브러리를 임포트해야 합니다. 이 과정에는 별도의 특수 하드웨어가 필요하지 않습니다. 학습은 OpenAI 서버에서 수행되기 때문입니다.﻿﻿
미세 조정의 목적Chat GPT-3 모델을 미세 조정하면 특정 요구에 더 잘 부응하도록 기능을 맞춤화하고 강화할 수 있습니다. 예를 들어 GPT-3의 마지막 학습 시점이 2022년이므로, 그 이후에 발생한 사건에 대한 정보는 포함되어 있지 않습니다. 궁극적으로 목표는 범용 모델을 특화된 도구로 전환하는 것이며, 새로운 데이터로 학습시키고 지식을 확장함으로써 그렇게 만들 수 있습니다.
이 글에서는 미세 조정을 위해 널리 사용되는 SQuAD 데이터셋모델을 미세 조정하면 텍스트 문단을 이해하고 그와 관련된 질문에 정확히 답하는 능력을 학습하게 됩니다. 
시작해 봅시다.
미세 조정한 챗봇을 만들어 봅시다다음 섹션에서는 GPT-3 모델을 미세 조정하는 방법을 단계별로 자세히 안내합니다. 먼저 데이터셋을 자세히 살펴보고, 미세 조정에 적합한 형식으로 변환하겠습니다. 그다음 실제 미세 조정 절차를 살펴보겠습니다.
1단계: 필요한 패키지 설치먼저 미세 조정 작업에 필요한 라이브러리를 설치하고, 노트북 환경에서 아래 코드를 실행하세요. 
!pip install wandb
!pip install openai 
2단계: 데이터셋 준비미세 조정을 위해 SQuAD 데이터셋을 선택했습니다. 이 사이트에서 다운로드할 수 있습니다. SQuAD는 질문 응답 데이터셋으로, 500개가 넘는 위키피디아 문서에서 생성된 10만 개 이상의 질문-응답 쌍을 포함합니다. 각 데이터 포인트는 위키피디아 문서의 본문 일부(컨텍스트)와 그에 연결된 질문 및 해당 답변으로 구성됩니다. 
먼저 데이터셋을 미세 조정에 적합한 JSONL 형식으로 변환해야 합니다.
import json
﻿
﻿
def convert_dataset_to_jsonl(input_json_path, output_jsonl_path):
    with open(input_json_path, 'r', encoding='utf-8') as f:
        data = json.load(f)
﻿
﻿
    with open(output_jsonl_path, 'w', encoding='utf-8') as outfile:
        for article in data['data']:
            for paragraph in article['paragraphs']:
                context = paragraph['context']
                for qa in paragraph['qas']:
                    question = qa['question']
                    is_impossible = qa.get('is_impossible', False)
                    prompt = f"Context: {context}\nQuestion: {question}\nAnswer:"
                
                    if is_impossible:
                        completion = " Impossible"
                    else:
                        
                        answer = qa['answers'][0]['text'] if qa['answers'] else "Unknown"
                        completion = f" {answer}"
               
                    jsonl_entry = json.dumps({"prompt": prompt, "completion": completion})
                    outfile.write(jsonl_entry + '\n')
﻿
﻿
convert_dataset_to_jsonl('/train-v2.0.json', '/train.jsonl')
convert_dataset_to_jsonl('/dev-v2.0.json', '/dev.jsonl')
3단계: OpenAI 구성 및 초기화이제 사용할 모듈을 임포트하고 기본 설정을 구성하겠습니다. OpenAI 계정에서 API 키를 확인한 뒤, 환경 변수로 설정하거나 아래에 설명한 방식대로 직접 사용하세요.
import openai
import wandb
import os
﻿
﻿
# Set OpenAI API Key
openai.api_key = 'your api key'
client = OpenAI(api_key)
4단계: wandb 로그인다음으로 로그인하겠습니다!
wandb.login()
wandb.init(project='project_name', entity='entity_name')
5단계: 미세 조정을 위해 학습 및 검증 데이터셋을 OpenAI에 업로드하기이제 JSONL 형식으로 데이터셋 준비가 끝났으니 OpenAI에 업로드할 차례입니다. 데이터는 한 번만 업로드하고 저장해 두세요. train_file_id 그리고 dev_file_id, 그래서 메모리가 부족해질 때까지 매번 다시 업로드하지 않고도 동일한 데이터셋을 여러 번 실행에 재사용할 수 있습니다.
def upload_file_to_openai(file_path, purpose='fine-tune'):
    response = openai.File.create(file=open(file_path), purpose=purpose)
    return response.id
﻿
﻿
train_file_id = upload_file_to_openai("/train.jsonl")
dev_file_id = upload_file_to_openai("/dev.jsonl")
print(train_file_id)
6단계: 하이퍼파라미터 정의 및 wandb 로깅# Define hyperparameters
hyperparameters = {
    "n_epochs": 2,  # Number of training epochs
    "batch_size": 4,  # Batch size for training
    "learning_rate_multiplier": 0.1,  # Learning rate adjustment factor
}
﻿
﻿
# Log hyperparameters to wandb
wandb.config.update(hyperparameters)
7단계: OpenAI에서 미세 조정 작업 시작 및 Weights & Biases로 작업 ID 로깅이제 미세 조정 작업을 시작할 차례입니다. 이를 위해 우리는 사용할 것입니다 openai.FineTuningJob.create() 메서드입니다. 우리는 우리의 train_file_id 그리고 dev_file_id 업로드 단계에서 이전에 저장해 둔 것을 사용할 것입니다. 그다음, 이 메서드를 사용해 미세 조정 작업의 상태를 확인하겠습니다.
openai.FineTuningJob.retrieve(fine_tune_id)
fine_tune_response = openai.FineTuningJob.create(
    training_file=train_file_id,
    validation_file=dev_file_id,
    model="babbage-002",
    hyperparameters=hyperparameters
)
﻿
﻿
print(f"Fine-tuning started with ID: {fine_tune_response['id']}")
wandb.log({"fine_tune_id": fine_tune_response["id"]})
﻿
﻿
fine_tune_id= fine_tune_response['id']
﻿
﻿
fine_tune_status = openai.FineTuningJob.retrieve(fine_tune_id)
﻿
﻿
print(f"Fine-tuning job status: {fine_tune_status['status']}")
8단계: 미세 조정 작업 모니터링 및 결과 가져오기이제 미세 조정 작업이 시작되었으니, 작업이 완료되었는지 여부와 완료되었다면 이벤트 세부 정보를 확인하는 데 관심이 있습니다. 이를 위해 이 스크립트는 먼저 미세 조정 작업 ID를 캡처해 초기화한 다음, 인터럽트 신호(SIGINT)를 처리할 수 있도록 시그널 핸들러를 등록합니다. 
인터럽트를 받으면 미세 조정 작업의 현재 상태를 조회해 보고합니다. 이어서 해당 작업과 관련된 이벤트를 요청해 스트리밍하고, 각 이벤트의 타임스탬프와 메시지를 서식화하여 출력합니다. 스트리밍 과정이 중단되거나 오류가 발생하면 그 중단 사실을 보고합니다.
import signal
import datetime
﻿
﻿
fine_tune_id= fine_tune_response['id']
def signal_handler(sig, frame):
    status = openai.FineTuningJob.retrieve(fine_tune_id)['status']  # Access status correctly
    print(f"Stream interrupted. Job is still {status}.")
    return
print(f"Streaming events for the fine-tuning job: {fine_tune_id}")
signal.signal(signal.SIGINT, signal_handler)
try:
    events_response = openai.FineTuningJob.list_events(id=fine_tune_id)
    events = events_response['data']  # Access the list of events
   
    for event in events:
        event_time = datetime.datetime.fromtimestamp(event['created_at']).strftime('%Y-%m-%d %H:%M:%S')
        print(f"{event_time} {event['message']}")
except Exception as e:
    print(f"Stream interrupted (client disconnected). Error: {str(e)}")
출력이 준비되었습니다. 이제 이를 wandb에 로깅해야 합니다.
출처: 작성자
위의 데이터를 기반으로 W&B에서 그린 그래프입니다.
﻿
출처: 작성자
9단계: 미세 조정된 모델 평가이제 모델의 미세 조정을 마쳤으니 평가를 진행하겠습니다. 평가를 위해 질문과 답변으로 구성된 간단한 데이터셋을 준비했습니다. 각 질문과 해당 문맥을 모델에 질의하여 모델의 답변을 수집한 뒤, 이를 pandas DataFrame으로 정리합니다. 마지막으로 결과를 Weights & Biases(wandb) 프로젝트의 테이블로 로깅하고, 필요에 따라 로컬에서 사용할 수 있도록 CSV 파일로 저장합니다.
wandb.login()
wandb.init(project='project_name', entity='entity_name')
테스트 데이터셋을 살펴보겠습니다:
# Test dataset
test_data = [
    {
        "context": "The Normans (Norman: Nourmands; French: Normands; Latin: Normanni) were the people who in the 10th and 11th centuries gave their name to Normandy, a region in France. They were descended from Norse (\"Norman\" comes from \"Norseman\") raiders and pirates from Denmark, Iceland and Norway who, under their leader Rollo, agreed to swear fealty to King Charles III of West Francia. Through generations of assimilation and mixing with the native Frankish and Roman-Gaulish populations, their descendants would gradually merge with the Carolingian-based cultures of West Francia. The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries.",
        "qas": [
            {
                "question": "In what country is Normandy located?",
                "answer": "France"
            },
            {
                "question": "When were the Normans in Normandy?",
                "answer": "10th and 11th centuries"
            },
            {
                "question": "From which countries did the Norse originate?",
                "answer": "Denmark, Iceland and Norway"
            },
            {
                "question": "Who was the Norse leader?",
                "answer": "Rollo"
            },
            {
                "question": "What century did the Normans first gain their separate identity?",
                "answer": "10th century"
            }
        ]
    },
    {
        "context": "The distinct cultural and ethnic identity of the Normans emerged initially in the first half of the 10th century, and it continued to evolve over the succeeding centuries.",
        "qas": [
            {
                "question": "Who was the duke in the battle of Hastings?",
                "answer": "William the Conqueror"
            },
            {
                "question": "Who ruled the duchy of Normandy",
                "answer": "Richard I"
            }
        ]
    }
]
다음 함수 정의 normalize_answer(s): 텍스트형 답변을 소문자로 변환하고, 문장 부호를 제거하며, ‘a’, ‘an’, ‘the’와 같은 일반적인 관사를 삭제하여 표준화합니다. 이렇게 하면 형식의 일관성이 확보되어 답변을 비교하고 평가하기가 쉬워집니다. 
# Function to normalize answers (removing punctuation, lowercase, etc.)
def normalize_answer(s):
    import re
    def remove_articles(text):
        return re.sub(r'\b(a|an|the)\b', ' ', text)
    def white_space_fix(text):
        return ' '.join(text.split())
    def remove_punct(text):
        return re.sub(r'[\W]', ' ', text)
    def lower(text):
        return text.lower()
    return white_space_fix(remove_articles(remove_punct(lower(s))))
이 함수 def f1_score(prediction, truth): F1 점수를 계산합니다. 먼저 위에서 정의한 함수를 사용해 예측 답변과 정답을 모두 정규화합니다. 그런 다음 예측과 정답 사이의 공통 토큰 수를 기반으로 F1 점수를 산출합니다.
# Calculate F1 score
def f1_score(prediction, truth):
    prediction_tokens = normalize_answer(prediction).split()
    truth_tokens = normalize_answer(truth).split()
    common_tokens = Counter(prediction_tokens) & Counter(truth_tokens)
    num_same = sum(common_tokens.values())
    if num_same == 0: return 0
    precision = 1.0 * num_same / len(prediction_tokens)
    recall = 1.0 * num_same / len(truth_tokens)
    f1 = (2 * precision * recall) / (precision + recall)
    return f1
이 함수 def exact_match_score(prediction, truth): 정확 일치(Exact Match) 점수를 계산합니다. 이 점수는 예측한 답변이 정답과 완전히 일치하는지를 판단합니다. 정규화된 예측 답변이 정규화된 정답과 일치하면 1을, 그렇지 않으면 0을 반환합니다.
# Calculate Exact Match score
def exact_match_score(prediction, truth):
    return int(normalize_answer(prediction) == normalize_answer(truth))
이 함수 def query_model(question, context, model):  OpenAI API를 사용해 사전에 학습된 언어 모델(‘model’ 매개변수로 지정됨)에 질문과 컨텍스트를 질의합니다. 질문과 컨텍스트를 프롬프트로 제공해 응답을 생성하고, 모델의 답변을 가져옵니다. 그런 다음 앞뒤 공백을 제거한 응답을 반환합니다. 
# Function to query the model and get the answer
def query_model(question, context, model):
    openai.api_key = '<key-here>'
    response = openai.Completion.create(
        model=model,
        prompt=f"Question: {question}\nContext: {context}\nAnswer:",
        temperature=0,
        max_tokens=50,
        top_p=1.0,
        frequency_penalty=0.0,
        presence_penalty=0.0,
        stop=["\n"]
    )
    return response.choices[0].text.strip()
다음 스니펫은 위에서 언급한 테스트 데이터 목록을 받아 각 항목을 순회하며 컨텍스트와 질문을 추출하고, 모델에 답변을 질의한 뒤, 상세 결과를 리스트에 저장하고, 그 리스트를 DataFrame으로 변환하여 Weights & Biases에 테이블로 기록합니다.
# Adjusted part of the script to compile results into a DataFrame
detailed_results = []  # List to store detailed results
﻿
﻿
for item in test_data:
    context = item['context']
    for qa in item['qas']:
        question = qa['question']
        true_answer = qa['answer']
        model_answer = query_model(question, context, model="modelid")
       
        # Append detailed results for each question
        detailed_results.append({
            "question": question,
            "model_answer": model_answer,
            "true_answer": true_answer        })
﻿
﻿
# Convert detailed results list to DataFrame
df_results = pd.DataFrame(detailed_results)
﻿
﻿
﻿
﻿
# Log the entire DataFrame as a table to W&B
wandb.log({"results_table": wandb.Table(dataframe=df_results)})
﻿
﻿
# Optional: Save the DataFrame to CSV for local use
df_results.to_csv('evaluation_results.csv', index=False)
﻿
﻿
다음은 wandb에서 가져온 최종 테이블로, 우리가 제시한 질문의 실제 정답과 모델이 제시한 답변을 함께 보여줍니다. 이 테이블을 사용해 파인튜닝한 모델과 그 답변의 품질을 평가할 수 있습니다!
﻿
﻿
 이 글은 AI로 번역되었습니다. 오역이 있을 수 있으니 댓글로 알려 주세요. 원문 보고서는 다음 링크에서 확인할 수 있습니다: 원문 보고서 보기﻿
﻿
Add a comment