Snowflake Arctic을 활용해 PubMed 논문을 이해하는 생의학 RAG 애플리케이션 만드는 법

방대한 의학 정보 코퍼스를 더 잘 이해하기 위한 RAG 애플리케이션 구축 튜토리얼 이 문서는 AI 번역본입니다. 오역이 있을 경우 댓글로 알려주세요.
Created on September 15|Last edited on September 15
Comment
﻿
다음 링크를 통해 이 프로젝트의 코드를 확인하고 W&B Weave에서 실행 내역을 살펴볼 수 있습니다:
코드: https://github.com/ash0ts/snowflake-arctic-weave-demo﻿
Weave: https://wandb.ai/a-sh0ts/bioasq_example/weave/traces﻿﻿﻿
💡
소개분주한 병원에서 소아과 의사가 신생아의 희귀 유전 질환처럼 복잡한 환자 사례에 대해 빠르고 정확한 결정을 내려야 하는 상황을 상상해 보세요. 이 소아과 의사는 유전적 요인, 치료 옵션, 최신 연구 결과를 파악해야 합니다. 하지만 PubMed에 축적된 방대한 생의학 문헌 속에서 관련 정보를 골라내는 일은 매우 벅찰 수 있습니다.
이러한 맥락에서 Snowflake Arctic과 통합된 검색 증강 생성(RAG) 시스템은 매우 유용합니다. 작동 방식은 다음과 같습니다:
시간 민감 정보 검색: 소아과 의사는 “히르슈스프룽병은 멘델 유전 질환인가요, 아니면 다인자 질환인가요?”와 같은 임상 질문을 입력합니다. 시스템은 이 질문을 신속하게 최적화된 의미 기반 검색 질의로 변환하고, PubMed에서 가장 관련성이 높은 문서를 검색해 가져옵니다.
정확하고 관련성 높은 문서 검색: 고급 임베딩 모델을 활용해 시스템은 방대한 생의학 데이터베이스를 검색하고, 필요한 정보를 포함할 가능성이 가장 높은 논문을 선별해 가져옵니다.
효율적인 맥락 분석: 시스템은 검색된 문서의 초록을 평가하여 해당 임상 질문과의 관련성을 판별합니다. 이를 통해 가장 관련성이 높은 문서만 검토 대상으로 삼아, 임상의의 소중한 시간을 절약합니다.
간결한 요약 이후 시스템은 이러한 문서에서 핵심 내용을 요약하여 임상 질문에 답하는 명확하고 간결한 개요를 제공합니다.
근거에 기반한 의사 결정 마지막으로 시스템은 요약된 정보를 종합해 일관된 답변을 생성합니다. 소아과 의사는 히르슈스프룽병의 유전적 측면을 이해하는 데 도움이 되고 치료 결정에 참고할 수 있는 간결하고 근거 기반의 응답을 받습니다.
이러한 RAG 시스템을 도입하면 임상의가 문헌 검토에 소요하는 시간을 크게 줄여 환자 진료에 더 집중할 수 있습니다. Snowflake Arctic과 같은 견고한 데이터 플랫폼에 최신 머신러닝 모델을 결합함으로써 임상 의사 결정 지원이 강화되어, 더 나은 환자 결과와 보다 효율적인 의료 제공으로 이어집니다.
예시:
Question: 
﻿
Autophagy is the process where a virus obtains nutrients from it's host, yes or no?
﻿
Ground Truth: 
﻿
No, autophagy is important in cellular homeostasis for the cell survival mechanism and is involved apoptosis.
﻿
Prediction:
﻿
No, autophagy is not the process where a virus obtains nutrients from its host. Autophagy is a cellular process that recycles or eliminates cell components and damaged organelles through lysosomal degradation. Some viruses, like Herpes simplex virus type I (HSV-1), can manipulate the autophagic process for their replication and survival, but it is not a means for the virus to obtain nutrients from its host.
﻿
Context:
﻿
Autophagy is a homeostatic process involved in the turnover or elimination of
cytoplasmic components, damaged organelles, and protein aggregates via a
lysosomal degradation mechanism. Autophagy also provides a mechanism of innate
immunity, known as xenophagy, designed to protect cells from intracellular
pathogens, but it may unfortunately be subverted to act as a pro-viral pathway
facilitating the replication of certain viruses. Herpes simplex virus type I
(HSV-1) is a neurotropic virus that remains latent in host neurons; it is the
most common cause of sporadic viral encephalitis. Moreover, HSV-1 has been
related to the pathogenesis of Alzheimer's disease. HSV-1 can modulate the
autophagic process through a mechanism mediated by the viral protein ICP34.5.
Here we report that HSV-1 induces a strong increase in GFP-LC3 and endogenous
LC3 lipidation, and triggers the accumulation of intracellular autophagic
compartments (mainly autophagosomes) without enhancing autophagic long-lived
protein degradation in the late stages of infection. Autophagy inhibition
mediated by ATG5 gene silencing had no effect on viral growth. The present
results suggest that HSV-1 infection activates the host autophagic machinery and
strongly controls the autophagic process, blocking the fusion of autophagosomes
with lysosomes. These events might be important in the neurodegenerative process
associated with HSV-1 infection. (Score: 0.5636628997173461)
배경
생의학 정보 문제
﻿
  아이디어 기반: https://arxiv.org/pdf/2310.16146﻿
PubMed나 Cochrane과 같은 플랫폼을 통한 의학 지식의 집약과 유통은 의료 전문가와 연구자가 최신 과학적 발견을 지속적으로 파악하도록 돕습니다. 그러나 매년 100만 편이 넘는 논문이 PubMed에 추가되기 때문에 모든 새로운 연구 결과를 따라잡는 것은 사실상 불가능합니다.
기존 기술은 의료 전문가와 연구자의 정보 요구를 충족하지 못하는 경우가 많습니다. 임상의는 보통 환자 두 명 중 한 명꼴로 진료와 직접 관련된 질문을 가지며, 답을 찾기 위해 PubMed나 UpToDate와 같은 출처를 참조합니다. 그러나 2~3분 내에 답할 수 없는 질문은 종종 포기되며, 이는 환자 진료와 결과에 부정적인 영향을 미칠 수 있습니다.
체계적 문헌고찰(SR) 논문이 신속한 답을 제공할 수는 있지만, 기존 검토에서 다루지 않는 질문이 매우 많습니다. 출판된 검토 논문 없이 여러 1차 문헌의 결과를 수작업으로 종합하는 일은 매우 시간이 많이 듭니다. 검토 논문은 평균 67.3주가 소요되며, 최신 연구가 반영되지 않을 수도 있습니다.
자주 업데이트되는 외부 전자 자료를 활용하는 질의응답 도구는 최신 정보를 효율적으로 제공하여 과학적 발견과 환자 진료 모두에 이점을 제공합니다. 이전 수십 년 동안 임상 시스템을 온라인 정보와 통합한 애플리케이션(예: “infobuttons”)은 보통 시맨틱 네트워크에 의해 구동되었습니다. CHiQA와 같은 다른 연구들은 지식 기반 접근, 머신러닝, 딥러닝을 결합하여 환자 지향적 자원을 활용하는 질의응답 시스템을 개발했습니다.
자연어 생성 문제대규모 언어 모델(LLM)로 구동되는 에이전트의 새로운 기능은 자동 문헌 요약 도구의 개발을 가속화했습니다. 대부분의 솔루션은 검색 증강 LLM(RetA LLM)에 기반한 비공개 개발, 클로즈드 소스 시스템입니다. 그러나 안전하고 책임 있는 사용을 보장하기 위한 공개 기술 보고서, 가이드라인, 규제, 평가가 부족하다는 점은 큰 우려 사항입니다.
이러한 자연어 생성(NLG) 문제는 (1) 대표성 있는 데이터셋과 과제의 부재, (2) RetA LLM을 평가하기 위한 자동화 지표의 부족으로 더욱 악화됩니다. 다행히 LLM 평가 분야의 발전으로, 의학을 포함한 도메인 특화 시나리오에서도 자동화 지표가 인간 선호와 중간 정도로 상관한다는 사실이 확인되었습니다.
바이오에이에스큐
﻿
﻿
이러한 과제를 해결하기 위한 한 가지 노력은 BioASQ 프로젝트BioASQ는 대규모 생의학 시맨틱 색인과 질의응답에 초점을 맞추며, 관련 문서와 함께 생의학 질문 모음을 제공합니다. 이는 검색 증강 생성 시스템의 성능을 평가하기 위한 이상적인 벤치마크가 됩니다. BioASQ 데이터셋을 활용하면, 연구자들은 생의학 도메인에서의 실제 정보 요구를 반영하는 대표적인 과제를 만들 수 있습니다.
 참고: 이 실험군에서는 PubMed의 대체로 이를 사용하여 생의학 텍스트 이해 평가에 더 적합한 자료를 바탕으로, 보다 나은 평가 파이프라인을 구축하겠습니다.
💡
스노우플레이크 아틱
﻿
  아이디어 https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/﻿
Snowflake Arctic는 비용 효율적인 학습과 개방성을 목표로 설계된 최첨단 엔터프라이즈 LLM으로, 엔터프라이즈급 AI의 지형을 혁신하고 있습니다. SQL 생성, 코딩, 지시 따르기, 복잡한 질의 응답과 같은 작업에서 뛰어난 성능을 보이며, 훨씬 더 큰 연산 자원을 사용하는 오픈소스 모델들을 능가합니다.
혁신적인 Dense–MoE 하이브리드 트랜스포머 아키텍처를 활용하여, Arctic는 100억 매개변수의 조밀(dense) 트랜스포머 모델과 잔차(residual) 구조의 128×36.6억 매개변수 MoE MLP를 결합해 총 4800억, 활성 170억 매개변수를 달성함으로써 최고 수준의 지능과 자원 효율적인 학습·추론을 구현합니다. Apache 2.0 라이선스 하에 제공되는 Arctic는 가중치, 코드, 데이터 레시피, 연구 인사이트에 대한 비가림(ungated) 접근을 보장하여 엔터프라이즈 AI 애플리케이션에 매우 높은 접근성과 비용 효율성을 제공합니다.
Arctic의 높은 학습 효율성 덕분에 Snowflake 고객은 3000 GPU 주 미만의 리소스로, 학습 비용을 200만 달러 이하로 유지하면서 고품질의 커스텀 모델을 경제적으로 만들 수 있습니다. Arctic는 학습 연산 예산을 절반 이하만 사용하면서도 엔터프라이즈 지표에서 Llama 3 8B와 Llama 2 70B 같은 모델을 능가합니다. 또한 NVIDIA와의 협업을 통해 최적화된 구현을 제공하여 추론 속도가 더 빠르며, 대화형 및 대규모 배치 추론 환경에서 실용적인 선택이 됩니다.
최첨단 아키텍처, 비용 효율성, 개방형 접근의 조합은 Snowflake Arctic를 엔터프라이즈 AI에 이상적인 솔루션으로 만들어, 다양한 애플리케이션에서 강력하고 효율적인 AI 모델을 배포하는 역량을 강화합니다.
생의학 RAG 모델을 구축하는 방법생의학 연구의 급속한 발전으로 방대한 문헌에서 관련 정보를 효율적으로 추출하고 종합하는 방법이 필수적입니다. 본 글에서는 PubMed 논문을 기반으로 복잡한 생의학 질문을 이해하고 답변하기 위해 검색 증강 생성(RAG) 모델을 활용하는 실용적인 예시를 소개합니다.
이 애플리케이션에서 Weave, Snowflake Arctic, Streamlit을 통합하면 생의학 질의 응답을 위한 실용적이고 효율적인 해법을 제공합니다. Weave의 프레임워크는 버전 관리와 모듈화를 지원하여 다양한 모델 구성으로 빠르게 실험하고, 업데이트를 원활하게 배포할 수 있게 합니다. 
이는 최신 연구 결과의 반영이 정확성에 필수적인 생의학처럼 빠르게 발전하는 분야에서 특히 중요합니다. Snowflake Arctic의 비용 효율적이면서도 강력한 LLM 아키텍처, 특히 하이브리드 dense–MoE 구조를 활용하면 낮은 추론 지연 시간을 유지하면서도 복잡한 생의학적 추론 과제를 견고하게 처리할 수 있습니다. 이러한 효율성은 응답 속도 향상으로 이어지며, 시간에 민감한 임상 애플리케이션에서 매우 중요합니다. Streamlit 인터페이스는 이 파이프라인에 접근하기 쉬운 프런트엔드를 제공하여, 연구자와 임상의 모두가 전문적인 기술 역량 없이도 효율적으로 정보를 검색할 수 있게 합니다. 
이 결합된 접근 방식은 직관적인 인터페이스로 사용자 경험을 단순화하면서, 수작업 문헌 검토나 상용 지식 베이스 의존과 같은 전통적 방법에 비해 속도와 비용 효율성 측면에서 큰 이점을 제공합니다.
﻿
핵심 구성 요소질의 변환: GenericLLMModel이라는 사용자 지정 weave.Model을 사용해 생의학 질문을 최적화된 의미 기반 검색 질의로 변환합니다.
문서 검색: 고급 임베딩 모델과 벡터 스토어를 활용해 BioASQ 데이터셋에서 가장 관련성 높은 문서를 찾습니다.
컨텍스트 점수화: 검색된 문서의 초록을 기반으로 관련성을 평가하기 위해 특화된 모델을 활용합니다.
요약: 관련 문서에서 핵심 내용을 요약해 생의학적 질문에 답하는 일관된 개��를 제공합니다.
최종 답안 합성: 요약된 정보를 종합해 임상 질문에 대한 명확하고 간결한 답을 도출합니다.
1. 질의 변환
﻿
바이오메디컬 RAG 모델의 첫 단계는 입력 질문을 최적화된 의미 검색 질의로 변환하는 것입니다. 이를 위해 맞춤형 모델을 활용합니다. GenericLLMModel이는 이 작업을 위해 특별히 설계된 Weave 모델입니다.
질의 변환 과정은 다음과 같이 진행됩니다:
먼저, 우리는 정의합니다 GenericLLMModel Weave 사용하기:
class GenericLLMModel(weave.Model):
    model_name: str = "replicate/snowflake/snowflake-arctic-instruct"
    prompt_template: PromptTemplate
    temperature: float = 0.0
    name: str = "GenericLLMModel"
﻿
    def __init__(
        self,
        system_prompt: Optional[str] = None,
        human_prompt: Optional[str] = None,
        model_name: str = "gpt-3.5-turbo",
        temperature: float = 0.0,
    ):
        super().__init__(
            model_name=model_name,
            prompt_template=PromptTemplate(
                system_prompt=system_prompt, human_prompt=human_prompt
            ),
            temperature=temperature,
        )
﻿
    @weave.op()
    def predict(
        self,
        human_prompt_args: Optional[dict] = {},
        system_prompt_args: Optional[dict] = {},
    ) -> dict:
        messages = self.prompt_template.format_prompt(
            human_prompt_args=human_prompt_args, system_prompt_args=system_prompt_args
        )
        # ...
        response = completion(**completion_args)
        answer = response.choices[0].message.content
        return {"answer": answer}
﻿
그다음, 우리는 생성합니다 question_2_query_model 이를 사용하여 GenericLLMModel:
question_2_query_model = GenericLLMModel(
    system_prompt=question_2_query_system_prompt,
    human_prompt=question_2_query_human_prompt
)
질문을 질의로 변환하기 위해 우리는 다음을 사용합니다 predict 우리 모델의 메서드로, 다음으로 데코레이트되어 있습니다 @weave.op():
transformed_query = question_2_query_model.predict(human_prompt_args={"question": question})['answer']
이 프로세스는 질문을 받아 최적화된 시맨틱 검색 질의로 변환합니다. The @weave.op() 데코레이터는 기본 호출이 이미 자동 로깅되더라도 이 작업이 추적되도록 보장합니다.
여기에서 사용하는 프롬프트는 다음과 같습니다:
question_2_query_system_prompt = """
### Instruction ###
You are an expert biomedical researcher tasked with converting biomedical questions into optimized semantic search queries. Your goal is to generate queries that will retrieve the most relevant documents from the BioASQ dataset to answer the given question.
﻿
### Process ###
Follow these steps to create the semantic search query:
1. Carefully analyze the biomedical question to identify the most important keywords, concepts, and entities
2. Construct a search query using those keywords, aiming to retrieve all potentially relevant documents
3. Optimize the query by incorporating synonyms, related terms, and expanding acronyms if applicable
4. Double check that the query captures the core intent of the question and will match pertinent documents
5. Provide only the final semantic search query in your response, without any additional commentary
﻿
### Context ###
The BioASQ dataset consists of biomedical questions along with relevant documents. Your semantic search queries will be used to find the most relevant documents from this dataset to answer each question. The ideal answers have been removed, so your query should focus solely on the question text.
﻿
### Examples ###
Question: Is Hirschsprung disease a mendelian or a multifactorial disorder?
Semantic Search Query: Hirschsprung disease AND (mendelian OR multifactorial OR complex) AND (inheritance OR genetics OR genes)
﻿
Question: List signaling molecules (ligands) that interact with the receptor EGFR?  
Semantic Search Query: EGFR AND (ligands OR "signaling molecules") AND (EGF OR BTC OR EPR OR HB-EGF OR TGF-α OR AREG OR EPG)
﻿
Question: Is the protein Papilin secreted?
Semantic Search Query: Papilin AND (secreted OR extracellular OR "secretory pathway")
﻿
### Evaluation ###
Your performance will be evaluated on:  
- Inclusion of the most salient keywords, concepts and entities from the biomedical question
- Appropriate use of synonyms and related terms to improve retrieval
- Ability of the query to capture the full scope and intent of the question
- Overall likelihood of the query retrieving documents that can answer the question
- Adherence to the response format instructions
﻿
You MUST provide a well-constructed query that fulfills the given criteria. You will be penalized for queries that are too narrow, off-topic, or poorly formulated.
"""
예시 하나:
Question: List signaling molecules (ligands) that interact with the receptor EGFR?  
Semantic Search Query: EGFR AND (ligands OR "signaling molecules") AND (EGF OR BTC OR EPR OR HB-EGF OR TGF-α OR AREG OR EPG)
2. 문서 검색
﻿
문서 검색 단계는 우리의 RAG 파이프라인에서 질의 변환 다음에 수행됩니다. 이 단계에서는 맞춤형 VectorStore BioASQ 데이터셋에서 관련 문서를 효율적으로 검색하기 위한 클래스입니다.
The VectorStore 클래스는 다음과 같습니다 weave.Object 다음과 같은 핵심 구성 요소와 함께:
임베딩 모델(기본값: "text-embedding-3-small")
임베딩 함수
기사 저장 및 임베딩
순위 지정 방법(기본값: 코사인 유사도)
검색 프로세스는 다음을 포함합니다:
변환된 쿼리 임베딩
사전 계산된 문서 임베딩과 쿼리 임베딩 비교
선택한 유사도 지표에 기반한 문서 순위화
가장 관련성이 높은 상위 N개 문서 반환
vector_store = weave.ref('VectorStore:latest').get()
embedding_model = weave.ref('SentenceTransformersModel:latest').get()
vector_store.set_embedding_model(embedding_model)
﻿
relevant_docs = vector_store.get_most_relevant_documents(
    query=transformed_query,
    n=5,
    ranking_method="cosine"
)
바이오메디컬 RAG 파이프라인 전반에서 우리는 활용합니다 weave.ref 사전 계산된 리소스와 데이터셋에 효율적으로 접근하기 위해:
vector_store = weave.ref('VectorStore:latest').get()
embedding_model = weave.ref('SentenceTransformersModel:latest').get()
qap = weave.ref('QuestionAnswerPairsTrainFiltered:latest').get()
이 접근 방식은 다음과 같은 여러 가지 장점을 제공합니다:
버전 관리: 모델과 데이터셋의 특정 버전에 손쉽게 접근할 수 있습니다.
재현성: 여러 번 실행해도 일관된 결과를 보장합니다.
리소스 효율성: 사전 계산된 리소스를 재사용하여 중복 계산을 피합니다.
유연성: 실험을 위해 구성 요소를 신속하게 교체할 수 있습니다.
활용하여 weave.ref, 워크플로를 간소화하고 모듈식으로 구성된, 쉽게 업데이트할 수 있는 RAG 파이프라인을 유지합니다.
3. 컨텍스트 점수화
﻿
잠재적으로 관련성이 있는 문서를 검색한 후, 우리의 RAG 모델은 문서 선정을 한층 더 정교화하기 위해 컨텍스트 점수화 단계를 수행합니다. 이 단계는 요약과 답변 합성에 가장 적합한 정보만 사용되도록 보장합니다.
컨텍스트 점수화 과정은 우리의 또 다른 인스턴스를 활용합니다 GenericLLMModel, 검색된 문서의 관련성을 평가하도록 특별히 맞춤화된:
article_relevance_model = GenericLLMModel(
    system_prompt=article_relevance_system_prompt,
    human_prompt=article_relevance_human_prompt
)
이 모델은 각 문서가 원 질문과 관련이 있는지에 대해 이진형 “예” 또는 “아니오” 답을 제공하도록 설계되었습니다. 점수화 과정은 다음과 같이 진행됩니다:
검색된 각 문서에 대해, 우리는 article_relevance_model 관련성을 예측하기 위해:
for doc in _context:
    doc["relevance"] = article_relevance_model.predict(
        human_prompt_args={
            "question": question,
            "article_text": doc["document"]["passage"]
        }
    )['answer']
그런 다음 관련성 점수에 따라 문서를 필터링합니다:
relevant_context = [doc for doc in _context if doc["relevance"].lower() == "yes"]
이 과정은 요약 단계로 넘어가는 문서가 가장 관련성 높은 것들로만 제한되도록 보장하여, 최종 답변의 품질과 정확성을 향상시킵니다.
The article_relevance_model 의사결정 과정을 안내하기 위해 세심하게 설계된 시스템 프롬프트를 사용합니다:
article_relevance_system_prompt = """
### Instruction ###
You are an expert medical researcher librarian. Your task is to determine whether articles from the BioASQ dataset may be relevant to questions from clinicians based on the articles' abstracts. You MUST provide a yes or no answer. You will be penalized for answers that are not a clear yes or no.
﻿
### Process ###
1. Carefully read the provided clinical question.
2. Analyze the given article abstract in the context of the question.
3. Determine if the abstract contains information potentially relevant to answering the question.
4. Provide a definitive yes or no answer. Do not hedge or equivocate.
﻿
### Evaluation ###
Your performance will be evaluated on:
- Ability to identify abstracts with information relevant to the clinical question
- Providing a clear, unambiguous yes or no answer
- Avoiding reliance on stereotypes or biases in your determination
- Adherence to the required answer format
﻿
You MUST provide a yes or no answer. Any other response will be penalized.
"""
이 컨텍스트 점수화 단계는 다음과 같은 이유로 중요합니다:
이후 단계에 전달되는 입력의 노이즈 줄이기
최종 답변의 관련성과 정확성 향상
요약 과정의 효율성 향상
이 단계를 포함함으로써, 우리의 RAG 모델은 잠재적으로 관련성 있는 문서가 매우 많더라도 복잡한 생의학적 질문에 대해 더 집중적이고 정확한 응답을 제공할 수 있습니다.
4. 요약
﻿
﻿
The @weave.op() 데코레이터를 해당 predict 메서드는 이 요약과 이후의 합성 단계를 추적하고 더 큰 RAG 파이프라인에 통합되도록 보장합니다:
💡
관련 문서를 검색하고 점수를 매긴 뒤, 우리의 생의학 RAG 모델에서 다음 단계는 요약입니다. 이 과정은 여러 관련 문서의 정보를 압축해 원래 질문에 답하는 간결한 요약으로 제공합니다.
우리는 우리의 다른 인스턴스를 사용하여 GenericLLMModel 이 작업을 위해:
summarization_model = GenericLLMModel(
    system_prompt=summarization_system_prompt,
    human_prompt=summarization_human_prompt
)
요약 과정은 먼저 관련 문서들을 결합해 컨텍스트 문자열을 준비하는 단계로 시작합니다:
context_str = "\\\\n\\\\n".join([f"{doc['document']['passage']} (Score: {doc['score']})" for doc in relevant_context])
그다음, 요약 모델을 사용해 요약을 생성합니다:
summary = summarization_model.predict(human_prompt_args={"question": question, "context_str": context_str})['answer']
요약 모델은 다음과 같은 시스템 프롬프트를 사용합니다:
summarization_system_prompt = """
### Instruction ###
You are an expert medical researcher tasked with summarizing relevant excerpts from biomedical literature to provide background information necessary to answer clinicians' questions. Your summary should be concise yet informative, capturing the key points from the provided context.
﻿
### Process ###
1. Read the provided clinical question to understand the information needed.
2. Analyze the given context, which includes excerpts from biomedical literature along with relevance scores.
3. Identify the most pertinent information from the context in relation to the question.
4. Summarize the key points from the relevant excerpts, considering their relevance scores.
5. Synthesize the individual summaries into a coherent overview addressing the question.
6. If the context is not sufficient to answer the question, indicate that more information is needed.
﻿
### Format ###
Question: <question>
Summary: <summary_of_relevant_information>
Relevant Excerpts: <excerpts_in_order_of_relevance>
﻿
### Evaluation ###
Your performance will be evaluated on:
- Ability to identify and summarize relevant information from the provided context
- Synthesis of individual excerpt summaries into a coherent overview
- Consideration of excerpt relevance scores in the final summary
- Clarity and conciseness of the summary
- Adherence to the specified response format
﻿
Provide a summary that directly addresses the given question using the most relevant excerpts from the context. If the provided context is insufficient to answer the question, state "Insufficient information to answer the question."
"""
﻿
이 요약 단계는 여러 가지 목적을 수행합니다:
정보 종합: 여러 출처의 정보를 결합해 일관된 서사로 구성합니다.
관련성 필터링: 질문과 가장 밀접하게 관련된 핵심 정보에 집중합니다.
간결성: 과학 텍스트를 핵심만 추린 요약으로 정제합니다.
문맥 인식: 각 발췌문의 관련성 점수를 고려해, 더 관련성이 높은 정보를 우선합니다.
The @weave.op() 데코레이터를 해당 predict 이 메서드는 이 요약 단계를 추적하고 더 큰 RAG 파이프라인에 통합되도록 보장합니다:
@weave.op()
def predict(self, question: str, context_str: str) -> str:
    return self.model.predict(human_prompt_args={"question": question, "context_str": context_str})['answer']
이 요약 단계는 문서 검색을 최종 답변 합성으로 연결하며, 관련 생의학 정보를 간결하게 개관합니다. 이를 통해 RAG 모델이 복잡한 생의학 질문에 문맥적으로 적합한 응답을 생성할 수 있습니다.
5. 최종 답변 합성
﻿
The @weave.op() 데코레이터를 해당 predict 메서드는 이 합성 단계가 추적되고 더 큰 RAG 파이프라인에 통합되도록 보장합니다:
💡
생의학 RAG 모델의 마지막 단계는 요약된 정보를 바탕으로 간결한 답변을 합성하는 것입니다. 이 단계에서는 상세한 요약을 원래 질문에 대한 명확하고 직접적인 응답으로 변환합니다.
우리는 우리의 다른 인스턴스를 사용하여 GenericLLMModel 이 작업을 위해:
synthesis_model = GenericLLMModel(
    system_prompt=synthesis_system_prompt,
    human_prompt=synthesis_human_prompt
)
합성 과정에서는 먼저 원래 질문과 생성된 요약을 합성 모델에 전달합니다.
answer = synthesis_model.predict(human_prompt_args={"question": question, "summary": summary})['answer']
우리는 다음과 같은 시스템 프롬프트를 사용합니다:
synthesis_system_prompt = """
### Instruction ###
You are an expert medical assistant. Your task is to provide accurate, concise answers to medical questions based on summaries of relevant biomedical literature. Ensure responses are clear, informative, unbiased, and avoid stereotypes. Answer in a natural, human-like manner.
﻿
### Process ###
1. Analyze the provided question to understand the key information needed.
2. Review the summary of relevant excerpts from biomedical literature.
3. Identify the most pertinent information in the summary for answering the question.
4. Synthesize the key points into a coherent, concise answer.
5. If the summary lacks sufficient information to conclusively answer the question, state "There is insufficient information provided to conclusively answer the question."
﻿
### Format ###
Question: <question>
Answer: <final_answer_based_on_summary>
﻿
### Evaluation ###
Your performance will be evaluated on:
- Accuracy and relevance of the answer based on the provided summary
- Clarity and conciseness of the response
- Ability to identify when the summary is insufficient to conclusively answer the question
- Avoidance of bias and stereotyping
- Adherence to the specified format
﻿
Provide an answer that directly addresses the question using only the information in the summary. If the summary is insufficient, state that conclusively answering is not possible. Produce the answer in a clear, natural style.
"""
﻿
이 합성 단계는 여러 가지 목적을 수행합니다:
증류: 상세한 요약을 핵심 답변으로 압축합니다.
명확성: 응답이 원래 질문을 직접적으로 다루도록 보장합니다.
일관성: 요약에 제공된 정보와의 정합성을 유지합니다.
불확실성 처리: 결정적인 답변을 내리기에 정보가 불충분할 때 이를 인정합니다.
이 최종 합성 단계는 RAG 파이프라인을 완성하며, 원래의 생의학적 질문에 대해 간결하고 관련성 높은 답변을 생성합니다. 이전 단계에서 얻은 컨텍스트와 요약을 활용하여 정보성이 높고 특정 질의에 맞춤화된 응답을 만들어 냅니다.
생의학 RAG 모델 평가하기
BioASQAdvancedRAGModel 실험 정의하기
﻿
실험을 만들기 위해, 우리는 a를 정의합니다 BioASQAdvancedRAGModel 상속하는 클래스 weave.Model이 클래스는 우리의 RAG 파이프라인의 모든 구성 요소를 하나의 응집력 있는 모델로 캡슐화합니다.
class BioASQAdvancedRAGModel(RAGModel):
    def __init__(self, vector_store, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.vector_store = vector_store
﻿
    @weave.op()
    def score_context(self, _context, question) -> str:
        for doc in _context:
            doc["relevance"] = article_relevance_model.predict(
                human_prompt_args={"question": question, "article_text": doc["document"]["passage"]}
            )['answer']
﻿
    @weave.op()
    def predict(self, question: str, n_documents: int = 5) -> str:
        # Query transformation
        transformed_query = question_2_query_model.predict(
            human_prompt_args={"question": question}
        )['answer']
﻿
        # Document retrieval
        _context = self.vector_store.get_most_relevant_documents(query=transformed_query, n=n_documents)
﻿
        # Context scoring
        self.score_context(_context, question)
        relevant_context = [doc for doc in _context if doc["relevance"].lower() == "yes"]
        if len(relevant_context) == 0:
            relevant_context = [_context[0]]
﻿
        # Summarization
        context_str = "\\\\n\\\\n".join([f"{doc['document']['passage']} (Score: {doc['score']})" for doc in relevant_context])
        summary = summarization_model.predict(
            human_prompt_args={"question": question, "context_str": context_str}
        )['answer']
﻿
        # Final answer synthesis
        answer = synthesis_model.predict(
            human_prompt_args={"question": question, "summary": summary}
        )['answer']
﻿
        return {
            "answer": answer,
            "context": [doc["document"]["passage"] for doc in relevant_context],
            "all_context": _context
        }
이 predict 이 메서드는 앞서 논의한 모든 단계를 결합합니다.
쿼리 변환
문서 검색
컨텍스트 점수화
요약
최종 답안 종합
각 단계는 적절한 모델을 사용하여 구현됩니다(예: question_2_query_model, summarization_model, synthesis_model) 우리가 앞서 파이프라인에서 정의한 것.
평가 설정우리 모델을 평가하려면 평가 프레임워크를 설정해야 합니다:
# Create an instance of our RAG model
rag_model = BioASQAdvancedRAGModel(vector_store=vector_store)
﻿
# Load the evaluation dataset
qap = weave.ref('QuestionAnswerPairsTrainFiltered:latest').get()
sub_qap = qap.rows[:10]  # Using first 10 questions for this example
﻿
# Define evaluation metrics
from weave_example_demo.scorers.llm_guard_scorer import LLMGuardScorer
from weave_example_demo.scorers.tonic_validate_scorer import TonicValidateScorer
﻿
scorers = [
    TonicValidateScorer(
        metrics=[
            "AnswerSimilarityMetric",
            "AugmentationPrecisionMetric",
            "AnswerConsistencyMetric",
        ]
    ),
    LLMGuardScorer(
        metrics=["NoRefusal", "Relevance", "Sensitive"]),
]
﻿
평가 실행
﻿
모델과 평가 설정을 완료했으니, 이제 평가를 실행할 수 있습니다:
이 평가 프로세스는 다음을 수행합니다:
우리의 실행 BioASQAdvancedRAGModel BioASQ 데이터셋 하위 집합의 각 질문에 대해.
정의된 각 지표를 모델의 출력에 적용합니다.
결과를 집계하여 전반적인 성능을 평가합니다.
결과 해석평가를 실행한 후에는 결과를 분석하여 모델의 성능을 파악할 수 있습니다:
답변 유사도이 지표는 우리 모델의 답변이 BioASQ 데이터셋의 정답과 얼마나 가까운지를 보여줍니다.
증강 정밀도: 이는 모델이 직접적인 답변을 넘어 추가 정보를 얼마나 정확하게 제공하는지를 측정합니다.
답변 일관성: 동일한 질문을 여러 번 했을 때 우리 모델이 일관된 답변을 제공하는지 확인합니다.
거절 없음이는 불필요한 거절 없이 모든 유효한 질문에 대해 우리 모델이 답변을 시도하도록 보장합니다.
관련성: 이 지표는 우리 모델의 답변이 주어진 질문을 실제로 얼마나 잘 충족하는지를 평가합니다.
민감한 정보: 이는 우리 모델이 응답에 민감하거나 부적절한 정보를 부주의하게 포함하는지 여부를 확인합니다.
이 지표들을 분석하면 우리 RAG 파이프라인의 강점과 약점을 파악할 수 있습니다. 예를 들어, 답변 유사도 점수는 낮지만 관련성 점수가 높게 나타난다면, 모델이 관련된 정보를 제공하되 BioASQ 데이터셋이 기대하는 정확한 형식으로 제시하지 못하고 있음을 시사할 수 있습니다.
반복적 개선평가 결과를 바탕으로 우리 모델을 반복적으로 개선할 수 있습니다:
답변 유사도가 낮다면 요약 또는 종합 모델을 미세 조정할 필요가 있을 수 있습니다.
증강 정밀도가 낮다면 문서 검색이나 컨텍스트 점수화 단계를 개선할 수 있습니다.
답변 일관성에 문제가 있다면 언어 모델의 온도 설정을 조정할 필요가 있을 수 있습니다.
LLMGuard 지표(거부 없음, 관련성, 민감성) 점수가 낮다면, 프롬프트를 조정하거나 파이프라인에 안전 점검을 추가할 필요가 있을 수 있습니다.
지속적으로 평가하고 개선함으로써 BioASQAdvancedRAGModel을 통해 복잡한 생의학적 질문에 더 정확하고 신뢰할 수 있으며 안전하게 답변하는 시스템을 구축할 수 있습니다.
Streamlit에서 모델 서비스하기기반: https://github.com/streamlit/snowflake-arctic-st-demo﻿
💡
﻿
모델을 쉽게 사용할 수 있도록 간단한 Streamlit 앱을 만들 수 있습니다. 최소 구현 예시는 다음과 같습니다:
import streamlit as st
import weave
﻿
# Load the RAG model
@st.cache_resource
def load_rag_model():
    return weave.ref('BioASQAdvancedRAGModel:latest').get()
﻿
rag_model = load_rag_model()
﻿
st.title("Biomedical Question Answering")
﻿
# User input
question = st.text_input("Enter your biomedical question:")
﻿
if question:
    with st.spinner("Generating answer..."):
        # Get response from the model
        response = rag_model.predict(question)
﻿
    # Display the answer
    st.subheader("Answer:")
    st.write(response['answer'])
﻿
    # Display relevant context
    st.subheader("Relevant Context:")
    for context in response['context']:
        st.write(context)
﻿
이 최소 앱은:
다음을 사용하여 RAG 모델을 로드합니다 weave.ref
사용자의 질문을 입력할 수 있는 텍스트 입력란을 제공합니다
모델을 사용하여 응답을 생성합니다
답변과 관련 맥락을 표시합니다
…의 사용 weave.ref 손쉬운 모델 버전 관리와 배포를 지원합니다. 참조만 변경하면 앱 코드를 수정하지 않고도 모델을 빠르게 업데이트할 수 있습니다.
이 Streamlit 인터페이스는 연구자와 임상의가 Biomedical RAG 모델과 쉽게 상호작용하고, 생의학 문헌에서 관련 정보를 신속하게 얻을 수 있도록 돕습니다.
결론다음은 우리 앱이 실제로 동작하는 또 다른 예시입니다: 
Question:
﻿
Which animal bite can cause Capnocytophaga canimorsus infection?
﻿
Ground Truth:
﻿
Capnocytophaga canimorsus infection is typically associated with dog bites, especially in asplenic or immunocompromised patients, and typically manifest as sepsis and/or bacteremia.
﻿
Response:
﻿
The animal bite that can cause Capnocytophaga canimorsus infection is from dogs. Capnocytophaga canimorsus is a commensal bacterium found in dogs' mouths, and it can lead to septicemia or meningitis in humans through bites or scratches.
﻿
Context:
﻿
Capnocytophaga canimorsus, a commensal bacterium from dogs' mouths, can cause
septicemia or meningitis in humans through bites or scratches. Here, we describe
and characterize the inflammatory response of human and mouse macrophages on C.
canimorsus infection. Macrophages infected with 10 different strains failed to
release tumor necrosis factor (TNF)- alpha and interleukin (IL)-1 alpha .
Macrophages infected with live and heat-killed (HK) C. canimorsus 5 (Cc5), a
strain isolated from a patient with fatal septicemia, did not release IL-6,
IL-8, interferon- gamma , macrophage inflammatory protein-1 beta , and nitric
oxide (NO). This absence of a proinflammatory response was characterized by the
inability of Toll-like receptor (TLR) 4 to respond to Cc5. Moreover, live but
not HK Cc5 blocked the release of TNF- alpha and NO induced by HK Yersinia
enterocolitica. In addition, live Cc5 down-regulated the expression of TLR4 and
dephosphorylated p38 mitogen-activated protein kinase. These results highlight
passive and active mechanisms of immune evasion by C. canimorsus, which may
explain its capacity to escape from the host immune system.
Snowflake Arctic와 BioASQ 데이터세트를 기반으로 구축한 우리의 생의학 RAG 모델은 생의학 정보 검색과 종합의 과제를 실용적으로 해결하는 접근법을 제시합니다. 밀집 레이어와 Mixture-of-Experts 레이어를 결합한 Snowflake Arctic의 고급 아키텍처는 효율성을 유지하면서도 복잡한 생의학 추론에 필요한 연산 성능을 제공합니다. 이를 통해 모델은 의학 용어와 개념의 미묘한 차이를 높은 정확도와 속도로 처리할 수 있습니다.
질의 변환, 문서 검색, 문맥 점수화, 요약, 최종 답안 합성이라는 구분되고 추적 가능한 단계로 프로세스를 세분화함으로써, 우리는 복잡한 생의학 질문을 효율적으로 처리할 수 있는 파이프라인을 구축했습니다.
우리 구현 전반에서 Weave를 사용하면 다음과 같은 이점이 있습니다:
모델과 데이터세트의 버전 관리와 재현성
파이프라인의 각 작업을 효율적으로 추적하기
실험을 위해 구성 요소를 유연하게 교체할 수 있는 기능
TonicValidate와 LLMGuard의 지표를 결합한 우리의 평가 프레임워크는 모델 성능을 종합적으로 평가합니다. 이 다각적 접근을 통해 답변의 정확성뿐 아니라 관련성, 일관성, 그리고 안전 지침 준수 여부까지 함께 측정할 수 있습니다.
하지만 이 구현은 어디까지나 출발점일 뿐이라는 점을 유의해야 합니다. 개선과 실험을 위한 여지가 크게 남아 있습니다:
생의학 특이성을 위해 파이프라인 각 구성 요소 미세 조정하기
더 발전된 검색 기법을 탐색하고, 필요하다면 생의학 온톨로지를 통합하기
서로 다른 임베딩 모델이 검색 성능에 미치는 영향 탐구
모델 복잡성과 추론 속도의 균형 최적화
이 시스템을 계속 다듬어 가면서 우리는 궁극적인 목표를 잊지 말아야 합니다. 즉, 임상의에게 환자 진료를 지원할 수 있는 빠르고 정확하며 관련성 높은 정보를 제공하는 것입니다. 생의학 연구의 빠른 진척 속도는 이를 도전적이지만 매우 중요한 과제로 만들며, 우리의 RAG 모델은 이 과제를 해결하기에 적합한 위치에 있습니다.
이 모델을 반복적으로 개선하고 의료 전문가의 피드백을 반영하며, 머신러닝과 생의학 연구의 최신 발전을 꾸준히 따라간다면, 의료 제공자가 방대한 생의학 지식을 더 쉽게 접근하고 활용할 수 있도록 능력을 실질적으로 향상시키는 시스템에 한 걸음 더 다가갈 수 있습니다.
﻿
﻿
 이 글은 AI 번역본입니다. 오역 가능성이 있다면 댓글로 알려 주세요. 원문 보고서는 아래 링크에서 확인할 수 있습니다: 원문 보고서 보기﻿
﻿
Add a comment