For more information or if you need help retrieving your data, please contact Weights & Biases Customer Support at support@wandb.com
In this webinar, we explore the potential of leveraging out-of-domain data to enhance the fine-tuning of MistralAI language models for detecting factual inconsistencies, also known as hallucinations.
Inspired by Eugene Yan’s article on bootstrapping hallucination detection, we use the Factual Inconsistency Benchmark (FIB) dataset and initially fine-tune a MistralAI-based model solely on this dataset, achieving limited success.
We then employed pre-finetuning on Wikipedia summaries from the Unified Summarization Benchmark (USB) before applying task-specific finetuning on FIB. This approach significantly improved performance.
Our methodology incorporates Weights & Biases Weave to automate model evaluation, demonstrating that pre-fine-tuning on related but out-of-domain data can effectively bootstrap the detection of factual inconsistencies, thus reducing the need for extensive task-specific data collection. This technique offers a promising strategy for enhancing the accuracy and applicability of natural language inference models in production environments.

ML Engineer
Weights & Biases

Head of Developer Relations
Mistral AI