Skip to main content

LongWriter: Breaking the 2,000-Word Barrier in Long Context LLMs

A new method for expanding generation capabilities of LLM's!
Created on August 23|Last edited on August 23
Advancements in long context large language models have enabled these models to process extensive inputs of over 100,000 tokens. However, generating outputs that exceed 2,000 words remains a significant challenge. A new paper by researchers from Tsinghua University and Zhipu AI addresses this issue through a series of experiments and the introduction of a novel agent-based pipeline, AgentWrite.

Understanding the Output Length Limitation

The study begins by identifying the core issue: current long context LLMs struggle to generate outputs beyond 2,000 words due to the limitations of the supervised fine-tuning datasets they are trained on. These datasets typically do not contain enough examples of long outputs, which inherently limits the models' ability to produce lengthy text. This limitation is not due to the models' architecture or capability, but rather the scarcity of long-output examples in the training data.

AgentWrite: A New Approach to Ultra-Long Generation

To overcome this limitation, the researchers introduced AgentWrite, an innovative agent-based pipeline that enables existing LLMs to generate coherent outputs exceeding 20,000 words. AgentWrite operates by decomposing large writing tasks into smaller, manageable subtasks. The model generates each subtask sequentially, which are then combined to produce the final extended output. This method effectively bypasses the inherent limitations of current SFT datasets.

Scaling Output Length with LongWriter-6k

Building on the capabilities of AgentWrite, the team created LongWriter-6k, a dataset containing 6,000 supervised fine-tuning data points with outputs ranging from 2,000 to 32,000 words. Incorporating this dataset into model training allowed the researchers to successfully scale the output length of existing models to over 10,000 words without sacrificing output quality. They also developed LongBench-Write, a comprehensive benchmark for evaluating ultra-long generation capabilities.

Experimental Results and Impact

The experiments demonstrated that the models trained with LongWriter-6k significantly outperformed existing models in generating long-form text, achieving state-of-the-art performance even compared to much larger proprietary models. The research highlights that the potential for ultra-long output generation already exists in current LLMs and can be unlocked with the appropriate training data.

Conclusion and Future Directions

The LongWriter study opens up new possibilities for LLMs in tasks requiring extensive output, such as detailed reports, long-form articles, and other comprehensive documents. The research suggests that further advancements could be made by expanding datasets and refining the AgentWrite framework, potentially pushing the boundaries of LLM output lengths even further.

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.