SWE-1: Windsurf Launches Models Built for Software Engineering
Created on May 16|Last edited on May 16
Comment
Windsurf has launched SWE-1, its first family of models explicitly designed for the full software engineering workflow. Unlike traditional "coding models," which mostly optimize for code completion and generation, SWE-1 aims to assist across the entire engineering lifecycle—from reasoning over incomplete states and interfacing with tools to managing long-lived development tasks.
Why SWE-1 Matters
Most AI coding tools today excel at small-scale code suggestions or unit-test-driven outputs. SWE-1 was built around a different hypothesis: that helping software engineers requires much more than writing code. It’s about understanding and navigating the entire process—terminal work, debugging, refactoring, reasoning over user feedback, and maintaining long-running projects with partial states. SWE-1 is designed to do all of this.
The SWE-1 Model Family
Windsurf has introduced three versions of SWE-1 to cover different performance and speed needs. SWE-1 is the main model, competitive with Claude 3.5 Sonnet in tool usage and available for free to paid users during a promotional period. SWE-1-lite is a smaller, higher-quality replacement for Cascade Base and is free for all users. SWE-1-mini powers the Tab experience, a lightweight but fast model that runs passively and supports real-time feedback.
What Makes SWE-1 Different
The core innovation behind SWE-1 isn’t just in its parameters or speed, but in how it was trained. Windsurf used insights from their own IDE, the Windsurf Editor, to guide model development. SWE-1 was trained to understand long task timelines, support multi-surface work (terminal, browser, editor), and adapt to human-in-the-loop workflows. This attention to real-world developer behavior separates it from models narrowly trained on codebases.
Evaluation and Performance
In offline benchmarking, SWE-1 performs comparably with top foundation models and outperforms all open-weight models. It was tested using two custom benchmarks: one for in-progress coding sessions ("Conversational SWE Task Benchmark") and one for solving problems independently from scratch ("End-to-End SWE Task Benchmark"). These metrics were chosen to reflect both collaborative and autonomous model utility, showing SWE-1’s strength in interactive environments.
Production Experiments and Real-World Use
Windsurf also validated SWE-1 using blind production testing. Users unknowingly interacted with SWE-1, and metrics like "daily lines contributed per user" and "contribution rate" were tracked. SWE-1 outperformed legacy models like Claude in both categories, showing its practical effectiveness. The results show that SWE-1 doesn't just perform well in artificial tests—it helps users write more, better code in real-world usage.
Flow Awareness and Timeline Architecture
A major differentiator for SWE-1 is its deep integration with Windsurf’s flow-aware system. Flow awareness is the concept of building a shared, continuous understanding of what both the AI and human are doing across various tools—terminal, browser, editor, etc. This shared timeline is the backbone of how SWE-1 reasons and collaborates. It enables more intelligent handoffs between human and AI, and makes it possible to maintain long-term context even in complex tasks.
The Broader Vision and What’s Next
SWE-1 is just the beginning. Windsurf is using its flow-aware architecture to build a flywheel effect, where user data continually improves model behavior. By deeply integrating model, tool, and user activity, Windsurf is positioning itself to compete with and potentially surpass foundation model labs in the software engineering domain. The company plans to aggressively expand its research and development efforts, and is actively hiring.
Windsurf’s SWE-1 family marks a shift in how we think about developer tools. Rather than just aiming for better autocomplete, the models are being designed to truly understand and assist with the whole of software engineering. If they succeed, the future of coding won’t just be faster—it could become more collaborative, contextual, and intelligent from start to finish.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.