How Upstage proved the origin of their leading AI model–and set a new standard for reproducible model development
Upstage is on a mission to become one of the world’s most capable frontier AI labs. Founded in 2020, the company builds compact, high-performance AI models designed to deliver enterprise-grade capabilities to leading organizations worldwide.
Training a model with end-to-end transparency
Solar Open is a 102B-parameter open-source large language model (LLM) released under an open license. This model was developed as part of the Sovereign AI Foundation Model project, initiated by South Korea’s Ministry of Science and Information & Communication Technology (ICT), to create a homegrown, sovereign LLM that rivals the performance of top U.S. and Chinese models.
Upstage was selected as the only startup to develop a large-scale LLM that would have otherwise been financially prohibitive. The project enables participants to utilize thousands of government-funded GPUs.
The scale of compute was only one dimension of the challenge. Public-sector projects require every stakeholder — from ministry officials to the public — to have complete transparency including access to data usage, training methodology, and reproducibility.
“We couldn’t just show results,” said Kyle Yi, Executive Director at Upstage. “We had to demonstrate how we got there. Full transparency on training runs, configurations, and decisions was non-negotiable.”
Full experiment lineage, from first experiment to models released into production
When claims emerged suggesting Solar Open was copied from a Chinese AI model, Upstage was able to share verifiable evidence to prove otherwise. Weights & Biases served as their trust-and-accountability platform, making their response to the claims factual and reproducible. Because Upstage used W&B Models, all model experiments were tracked so they could transparently demonstrate the model’s incremental evolution over time, specific datasets and configurations used at every stage, and proof that each version was independently produced.
As the backbone connecting research, engineering, and production teams, W&B Models empowers Upstage to:
- Track and compare large-scale experiments.
- Manage dataset and model versioning.
- Link research results directly to production-ready models.
- Ensure that individual experiments become permanent organizational knowledge rather than relying on an individual’s memory.
Surpassing benchmarks with full trust
With W&B Models as the operational engine of Solar Open’s training process, Upstage delivered a model that exceeded its benchmarks and set new standards for open Korean LLMs:
- Surpassed comparable open-source models including OpenAI’s gpt-oss-120b on major benchmarks, including 100% higher performance on major Korean benchmarks.
- Pre-trained on 19.7 trillion tokens, ensures broad knowledge coverage and robust reasoning capabilities across various domains.
- Achieved superior performance across critical dimensions, including mathematics, instruction-following, and agentic tasks.
- Delivers deep knowledge with the inference speed and cost-efficiency of a much smaller model.
Moreover, W&B Models empowered Upstage to provide full transparency to government stakeholders. Every training run, configuration, and outcome was verifiable and time-stamped. A trust-and-accountability framework was established, making W&B Models a structural part of Upstage’s public-sector reporting process. As Kyle puts it:
“Weights & Biases became our organization’s memory. For a government-backed project, that kind of transparent, auditable record isn’t optional — it’s what makes the work credible.”