Skip to main content

Just like Exaone-Deep but for FREE.

Created on August 19|Last edited on August 20
Most of out training runs are coming to an end, and with some final counting, it looks like we've used about 30,000 H100 hours for training and another 20,000 H100 hours for data generation. With such enormous amount of compute thrown into this, I better have something nice in the oven.
And luckily, we managed to build our own 8B reasoning model (KO-REAson-8B) , which performs on par with EXAONE-Deep-7.8B.
Here are some of our last calls.

To pack or not to pack.

Sequence packing has always been an attractive illusion, it does save me a lot of compute, but has always returned negative results. For the very last time, we run the identical training each with and without packing to measure the tradeoff.
Result. Despite being more than three times faster, packing consistently reduced accuracy, with the largest drops in General Knowledge and Reasoning on both benchmark suites. Accordingly, we proceed without packing.


Selecting the Base Model

We primarily used Gemma and Kanana during development, but to decide our final base model we train five different base models using the final dataset to confirm the best combination. The five models used are: Gemma-3-4B, Llama-3.1-8B, Koni-Llama-3.1-8B, Kanana-1.5-8B, and A.X-3.1-Light.
Evaluation. We scored models on five benchmarks—General (KMMLU-Hard, KMMLU-Pro), Korean-specific (KoBALT-700, CLIcK), and Reasoning (KSM).
Result. All five bases saw large gains after our recipe, but A.X-3.1-Light achieved the strongest overall performance. We therefore release our fine-tuned KO-REAson-8B based on A.X-3.1-Light. KO-REAson-8B scores an average of 44.56 in our held-out evaluation suite, on par with Exaone-Deep-7.8B (44.40), and outperforming open models such as Nemotron-Nano-8B (Nvidia), and R1-Distill-7B/8B (DeepSeek).


Conclusion

Building strong reasoning models has long felt like a black box, with ongoing debate about whether reinforcement learning is required. In this blog series with KISTI and Oracle, we systematically explore datasets, training setups, and ablations to show that a well-curated supervised fine-tuning (SFT) corpus can be powerful enough.
Note 1. The model and full dataset are coming soon, stay tuned.
Note 2. We are also planning to leverage RL to further sharpen the model and expand into multimodal reasoning. If you’re interested in joining the journey, contact spthsrbwls123@yonsei.ac.kr