Skip to main content

BigCode: Hugging Face And ServiceNow Partner To Develop LLM For Code

A new large language model for code is on it's way thanks to BigCode, a new project by Hugging Face and ServiceNow inspired by BigScience.
Created on September 27|Last edited on September 27
Following in the footsteps of BigScience and BLOOM comes BigCode, a project jointly supported by Hugging Face and ServiceNow to develop an open-source large language model entirely made for code. Calls for collaboration have also been made to any interested researchers, which you can find out more about here.

Code-generating machine learning models have been on people's minds lately, with high-profile releases such as OpenAI's Codex, DeepMind's AlphaCode, and Amazon's CodeWhisperer, though many have been the subject of controversy in their own ways: primarily, how the code used in datasets is harvested.
BigCode's core ideal, in line with their inspiration BigScience, is to create in an open and responsible way.
BigCode's first goal is to develop a dataset of code collected in the most ethically reasonable way doable. While other releases scrape GitHub for all of its code, BigCode will only collect code from repositories marked with permissive licenses.
This ethically-sourced code dataset will be made large enough to train large language models. Using the dataset, their next goal is to train a 15 billion parameter language model, using NVIDIA's Megatron as a jumping-off point.

Find out more

Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.