Side & Sphere: Meta AI's Newest Open-Source AI Models For Assisting Wikipedia Citations
Meta AI's newest open-source release will automate the most tedious part of verifying citations on Wikipedia.
Created on July 11|Last edited on July 12
Comment
Because of Wikipedia's massive library of articles, it's an overwhelming job to make sure everything stays factual. With the aim of keeping things factual, Wikipedia pages are littered with citations pointing to news articles and other information pieces which verify claims.
However, you'll also undoubtedly find many instances of [citation needed], which means there's a claim which has yet to be verified.
Today, Meta AI has announced another big AI reveal, and just like NLLB-200, this project is designed to assist Wikipedia editors.
The first new model, named Side, was built to quickly verify whether or not a citation actually supports a claim. Alongside it comes Sphere, a retrieval engine that is built to efficiently query a massive dataset of 134 million public web pages to find ones that might contain a valid citation for a claim. Together, they can help Wikipedia editors quickly verify and edit citations that are wrong or missing.
What is Meta AI's Side model?

Side is an open-source model which uses natural language processing to identify whether the citation on a claim actually support the claim or not. Sometimes, citations on Wikipedia are incorrect even if they seems correct on first glance, or might seem correct to less advanced AI models. Side will help eliminate those faulty citations by reporting them en masse to Wikipedia editors.
Unlike other similar models which might be trained to verify fact citations with just a handful of sentences, Side was trained on entire web pages of content. This difference in training scope proved fruitful as Side is able to much more accurately detect the accuracy of citations.
What is Meta AI's Sphere retrieval engine?

The Sphere retrieval engine is open-source and consists of an AI retriever model which efficiently queries through a massive 134-million-page large dataset of public web pages. The model can determine how likely a page is to contain the information needed to verify a claim and pass a list of those pages to a Wikipedia editor who can check through for the best citation.
The dataset that Sphere relies on is a subset of CCNet, a cleaned-up version of Common Crawl. Working within a dataset like this allows for much more flexibility when it comes to model architecture, as opposed to using live web search engines which must be interacted with through human-interpretable text strings.
How do Side and Sphere work together?
Side and Sphere work together with Wikipedia editors to determine the best possible citation for a claim. Side first determines whether a claim's citation is accurate. If it's not, the claim is fed into the Sphere retrieval engine to get a collection of the most promising citations. After some internal weighing, a ranked list of suggested citations will be given to Wikipedia editors to verify it themselves.

Thanks to Side and Sphere, the most tedious part of citation verification is handled automatically. Instead of manually searching the web for often very obscure claims, a Wikipedia editor can just sit back and let the AI do the heavy lifting. Though the editors will still have to manually review what the AI thinks is best, it's a much less overwhelming task than it was before.
The whole of the project is open-source and each piece is individually available on its own GitHub repositories, but if you want to just get right into trying it out for yourself, they've set up an interactive demo web page to let you see how it works first-hand. The demo is available here: https://verifier.sideeditor.com/
Find out more
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.