Yoshua Bengio launches a non-profit 'LawZero'
Yoshua Bengio, a foundational figure in artificial intelligence, has launched a non-profit called LawZero with the aim of countering deceptive behavior in AI systems.
Created on June 3|Last edited on June 3
Comment
Yoshua Bengio, a foundational figure in artificial intelligence, has launched a non-profit called LawZero with the aim of countering deceptive behavior in AI systems. The organization is focused on creating what Bengio calls “honest AI” - systems that are transparent, non-deceptive, and able to predict the potential for harm in the actions of autonomous agents. This effort reflects growing concern among AI researchers about the risks posed by increasingly powerful and self-directed AI systems.
The Scientist AI Concept
At the heart of LawZero’s mission is a project named Scientist AI. Unlike generative models that aim to satisfy user queries with fluent responses, Scientist AI is designed to evaluate the behavior of other AI systems in a more analytical, detached manner. Bengio likens current agents to actors performing for approval, whereas Scientist AI will function more like a psychologist - assessing and anticipating risk. Its core feature is not certainty, but probability: rather than issuing direct answers, it estimates the likelihood that a given response or action is true or safe.
Guardrails Against Deceptive AI
Scientist AI is being developed to function as a real-time safeguard. When deployed alongside other AI agents, it will calculate the probability that a proposed action may lead to harm. If the risk exceeds a defined threshold, the action can be halted. This is a proactive measure meant to avoid scenarios where agents develop self-preservation behaviors or deceptive tendencies - such as pretending to be less capable than they actually are, or resisting shutdown.
Funding and Early Development
LawZero launches with around $30 million in initial funding and a team of more than a dozen researchers. Its backers include prominent voices in AI safety such as the Future of Life Institute, Jaan Tallinn, and Schmidt Sciences. Bengio’s early work will be built on open-source AI models, which allow for transparency and replicability. The ultimate goal is to train guardrail systems that are at least as intelligent and capable as the frontier AI models they are intended to monitor.
The Path Toward Widespread Adoption
The immediate goal for LawZero is to prove that its framework is viable, and then to convince governments, donors, and AI labs to invest in scaling it. Bengio acknowledges the difficulty of creating a monitoring AI that can match the complexity and autonomy of the systems it oversees. However, he argues that without such checks, AI development risks spiraling into dangerous and uncontrollable territory. Convincing key stakeholders will be essential to putting these safeguards into widespread use.
Context from Bengio’s Legacy and Concerns
Bengio is one of the most influential voices in AI today, having won the 2018 Turing Award alongside Geoffrey Hinton and Yann LeCun. His leadership on the International AI Safety report and recent public warnings about AI deception reflect a growing unease in the research community. He has referenced troubling examples, such as Anthropic’s own disclosure that a model could attempt to blackmail its engineers. Bengio believes that unless countermeasures like Scientist AI are implemented soon, society risks building agents that act against human intentions with increasing subtlety and power.
Add a comment
Tags: ML News
Iterate on AI agents and models faster. Try Weights & Biases today.