New “GIM” method tops MIB global leaderboard, offering the industry a scalable way to inspect and improve billion-parameter models.
COPENHAGEN, Denmark and NEW YORK, Dec. 19, 2025 /PRNewswire/ — Corti, a healthcare AI infrastructure company, today announced a major advance in AI transparency, achieving the #1 position on the Hugging Face Mechanistic Interpretability Benchmark (MIB).
The benchmark, maintained by researchers from MIT, Stanford, Cambridge, Boston, and ETH Zurich, is the industry standard for evaluating interpretability methods. By outperforming established approaches from Meta, DeepMind-affiliated researchers, and Harvard, Corti’s new Gradient Interaction Modification (GIM) technique proves that specialized labs can lead in fundamental AI research – the kind that shapes how the entire industry builds, understands, and deploys intelligent systems
The “surgical” shift
As models aspire to reach AGI-level complexity, the industry is realizing the need to sharpen comprehension – we cannot control what we cannot see. The focus is shifting from simply scaling parameters to understanding the mechanisms behind them.
“Until now, much of AI development has been empirical rather than mechanistic – we train models and observe results, but struggle to understand why they work,” said Lars Maaløe, Co-Founder and CTO at Corti. “GIM reveals the internal logic that standard tools miss, turning model improvement from trial-and-error into precision engineering.”
The technical breakthrough
Traditional interpretability methods test neurons in isolation – like testing light switches one at a time. However, if the light only turns off when all switches are flipped, testing them individually falsely suggests that neither switch matters.
GIM solves this by analyzing how components interact when changed simultaneously, revealing the actual circuits driving behavior rather than backup mechanisms. This enables researchers to:
- Pinpoint Causality: Identify the specific circuits responsible for outputs, not just correlated signals.
- Debug Surgically: Trace failures like hallucinations to their root cause.
- Scale Efficiently: Run deep analysis on production models in seconds – critical for meeting regulatory requirements like the EU AI Act (Article 13).
Higher standards for the highest stakes
Healthcare AI demands higher explainability standards than consumer applications. When models influence clinical decisions, institutions need forensic-level precision to trace reasoning and validate safety – not just aggregate accuracy.
“In healthcare, we don’t have the luxury of ‘good enough’ – we need to know exactly why a model made a specific decision,” said Andreas Cleve, Co-Founder and CEO of Corti. “High-stakes verticals require surgical precision. We built GIM because we needed it for our own clinical infrastructure, but the result is a tool that accelerates progress for the entire field.”
This addresses a problem that’s becoming universal. As transparency requirements expand across finance, autonomous systems, and government services, the methods necessary for clinical deployment today are becoming table stakes for all regulated AI tomorrow.
Corti has released GIM as open source, making the method immediately available to researchers, developers, and organizations worldwide. Developers can access the implementation through a Python package here.
About Corti
Corti is a research and development company that specializes in state-of-the-art AI foundation models and infrastructure for healthcare. Its mission is to eliminate administrative hurdles in healthcare and life sciences and to bring expert-level reasoning to every corner of the globe – driving down costs and improving quality of care through purpose-built AI models you can trust.
Corti’s models integrate seamlessly into any healthcare application through Corti’s SDKs and APIs, enabling vendors, providers, and payers to leverage safe, cutting-edge AI across the entire care journey.
CONTACT: press@corti.ai