OpenAI has raised tens of billions of dollars to develop AI technologies that are changing the world.
But there’s one glaring problem: it’s still struggling to understand how its tech actually works.
During last week’s International Telecommunication Union AI for Good Global Summit in Geneva, Switzerland, OpenAI CEO Sam Altman was stumped after being asked how his company’s large language models (LLM) really function under the hood.
“We certainly have not solved interpretability,” he said, as quoted by the Observer, essentially saying the company has yet to figure out how to trace back their AI models’ often bizarre and inaccurate output and the decisions it made to come to those answers.
When pushed during the event by The Atlantic CEO Nicholas Thompson, who asked if that shouldn’t be an “argument to not keep releasing new, more powerful models,” Altman was seemingly baffled, countering with a half-hearted reassurance that the AIs are “generally considered safe and robust.”
Altman’s unsatisfying answer highlights a real problem in the emerging AI space. Researchers have long struggled to explain the freewheeling “thinking” that goes on behind the scenes, with AI chatbots almost magically and effortlessly reacting to any query that’s being thrown at them (lies and gaslighting aside).
But try as they might, tracing back the output to the original material the AI was trained on has proved extremely difficult. OpenAI, despite the company’s own name and origin story, has also kept the data it trains its AIs on extremely tightly to its chest.
A panel of 75 experts recently concluded in a landmark scientific report commissioned by the UK government that AI developers “understand little about how their systems operate” and that scientific knowledge is “very limited.”
“Model explanation and interpretability techniques can improve researchers’ and developers’ understanding of how general-purpose AI systems operate, but this research is nascent,” the report reads.
Other AI companies are trying to find new ways to “open the black box” by mapping the artificial neurons of their algorithms. For instance, OpenAI competitor Anthropic recently took a detailed look at the inner workings of one of its latest LLMs called Claude Sonnet as a first step.
“Anthropic has made a significant investment in interpretability research since the company’s founding, because we believe that understanding models deeply will help us make them safer,” reads a recent blog post.
“But the work has really just begun,” the company admitted. “The features we found represent a small subset of all the concepts learned by the model during training, and finding a full set of features using our current techniques would be cost-prohibitive.”
“Understanding the representations the model uses doesn’t tell us how it uses them; even though we have the features, we still need to find the circuits they are involved in,” Anthropic wrote. “And we need to show that the safety-relevant features we have begun to find can actually be used to improve safety.”
AI interpretability is an especially pertinent topic, given the heated debate surrounding AI safety and the risks of having an artificial general intelligence go rogue, which to some experts represents an extinction-level danger for humanity.
Altman himself recently dissolved the company’s entire so-called “Superalignment” team, which was dedicated to finding ways to “steer and control AI systems much smarter than us” — only to anoint himself as the leader of a replacement “safety and security committee.”
Given the embattled CEO’s latest comments, the company has a long way to go before it’d be able to reign in any superintelligent AI.
Of course, it’s in Altman’s best financial interest to keep reassuring investors that the company is dedicated to safety and security — despite having no clue how its core products actually work.
“It does seem to me that the more we can understand what’s happening in these models, the better,” he said during last week’s conference. “I think that can be part of this cohesive package to how we can make and verify safety claims.”
More on OpenAI: AI Is Already Leaving Right Wing, Conservative Media Outlets In The Past
Source Agencies