OpenAI and its competitors are exploring innovative approaches to advance artificial intelligence as existing techniques reach their constraints.
Artificial intelligence firms, including OpenAI, are striving to navigate unforeseen delays and obstacles in their quest for increasingly sophisticated large language models by innovating training methods that mimic more human-like cognitive processes for algorithms.
A group of AI experts, researchers, and investors shared with Reuters their belief that these new techniques, which underpin OpenAI's recently launched o1 model, could significantly alter the competitive landscape of AI and affect the types of resources that AI companies relentlessly pursue, such as energy and specific chip types.
OpenAI chose not to comment on this matter. Following the launch of the widely popular ChatGPT chatbot two years ago, technology firms that have seen substantial increases in their valuations due to the AI surge have publicly asserted that enhancing existing models by increasing data and computational power will reliably yield better AI outcomes.
However, several leading AI researchers are now voicing concerns regarding the limitations of the "bigger is better" mindset.
Ilya Sutskever, co-founder of Safe Superintelligence (SSI) and OpenAI, recently informed Reuters that the benefits of scaling up pre-training—the stage where an AI model is trained on a large volume of unlabeled data to grasp language patterns—have reached a plateau.
Sutskever is recognized as a pioneer in advocating for significant advancements in generative AI through the extensive use of data and computational resources during pre-training, which ultimately led to the creation of ChatGPT. He departed from OpenAI earlier this year to establish SSI.
“The 2010s were characterized by scaling; now we are returning to an era of exploration and innovation. Everyone is searching for the next breakthrough,” Sutskever remarked. “It is now more crucial than ever to scale the right elements.”
Sutskever refrained from providing further specifics on how his team is tackling this challenge, only mentioning that SSI is pursuing a different strategy for enhancing pre-training.
Researchers at leading AI laboratories are encountering setbacks and unsatisfactory results in their efforts to develop a large language model that surpasses OpenAI’s nearly two-year-old GPT-4, as reported by three sources with insider knowledge.
The process of conducting 'training runs' for these extensive models can incur costs in the tens of millions of dollars, as it involves the simultaneous operation of hundreds of chips. The complexity of the system increases the likelihood of hardware failures, and researchers often remain unaware of the models' ultimate performance until the training run concludes, which can span several months.
Additionally, large language models require vast amounts of data, and the readily available data sources have been largely depleted. Energy shortages have further complicated the training processes, as they demand significant power resources.
To address these obstacles, researchers are investigating a technique known as “test-time compute,” which improves the performance of existing AI models during the “inference” phase, when the model is actively utilized. For instance, rather than selecting a single answer right away, a model could generate and assess multiple options in real-time, ultimately identifying the most effective solution.
This approach enables models to allocate more computational resources to difficult tasks, such as mathematical calculations, coding challenges, or intricate operations that require reasoning and decision-making akin to human capabilities.
Noam Brown, a researcher at OpenAI involved in the o1 project, noted at the TED AI conference in San Francisco last month that allowing a bot to think for just 20 seconds during a poker hand yielded performance improvements comparable to scaling up the model by 100,000 times and extending the training duration by the same factor.
OpenAI has adopted a new approach in its recently launched model, referred to as "o1," which was previously known as Q* and Strawberry, as reported by Reuters in July. The O1 model is designed to tackle problems through a multi-step reasoning process akin to human thought. This model also incorporates data and insights gathered from PhD holders and industry specialists. The unique aspect of the o1 series lies in an additional layer of training applied to foundational models like GPT-4, with plans to extend this methodology to larger and more advanced base models.
Simultaneously, researchers from leading AI organizations, including Anthropic, xAI, and Google DeepMind, are also engaged in developing their own iterations of this technique, according to sources familiar with the initiatives.
Kevin Weil, OpenAI's chief product officer, remarked at a tech conference in October, “We see a lot of low-hanging fruit that we can go pluck to make these models better very quickly. By the time people do catch up, we're going to try and be three more steps ahead.”
Requests for comments from Google and xAI went unanswered, and Anthropic did not provide an immediate response.
This development could significantly impact the competitive dynamics of AI hardware, which has been largely influenced by the soaring demand for Nvidia’s AI chips. Notable venture capital firms, including Sequoia and Andreessen Horowitz, which have invested billions in the costly development of AI models across various labs, are closely monitoring this shift and its potential effects on their investments.
Sonya Huang, a partner at Sequoia Capital, stated to Reuters, “This shift will move us from a world of massive pre-training clusters toward inference clouds, which are distributed, cloud-based servers for inference.”
Nvidia’s AI chips, recognized as the most advanced in the market, have propelled the company to become the world’s most valuable, surpassing Apple in October. While Nvidia currently leads in training chips, it may encounter increased competition in the inference sector.
In response to inquiries regarding the potential effects on product demand, Nvidia highlighted recent presentations that emphasize the significance of the methodology underlying the o1 model. CEO Jensen Huang has noted a growing demand for the application of its chips in inference tasks.
"Recently, we have identified a second scaling law, which pertains specifically to inference... These elements have contributed to an exceptionally high demand for Blackwell," Huang stated last month during a conference in India, referring to the company's newest AI chip.