Hello, I’m Ylli Bajraktari, CEO of the Special Competitive Studies Project. In this week’s edition of 2-2-2, PJ Maykish, Abby Kukura, Nyah Stewart, and I discuss the three megatrends leading toward artificial general intelligence (AGI) - a topic many on our team have been tracking since our early days with the National Security Commission on Artificial Intelligence (NSCAI).
AI+ Summit Series
SCSP is excited about our upcoming AI+ Summit Series, a set of high-level events dedicated to enabling rapid advancements in artificial intelligence as it transforms our country and becomes a keystone of our national security.
The AI+ Energy Summit, the first in this series, will take place on September 26, 2024, in Washington, D.C. We recently announced Dr. Brian Spears, Dr. Steven Cowley, Dr. Philippe Larochelle, and Dr. Peter Battaglia as speakers - stay tuned to see who else will join us! The next event in this series is the AI+ Robotics Summit and it will take place on October 23, 2024 – keep an eye out for more details soon! The AI+ Summit Series will culminate with our next AI+ Expo and the Ash Carter Exchange on June 2-4, 2025. We hope to see you there!
We are excited to introduce new video segments to our newsletters! Check out Ylli & PJ’s overview of today’s newsletter, AGI Will Arrive in Three Ways.
AGI Will Arrive in Three Ways
As artificial intelligence gets stronger, we are being forced to grapple with the mysteries of the human mind once more. The adult brain has approximately 86 billion neurons. There are trillions of connections between neurons in your mind alone, creating a network far greater than that connected to the internet, which involves only tens of billions of connected devices. This comparison underscores the brain's unparalleled complexity and processing power, which exceed even our most advanced tech creations. With this complexity comes a lack of consensus over how our minds actually work. Why do we dream, for example? Some think dreams help us process emotions or memories, while others believe they're a way for our brain to practice for real-life situations. The truth is, we're still deciphering the dream code. And what about memory? How can a smell or a song suddenly transport us back to a specific moment from childhood? While we know a lot about the brain structures involved in memory, the precise mechanisms of how memories are encoded, stored, and retrieved remain elusive. Like the journey of understanding our minds, it is essential to the human experience to step back and imagine how a more powerful form of artificial intelligence (AI) will arrive in leaps, including the next 12-18 months.
By examining three pivotal megatrends, we can imagine the rapid development and deployment of a markedly more powerful form of AI that will lead to artificial general intelligence or AGI. We can also imagine the possibility of these three megatrends collectively producing something greater.
First, generative AI models like GPT-4 will continue to improve. If you were impressed with GPT-4, imagine GPT-7 and its competitors. This phenomenon is generally referred to as the LLM “scaling” hypothesis, as large generative AI models scale their performance by getting bigger, faster, and stronger (see below chart). This path is driven by the few global labs that have the funding, know-how, compute, and power to build these advanced models. The scaling vector of improved AI performance will likely continue unless, or until, scale “breaks,” meaning LLM innovation halts or plateaus due to energy costs, the amount of data available, or the money needed to build them compared to the returns of spending more. More-general AI may be so useful that hyperscale companies may find technological offsets to these scaling barriers. Sam Altman has stated he sees no plateau in sight of the scale vector. Dario Amodei has argued that it isn’t necessary to fixate on the specific day of “when” AGI will arrive. Instead, the simple phenomena is that these models are getting better and better and may already be better than even the best humans at some things. Dario further notes that models today are being trained at $1B and the next foundation models–as soon as 2025–will cost $10B to create. Yet even if scale does “break,” LLMs will plateau at some useful cost point and continue to serve as the user interface (UI) between people and AI systems-of-systems because LLMs “speak our language.”
Note: Some leading-edge models lack publicly available data on FLOP and are therefore missing from the graph, such as Gemini 1.5.
Sources: Data on Notable AI Models, Epoch AI (last accessed 2024); Alan D. Thompson, GPT-5, Life Architect (last accessed 2024); MMLU Benchmark (Multi-task Language Understanding), Papers With Code (last accessed 2024); How Bad Will the AI Power Crunch Be?, Special Competitive Studies Project (2024); Sarah Chudleigh, Everything You Should Know About GPT-5, Botpress (2024); Norges Bank Investment Management, In Good Company Podcast: Dario Amodei - CEO of Anthropic, YouTube at 13 minutes (2024); Dylan Patel, et.al, Multi-Datacenter Training: OpenAI's Ambitious Plan To Beat Google's Infrastructure, SemiAnalysis (2024); Yih-Khai Wong, How Many Data Centers Are There and Where Are They Being Built?, Abi Research (2024); Cade Metz, Robots Learn, Chatbots Visualize: How 2024 Will Be AI's 'Leap Forward', New York Times (2024); Matthias Bastian, Sam Altman says GPT-5 could be a "Significant Leap Forward," But There's Still "A Lot of Work to Do", The Decorder (2024).
Second, new approaches to specific AI functions will both improve and combine with each other to perform tasks that are more human-like, while simultaneously supporting the scaling megatrend by improving “effective compute.” A survey of several of these AI capabilities presents a simple thought experiment about what kind of AI humans will use when these vectors combine. If you combine today’s AI with capabilities akin to human functions such as better reasoning, planning, creativity, memory, and sensing, as well as the ability to orchestrate the inputs from over 700 specialized GPT models, what would you have? The answer is something much more general. The second megatrend has economic consequences, as it opens a dynamic space in which small and medium-sized companies can flourish.
As the global AI community continues to innovate daily, new paradigms and functions are constantly emerging. In this rapidly evolving field, these stand out as the most significant advancements today:
Multimodality. The ability to fuse data from different kinds of sensors and sources grants an AI system capabilities akin to the human senses. Consider how modern electric vehicles combine cameras, radar, lidar, and ultrasonic sensors into one advanced driver-assistance system. That is a modern example of multimodality and, when connected to an AI system, it is like providing an algorithm with a synthetic equivalent to the human senses. Multimodality is a central concept to AI systems, and it bridges the cyber-physical domain to impact life beyond a computer screen and includes subjects ranging from computer vision to digital smell. The sub tech-vectors in multimodality are vast but include this year’s “Segment Anything Model” (SAM), which can accurately identify and isolate any object or region of interest within an image or video, even without specific prior training on those particular objects or scenes.
Chain-of-thought (CoT) and reasoning. The ability for an AI application to “reason” refers to the ability of a system to use logic and inference to draw conclusions, make predictions, or solve problems based on available information. It involves understanding relationships, identifying patterns, and evaluating evidence in a manner similar to human thought processes. CoT is a part of this and involves an algorithm breaking complex problems down into a series of smaller, more manageable steps. Just as humans often use intermediate thoughts and logical deductions to arrive at a conclusion, CoT prompting encourages AI models to generate intermediate reasoning steps before reaching a final answer.
Planning/scaffolding/framing. Framing or scaffolding combines planning and reasoning into something akin to strategy. As the engineer/investor Leopold Aschenbrenner states, “Think of CoT++: rather than just asking a model to solve a problem, have one model make a plan of attack, have another propose a bunch of possible solutions, have another critique it, and so on,” using a family of AI planning applications. AI models are advancing towards human-like strategy and problem-solving, utilizing a combination of tools and algorithm techniques to generate results greater than the sum of their parts.
Agentic AI. This is a class of AI systems designed to act as autonomous agents that can do general-purpose work like completing a task end-to-end using planning and software tool-calling skills. These systems are capable of making decisions, interacting with their environments without requiring constant human intervention, and even interacting agent-to-agent (“multi-agent”) via the internet to learn from each other, cooperate and even develop their own language. Agentic AI systems emphasize goal-oriented behavior and adaptive decision-making, often leveraging advanced algorithms and sensory inputs to execute actions in real-time and learn from continuous feedback. The bleeding edge of this subject includes automated design of agentic systems (ADAS) where combining the building blocks to automate design is possible. AI Agents have even been created to replicate each person on a software development team in a functioning multi-agent framework. These agents then collaborate using standardized operating procedures and a shared “memory” to complete complex tasks, producing outputs like project requirements, design documents, and even functional code.
Specialization/Pruning/Quantization. Specialization refers to taking pre-trained AI models like LLAMA-2 and tailoring or finetuning them for a specific purpose. GitHub Co-Pilot, for an early example, is a code-completing app drawn from OpenAI’s general Codex model that works in various programming languages (think of the way word processing tools can complete a sentence for you). Specialization has emerged as a computer science phenomena that can amplify the power of large open-source LLMs as it allows others to take expensive, highly trained LLMs and specialize them without incurring such high expenses. Model pruning involves removing unnecessary or redundant parameters (weights) from a trained model to reduce model size and complexity, leading to faster inference and potentially lower memory requirements. Quantization reduces the precision of the model's numerical representations (weights and activations). This typically involves converting 32-bit floating-point numbers (the standard in most training) to lower-precision formats like 16-bit or even 8-bit integers.
Liquid networks. Otherwise known as liquid neural networks (LNNs), liquid networks are a compute-efficient type of neural network designed for enhanced adaptability and robustness in handling time-series data. Unlike traditional neural networks with fixed architectures, LNNs possess a "liquid" quality, allowing them to dynamically adjust their internal parameters and equations in response to new incoming data, even after the initial training phase is complete.
Text-to-action; embedded AI, cyber-physical advances. Since 2000, AI has been used to optimize industrial processes to help create zero-downtime-systems. Now text-to-action puts that technology in the hands of individuals on the internet of things. In AI studies, "text to action" refers to the process of an AI system interpreting and understanding written or spoken language, translating that understanding into specific, executable actions like "set a timer for 10 minutes," and then setting a timer. These capabilities will grow to take on more complex action. Rather than just setting timers, they will be able to create autonomous cyber attack systems that can execute tailored-access operations like “turning off” the power in Washington, DC. Additionally, AI systems functioning in stand alone machines – embedded AI – are accelerating robotics and manufacturing as independent trends to follow.
Mixture of Experts (MOE) and Composition of Experts (COE). Specialization of generative AI models takes on a whole new meaning if hundreds of them can be coordinated to act as one in a mixture of experts. This AI front has two main paths. The first path is MOE. MOE uses a gating network (or router) that dynamically selects which expert models to activate for a given input, based on its characteristics. The second is COE. In a COE, multiple smaller expert models are chained together to form a larger, more powerful model. Each expert processes the output of the previous one, allowing them to specialize in different levels of abstraction or aspects of the problem.
Memory/context strings. A context string or context window refers to the maximum number of tokens (words or subwords) that the model can consider at once when processing a prompt or generating a response. It's analogous to the AI system's "short-term memory" or its ability to retain and utilize information from the immediate conversation or document. As Aschenbrenner notes it is extraordinary that “Gemini 1.5 Pro, with its 1M+ token context, was even able to learn a new language… from scratch.”
Inference engines. Inference engines in AI are like the brains behind the operation, taking in information and using it to make decisions or predictions. They take the knowledge stored within a trained AI model and apply it to new, unseen data, allowing the AI to understand and respond to real-world situations. This is crucial for a wide range of AI applications, from self-driving cars making split-second decisions based on sensor data, to chatbots understanding and responding to human language. In essence, inference engines are what enable AI systems to move from theoretical models to practical, real-world problem-solvers.
Co-piloting and program synthesis. The trend of AI systems working on computer code has followed a basic three-step progression. First, it helped a human to complete code (co-piloting), then it refined/tested code (fine-tuning), and, finally, it now autonomously generates its own code to solve problems given by a human (program synthesis). In 2017, program synthesis was described by two researchers as the holy grail for AI. "The grand dream of program synthesis is to make programmers obsolete. The holy grail is to be able to simply state one’s intent in some natural form, and have the computer automatically synthesize an efficient program that meets that intent." Seven years later, this front is advancing and you can observe certain freely available models that will pause to build code for whatever task was in your prompt.
Grounded search. This is a technique that aims to reduce hallucinations in model performance (generating incorrect or nonsensical information) by anchoring the model's responses in reliable external knowledge sources such as the internet. Grounded search is one way for a non-living AI system to have an external reference for accuracy once it has been trained. Relatedly, Retrieval-Augmented Generation (RAG) involves augmenting an LLM with external knowledge sources like databases or private documents. This enables the model to access up-to-date information and generate more factual and contextually relevant responses.
RLHF and Auto-HF. Reinforcement learning with human feedback (RLHF) is what you do when you “thumbs up/thumbs down” an LLM response to your question. You give the machine feedback, telling it if it succeeded, or if it did not. Automatic human feedback (Auto-HF) is the ability for an AI model to autonomously get the human feedback it needs to formulate outputs, such as automatically characterizing and machine learning from human references from videos on the internet. Auto-HF allows a trained model to predict or approximate human feedback on its generated outputs, without requiring direct input from human evaluators for each instance. The “Constitutional AI” technique is an example of auto-HF where the core idea is to use AI feedback itself, rather than relying solely on human labels, to ensure the AI behaves in accordance with a set of values, or a constitution, determined by the model developers. This constitution, often called a reward model, is first trained on a dataset of human preferences or rankings of AI-generated outputs.
Compositionality and creativity. When an image generator creates artwork based on the input of human words, or an LLM is able to generate a decent new poem, this reflects early signs of AI compositionality that is like humans. AI creativity appeared in 2016 when AlphaGo's unexpected move 37 in its victory over Lee Sedol shocked the Go world and would change how humans play the game. Compositionality and creativity are key challenges in AI research on the path toward AGI as these qualities enable systems to go beyond simple pattern recognition and demonstrate a deeper understanding of the underlying structure and meaning of their inputs to co-create with us.
Learning with less/one-shot/no-shot learning. In a nutshell, machine learning with Less data is about being “compute efficient” with your training resources. One-shot learning is about learning quickly from minimal examples like an AI model seeing one picture of an animal and then recognizing it like a human would. Zero-shot learning is about generalizing knowledge to completely new situations.
Reflexion. Reflexion allows an AI model to generate multiple diverse reasoning chains or "thoughts" before arriving at a final answer. These intermediate thoughts are then evaluated, and the likeliest correct one is selected as the output. This process mimics the human thought process of reflection, where multiple possibilities are considered before making a decision. It is easy to imagine how this computer science front could combine with AI agents to make them more capable of general applications.
Attention modeling. Attention modeling – the ability to measure (descriptively) and guide (prescriptively) where an AI system turns for data to perform its tasks – is not new but it continues to grow in significance. Attention mechanisms are being used in computer vision tasks like image recognition, object detection, and image captioning. They enable models to focus on the most relevant parts of an image. For advanced scientific discovery, attention-based models are being used to analyze complex scientific data, such as protein sequences and molecular structures and can identify patterns difficult for humans to detect. In these ways, attention modeling helps AI systems process information more intelligently and effectively, thus contributing to more general and sophisticated AI applications.
A third way AGI will arrive is by transforming the fundamental technologies upon which AI depends – such as compute, data, microelectronics, networks, and energy. Leaps in these fundamental parts of the AI stack will take AI performance to another level.
Compute. Different paradigms like quantum, neuromorphic, and reversible computing, plus existing forms of classical supercomputing, will grant both strengths and weaknesses for AI performance. We should expect the potential of large QBIT, fault-tolerant quantum computing to arrive before 2030. While quantum computing is not a driver of AI innovation per se, when this new paradigm of compute arrives, AI systems will have a powerful external source for modeling the real world. Thus, a real frontier lies in integrating the advantages of each kind of computing into one functioning system optimized for AI performance. SCSP has previously called for dominating the hybrid computing space in a way that resembles U.S. leadership in the design of microelectronics currently (i.e. as a future economic position worth achieving as a nation).
Microelectronics. Both the scaling of LLMs and the advancement of novel paradigms previously mentioned are both based in whole or in part on improvements in microelectronics. For example, both the context strings that give an AI system human-like memory capabilities and COE algorithms that can orchestrate pull over 700 specialized GPTs into concert were only made possible by the development of new microelectronics.
Data science. Innovation in data management includes advancements in the data center itself, in better organizing and labeling data, and in identifying novel sources of data for AI systems. Hyperscale companies like Microsoft are opening a new data center every three days. Building them better, faster, and cheaper is its own broad field. On organization, labeling data (structuring data to be machine-readable for AI use) was a barrier for machine learning systems in 2021. Now, auto-labeling has arrived: the AI systems can do the labeling autonomously. This means that machine learning models can now be trained much faster and more efficiently. Another profound front in data innovation is access to different categories of data for training and grounding AI models such as open internet-available, synthetic/simulated, multimodal, provided databases, and proprietary “future” data such as that in scientific labs. Innovation across the entire data science vertical could fundamentally improve AI performance and scale.
Networks, IOT, and edge. As networks advance, this will free-up AI applications to function more like a system of systems. Think of super-fast internet networks like optical networks or 6G as lightning-fast highways for information. With these highways, AI systems connected to the Internet of Things (IoT) can exchange massive amounts of data in the blink of an eye. This means AI systems can react and make decisions almost instantly, like a self-driving car avoiding an accident before you even see the danger. This super-speed also allows AI models to learn from the constant stream of data from IoT devices in real-time, making it smarter and more capable every second. Basically, these advanced networks transform AI models from a brainy but slow thinker into a super-fast, super-smart decision-maker, revolutionizing how it interacts with our connected world.
Energy. Imagine a world where new forms of renewable energy like fusion drive energy costs closer to “zero.” Cheaper energy would lower the price point for owning and operating complex AI systems (including data storage, transport, training). Yet as SCSP has noted, there is much that needs to be done to reach this positive future.
The likelihood is that these three megatrends will develop, interact, intertwine and combine to create the fourth way to AGI. Big questions remain about how the megatrends will combine: when it will happen, who will do it, and which drivers will matter most. Imagine a combination of a future LLM or AI-agent as a user interface, with AI systems that combine better memory, reasoning, creativity, and learning plus advanced modeling on quantum computers integrated with classical computers like the one you are using now. As PsiQuantum has suggested, that combination has the potential to transform whole industries.
The resulting path to AGI will proceed in four simple stages of 1) AI of today, 2) some intermediate stage that is “more-general” as discussed by the NSCAI, 3) AGI that begins to disrupt (positively and negatively) whole verticals of the economy, and 4) something better than humans in some aspects called artificial super-intelligence (ASI). As the confluence of these megatrends accelerates us towards the threshold of AGI, getting positioned and organized for the arrival of this general purpose technology is a defining challenge of our era.
This is excellent, but...you might consider writing a separate post, playing the devil's advocate by laying out the many reasons AGI might *not* happen within the next 20 years, to say nothing of ASI. Beginning with definitional limitations (such as sliding semantics scales), and including other factors such as expected limits to scaling laws, inability to move from R&D into scaled production, economic & energy constraints, potential public backlash, unintended societal consequences, global disparities between the AI haves and have-nots, and so on.