Hello, I’m Ylli Bajraktari, CEO of the Special Competitive Studies Project. In this week’s edition of 2-2-2, Rama Elluru and Venkat Somala discuss Agentic AI - the next frontier in AI development. SCSP recently convened a meeting of private sector leaders, academics, and government representatives to discuss the early stage adoption of Agentic AI. That conversation inspired this newsletter which highlights the transformative benefits of advanced Agentic AI, the current landscape and challenges surrounding agent adoption, and recommendations to address those challenges.
SCSP is excited about our upcoming AI+ Summit Series, a set of high-level events dedicated to enabling rapid advancements in artificial intelligence as it transforms our country and becomes a keystone of our national security.
The AI + Energy Summit, the first in this series, will take place on September 26, 2024, in Washington, D.C. The series will culminate with our next AI Expo and the Ash Carter Exchange on June 2-4, 2025. We hope to see you there!
Agentic AI: The Key to Unlocking Innovation Power
Agentic AI is the latest buzz in the AI world. So what is Agentic AI and what’s it all about? Picture this: As you sip your morning coffee, you give your digital assistant – let’s call it “Agent V” – your goals for the day. You ask it to go through your email and send responses, schedule your meetings, plan your meals, and organize your tasks. Agent V, with extensive knowledge of your objectives, habits, and preferences, starts executing your goals. You could even tell Agent V to write a newsletter on agents. (Is an agent writing this newsletter? Stay with us, and we’ll tell you at the end.)
Agentic AI is the next frontier in AI. It represents a shift from passive AI systems to goal-driven, action-oriented AI systems. This advancement has the potential to revolutionize workflows across sectors, including energy, manufacturing, defense, education, and healthcare.
As SCSP’s Vision for Competitiveness report demonstrates, a nation’s innovation power determines its strength. Given Agentic AI’s capacity to transform the paradigm of innovation, the United States’ ability to harness its benefits will be a decisive factor in maintaining its geopolitical influence. That’s why the United States, along with its allies and partners, must drive and shape this technology for the benefit of democratic societies.
What is Agentic AI?
Agentic AI can pursue complex goals, needing only a human nudge in the right direction (“go write me a newsletter on agents in the style of prior newsletters and send it to all my subscribers”). Large Language Models (LLMs), or other generative AI models, respond to a prompt and produce output that a human can act upon or ignore. But an AI-enabled agent takes a prompt, breaks the goal down into subtasks, takes action, checks its work along the way, and adapts its approach as needed – all without step-by-step human guidance.
For example, an AI agent tasked with writing a newsletter on Agentic AI technology must be able to autonomously plan and execute a series of actions, including conducting research using a search engine, drafting content with a text editor, self-critiquing its draft to identify areas for improvement, finding and adding additional research to fill gaps, incorporating edits, formatting the newsletter on Substack, and distributing it via email.
Most of the agents being currently adopted are built as applications on top of foundation models (GPT-4, Claude, Gemini, or Llama) and have access to the internet, code developer environments, Gmail, and other tools. That way, the agent can get the information and access it needs to interact with various systems and perform a wide range of tasks.
By combining intelligence with agency, Agentic AI represents another long step towards artificial general intelligence, the kind that can broadly match, and even exceed, human capabilities.
The Big Question - How does Agentic AI play out in the real world?
Well, Agentic AI isn’t just here to have a digital assistant like “Agent V” make your mornings smoother; it’s set to revolutionize industries including healthcare, energy, and manufacturing, by increasing productivity and efficiency, and, thus, reducing costs.
In drug discovery, for example, advanced AI agents could accelerate each step of the scientific process. An agent could independently scour through mountains of scientific literature, generate hypotheses, simulate virtual experiments in digital labs, identify and test promising compounds, and analyze results – all autonomously because it was instructed to find a treatment for a disease. The enhanced speed and scale of intelligent AI agents could significantly reduce development timelines and result in lower costs for patients, because fewer resources were used. These advantages ultimately lower barriers to entry in the business and innovation ecosystem, fostering a more dynamic and competitive market.
Or think about the public sector. An agent could be fine-tuned on the labyrinth of the Department of Veteran Affairs’ policies, regulations and data, to streamline the provision of veterans’ benefits. The AI agent could act as a veteran’s 24/7 customer service representative, autonomously navigating benefit eligibility policies, accessing the programs that best meet their needs, and filling out their forms. This approach could promote efficiency and improve resource management by requiring fewer resources to ensure that veteran inquiries and needs are met, all while a small team of humans stands ready to step in for complex or high-consequence cases.
Agentic AI also holds promise for public-private uses, such as automating cybersecurity. These agents could conduct vulnerability scanning, patch vulnerabilities, and even develop and test possible solutions in controlled environments. By incorporating AI agents into their cybersecurity infrastructure, companies and national security entities could reduce costs by maintaining smaller cybersecurity teams and avoiding the expenses associated with a breach. Cybersecurity agents could also lower the barriers to entry for small and medium-sized enterprises, thereby diversifying the innovation ecosystem, and making it more resilient.
And agents can make our everyday lives even smoother. Imagine an AI agent that knows you so well – your preferences, goals, constraints, and habits – that it can anticipate your needs and plan your entire day or week. Personalized AI agents could act as tireless digital assistants, handling everything from scheduling and shopping to serving as your career coach or wedding planner. The result? More time, more energy, and greater access to a broader range of products and services.
Current Agentic AI Landscape
So, why isn’t your agent currently taking care of everything for you? Well, Agentic AI is still in its nascent stages, with adoption in areas like customer relationship management (CRM) and software development. Businesses are integrating Agentic AI into CRM platforms to better understand and utilize their customer data to provide more personalized services, recommend solutions for diverse customer scenarios, and execute actions through integration with other platforms like Slack and Gmail.
Similarly, Agentic AI is gaining traction in coding systems. Many companies have developed agentic coding assistants like Github’s Copilot, Amazon’s Q Developer, Cognition AI’s Devin, and Codeium which generate code from human language. These coding agents must, therefore, understand various programming languages, software architectures, and best practices, while also having access to an integrated development environment, to autonomously write and edit code.
For an Agentic AI system to successfully achieve complex goals and take actions in multi-faceted environments, it must be highly reliable. Consider an agent that was asked to plan your vacation with your friends. The travel planning agent must coordinate calendars, find destinations, check the weather, book flights and hotels, and plan activities. Even a small mistake, like overlooking time zones when booking connecting flights, could derail the entire trip. To show why high reliability matters, imagine an AI planning a vacation with 40 steps. Even if the AI is great at executing individual steps, the error rate compounded over the 40 steps decreases the accuracy rate for the larger task of planning the trip. This demonstrates why AI systems tackling complex real-world tasks need extremely high reliability – even a small error can significantly impact the overall success of a multi-step task.
Research shows that current agents are good only at simple tasks, and struggle at complex tasks. They can successfully complete nearly 60% of tasks that typically take humans 1-4 minutes, but their success rate drops to approximately 25% for more complex tasks that usually take humans 1-4 hours.
This is why the current adoption of Agentic AI is primarily focused on narrow, specialized tasks that have a ground truth answer or a mechanism to verify the agent output for accuracy. For example, today’s agents have been applied to software development with testable solutions, math problems with provable answers, and games like chess or Go with defined win conditions.
Challenges of General Agents
Coming back to your AI agent, when will Agent V be able to do your grocery shopping, keep up with your email, plan your vacations, and make your products and services cheaper to provide? That will require overcoming at least four challenges: reliability, testing and evaluation, existing systems resilience, and multi-agent orchestration.
Reliability: It is possible that agents will become more capable as the performance of foundation AI systems, upon which agents are typically built, improves. For example, in early April of this year, the highest score on the SWE-Agent benchmark, which evaluates an AI system’s ability to solve real-world software issues, was 18%. Three months later, the top score on the benchmark was 43%. This rapid progress is promising, and the reliability and performance of these models may continue to improve for two reasons.
First, Agentic AI capabilities will be enhanced as the underlying foundation models become more intelligent. For example, Agentic AI will improve as foundation models gain larger context windows – the amount of information the model can consider at once – enabling them to process more information, keep track of long-range dependencies, and handle complex tasks.
Second, research is leading to techniques, such as “chain of thought reasoning” and “multi-agent debate,” which have been shown to increase agent performance. “Chain of thought” reasoning allows an AI system to explicitly articulate its step-by-step reasoning process. Having the AI system output its chain of thought reasoning forces the model to “think” through its approach. This process often leads to improved performance, enhances transparency, and enables human and cross-agent interpretability of the AI’s decision-making process. “Multi-agent debate” involves multiple instances of the agent proposing, critiquing, and refining answers over several rounds. Each agent generates an initial response and then reviews and critiques the other agents’ answers. They iteratively update their own responses based on the feedback, which allows for parallel reasoning across agents and aims to produce a well-vetted, consensus answer that leads to greater performance. While these techniques are effective, research avenues could uncover additional novel techniques to elicit greater AI performance.
Recommendation: Industry has incentives to improve AI reliability and reasoning. However, given the economic and societal benefits of Agentic AI, and, thus, the national security implications, the United States government should pay close attention to ensure sufficient market incentives exist to advance this transformative technology. For example, the United States government can provide funding and compute resources to academia and startups, especially small- and medium-sized entities, which have promising proposals for overcoming reliability and reasoning. This support could be facilitated through programs including the National AI Research Resource (NAIRR), enabling broader participation in AI innovation and development.
Testing and Evaluation (T&E): Agentic AI adoption will depend on public trust. Public trust hinges on robust testing and evaluation of AI systems to ensure they work as intended. Unlike LLMs, however, agents can operate in varying environments and take actions. This complexity makes it difficult to predict and test for all the ways an agent will interact and respond in the world, rendering exhaustive evaluation nearly impossible.
Recommendation: Given these challenges, greater research is needed on the robust testing and evaluation methods necessary to ensure that Agentic AI systems align with human objectives and values. Industry may have incentives to prioritize investment in improving the intelligence of their models, rather than in developing testing and evaluation techniques. Thus, the United States government should facilitate T&E research through initiatives like NAIRR and the U.S. AI Safety Institute (US AISI). NAIRR would enable a wider set of actors, including academia and startups, to research and develop promising T&E techniques. The U.S. AI Safety Institute can drive collaboration on T&E research both domestically and with international partners, and help administer third-party testing. Additionally, it will be crucial to prioritize highly consequential uses of Agentic AI that require thorough evaluation.
Existing System Resilience: Introducing Agentic AI into existing systems poses risks, principally “system overload” and “uncoordinated optimization.”
“System overload” occurs when agents overwhelm existing systems with their capacity to operate at a greater speed and scale relative to humans. For instance, if an agent makes restaurant bookings at every available venue just to secure an option, the reservation system could collapse, rendering it useless for everyone. Even if the system didn’t collapse, its integrity would be challenged with the presence of agents.
“Uncoordinated optimization” between agents occurs when an agent solely optimizes for its own objective, but does not consider the long-term implications on the entire system. An example failure is a flash crash, where millions of individuals deploy different instances of a financial trading agent. If these agents, built on the same logic, simultaneously interpret market information as a signal to sell, their collective action could trigger a dramatic and potentially destabilizing stock market crash.
Both of these issues arise from introducing AI agents into our systems without fully understanding the interaction and all the implications. Strengthening our systems to be more resilient to Agentic AI actions with monitoring mechanisms, remediation processes, and product features is critical. Examples of product features include Community Notes on X, spam email filters, user verification badges, and agent rate limits. We should also leverage AI to bolster system defenses. For example, we could have cyberdefense agents scanning and patching vulnerabilities or an agent in our financial market systems checking for market manipulation or risks from multiple agents acting on their objective without regard to the entire system.
Recommendation: NIST and the U.S. AI Safety Institute should develop standards and frameworks – built upon efforts such as the AI Risk Management Framework – to promote agents safely interacting with various systems and work with the relevant agencies to test, strengthen, and ensure that the systems under the agency’s purview are resilient to the adoption of AI agents.
Multi-Agent Orchestration: As agentic technologies improve in capabilities, agents will proliferate across our digital environments. Examples of mature multi-agent applications of Agentic AI include self-driving cars that can communicate with each other, and smart city infrastructure that optimizes traffic flow and enhances safety. Others include robot factories that utilize multiple agents working in tandem to increase efficiency and productivity, and trading platforms employing different agents to represent buyers and sellers, negotiate prices, and execute trades autonomously.
A key benefit of multi-agent systems is the ability to better identify, interpret, and address errors and failures that occur in AI systems. Agents can be specialized with functions that are clearly delineated. This approach enables more efficient troubleshooting of errors and failures, as investigations can focus on the specific agent responsible for a particular function. Another advantage of multi-agent systems is scalable oversight. Multi-agent systems with built-in checks and balances can significantly improve AI robustness and reliability by creating a network of interconnected agents that monitor each other’s actions, providing distributed oversight.
Recommendation: Orchestrating reliable and trustworthy multi-agent environments will require mechanisms for these systems to communicate, coordinate, and collaborate to take action. The United States government should fund and direct NIST to develop comprehensive standards and processes for multi-agent communication, interoperability, and coordination in networked environments.
Conclusion
Agentic AI is the next milestone in AI development, representing a shift towards systems capable not only of analyzing information but also acting on it. To realize the full potential of Agentic AI in critical fields including health care, education, energy, and our everyday lives, we must improve Agentic AI reliability, testing and evaluation, system resilience, and the orchestration of multi agents. Through collaboration between industry, government, academia, and civil society, we must exercise responsibility in its development and deployment. By proactively addressing these challenges, the United States can leverage agents to unlock tremendous innovation power and ensure that this technology benefits society.
Did an agent write this newsletter? No, the technology is not quite there yet, but we did consult an LLM to be funnier. And, yes, we think a successful AI agent is one with a sense of humor.
Thank you for an informative and insightful article. I've been exploring collaborative AI agents for past 6 months and have written two blog posts at https://kiumarse.substack.com Looking forward to staying connected and possibly collaborating.