Quantifying Generative AI in Defense Applications

Defense Advanced Projects Research Agency (DARPA)
Arlington, VA

Relying on credible, concrete information is essential in high-stakes decision-making. So, how can society be sure generative artificial intelligence (AI) will be safe and effective for such applications?

Over the past century, one of humanity's most significant innovations has been the ability to move people and things quickly over large scales. Everything from bridges to jets and rockets uses mathematical foundations to understand the physical world and reliably build these systems and structures.

Yet, as society catapults into an era of exploring and applying AI to quickly deliver information to people, methods for guaranteeing the capabilities (and limitations) of generative AI systems do not exist. Neither do insights into when and why those capabilities manifest.

The Defense Advanced Projects Research Agency (DARPA) has been a long-term investor in AI research and development. With the influx of large language models, the agency continues to invest in areas that show promise in filling the fundamental gaps between state-of-the-art systems and national security applications, including the Department of Defense’s (DoD) mission-critical needs.

As decision-making becomes faster due to generative AI, the agency seeks to develop mathematical foundations for assessing generative AI and providing guarantees necessary to deploy the technology safely and effectively across the DoD and society.

According to DARPA's Artificial Intelligence Quantified (AIQ) Program Manager, Dr. Patrick Shafto — it all boils down to math.

"AI has achieved near human-level performance in domains including text generation, game playing, and such, which raises the prospect of widespread integration with human partners in the military and society," he explained. "And at the most general level, we're interested in determining how to ensure AI systems will have the properties needed to solve various problems."

AIQ will explore the hypothesis that mathematical foundations, combined with advances in measurement and modeling, will guarantee an AI system's capabilities, when they will or will not manifest, and why.

Today, if you ask a generative AI chatbot a question, there's no guarantee that it will get the answer right.

Furthermore, even slight rewordings of the same question or simply changing the order of the words can result in a completely different answer.

Shafto says that mathematical foundations, combined with advances in measurement and modeling, may unlock the solution to guarantee AI capabilities in a quantified way. And generalization is key, says Shafto.

Current AI evaluation focuses on giving AI systems quizzes, like we would give to a person. However, there is no reason to believe that the answers would be the same even for simple rewordings of the same question, nevermind real-world applications. That is, we want guarantees about generalization, and math is required for that.

Through AIQ, DARPA will work closely with partners at the National Institute of Standards and Technology (NIST) and the DoD to ensure that when AI systems are deployed in high-stakes situations, one can have confidence in predicting their performance.

Visit Here