The "A3" Paradox: Designing LLM Applications When Less is More
The Hard Choices in Every LLM Deployment
This month’s AI Center of Excellence roundtable will be led by Tapas Moturu, Chief Architect at Intuit, a founding member of Decibel’s AI Center of Excellence and co-author of the A3 Theorem. Inspired by the CAP Theorem for distributed data stores, the A3 Theorem details the tradeoffs between Applicability, Adaptability, and Affordability (together, the AAA, or “A3”) of LLMs when deployed in production.
AI has inspired every product and engineering leader over the past several quarters, and 2024 is the year that our inspiring proof-of-concepts will finally move into production. In the lab, AI seems magical and capable of solving incredibly complex problems without much prompting. As we scale up its use, we often encounter the practical limitations of LLMs and must consider the important tradeoffs of price, performance, and ultimately, ROI. What seems like an infinite canvas from which to design AI-native experience begins to narrow down to very specific tradeoffs of cost, accuracy, and ability. Is there an optimal solution to these complex constraints?
Many executives have wrestled with these tensions, and we are grateful to our friends Tapas Moturu, Mallik Mahalingam, and the team at Intuit who have developed a full library of LLMs powering experiences for >100 M consumers and SMB customers across their TurboTax, Credit Karma, Quickbooks, and Mailchimp platforms. Underlying their optimization choices is the A3 Theorem, which suggests that every LLM app can optimize for any two but not all three critical properties in an AI-native use case:
Applicability (how often does the model perform the task with consistency and quality)
Adaptability (general purpose and adaptiveness vs narrow domain or task specificity)
Affordability (the underlying cost and/or the speed and latency of the task)
The Paradox of Choice: Why Choosing 1 or 2 is Better than 3
The “A3” theorem mirrors the CAP theorem espoused by Brewer, which states that any distributed system or data store can simultaneously provide only two of three guarantees: consistency, availability, and partition tolerance (CAP). After several years of moving AI enabled products from prototype and production, the Intuit team leverages the A3 Theorem to drive much faster decisions on LLM product design, increasing the velocity and efficiency of their pipeline of LLM powered features. We believe it serves as an expert guide to organizations seeking to deploy AI into production, and have distilled some of the key lessons to help executives effectively navigate the paradox and perils of choosing what to trade off.
A3 Theorem: How to Choose Your Path Wisely
Design Principle #1: Users Are Impatient.
Most AI-native experiences are initially designed to be real-time and interactive, such as customer service chatbots or co-pilots. In this case, it is often desirable to have wide ranging and low latency interactions to keep users engaged. In practice the economics of supporting thousands of highly accurate responses ends up being too expensive as a design principle. A best practice for handling low latency with lower accuracy experiences is to design co-pilot experiences with human-in-the-loop interaction to get real time feedback. In general we all would rather get an imperfect interaction that we can react to quickly than wait for something that is perfect.
If the use case is synchronous, speed (affordability) and adaptability (range) generally trump applicability (quality).
Design Principle #2: Finding the Truth Can Take Time.
If your AI application can afford delays, you are more likely to be able to engineer an accurate outcome for a broad range of possible user interactions. For example, the combination of multiple models to perform diverse tasks and confirm results will yield a high quality end product provided the AI process can run asynchronously. It is often helpful if users are given step by step explainability while they wait, which is rapidly becoming a best practice in AI-native design. We can all find a little extra patience if an application is working to keep us engaged.
If a use case is asynchronous, applicability (quality) and adaptability (range) trump affordability (speed)
Design Principle #3: When Performance Matters, Train for Results.
If an AI product is expected to generate revenue, investing in higher accuracy with low latency is likely justified to ensure superior customer experience and competitive differentiation. There are also specific vertical use cases such as handling health information or financial data where errors can have significant consequences. In these cases, fine tuning or skilling a model for a unique purpose is the best way to achieve the desired results. Often a smaller and / or open source option will yield the highest price / performance as they are not easily distracted once properly instructed.
If a use case is domain specific and / or strategic, applicability (quality) and affordability (speed) trumps adaptability (range)
Strategic Levers to Pull
While the priority of applicability, adaptability, and affordability ultimately drives the architectural decision and AI-native design, there are other critical levers that can be pulled to create the most optimal results:
Selecting Open vs. Closed Source Models: In recent months the quality of models from proprietary and open source providers have reached near parity, giving developers many options for deployment. Open source models can be more cost-effective but often require more out-of-the-box tuning to achieve the desired applicability, or accuracy. Closed source models, while potentially offering higher accuracy out-of-the-box, often come with higher cost when trying to customize. Smaller models tend to be easier to skill when all is said and done.
Creating High Quality LLM Ops Pipelines: Effective AI deployment hinges on high-quality, well-managed data engineering pipelines for pre-training, fine tuning, and ultimately ensuring the accuracy and adaptability of AI models. This includes data cleaning, enrichment, and high quality training datasets to improve model performance across various scenarios. Data pipelines are essential to repeatable deployment success, and are one of the only ways to upgrade your specialized skills as new foundation models are released.
Scalability and Elasticity of GPU Infrastructure: As AI applications grow, the underlying training and inference infrastructure must scale accordingly. Cloud-based solutions offer flexibility and scalability, allowing organizations to adjust resources as demand changes. Hardware pricing and availability can play a significant role in the overall cost and latency of your deployment. In general, it is better to rent than to own if you are not one of the few companies that can claim to be “GPU Rich” vs. “GPU Poor.”
Designing Novel AI Experiences vs. Copilots: One of the great opportunities in any application is to change the design to embed AI seamlessly into the user experience, moving beyond traditional chatbots and copilots to an experience that is AI-native. This unlocks some of the constraints of the A3 Theorem, as often a new user experience offers an unconstrained path to create products that are accurate, applicable, and cost-effective. The opportunities are only limited by our imagination!
Conclusion and Thank You
The A3 Theorem is one of the first major architectural frameworks to provide clarity on the difficult choices that product and engineering leaders must make in every LLM deployment. A special thank you again to Tapas Moturu and Mallik Mahalingam from Intuit who introduced us to the A3 Theorem at our AI Pioneers Summit last October and have open sourced the framework for all. We are excited to host you at our AI Center of Excellence roundtable to share more advice on how to navigate these choices in your LLM deployment.
Tapas Moturu and Mallik Mahalingam at the AI Pioneers Summit 2023