LLM Product Patterns: The General, The Judge, and the Vizier
The advent of widely available language models has created a tremendous amount of excitement, as well as a decent amount of chaos. Understanding how everything fits together can be difficult.
At Cisco, we have dozens of product and service groups that are working to harness the power of Large Language Models, building on a long history of productizing Machine Learning solutions. I have been talking to many of these people to learn what use cases are being explored and how they are implementing them. I feel immense gratitude for having access to such a large community of AI professionals here at Cisco where we can learn from a diverse set of business contexts ranging from networking and security to applications and collaboration.
From my discussions with AI leaders at Cisco I can see three broad product patterns emerging. To explain them, I like to use the metaphor of a royal court, where the council gathers to help rule the kingdom (your product) and take on some of the difficult work of serving the people (your users). Each member of the council plays a different role, and while they are all indisposed to help the realm, they think and operate in different ways:
- The Vizier, “How may I help?”
- The Judge, “Here’s what I think … “
- The General, “ Tell me what to do!”
The Vizier
The first pattern I call the Vizier. They are trained on a body of knowledge that we want to give people access to. “How my I help?”, the Vizier asks, who is here to help people make sense of challenging problems, a Socratic partner of sorts. They will answer questions to explain complex material or help turn a goal into an appropriate course of action. The Vizier may be an advisor to the royal family, or they may go directly to the people and help them with their problems. The Vizier is always ready and waiting, ready to serve when called upon.
In this pattern, the user interacts directly with the LLM to get access to information available in the model. It is an inbound workflow, where users are deliberately and knowingly interacting with an AI. The pattern is usually implemented as either a conversational agent or co-pilot. The distinction between the two is important — a chatbot-style agent has an open natural language interface, while the co-pilot is constrained to making suggestions that guide the user through a task or workflow (Think Clippy, but it actually helps).
This pattern is the most obvious and prevalent, typified of course, by just plain-old ChatGPT. Most companies will already be looking at possibilities to simplify their product experiences or enhance self-service workflows using this pattern. For example, at Cisco, we have a vast body of training materials that customers need to use to understand their products. Simplifying access to that information is a real, tangible, and immediate benefit to our customers, and it is one of the first and earliest Gen AI capabilities we have launched. In fact, we now have a Vizier interacting directly with users as a bot in our forums (we make sure to let users know they are interacting with a bot, of course).
The Judge
The next pattern is The Judge. Here we have an expert who can look at a situation and use their knowledge and skills to provide opinions, summaries, and judgments to our users. “Here’s what I think”, says the Judge, who is adept at boiling down a complex situation into something that is understandable and useful. The Judge can sometimes get things wrong, so we may need to review their opinion before actually presenting it to the user, perhaps by a jury of their peers (i.e. other LLMS), or even the user’s peers (other humans). Sometimes, the user does not agree with the decision, so they need the ability to appeal and ask for a second, third, or fourth judgment.
In this pattern, we take a user’s situation and interpret it through an LLM. This is an outbound workflow, where the user generally does not get to interact directly with the Model — we are providing them with the output of the model to meet a need their need for a specific context. We do need to disclose to the user that the content is AI-generated and warn appropriately so they can inspect it and assess it accordingly. We may allow the user to regenerate the content in a different way if the results are poor, and we need to collect feedback so that we can continuously improve the quality of the output and use it in reinforcement loops.
This pattern shows up anywhere we want to help our users by generating the content they need to complete their tasks. The main activities are summarization, completion, and categorization. Examples include summarizing the recording of a meeting into a set of notes, boiling down a case into a summary, or analyzing a set of data for patterns. I would also put the coding use cases under this category, like Github Co-pilot, which I would actually not call a ‘co-pilot’ as it is really a completion pattern in the style of The Judge.
It is interesting to note that both the above patterns also call upon new design patterns In our products. Normally products are supposed to get things right — if we cannot get the thing right, why would anyone use it? But here, we are generating original content, and the user’s opinion of it is subjective, so we can give the user controls to generate content differently to meet their expectations. For example, Midjourney offers interesting methods to recreate your picture without changing the prompt by adding more or less variance, expanding or changing the perspective, and so on.
It’s also important to that we take a moment to acknowledge the Expert Paradox, wherein the user is asking for help because they don’t know the answer, but only an expert would be able to determine whether the output is accurate. LLM output is very convincing in a way that avoids setting off our spider-sense. Users need to be properly instructed to avoid treating the Vizier like an Oracle, or the Judge like a God, and attention to end-user feedback becomes somewhat of an ethical obligation and part of the commitment to responsible use of AI.
The General
The final pattern is The General. Here we have an agent who can help you execute your plans. “Tell me what to do!”, The General says. Give them orders, and they will carry them out to the best of their ability, returning dutifully with the results. The job must be done correctly, and there is a right way and a wrong way to do it. We have to make sure that the General is extremely well trained on the tasks being given because the stakes are high and there is a lot on the line: coming back with the job done wrong means the plan will fail.
In this pattern, we are placing the LLM in the middle of a workflow and using it to accomplish some or all of the steps in that workflow. The user does not know that the LLM is involved in what they are doing. The LLM is being used in the backend as a kind of logic engine or embedded kernel that can take an input and produce a desired output in a sequence of operations. The goal is to improve a piece of the workflow by simplifying a complex or repetitive task to produce output that can be used in downstream operations.
Examples in this pattern involve incorporating operations like classification, categorization, labeling, and translation into a larger workflow. For example, in security products, we may want to categorize a document so that we can decide whether it is content that needs to be blocked. The user has created a policy to block malicious or unwanted content, and we will use the LLM to classify it as such. Clearly, if we start blocking content that should be allowed we will cause problems for the customer, worse if we allow content that should be blocked.
The General holds a huge amount of promise if we can get it to work. The opportunities are truly incredible. However, it is also the pattern with the most difficulty. There are a few problems:
- The probability problem: we need to remind ourselves that an LLM is a probability engine, not a logic engine, and there is a non-zero probability of generating the “wrong” answer. When we are building application workflows we need to have a guarantee of accuracy. What level of correctness are we okay with? What happens if the model is incorrect? In what ways can we check and monitor correctness?
- The explain-ability problem: it is difficult to ascertain why an LLM has arrived at a given decision. This is problematic because when we have a failure in reasoning, we don’t know why it happened or what we need to do to correct it. Is there a problem with the data? In the model? In the embeddings? In the prompt? When you change these things, how do you know if you are helping or hurting the model?
- The data problem: We’ve been collecting data for years, so there is a lot of it to feed the LLM for our tasks, right? Wrong! A lot of the training data that we need to do these kinds of tasks just isn’t there. Most datasets do not contain all the contextual details that are needed: before LLMs came along, we generally would not store entire documents and the lengthy descriptions that an LLM needs to train itself.
- The drift problem: over time, our world may no longer match the world that was captured by the LLM. At some point the meaning of words will change, categories will expand or contract, and new concepts and ideas enter the milieu. At what point this happens is never certain, nor is it clear how we would introduce these changes into the LLM.
While Generative AI has proven incredible at (surprise!) generating content, using LLMs in logical workflows is still an area of exploration. With the Vizier and the Judge, the output is up for interpretation, and acceptable answers fall within a range of responses. We can let users play with the output to get what they are looking for. With the General, the constraints on “correctness” are strict — right or wrong. It appears that LLMs will take some time to mature in this area.
LLM-Powered product development is coalescing into a set of standard implementations. These patterns identified here are a first cut at trying to understand the broad strokes for LLMs in our applications. I would love to hear your feedback!