Securing the LLMOps Pipeline

8 min readJun 23, 2023

The current discussion about security AI is all about the prompt, a natural focal point: it’s the interface, generally available to the public and evoking the world’s imagination. We can clearly see the weaknesses: the ability to generate malicious content, the problems of data privacy, the threat of injection/exploitation, and general adversarial dialogue. But this is not what I want to talk about — I am interested in what is underneath.

I am interested in how we will operate our own deployments of large language models, and what happens when people start attacking them.

As corporations begin to leverage in-house LLMs, this will become a primary concern. And make no mistake about it: this will be the only way that many large businesses will be able to run AI with their own data, in the same way that we run on-prem versions of tools like Github and Jira, for the same reasons that Apple, Samsung and others have blocked their employees from using it. In fact, we just shipped a blocking capability in our own security product here at Cisco.

It’s true that, for access to large and powerful LLMs, subscription providers like Google, OpenAI and Anthropic will remain dominant for some time. But with an increasing push towards open access models (thanks Meta!), a deluge of new developments to run models more efficiently, and constantly emerging techniques like fine-tuning, distillation, and quantization, we can expect the reality of “LLMs for everyone” to arrive sometime soon. Over the long term, there is no moat.

When it comes to operating and securing our own models, we need to contend with the novel elements introduced with this new technological paradigm. The pattern at hand is this:

With new technology comes new patterns of insecurity

Think about it: email, databases, websites, the internet. All of these technological paradigms, when first deployed, were deployed and operated completely insecurely. In fact, when they were first deployed it would take a crystal ball to even understand what “insecure” even means in the new context. It is not until we actually see technology being used at scale that we discover the many ways in which we misunderstood it and its edge cases.

And, of course, what happens when we come to rely on a new technology? It becomes the target of attacks: to disrupt the business of a company, to steal or falsify information, to undertake malicious activity in a variety of ways.

Today, very few people would know what it takes to run your own LLM in production, and even fewer know what parts of it are subject to interference, meddling or unreliability at scale. But we can change that. It is easy to get a toy system running quickly on your computer so you can start to understand the different elements of the system. We can play around to get a feel for how to deploy and operate models, and start to understand what is to come in the world of LLM security.

LLMOps

Let’s imagine we want to deploy an LLM internally to help unblock employees on their day-to-day tasks. In particular, we want to give our coders access to a tool that will make them more productive without exposing data to a service provider.

For this, we will want to provide an LLM-powered internal application that is optimized for generating code and which also knows about all code and code-related artifacts within the company: where they are, how they are organized, who wrote them. To do this we will need to give it access to every internal repository in the company as well as the documentation (internal and external) around it, and we will want to update it regularly on new materials so we can use it to come up with fresh insights. What do we need to do to get this going?

The Dataset. Our own data, our crown jewels. It is going to be ingested regularly, so we will need a continuous sourcing pipeline from the relevant data sources. This means crawling doc stores, connecting to raw data sources, and then properly cleaning up and tokenizing the data. Data preparation and refinement is an area that is already mature, but note that there are aspects of the transformer architecture in our LLMs that cause the choice of tokenization techniques to uniquely impact the performance of the system.
The Model. The model is the structure of the learning machine that will do our inferencing. The models we are using today are neural language models that have been around for a while, but the transformer architectures and their self-attention mechanisms are a recent breakthrough that has caused the explosion of interest in AI. This is fueling a fast-growing community of contributors from which we can source our model. This also means that, for whatever model we choose, we will likely want to change it in the near future, so we need to support easily testing and swapping new ones.
Parameters. Our model needs to be trained on an initial dataset that will generate the parameters used to make inferences. Training is very expensive, and we will not be doing it ourselves: we will use a pre-trained model that comes with an exist set of parameters. Parameters number in the billions. The files can be huge (tens to hundreds of gigabytes), they are loaded into GPU memory and can be challenging to deal with. For our example, we will take an open access model like StarCoder, which is pre-trained on a large coding data set. The model was trained on 512 NVideo A100 80 GB GPUs, however you can run it yourself on a single A100.
Fine-tuning: This is the process of adjusting the model’s parameters to optimize it with our own dataset. Though not as intensive as the initial training that we get for free, it can still be significant and requires specialized hardware. We will need to build the capability to do automated retraining of our models for several reasons: the first is to update our model as we acquire more data. The second is that, in general, models are susceptible to degradation over time. Model performance SLIs like accuracy, recall and latency must be monitored and tracked to identify such issues and retrain when necessary.
Framework: These tools and libraries help us run our LLM in production and make it available via a set of APIs. Frameworks like these will help load and unload our models independently from the actual user interface. Frameworks also take care of optimizations like layer fusion, precision calibration, and kernel auto-tuning that help deployed models run more efficiently and effectively in production environments. There also are many different scaling and performance implications here, depending on the number of users and the amount of activity.
Interface: this is the actual application experience that our users will have, provided through APIs and UI, serving up our precious prompt, which we have all come to know and love so much.

We call this process LLMOps: the pipeline to continuously leverage updated datasets, evolve models, fine-tune parameters, and continuously deliver our system.

We will need to be able to quickly and easily version, test, verify, upgrade, swap, monitor, and deploy all of these components. We will also need experimentation infrastructure in place to allow us to try different models, test our fine-tuning, and evaluate/validate the results with real people using A/B strategies and control groups. With these capabilities in place, we will have a continuously evolving process that allows us to iterate on our internal LLM deployments.

We are already seeing many vendors entering the LLMOps space to help companies more easily handle these tasks, and leading the pack is Hugging Face with their amazing toolset and model zoo. While many bottlenecks in the current process make the state-of-the-art cumbersome and unwieldy, the industry as a whole is moving forward very quickly, and we can expect to see production LLMs proliferate in the coming years.

Security

Now that we have a model of our LLMOps pipeline, we can ask the pertinent question — how might an attacker exploit it? Little has been published so far on this topic.

Based on my research and experience with running small models, I’ve imagined a list of potential attack vectors, but please note that this is really just a thought experiment. It’s a good way to start, but as stated earlier, we will also have to wait and see what attacks emerge as these systems start getting used in production.

Dataset Attacks: The dataset has obvious security implications, as it is going to be a main target, not only for data theft but also for poisoning attacks in the preparation process. This would be handled using traditional data protection methods, which I would characterize as a mostly solved problem in the existing infosec domain. Like any dataset, ours is highly sensitive, so this is an area that needs to be handled carefully.
Malicious Models: For the model, we need to consider malicious model attacks and understand the tampering/repudiation dynamics in the model supply chain: how we can validate that our model is not malicious, that the model we downloaded is the legitimate model, that the model has not been tainted in some way, containing exploitation code or somehow re-trained.
Parameter Poisoning: Various attacks could be made on the model parameters used to make inferences, forcing them into misclassification patterns. Parameters degrade in a number of ways that can deny service by making inferences useless or misleading.
Hyperparameter Interference: We need to consider a variety of denial-of-service attacks that involve the hyperparameters of both the model and the fine-tuning process. In this way the system can be de-optimized to the point of making it useless.
Framework Disruption: the infrastructure that we use to deploy our model is highly vulnerable. There are model inversion attacks that can reconstruct the training data from the model parameters and there is the potential for the model to be altered and made unusable or otherwise non-performant.
Interface Attacks: In our example, the application interface is for internal use only, and therefore we need to ensure secure access control is in place. There are already great ways to protect internal applications using Zero Trust Access with IPS, DLP, and more (this happens to be the product I am building in Cisco right now). In the case where the interface is exposed to external users, then we need to be more concerned with prompt injections, data leakage and other attacks, which is a topic that is already covered elsewhere.

Overall, empowering our businesses with in-house LLM means developing a good understanding of LLMOps, and creating a taxonomy of vulnerabilities for the LLMOps pipeline. Some resources that I have found interesting as I start my own journey:

First, for a comprehensive list of AI resources, nothing beats A16Z’s new article, AI Canon.

On LLMOps check out,

Understanding LLMOps
DevTools for Language Models
Building LLM Apps for Production
and of course, head over to Hugging Face to run your own models and play around with them .

On LLM Security,

The OWASP Top 10 for LLMs draft was recently published
NCC Group has done an excellent analysis on practical attacks

Finally, you should definitely check out the papers for the major models to understand how they built — BLOOM, LLaMa, Vicuna, GPT-4 and more.

The potential for the proliferation of in-house LLMs is increasing every day. Expect this to be a ripe space for investment as AI adoption increases. As their value gets proven out, we will see both LLMOps and LLM Security become exciting ecosystems with their own hype cycles. You can get ahead of the curve today by experimenting on your own. If you do, reach out! I would love to hear from you.

Securing the LLMOps Pipeline

Written by John Rauser

Responses (1)