Contents

Test out Aptible AI for your team

Engineering

Engineering

Last Updated

Oct 2, 2024

0 Min Read

How to build an AI Agent for SRE (Part 1)

Eric Abruzzese

Software Engineer, Aptible AI

Eric Abruzzese

Software Engineer, Aptible AI

Engineering

Last Updated

Oct 2, 2024

0 Min Read

How to build an AI Agent for SRE (Part 1)

Eric Abruzzese

Software Engineer, Aptible AI

Build an AI Agent for SRE
Build an AI Agent for SRE

Get started building your AI SRE Agent

In this guide, we’re going to cover:

  1. Some background: why so many companies (including Aptible) are investing in building an AI Agent to supplement incident response

  2. Some philosophical considerations: decisions we made with Aptible AI, what we’ve built so far, and a few pro tips

  3. Build your own agent 101: a basic step-by-step lab to get you started on your own bot (code snippets and all)

Note: this is the first of three guides, so stay tuned!

Why use an AI Agent for incident response?

Incident response is messy. Knowledge silos end up forcing you to rely on a handful of subject matter experts and tenured engineers; SREs waste time on manual investigations during incidents at the expense of high impact work; and all of this leads to more downtime, higher MTTR, and potentially lost revenue. Aptible isn’t the first team to feel those pains, and we won’t be the last. But we’ve found a better way by implementing an AI Agent to act as an assistant to our SREs. Here’s why:

AI is really good at mundane, time consuming tasks like quickly retrieving information from disparate sources, summarizing huge amounts of data, and pattern matching. These benefits — when combined with human instinct, logic, and decision making skills — make AI ripe for augmenting on-call engineering teams during incidents.

At Aptible, we saw the potential here for our own incident response processes and decided to build an AI SRE Agent to help lower MTTR, eliminate knowledge silos, and decrease reliance on our most senior engineers.

We’ve spoken to quite a few engineering leads who are building the same type of tool for their teams, and we can all agree that building any kind of nontrivial AI assistant is hard for several reasons:

  • Given that AI is still so new, there are few established standards or best practices

  • Frameworks are changing all the time

  • So much information online is already outdated (or will be soon)

Because the demand for an AI Agent that actually works is so high — and since we’ve put so much time into building our own already — we’re going to share a detailed explanation of what we’ve built so far, the main considerations we had to take into account, and suggestions on how you can do it yourself.

What we’ve built so far + decisions we’ve made along the way


  1. Is there already a tool for that?

🤔 Considerations: If you don’t need to build something new, then you shouldn’t. There are so. many. tools on the market. And depending on your specific use case, you may be able to find one that works for you.

The impetus for Aptible’s search and eventual development of our own incident response was that we struggled with knowledge silos and had a tendency to rely on three or four subject matter experts any time we encountered an issue with a particular system.

So we started with an open source tool called Danswer to improve information retrieval (similar to its popular commercial competitor, Glean). It plugged directly into Slack and could retrieve answers to natural language questions from various information sources. The problem was that it was limited to indexed data only (i.e., just our docs and our chat history).

⚒️ What we built: What we needed was a tool that could integrate with all our other systems (not just docs and Slack). We needed to retrieve logs and metrics, application health status, and generate reports and postmortems after our incidents were over. So we designed an AI Agent that’s essentially built on a series of integrations that allow you to connect with both real-time and indexed data. More on that later!

💡 Pro tip:
Before you decide to build your own product, look into what’s already available. A great place to start might be crowdsourcing ideas from Reddit (check out this thread, for one) or checking out some of the open source tools out there (here’s a good place to start in github if you’re looking for incident response tools specifically). There’s also a long list of open source AI Agents that you could start using tomorrow.


  1. What’s your integration strategy?

🤔 Considerations: As mentioned above, the biggest consideration here is: what sort of information do you need your Agent to have access to? You could maybe get away with simply integrating it with third-party providers via an API, but if you need the integration to be more specific to your needs then you’ll need to be more thoughtful with how your integrations work.

By carefully considering what you’ll need to integrate with before you start building, you’ll save yourself some headache later on. Do you need your Agent to be able to execute custom scripts to query your databases? Do you need real-time retrieval of logs and metrics, and how will you design the Agent to retrieve that information? Will it return the link to the source? Will it return a chunk of lines of logs that you still have to manually sift through, or will it be able to deduce where the anomaly may be?

⚒️ What we built: At its core, Aptible AI is built on a series of integrations. An integration is more than just a connection to a third-party provider, it’s also a collection of configurations that are unique to how our team uses that provider. For example, Aptible AI supports multiple integrations for the same provider since we may want to use that provider in different ways. Different teams use Datadog differently and care about different metrics or use different tags, so each team can use the integration to the same tool in the way that they need.

Aptible AI supports a range of common SRE tooling, including:

  • Chat and other highly synchronous communications

  • Documentation and other knowledge repositories

  • Observability

  • Alerting

The actual implementation of these integrations fits into one of three categories of customizability:

  1. For starters, you have a basic integration that requires no customization (PagerDuty is one example). Since it’s just pulling data from PagerDuty and adding it to the AI’s context, every single team that leverages the PagerDuty integration uses it in the same way.

  2. Next, we have more customizable integrations (like the Datadog example from before) that are built on top of a generic InfluxDB integration but customized to the specific use cases of looking up container metrics and looking up restart activity.

  3. Finally, there are fully custom tools that would likely make no sense to anyone outside of Aptible (an example here would be our integration that gets containers for an application). These are entirely specific to how we run our infrastructure and can be implemented either by a lightweight PubSub interface or a websocket-based “safe” proxy.

💡 Pro tip:
Less is more! If you give the model too many tools to choose from, it can start choosing incorrect tools and confuse itself. More on that in the next section


  1. So many models, how do you pick one?!

🤔 Considerations: Here’s the thing with models… new ones pop up every day, and there are several considerations to keep in mind when choosing one (mainly to do with your specific use cases). Should you self-host? Do you need your Agent to be conversational or task-based or both? Will it be conducting simple or complex tasks? Do you need real-time performance?

There’s no need for us to go through all the models that exist since that content is already all over the place (if you want a deep dive, this is a great resource), but we can walk through the decisions that we had to make when building Aptible AI and the options we considered.

It’s a tricky process because you can’t really avoid tradeoffs. If you need your Agent to conduct complex tasks, then you’ll have to sacrifice a bit on speed and cost.

The model’s size, capability, and architecture depend heavily on whether the tasks require simple classification or highly complex reasoning and interaction. If simple, a smaller, lightweight model like a decision tree, random forest, or simple neural network would suffice. If more complex, then you may consider a more powerful model like GPT-4, BERT, or a similar transformer-based architecture.

If you choose to self-host to avoid the security headache, you’ll likely have to sacrifice on features and functionality since your self-hosted version will lag behind the hosted options.

If you need your Agent to be trained on domain-specific knowledge, then you’ll need to curate or create your own datasets for fine-tuning. See if you can get away with using a pre-trained model that’s already been trained on large datasets to avoid the data quality issue (though this may be impossible depending on the data you need your Agent to have access to).

⚒️ What we built: We’re currently using GPT-4o for Aptible AI because we believe that it’s most likely to give us the highest quality answers. However, we recognize that customers using Aptible AI may want to use their own models (including self-hosted models). As such, we’re keeping that in mind as we build.

💡 Pro tip:
Your Agent will only be as smart as the information that you give it. LLMs need help understanding how and when to use the information you give it, and if you don’t give it instructions on how to interpret information, it’ll just make something up. Spend real effort upfront curating the information you feed to your LLM!


  1. What about prompting techniques?

🤔 Considerations: You might be tempted to retrieve as much data as possible (documentation, Slack conversations, code repositories, issue trackers, etc.), throw it all at a RAG application**,** and ask it questions. But in our experience, there’s almost always going to be too much noise for this to be useful. That’s where prompt engineering comes in.

We’ve alluded to this already, but prompt engineering is a critical piece of the puzzle here (for a great overview on prompting techniques, check this out). The better your prompt engineering, the better your Agent will be.

For context, here are a few that we considered (over time) when building Aptible AI:

  • Zero-shot prompting: this is what most people do when they talk to ChatGPT; they just ask it a question then they get a response. If the response is bad, then they just ask the question differently.

  • Few-shot prompting: this is what slightly-more-experienced people do when talking to ChatGPT; they ask it a question and include examples of the output they want. You might use zero- and/or few-shot prompting for very simple tasks that the underlying model already knows how to do.

  • Retrieval Augmented Generation (RAG): this is a technique that allows the model to retrieve additional context and use it to answer the question. This is particularly useful for AI-powered document search (see also: Glean and Danswer).

  • ReAct: this technique allows an agent to generate “thoughts” and take “actions” in an iterative way to solve a problem, most similar to human reasoning. ReAct is great for moderately complex problems, like navigating references through documentation and tools in real time to compose an answer.

An important thing to keep in mind is that you can mix and match with these techniques (we’ll cover the multi-agent approach next). Here’s what we did…

⚒️ What we built: Because Aptible AI has a multi-agent structure (more on that later), we’ve implemented a mix of ReAct and RAG depending on the complexity of the task/question.

So when you ask the AI a question, we hand off all of the integrations (with instructions on how to use them) to the AI. The AI then makes decisions about what tools to call based on the information it has available to it. After each integration call, the AI has the option of deciding it has enough information to provide an answer, or deciding that additional integrations are relevant and could potentially yield additional information.

Throughout the process, we’re trying to help the AI make better decisions about what integrations to leverage via a few different mechanisms:

  • Extensive prompt engineering for the integrations, to make sure it’s really clear when and how to use each integration, as well as how to interpret the output.

  • We’ve built a self-rating system that asks the AI to self-rate the value of the response from an integration. Even when the AI makes a dumb decision in calling a integration (or provides bad inputs), it’s typically able to recognize that after the fact if you ask it to self-rate whether or not the output of the integration was useful. We can then use that to influence how much a specific output factors into a response. We can also block the AI from proceeding if it’s consistently making bad decisions.

  • We’ve implemented Naïve Bayes based on past experience. For example, if most of the time you call integration A and then B, and that yields useful results, it’s probably useful to continue doing so. The Agent can also use things like comparing to previous similar incidents to further narrow what integrations are useful, and when, in specific scenarios.

💡 Pro tip:
To avoid nonsense answers that sound correct but aren’t, be sure to take a step back and consider where your most useful information typically comes from for the problems that you’re trying to solve with AI – then design your Agent based on that.


  1. Multi-agent or single agent?

🤔 Considerations: Multi-agent approaches are becoming more popular, but they can be complicated and potentially unnecessary depending on your use case. It can be quite useful to have a team of agents working together with different techniques to solve complex problems.

For example, if you ask your bot a question in Slack that has nothing to do with your specific infrastructure (maybe you just want to know who won the World Series in 1995), you could have an Agent built on zero-shot prompting to simply act as a ChatGPT that’s integrated with your Slack (or wherever you have it).

But if your question or need is complex, it would be useful to have a team of Agents that basically act as your little research team, gathering and analyzing data from disparate sources in an intelligent way.

⚒️ What we built: Aptible AI uses a multi-agent approach, starting with a broker Agent that determines what type of question or task needs to be addressed.

💡 Pro tip: It’s easier to refactor into a multi-agent approach than out of it! So make sure you need it before you start building your Agent that way.


  1. Can’t forget about security…

🤔 Considerations: Here’s a topic that comes up a lot when we chat with Aptible AI early users. Most engineering teams eventually have to face their security team when it comes to implementing new tools, and it’s critical to ensure that the data is safe (especially if you’re working in a highly regulated industry). So the first thing you have to do is to know your organization’s AI security policy, then there are a few things you can do to protect against potential data leaks or external threats.

⚒️ What we built: For starters, we use a model that doesn’t train on our data. We're still doing a lot of discovery around what customers need regarding security, whether that's self-hosting or something else! Stay tuned.

💡 Pro tip:
Be careful with the data you give your AI access to or include in prompts, especially if that data shouldn’t be shared with the end user! If you need to include unpredictable data like logs, consider using at tool like Nightfall to ensure what’s passed to the LLM and end users is sanitized


  1. Oh, and of course, it needs to be usable!

🤔 Considerations: How do you plan to use your Agent? Does it need to have a UI? Will it be used across the organization?

You likely don’t need to spend time reinventing the wheel when it comes to the UX around your bot. Frameworks like Chainlit, Gradio, and Streamlit give you out-of-the-box tools for building user interfaces and/or integrating with your other workflow tools like Slack. Use one of these tools to start so that you can focus on getting good answers out of your Agent!

⚒️ What we built: Because our Agent was built specifically for incident response — and because we handle incidents within Slack — we mainly use Slack as our UI. It has its limitations, though, so we do our best to work around them (i.e. instead of showing that the Agent is responding by mimicking typing as seen in ChatGPT, the bot instead react to the question in Slack with an 👀 emoji). We also designed a web UI for configuration, reporting, auditing, and analytics.

💡 Pro tip:
Be sure to keep your LLM code as decoupled as you can, so that you can easily refactor away into another UX if the need arises.

Okay, let’s move on from the theoretical talk about models, techniques, and frameworks! Time to get your hands dirty and start building your own Agent.

Get started building your AI SRE Agent

In this guide, we’re going to cover:

  1. Some background: why so many companies (including Aptible) are investing in building an AI Agent to supplement incident response

  2. Some philosophical considerations: decisions we made with Aptible AI, what we’ve built so far, and a few pro tips

  3. Build your own agent 101: a basic step-by-step lab to get you started on your own bot (code snippets and all)

Note: this is the first of three guides, so stay tuned!

Why use an AI Agent for incident response?

Incident response is messy. Knowledge silos end up forcing you to rely on a handful of subject matter experts and tenured engineers; SREs waste time on manual investigations during incidents at the expense of high impact work; and all of this leads to more downtime, higher MTTR, and potentially lost revenue. Aptible isn’t the first team to feel those pains, and we won’t be the last. But we’ve found a better way by implementing an AI Agent to act as an assistant to our SREs. Here’s why:

AI is really good at mundane, time consuming tasks like quickly retrieving information from disparate sources, summarizing huge amounts of data, and pattern matching. These benefits — when combined with human instinct, logic, and decision making skills — make AI ripe for augmenting on-call engineering teams during incidents.

At Aptible, we saw the potential here for our own incident response processes and decided to build an AI SRE Agent to help lower MTTR, eliminate knowledge silos, and decrease reliance on our most senior engineers.

We’ve spoken to quite a few engineering leads who are building the same type of tool for their teams, and we can all agree that building any kind of nontrivial AI assistant is hard for several reasons:

  • Given that AI is still so new, there are few established standards or best practices

  • Frameworks are changing all the time

  • So much information online is already outdated (or will be soon)

Because the demand for an AI Agent that actually works is so high — and since we’ve put so much time into building our own already — we’re going to share a detailed explanation of what we’ve built so far, the main considerations we had to take into account, and suggestions on how you can do it yourself.

What we’ve built so far + decisions we’ve made along the way


  1. Is there already a tool for that?

🤔 Considerations: If you don’t need to build something new, then you shouldn’t. There are so. many. tools on the market. And depending on your specific use case, you may be able to find one that works for you.

The impetus for Aptible’s search and eventual development of our own incident response was that we struggled with knowledge silos and had a tendency to rely on three or four subject matter experts any time we encountered an issue with a particular system.

So we started with an open source tool called Danswer to improve information retrieval (similar to its popular commercial competitor, Glean). It plugged directly into Slack and could retrieve answers to natural language questions from various information sources. The problem was that it was limited to indexed data only (i.e., just our docs and our chat history).

⚒️ What we built: What we needed was a tool that could integrate with all our other systems (not just docs and Slack). We needed to retrieve logs and metrics, application health status, and generate reports and postmortems after our incidents were over. So we designed an AI Agent that’s essentially built on a series of integrations that allow you to connect with both real-time and indexed data. More on that later!

💡 Pro tip:
Before you decide to build your own product, look into what’s already available. A great place to start might be crowdsourcing ideas from Reddit (check out this thread, for one) or checking out some of the open source tools out there (here’s a good place to start in github if you’re looking for incident response tools specifically). There’s also a long list of open source AI Agents that you could start using tomorrow.


  1. What’s your integration strategy?

🤔 Considerations: As mentioned above, the biggest consideration here is: what sort of information do you need your Agent to have access to? You could maybe get away with simply integrating it with third-party providers via an API, but if you need the integration to be more specific to your needs then you’ll need to be more thoughtful with how your integrations work.

By carefully considering what you’ll need to integrate with before you start building, you’ll save yourself some headache later on. Do you need your Agent to be able to execute custom scripts to query your databases? Do you need real-time retrieval of logs and metrics, and how will you design the Agent to retrieve that information? Will it return the link to the source? Will it return a chunk of lines of logs that you still have to manually sift through, or will it be able to deduce where the anomaly may be?

⚒️ What we built: At its core, Aptible AI is built on a series of integrations. An integration is more than just a connection to a third-party provider, it’s also a collection of configurations that are unique to how our team uses that provider. For example, Aptible AI supports multiple integrations for the same provider since we may want to use that provider in different ways. Different teams use Datadog differently and care about different metrics or use different tags, so each team can use the integration to the same tool in the way that they need.

Aptible AI supports a range of common SRE tooling, including:

  • Chat and other highly synchronous communications

  • Documentation and other knowledge repositories

  • Observability

  • Alerting

The actual implementation of these integrations fits into one of three categories of customizability:

  1. For starters, you have a basic integration that requires no customization (PagerDuty is one example). Since it’s just pulling data from PagerDuty and adding it to the AI’s context, every single team that leverages the PagerDuty integration uses it in the same way.

  2. Next, we have more customizable integrations (like the Datadog example from before) that are built on top of a generic InfluxDB integration but customized to the specific use cases of looking up container metrics and looking up restart activity.

  3. Finally, there are fully custom tools that would likely make no sense to anyone outside of Aptible (an example here would be our integration that gets containers for an application). These are entirely specific to how we run our infrastructure and can be implemented either by a lightweight PubSub interface or a websocket-based “safe” proxy.

💡 Pro tip:
Less is more! If you give the model too many tools to choose from, it can start choosing incorrect tools and confuse itself. More on that in the next section


  1. So many models, how do you pick one?!

🤔 Considerations: Here’s the thing with models… new ones pop up every day, and there are several considerations to keep in mind when choosing one (mainly to do with your specific use cases). Should you self-host? Do you need your Agent to be conversational or task-based or both? Will it be conducting simple or complex tasks? Do you need real-time performance?

There’s no need for us to go through all the models that exist since that content is already all over the place (if you want a deep dive, this is a great resource), but we can walk through the decisions that we had to make when building Aptible AI and the options we considered.

It’s a tricky process because you can’t really avoid tradeoffs. If you need your Agent to conduct complex tasks, then you’ll have to sacrifice a bit on speed and cost.

The model’s size, capability, and architecture depend heavily on whether the tasks require simple classification or highly complex reasoning and interaction. If simple, a smaller, lightweight model like a decision tree, random forest, or simple neural network would suffice. If more complex, then you may consider a more powerful model like GPT-4, BERT, or a similar transformer-based architecture.

If you choose to self-host to avoid the security headache, you’ll likely have to sacrifice on features and functionality since your self-hosted version will lag behind the hosted options.

If you need your Agent to be trained on domain-specific knowledge, then you’ll need to curate or create your own datasets for fine-tuning. See if you can get away with using a pre-trained model that’s already been trained on large datasets to avoid the data quality issue (though this may be impossible depending on the data you need your Agent to have access to).

⚒️ What we built: We’re currently using GPT-4o for Aptible AI because we believe that it’s most likely to give us the highest quality answers. However, we recognize that customers using Aptible AI may want to use their own models (including self-hosted models). As such, we’re keeping that in mind as we build.

💡 Pro tip:
Your Agent will only be as smart as the information that you give it. LLMs need help understanding how and when to use the information you give it, and if you don’t give it instructions on how to interpret information, it’ll just make something up. Spend real effort upfront curating the information you feed to your LLM!


  1. What about prompting techniques?

🤔 Considerations: You might be tempted to retrieve as much data as possible (documentation, Slack conversations, code repositories, issue trackers, etc.), throw it all at a RAG application**,** and ask it questions. But in our experience, there’s almost always going to be too much noise for this to be useful. That’s where prompt engineering comes in.

We’ve alluded to this already, but prompt engineering is a critical piece of the puzzle here (for a great overview on prompting techniques, check this out). The better your prompt engineering, the better your Agent will be.

For context, here are a few that we considered (over time) when building Aptible AI:

  • Zero-shot prompting: this is what most people do when they talk to ChatGPT; they just ask it a question then they get a response. If the response is bad, then they just ask the question differently.

  • Few-shot prompting: this is what slightly-more-experienced people do when talking to ChatGPT; they ask it a question and include examples of the output they want. You might use zero- and/or few-shot prompting for very simple tasks that the underlying model already knows how to do.

  • Retrieval Augmented Generation (RAG): this is a technique that allows the model to retrieve additional context and use it to answer the question. This is particularly useful for AI-powered document search (see also: Glean and Danswer).

  • ReAct: this technique allows an agent to generate “thoughts” and take “actions” in an iterative way to solve a problem, most similar to human reasoning. ReAct is great for moderately complex problems, like navigating references through documentation and tools in real time to compose an answer.

An important thing to keep in mind is that you can mix and match with these techniques (we’ll cover the multi-agent approach next). Here’s what we did…

⚒️ What we built: Because Aptible AI has a multi-agent structure (more on that later), we’ve implemented a mix of ReAct and RAG depending on the complexity of the task/question.

So when you ask the AI a question, we hand off all of the integrations (with instructions on how to use them) to the AI. The AI then makes decisions about what tools to call based on the information it has available to it. After each integration call, the AI has the option of deciding it has enough information to provide an answer, or deciding that additional integrations are relevant and could potentially yield additional information.

Throughout the process, we’re trying to help the AI make better decisions about what integrations to leverage via a few different mechanisms:

  • Extensive prompt engineering for the integrations, to make sure it’s really clear when and how to use each integration, as well as how to interpret the output.

  • We’ve built a self-rating system that asks the AI to self-rate the value of the response from an integration. Even when the AI makes a dumb decision in calling a integration (or provides bad inputs), it’s typically able to recognize that after the fact if you ask it to self-rate whether or not the output of the integration was useful. We can then use that to influence how much a specific output factors into a response. We can also block the AI from proceeding if it’s consistently making bad decisions.

  • We’ve implemented Naïve Bayes based on past experience. For example, if most of the time you call integration A and then B, and that yields useful results, it’s probably useful to continue doing so. The Agent can also use things like comparing to previous similar incidents to further narrow what integrations are useful, and when, in specific scenarios.

💡 Pro tip:
To avoid nonsense answers that sound correct but aren’t, be sure to take a step back and consider where your most useful information typically comes from for the problems that you’re trying to solve with AI – then design your Agent based on that.


  1. Multi-agent or single agent?

🤔 Considerations: Multi-agent approaches are becoming more popular, but they can be complicated and potentially unnecessary depending on your use case. It can be quite useful to have a team of agents working together with different techniques to solve complex problems.

For example, if you ask your bot a question in Slack that has nothing to do with your specific infrastructure (maybe you just want to know who won the World Series in 1995), you could have an Agent built on zero-shot prompting to simply act as a ChatGPT that’s integrated with your Slack (or wherever you have it).

But if your question or need is complex, it would be useful to have a team of Agents that basically act as your little research team, gathering and analyzing data from disparate sources in an intelligent way.

⚒️ What we built: Aptible AI uses a multi-agent approach, starting with a broker Agent that determines what type of question or task needs to be addressed.

💡 Pro tip: It’s easier to refactor into a multi-agent approach than out of it! So make sure you need it before you start building your Agent that way.


  1. Can’t forget about security…

🤔 Considerations: Here’s a topic that comes up a lot when we chat with Aptible AI early users. Most engineering teams eventually have to face their security team when it comes to implementing new tools, and it’s critical to ensure that the data is safe (especially if you’re working in a highly regulated industry). So the first thing you have to do is to know your organization’s AI security policy, then there are a few things you can do to protect against potential data leaks or external threats.

⚒️ What we built: For starters, we use a model that doesn’t train on our data. We're still doing a lot of discovery around what customers need regarding security, whether that's self-hosting or something else! Stay tuned.

💡 Pro tip:
Be careful with the data you give your AI access to or include in prompts, especially if that data shouldn’t be shared with the end user! If you need to include unpredictable data like logs, consider using at tool like Nightfall to ensure what’s passed to the LLM and end users is sanitized


  1. Oh, and of course, it needs to be usable!

🤔 Considerations: How do you plan to use your Agent? Does it need to have a UI? Will it be used across the organization?

You likely don’t need to spend time reinventing the wheel when it comes to the UX around your bot. Frameworks like Chainlit, Gradio, and Streamlit give you out-of-the-box tools for building user interfaces and/or integrating with your other workflow tools like Slack. Use one of these tools to start so that you can focus on getting good answers out of your Agent!

⚒️ What we built: Because our Agent was built specifically for incident response — and because we handle incidents within Slack — we mainly use Slack as our UI. It has its limitations, though, so we do our best to work around them (i.e. instead of showing that the Agent is responding by mimicking typing as seen in ChatGPT, the bot instead react to the question in Slack with an 👀 emoji). We also designed a web UI for configuration, reporting, auditing, and analytics.

💡 Pro tip:
Be sure to keep your LLM code as decoupled as you can, so that you can easily refactor away into another UX if the need arises.

Okay, let’s move on from the theoretical talk about models, techniques, and frameworks! Time to get your hands dirty and start building your own Agent.

Get started building your AI SRE Agent

In this guide, we’re going to cover:

  1. Some background: why so many companies (including Aptible) are investing in building an AI Agent to supplement incident response

  2. Some philosophical considerations: decisions we made with Aptible AI, what we’ve built so far, and a few pro tips

  3. Build your own agent 101: a basic step-by-step lab to get you started on your own bot (code snippets and all)

Note: this is the first of three guides, so stay tuned!

Why use an AI Agent for incident response?

Incident response is messy. Knowledge silos end up forcing you to rely on a handful of subject matter experts and tenured engineers; SREs waste time on manual investigations during incidents at the expense of high impact work; and all of this leads to more downtime, higher MTTR, and potentially lost revenue. Aptible isn’t the first team to feel those pains, and we won’t be the last. But we’ve found a better way by implementing an AI Agent to act as an assistant to our SREs. Here’s why:

AI is really good at mundane, time consuming tasks like quickly retrieving information from disparate sources, summarizing huge amounts of data, and pattern matching. These benefits — when combined with human instinct, logic, and decision making skills — make AI ripe for augmenting on-call engineering teams during incidents.

At Aptible, we saw the potential here for our own incident response processes and decided to build an AI SRE Agent to help lower MTTR, eliminate knowledge silos, and decrease reliance on our most senior engineers.

We’ve spoken to quite a few engineering leads who are building the same type of tool for their teams, and we can all agree that building any kind of nontrivial AI assistant is hard for several reasons:

  • Given that AI is still so new, there are few established standards or best practices

  • Frameworks are changing all the time

  • So much information online is already outdated (or will be soon)

Because the demand for an AI Agent that actually works is so high — and since we’ve put so much time into building our own already — we’re going to share a detailed explanation of what we’ve built so far, the main considerations we had to take into account, and suggestions on how you can do it yourself.

What we’ve built so far + decisions we’ve made along the way


  1. Is there already a tool for that?

🤔 Considerations: If you don’t need to build something new, then you shouldn’t. There are so. many. tools on the market. And depending on your specific use case, you may be able to find one that works for you.

The impetus for Aptible’s search and eventual development of our own incident response was that we struggled with knowledge silos and had a tendency to rely on three or four subject matter experts any time we encountered an issue with a particular system.

So we started with an open source tool called Danswer to improve information retrieval (similar to its popular commercial competitor, Glean). It plugged directly into Slack and could retrieve answers to natural language questions from various information sources. The problem was that it was limited to indexed data only (i.e., just our docs and our chat history).

⚒️ What we built: What we needed was a tool that could integrate with all our other systems (not just docs and Slack). We needed to retrieve logs and metrics, application health status, and generate reports and postmortems after our incidents were over. So we designed an AI Agent that’s essentially built on a series of integrations that allow you to connect with both real-time and indexed data. More on that later!

💡 Pro tip:
Before you decide to build your own product, look into what’s already available. A great place to start might be crowdsourcing ideas from Reddit (check out this thread, for one) or checking out some of the open source tools out there (here’s a good place to start in github if you’re looking for incident response tools specifically). There’s also a long list of open source AI Agents that you could start using tomorrow.


  1. What’s your integration strategy?

🤔 Considerations: As mentioned above, the biggest consideration here is: what sort of information do you need your Agent to have access to? You could maybe get away with simply integrating it with third-party providers via an API, but if you need the integration to be more specific to your needs then you’ll need to be more thoughtful with how your integrations work.

By carefully considering what you’ll need to integrate with before you start building, you’ll save yourself some headache later on. Do you need your Agent to be able to execute custom scripts to query your databases? Do you need real-time retrieval of logs and metrics, and how will you design the Agent to retrieve that information? Will it return the link to the source? Will it return a chunk of lines of logs that you still have to manually sift through, or will it be able to deduce where the anomaly may be?

⚒️ What we built: At its core, Aptible AI is built on a series of integrations. An integration is more than just a connection to a third-party provider, it’s also a collection of configurations that are unique to how our team uses that provider. For example, Aptible AI supports multiple integrations for the same provider since we may want to use that provider in different ways. Different teams use Datadog differently and care about different metrics or use different tags, so each team can use the integration to the same tool in the way that they need.

Aptible AI supports a range of common SRE tooling, including:

  • Chat and other highly synchronous communications

  • Documentation and other knowledge repositories

  • Observability

  • Alerting

The actual implementation of these integrations fits into one of three categories of customizability:

  1. For starters, you have a basic integration that requires no customization (PagerDuty is one example). Since it’s just pulling data from PagerDuty and adding it to the AI’s context, every single team that leverages the PagerDuty integration uses it in the same way.

  2. Next, we have more customizable integrations (like the Datadog example from before) that are built on top of a generic InfluxDB integration but customized to the specific use cases of looking up container metrics and looking up restart activity.

  3. Finally, there are fully custom tools that would likely make no sense to anyone outside of Aptible (an example here would be our integration that gets containers for an application). These are entirely specific to how we run our infrastructure and can be implemented either by a lightweight PubSub interface or a websocket-based “safe” proxy.

💡 Pro tip:
Less is more! If you give the model too many tools to choose from, it can start choosing incorrect tools and confuse itself. More on that in the next section


  1. So many models, how do you pick one?!

🤔 Considerations: Here’s the thing with models… new ones pop up every day, and there are several considerations to keep in mind when choosing one (mainly to do with your specific use cases). Should you self-host? Do you need your Agent to be conversational or task-based or both? Will it be conducting simple or complex tasks? Do you need real-time performance?

There’s no need for us to go through all the models that exist since that content is already all over the place (if you want a deep dive, this is a great resource), but we can walk through the decisions that we had to make when building Aptible AI and the options we considered.

It’s a tricky process because you can’t really avoid tradeoffs. If you need your Agent to conduct complex tasks, then you’ll have to sacrifice a bit on speed and cost.

The model’s size, capability, and architecture depend heavily on whether the tasks require simple classification or highly complex reasoning and interaction. If simple, a smaller, lightweight model like a decision tree, random forest, or simple neural network would suffice. If more complex, then you may consider a more powerful model like GPT-4, BERT, or a similar transformer-based architecture.

If you choose to self-host to avoid the security headache, you’ll likely have to sacrifice on features and functionality since your self-hosted version will lag behind the hosted options.

If you need your Agent to be trained on domain-specific knowledge, then you’ll need to curate or create your own datasets for fine-tuning. See if you can get away with using a pre-trained model that’s already been trained on large datasets to avoid the data quality issue (though this may be impossible depending on the data you need your Agent to have access to).

⚒️ What we built: We’re currently using GPT-4o for Aptible AI because we believe that it’s most likely to give us the highest quality answers. However, we recognize that customers using Aptible AI may want to use their own models (including self-hosted models). As such, we’re keeping that in mind as we build.

💡 Pro tip:
Your Agent will only be as smart as the information that you give it. LLMs need help understanding how and when to use the information you give it, and if you don’t give it instructions on how to interpret information, it’ll just make something up. Spend real effort upfront curating the information you feed to your LLM!


  1. What about prompting techniques?

🤔 Considerations: You might be tempted to retrieve as much data as possible (documentation, Slack conversations, code repositories, issue trackers, etc.), throw it all at a RAG application**,** and ask it questions. But in our experience, there’s almost always going to be too much noise for this to be useful. That’s where prompt engineering comes in.

We’ve alluded to this already, but prompt engineering is a critical piece of the puzzle here (for a great overview on prompting techniques, check this out). The better your prompt engineering, the better your Agent will be.

For context, here are a few that we considered (over time) when building Aptible AI:

  • Zero-shot prompting: this is what most people do when they talk to ChatGPT; they just ask it a question then they get a response. If the response is bad, then they just ask the question differently.

  • Few-shot prompting: this is what slightly-more-experienced people do when talking to ChatGPT; they ask it a question and include examples of the output they want. You might use zero- and/or few-shot prompting for very simple tasks that the underlying model already knows how to do.

  • Retrieval Augmented Generation (RAG): this is a technique that allows the model to retrieve additional context and use it to answer the question. This is particularly useful for AI-powered document search (see also: Glean and Danswer).

  • ReAct: this technique allows an agent to generate “thoughts” and take “actions” in an iterative way to solve a problem, most similar to human reasoning. ReAct is great for moderately complex problems, like navigating references through documentation and tools in real time to compose an answer.

An important thing to keep in mind is that you can mix and match with these techniques (we’ll cover the multi-agent approach next). Here’s what we did…

⚒️ What we built: Because Aptible AI has a multi-agent structure (more on that later), we’ve implemented a mix of ReAct and RAG depending on the complexity of the task/question.

So when you ask the AI a question, we hand off all of the integrations (with instructions on how to use them) to the AI. The AI then makes decisions about what tools to call based on the information it has available to it. After each integration call, the AI has the option of deciding it has enough information to provide an answer, or deciding that additional integrations are relevant and could potentially yield additional information.

Throughout the process, we’re trying to help the AI make better decisions about what integrations to leverage via a few different mechanisms:

  • Extensive prompt engineering for the integrations, to make sure it’s really clear when and how to use each integration, as well as how to interpret the output.

  • We’ve built a self-rating system that asks the AI to self-rate the value of the response from an integration. Even when the AI makes a dumb decision in calling a integration (or provides bad inputs), it’s typically able to recognize that after the fact if you ask it to self-rate whether or not the output of the integration was useful. We can then use that to influence how much a specific output factors into a response. We can also block the AI from proceeding if it’s consistently making bad decisions.

  • We’ve implemented Naïve Bayes based on past experience. For example, if most of the time you call integration A and then B, and that yields useful results, it’s probably useful to continue doing so. The Agent can also use things like comparing to previous similar incidents to further narrow what integrations are useful, and when, in specific scenarios.

💡 Pro tip:
To avoid nonsense answers that sound correct but aren’t, be sure to take a step back and consider where your most useful information typically comes from for the problems that you’re trying to solve with AI – then design your Agent based on that.


  1. Multi-agent or single agent?

🤔 Considerations: Multi-agent approaches are becoming more popular, but they can be complicated and potentially unnecessary depending on your use case. It can be quite useful to have a team of agents working together with different techniques to solve complex problems.

For example, if you ask your bot a question in Slack that has nothing to do with your specific infrastructure (maybe you just want to know who won the World Series in 1995), you could have an Agent built on zero-shot prompting to simply act as a ChatGPT that’s integrated with your Slack (or wherever you have it).

But if your question or need is complex, it would be useful to have a team of Agents that basically act as your little research team, gathering and analyzing data from disparate sources in an intelligent way.

⚒️ What we built: Aptible AI uses a multi-agent approach, starting with a broker Agent that determines what type of question or task needs to be addressed.

💡 Pro tip: It’s easier to refactor into a multi-agent approach than out of it! So make sure you need it before you start building your Agent that way.


  1. Can’t forget about security…

🤔 Considerations: Here’s a topic that comes up a lot when we chat with Aptible AI early users. Most engineering teams eventually have to face their security team when it comes to implementing new tools, and it’s critical to ensure that the data is safe (especially if you’re working in a highly regulated industry). So the first thing you have to do is to know your organization’s AI security policy, then there are a few things you can do to protect against potential data leaks or external threats.

⚒️ What we built: For starters, we use a model that doesn’t train on our data. We're still doing a lot of discovery around what customers need regarding security, whether that's self-hosting or something else! Stay tuned.

💡 Pro tip:
Be careful with the data you give your AI access to or include in prompts, especially if that data shouldn’t be shared with the end user! If you need to include unpredictable data like logs, consider using at tool like Nightfall to ensure what’s passed to the LLM and end users is sanitized


  1. Oh, and of course, it needs to be usable!

🤔 Considerations: How do you plan to use your Agent? Does it need to have a UI? Will it be used across the organization?

You likely don’t need to spend time reinventing the wheel when it comes to the UX around your bot. Frameworks like Chainlit, Gradio, and Streamlit give you out-of-the-box tools for building user interfaces and/or integrating with your other workflow tools like Slack. Use one of these tools to start so that you can focus on getting good answers out of your Agent!

⚒️ What we built: Because our Agent was built specifically for incident response — and because we handle incidents within Slack — we mainly use Slack as our UI. It has its limitations, though, so we do our best to work around them (i.e. instead of showing that the Agent is responding by mimicking typing as seen in ChatGPT, the bot instead react to the question in Slack with an 👀 emoji). We also designed a web UI for configuration, reporting, auditing, and analytics.

💡 Pro tip:
Be sure to keep your LLM code as decoupled as you can, so that you can easily refactor away into another UX if the need arises.

Okay, let’s move on from the theoretical talk about models, techniques, and frameworks! Time to get your hands dirty and start building your own Agent.

Test out Aptible AI for your team

Hands-on lab, part 1: set up your application and give it a “brain”

As previously mentioned, this will be the first of several labs designed to help you build your own interactive AI SRE Agent, with a focus on simplicity and iterative learning. Each installation will lay out a specific goal then walk you through the implementation step-by-step, explaining each decision along the way.


  1. Set up a Chainlit application

Before we go delving into the endless rabbit hole of building AI, we’re going to set ourselves up for success by setting up Chainlit, a popular framework for building conversational assistant interfaces.

Why Chainlit?

Chainlit provides an opinionated set of building blocks for modeling conversational interactions — like threads, messages, and steps — as well as a ChatGPT-like user interface for interacting with the LLM.

It also offers out-of-the-box integrations with popular chat tools like Slack and Teams, as well as libraries for interfacing with popular tooling like React and FastAPI, so you can build it into a larger application, if you want.

In short: Chainlit is going to eliminate a lot of the scaffolding and grunt work for us so that we can focus on developing our AI assistant and getting feedback from our users, instead of fiddling with UI and configuration.

The Goal

By the end of this lab, you’ll have a working Chainlit application that will simply echo back what you say. We’ll jump into the AI integration in the next article.

Prerequisites

Before we get started, you’ll need to get set up with a few things:

  1. A working Python 3.12+ environment. We recommend using pyenv.

  2. A Python package manager. We’ll be using Poetry, but you can use whatever you’re comfortable with.

Once you’re set up, continue on.

Project Setup

First, set up your project, and add chainlit as a dependency:


mkdir roger
cd roger
poetry init --no-interaction
poetry add chainlit
Chainlit Boilerplate

Next, create an app.py file in the root of your project with the following content:


import chainlit as cl


@cl.on_message
async def handle_message(message: cl.Message) -> None:
    # Echo the message back to the user.
    await cl.Message(
        content=f"Received: {message.content}",
    ).send()

The code above is registering the handle_message function with Chainlit, so that any time a message is received, this function will run.

For the moment, our function simply echoes the message back to the user, prefixed with “Received: ”.

Try it out

Finally, spin it up! You can use --watch to hot-reload your code when you make changes.


poetry run chainlit run app.py --watch

Running this command will start your Chainlit app and open your browser to its UI, where you can send a message and get a response back:


  1. Make your application smarter by connecting an LLM

With our Chainlit app scaffolded, we can connect it to an LLM so that we can talk to it and get a human-like response.

We’ll use OpenAI’s hosted gpt-4o model for simplicity, but using another provider is just a matter of syntax.

The Goal

By the end of this article, you’ll be able to prompt the gpt-4o model and get a response, similar to how you’d interact with ChatGPT. We’ll also make sure that the bot maintains conversation context so that you can ask follow-up questions.

Prerequisites

Before you get started, you’ll need:

  1. An OpenAI account and an API key

Configure an OpenAI API client

First, we’ll configure an API client to interface with OpenAI’s APIs. Add the following code to the top of your app.py:


import os
from openai import AsyncOpenAI

##
# Settings
#
try:
    OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
except KeyError as ex:
    raise LookupError(f"Missing required environment variable: {ex}")
    
    
client = AsyncOpenAI(api_key=OPENAI_API_KEY)

# ...
Send Messages to the LLM

Next, we’ll need to update our handle_message function to send the user’s message to OpenAI and get a response instead of just echoing it back. Replace your handle_message function with this one:


# ...

@cl.on_message
async def handle_message(message: cl.Message) -> None:
    # Retrieve the response from the LLM
    response = await client.chat.completions.create(
        messages=[{"content": message.content, "role": "user"}],
        model="gpt-4o",
    )

    await cl.Message(content=response.choices[0].message.content).send()
Try it out

Now, if you run your application (or if you left it running with the --watch flag), you’ll be able to ask a question and get a response.

Curing Amnesia

If you’ve played around a bit and asked follow-up questions, you may have noticed that the bot doesn’t “remember” anything you’ve talked about. For example:

This is happening because every time we send a message, we’re sending only that one message to the LLM, which has no notion of the “conversation” by default.

To cure this amnesia, we’ll need to send all of the messages in the conversation every time we send a new one.

Chainlit makes this easy for us by providing a cl.chat_context.to_openai() helper, which gives us all of the messages exchanged so far, conveniently in the format that OpenAI (and most other providers) expects.

Update your handle_message function to prepend historical messages before the latest one:


# ...

@cl.on_message
async def handle_message(message: cl.Message) -> None:
    # Retrieve the response from the LLM
    response = await client.chat.completions.create(
        messages=[
            # Prepend all previous messages to maintain the conversation.
            *cl.chat_context.to_openai(),
                {"content": message.content, "role": "user"}
            ],
        model="gpt-4o",
    )

    await cl.Message(content=response.choices[0].message.content).send()

Now we can ask follow-up questions!

Coming soon…

In Part 2, we’ll show you how to make your Agent faster for a better user experience (this will be particularly helpful before we move on to Part 3 where we cover tool calls). Stay tuned!

Hands-on lab, part 1: set up your application and give it a “brain”

As previously mentioned, this will be the first of several labs designed to help you build your own interactive AI SRE Agent, with a focus on simplicity and iterative learning. Each installation will lay out a specific goal then walk you through the implementation step-by-step, explaining each decision along the way.


  1. Set up a Chainlit application

Before we go delving into the endless rabbit hole of building AI, we’re going to set ourselves up for success by setting up Chainlit, a popular framework for building conversational assistant interfaces.

Why Chainlit?

Chainlit provides an opinionated set of building blocks for modeling conversational interactions — like threads, messages, and steps — as well as a ChatGPT-like user interface for interacting with the LLM.

It also offers out-of-the-box integrations with popular chat tools like Slack and Teams, as well as libraries for interfacing with popular tooling like React and FastAPI, so you can build it into a larger application, if you want.

In short: Chainlit is going to eliminate a lot of the scaffolding and grunt work for us so that we can focus on developing our AI assistant and getting feedback from our users, instead of fiddling with UI and configuration.

The Goal

By the end of this lab, you’ll have a working Chainlit application that will simply echo back what you say. We’ll jump into the AI integration in the next article.

Prerequisites

Before we get started, you’ll need to get set up with a few things:

  1. A working Python 3.12+ environment. We recommend using pyenv.

  2. A Python package manager. We’ll be using Poetry, but you can use whatever you’re comfortable with.

Once you’re set up, continue on.

Project Setup

First, set up your project, and add chainlit as a dependency:


mkdir roger
cd roger
poetry init --no-interaction
poetry add chainlit
Chainlit Boilerplate

Next, create an app.py file in the root of your project with the following content:


import chainlit as cl


@cl.on_message
async def handle_message(message: cl.Message) -> None:
    # Echo the message back to the user.
    await cl.Message(
        content=f"Received: {message.content}",
    ).send()

The code above is registering the handle_message function with Chainlit, so that any time a message is received, this function will run.

For the moment, our function simply echoes the message back to the user, prefixed with “Received: ”.

Try it out

Finally, spin it up! You can use --watch to hot-reload your code when you make changes.


poetry run chainlit run app.py --watch

Running this command will start your Chainlit app and open your browser to its UI, where you can send a message and get a response back:


  1. Make your application smarter by connecting an LLM

With our Chainlit app scaffolded, we can connect it to an LLM so that we can talk to it and get a human-like response.

We’ll use OpenAI’s hosted gpt-4o model for simplicity, but using another provider is just a matter of syntax.

The Goal

By the end of this article, you’ll be able to prompt the gpt-4o model and get a response, similar to how you’d interact with ChatGPT. We’ll also make sure that the bot maintains conversation context so that you can ask follow-up questions.

Prerequisites

Before you get started, you’ll need:

  1. An OpenAI account and an API key

Configure an OpenAI API client

First, we’ll configure an API client to interface with OpenAI’s APIs. Add the following code to the top of your app.py:


import os
from openai import AsyncOpenAI

##
# Settings
#
try:
    OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
except KeyError as ex:
    raise LookupError(f"Missing required environment variable: {ex}")
    
    
client = AsyncOpenAI(api_key=OPENAI_API_KEY)

# ...
Send Messages to the LLM

Next, we’ll need to update our handle_message function to send the user’s message to OpenAI and get a response instead of just echoing it back. Replace your handle_message function with this one:


# ...

@cl.on_message
async def handle_message(message: cl.Message) -> None:
    # Retrieve the response from the LLM
    response = await client.chat.completions.create(
        messages=[{"content": message.content, "role": "user"}],
        model="gpt-4o",
    )

    await cl.Message(content=response.choices[0].message.content).send()
Try it out

Now, if you run your application (or if you left it running with the --watch flag), you’ll be able to ask a question and get a response.

Curing Amnesia

If you’ve played around a bit and asked follow-up questions, you may have noticed that the bot doesn’t “remember” anything you’ve talked about. For example:

This is happening because every time we send a message, we’re sending only that one message to the LLM, which has no notion of the “conversation” by default.

To cure this amnesia, we’ll need to send all of the messages in the conversation every time we send a new one.

Chainlit makes this easy for us by providing a cl.chat_context.to_openai() helper, which gives us all of the messages exchanged so far, conveniently in the format that OpenAI (and most other providers) expects.

Update your handle_message function to prepend historical messages before the latest one:


# ...

@cl.on_message
async def handle_message(message: cl.Message) -> None:
    # Retrieve the response from the LLM
    response = await client.chat.completions.create(
        messages=[
            # Prepend all previous messages to maintain the conversation.
            *cl.chat_context.to_openai(),
                {"content": message.content, "role": "user"}
            ],
        model="gpt-4o",
    )

    await cl.Message(content=response.choices[0].message.content).send()

Now we can ask follow-up questions!

Coming soon…

In Part 2, we’ll show you how to make your Agent faster for a better user experience (this will be particularly helpful before we move on to Part 3 where we cover tool calls). Stay tuned!

Hands-on lab, part 1: set up your application and give it a “brain”

As previously mentioned, this will be the first of several labs designed to help you build your own interactive AI SRE Agent, with a focus on simplicity and iterative learning. Each installation will lay out a specific goal then walk you through the implementation step-by-step, explaining each decision along the way.


  1. Set up a Chainlit application

Before we go delving into the endless rabbit hole of building AI, we’re going to set ourselves up for success by setting up Chainlit, a popular framework for building conversational assistant interfaces.

Why Chainlit?

Chainlit provides an opinionated set of building blocks for modeling conversational interactions — like threads, messages, and steps — as well as a ChatGPT-like user interface for interacting with the LLM.

It also offers out-of-the-box integrations with popular chat tools like Slack and Teams, as well as libraries for interfacing with popular tooling like React and FastAPI, so you can build it into a larger application, if you want.

In short: Chainlit is going to eliminate a lot of the scaffolding and grunt work for us so that we can focus on developing our AI assistant and getting feedback from our users, instead of fiddling with UI and configuration.

The Goal

By the end of this lab, you’ll have a working Chainlit application that will simply echo back what you say. We’ll jump into the AI integration in the next article.

Prerequisites

Before we get started, you’ll need to get set up with a few things:

  1. A working Python 3.12+ environment. We recommend using pyenv.

  2. A Python package manager. We’ll be using Poetry, but you can use whatever you’re comfortable with.

Once you’re set up, continue on.

Project Setup

First, set up your project, and add chainlit as a dependency:


mkdir roger
cd roger
poetry init --no-interaction
poetry add chainlit
Chainlit Boilerplate

Next, create an app.py file in the root of your project with the following content:


import chainlit as cl


@cl.on_message
async def handle_message(message: cl.Message) -> None:
    # Echo the message back to the user.
    await cl.Message(
        content=f"Received: {message.content}",
    ).send()

The code above is registering the handle_message function with Chainlit, so that any time a message is received, this function will run.

For the moment, our function simply echoes the message back to the user, prefixed with “Received: ”.

Try it out

Finally, spin it up! You can use --watch to hot-reload your code when you make changes.


poetry run chainlit run app.py --watch

Running this command will start your Chainlit app and open your browser to its UI, where you can send a message and get a response back:


  1. Make your application smarter by connecting an LLM

With our Chainlit app scaffolded, we can connect it to an LLM so that we can talk to it and get a human-like response.

We’ll use OpenAI’s hosted gpt-4o model for simplicity, but using another provider is just a matter of syntax.

The Goal

By the end of this article, you’ll be able to prompt the gpt-4o model and get a response, similar to how you’d interact with ChatGPT. We’ll also make sure that the bot maintains conversation context so that you can ask follow-up questions.

Prerequisites

Before you get started, you’ll need:

  1. An OpenAI account and an API key

Configure an OpenAI API client

First, we’ll configure an API client to interface with OpenAI’s APIs. Add the following code to the top of your app.py:


import os
from openai import AsyncOpenAI

##
# Settings
#
try:
    OPENAI_API_KEY = os.environ["OPENAI_API_KEY"]
except KeyError as ex:
    raise LookupError(f"Missing required environment variable: {ex}")
    
    
client = AsyncOpenAI(api_key=OPENAI_API_KEY)

# ...
Send Messages to the LLM

Next, we’ll need to update our handle_message function to send the user’s message to OpenAI and get a response instead of just echoing it back. Replace your handle_message function with this one:


# ...

@cl.on_message
async def handle_message(message: cl.Message) -> None:
    # Retrieve the response from the LLM
    response = await client.chat.completions.create(
        messages=[{"content": message.content, "role": "user"}],
        model="gpt-4o",
    )

    await cl.Message(content=response.choices[0].message.content).send()
Try it out

Now, if you run your application (or if you left it running with the --watch flag), you’ll be able to ask a question and get a response.

Curing Amnesia

If you’ve played around a bit and asked follow-up questions, you may have noticed that the bot doesn’t “remember” anything you’ve talked about. For example:

This is happening because every time we send a message, we’re sending only that one message to the LLM, which has no notion of the “conversation” by default.

To cure this amnesia, we’ll need to send all of the messages in the conversation every time we send a new one.

Chainlit makes this easy for us by providing a cl.chat_context.to_openai() helper, which gives us all of the messages exchanged so far, conveniently in the format that OpenAI (and most other providers) expects.

Update your handle_message function to prepend historical messages before the latest one:


# ...

@cl.on_message
async def handle_message(message: cl.Message) -> None:
    # Retrieve the response from the LLM
    response = await client.chat.completions.create(
        messages=[
            # Prepend all previous messages to maintain the conversation.
            *cl.chat_context.to_openai(),
                {"content": message.content, "role": "user"}
            ],
        model="gpt-4o",
    )

    await cl.Message(content=response.choices[0].message.content).send()

Now we can ask follow-up questions!

Coming soon…

In Part 2, we’ll show you how to make your Agent faster for a better user experience (this will be particularly helpful before we move on to Part 3 where we cover tool calls). Stay tuned!

Want a sneak peek at Part 2?

Want a sneak peek at Part 2?

Want a sneak peek at Part 2?

Don't want to build your own? Try Aptible AI.

Don't want to build your own? Try Aptible AI.

Don't want to build your own? Try Aptible AI.

© APTIBLE INC.

© APTIBLE INC.