Building out a basic RAG architecture for support an LLM

There’s no question that AI is a key technological advancement for the future. Everywhere you turn ChatGPT or other models are being discussed as ways of improving productivity. And with that, many are trying to build their own solutions so that they can feed these models with their own private data to support chat-based or agentic solutions.

So the question is, how do I bring my own data to a chat solution? The answer is RAG.

What is RAG?

So not going to lie, when I think of RAG, I am the level of nerd where my non-neurotypical brain goes directly to this “Hitchhiker’s Guide to the Galaxy”

But that’s not what we are thinking about here, even if I would say the idea of “Don’t Panic, bring RAG” works. RAG stands for Retrieval Augmented Generation”, and refers to the idea of bringing data to your model in a way that doesn’t require retraining of the model. The idea being that an LLM can be very costly to train and update, but if I can provide a vector database that the LLM knows how to read I can make it easier for the model to find the Data it needs.

RAG is an architectural pattern of providing your LLM with a data source for looking up information to help support the responses.

What did I build to help?

So to implement the following, we need an architectural solution to support using a vector database with a storage account. In short an architecture like this:

architecture-beta
    group vnet(cloud)[Virtual Network]

    service aisearch(database)[AI Search] in vnet
    service jumpbox(server)[Jump Box] in vnet
    service blob(disk)[Blob Storage] in vnet
    service logicapp(internet)[Logic App] in vnet
    service openai(internet)[Azure OpenAI] in vnet

    blob:L -- R:logicapp
    openai:L -- R:logicapp
    aisearch:T -- B:logicapp

And that will lead to the following flow:

sequenceDiagram
    participant File
    participant Blob
    participant LogicApp
    participant OpenAI
    participant AiSearch
    File->>Blob: File uploaded to blob storage
    Blob->>LogicApp: Logic app triggered to perform index
    LogicApp->>LogicApp: Chunk up the data
    LogicApp->>OpenAI: Send chunks to embeddings model
    OpenAI->>LogicApp: Receive back embedding results
    LogicApp->>AiSearch: Output vectors to database

Now the good news is, for the infrastructure, I’ve created bicep templates which will deploy the above architecture in a network isolated manner:

https://github.com/code-workbench/azure-ai-search-rag
1 forks.
0 stars.
0 open issues.

Recent commits:

Now using the above code, you can deploy this entire environment, using the following steps:

Deploying the repo

First you will need to clone the repo, and then use the following to log into the azure cli:

az cloud set --name AzureUSGovernment
az login --use-device-code

For this code, I assume you have a virtual network, if you don’t you will need to create one:

RESOURCE_GROUP_NAME="search-rag-demo-rg"
VNET_NAME="search-rag-vnet"
LOCATION="usgovvirginia"
SUBNET_NAME="default"

# Create the resource group
az group create --name $RESOURCE_GROUP_NAME --location $LOCATION

# Create the virtual network
az network vnet create --name $VNET_NAME --resource-group $RESOURCE_GROUP_NAME --subnet-name $SUBNET_NAME

Then you will want to deploy the basic environment:

RESOURCE_GROUP_NAME="search-rag-demo-rg"
PROJECT_PREFIX="rag"
ENV_PREFIX="dev1"
EXISTING_NETWORK_NAME="search-rag-vnet"
DEFAULT_TAG_NAME="environment"
DEFAULT_TAG_VALUE="search-rag"

az deployment group create --resource-group $RESOURCE_GROUP_NAME --template-file ./main.bicep --parameters project_prefix=$PROJECT_PREFIX env_prefix=$ENV_PREFIX existing_network_name=$EXISTING_NETWORK_NAME default_tag_name=$DEFAULT_TAG_NAME default_tag_value=$DEFAULT_TAG_VALUE deploy_openai=true deploy_jumpbox=true

NOTE: If you need only parts of this deployment, there are boolean parameters to deploy only parts, like blob storage, logic app, etc.

And then you will need to deploy the jumpbox (if required), run the following:

ADMIN_USERNAME=""
ADMIN_PASSWORD=""
JUMPBOX_SUBNET_ID=""

az deployment group create --resource-group $RESOURCE_GROUP_NAME --template-file ./main-jumpbox.bicep --parameters project_prefix=$PROJECT_PREFIX env_prefix=$ENV_PREFIX default_tag_name=$DEFAULT_TAG_NAME default_tag_value=$DEFAULT_TAG_VALUE admin_username=$ADMIN_USERNAME admin_password=$ADMIN_PASSWORD jumpbox_subnet_id=$JUMPBOX_SUBNET_ID

And then run the following to deploy the AOAI instance:

ADMIN_EMAIL=""
AOAI_SUBNET_ID=""

az deployment group create --resource-group $RESOURCE_GROUP_NAME --template-file ./main-aoai.bicep --parameters project_prefix=$PROJECT_PREFIX env_prefix=$ENV_PREFIX default_tag_name=$DEFAULT_TAG_NAME default_tag_value=$DEFAULT_TAG_VALUE admin_email=$ADMIN_EMAIL subnet_id=$AOAI_SUBNET_ID

From there, you can leverage the built in logic app workflow to setup the indexing of the data. You can find more details on how to do this with this article from Microsoft

What is RAG?

What did I build to help?

Deploying the repo

Leave a Reply Cancel reply

Related Posts

Keeping the lights on! – Architecting for availability?

Making sense of availability terms. SLA? RTO? RPO?