Getting started with LLMs, understanding the jargon

This past week I’ve been doing a lot of research into LLMs and how to use them. You can’t look at technology news without seeing AI and LLM solutions everywhere. There is zero question that LLMs are the future of software. Now, Generative AI solutions, and chat based capabilities are pretty common. My own family uses these solutions to help with a variety of things.

But for me, I’m interested in Agentic AI solutions and their implications. These represent a unique change in how we build software based solutions. And when you start to peel the layers back, you start to see the possibilities and how endless they are. The idea of feeding models information such as uptime / availability logs and teaching AI to identify availability problems and make realtime adjustments is a pretty compelling use-case.

Cybersecurity is another scenario, where teaching a model to figure out gaps in your security or even identify attacks and respond is really powerful. As I’ve started to examine this space, there is a lot of new terminology out there when we talk about LLMs and digging through the jargon. Below is an outlining of the terms.

So here are the different terms to help navigate what you are looking at: What is the difference between GPT-4 and GPT-4o? The machine learning models as part of this.  GPT is the architecture of the model itself, specifically “Generate PreTrained Transformers”, and it denotes the type of model being implemented.  Each new version of these models have been trained on larger and larger datasets and refined to provide greater support.
What are prompts? The request to the model. In the form of a chat message when you interact with a UI, but this can also be passed via an API in a coded interaction. This is the primary difference between AI Chat and Agentic AI. The models are the same, but the difference is where the prompts originate.

What is prompt engineering? This refers to the idea of refining prompts. The simple fact is that depending on the model you may need to give more or less information to ensure the right answer to your questions. But given that Agentic AI systems are passing prompts back to a model from a data source, with model versions changing there is a need to go back and test / evaluate your prompts to ensure the best result. This creates an interesting tech-debt / automated testing process of when you change models, what happens.

What are tokens? Tokens are how interactions are broken down.  This includes the input from the users, and then sent back, and the context for windows.  Instead of understanding at word / character, it looks like how it breaks things down for semantic ranking. 

What is semantic ranking? When an LLM receives a prompt, it will take those words, and assign a numeric value to each of them to inform the context of the words for when it tries to generate a response. This process is called semantic ranking, and is critical to how LLMs break things down into tokens and figure out what parts of your prompts are important and what are not.

What are embeddings? Embeddings are the mechanism for the LLM to under the text for semantic ranking.  So this is the next step in the process driving the understanding of the implementation.  The quality of the embeddings has a big impact on the effectiveness of the LLM, so depending on the model used to perform embeddings can help with the math around implementing.  Now, you can augment an LLMs knowledge by enabling solutions like RAG (Retrieval Augmented Generation) and create a database of embeddings that the model can search.

What are attention mechanisms? The short version is that LLMs perform semantic math on a question to attach weight to each word that is coming back.  The goal being to identify the words that are important to the questions.  The context is then used for processing.  This conversation math continues to grow the longer the conversation goes, and the implications of it.  This is why the LLM starts to hallucinate, and it losses context. 

What are hallucinations? The short version is that LLMs perform semantic math on a question to attach weight to each word that is coming back.  The goal being to identify the words that are important to the questions.  The context is then used for processing.  This conversation math continues to grow the longer the conversation goes, and the implications of it.  This is why the LLM starts to hallucinate, and it losses context.  It’s trying to keep a record of weights of what words are important.  When you shift an idea, you should start a new chat.  Some LLMs will push to a new window.  The question is something specific, and will produce a “appeared answer” but is made up. 

What are context windows? The context window is a measure of the working memory of a LLM. If you think about this as if you and I got together and talked for 4 hours, we might not remember how the conversation started. LLMs are now different, each model is capable of holding a certain amount of tokens in it’s memory for context in the conversation. Larger context windows mean that the model has better context, but that can also slowdown the models responsiveness.

What are the implications of expanding it? The model has limits to support it, and there are limits no the vRAM and the GPU support.  If you overload this.  It comes back to the support.  The benefit of cloud is that you can max out the context window and the options for exploring. 

What is Semantic Kernal? The semantic kernel involves with parts, including planner, skills, memories / context, and connectors for other steps.  This is an option for embedding AI capabilities quickly into the application.  The focus is on moving AI into existing application.  This integrates with existing code base through libraries to make this easy to implement.  This is open source, and run out of github.  aka.ms/sk/learn, and a discord channel. 

How do I get started with Semantic Kernel? This is open source and easy to find.  aka.ms/sk/samples for sample code and aka.ms/sk/sample-chat is a chat application. Another sample is aka.ms/sk/sample-book. 

So those are some of the high level terms involved with regard to LLMs and how they work. For me, knowing the key differences here is critical to understanding how these pieces can fit together to build an agentic software solution. The key to me is understanding the nature of these context windows and prompt engineering in a software solution.

I’m going to be digging a lot more into this topic and will continue to post here as I research building these solutions. But for me, this is exciting.

Leave a Reply

Your email address will not be published. Required fields are marked *