JackGPT - so why don't you tell me about myself?

Published September 16, 2024Return to Blog Page

Let's say you want to know a little bit about me. You could go and read my CV - but that's not much much fun. What if I could be on hand to answer any questions you had 24/7? I can't...but AI can!

JackGPT is my personal chatbot assistant that will answer questions about me by using Retrieval Augmented Generation (we will call it RAG) to give it the information it needs to answer said question. How this works (and how it could work better) will be answered in this post. We will also look at some of the challenges faced when using Generative AI for this and similar use cases.

I - How it works

Most of us have probably used ChatGPT by now, whether that's to write an E-Mail, fix a bug in our code, or tell us what to have for dinner. Broadly speaking, it is very good at giving answers to anything that is common knowledge, or at least common enough knowledge that it was trained on this information. GPT is not connected to the internet, but other models like Google's Gemini are and therefore provide even more up to date information. Unfortunately, I am not (yet) famous enough to be common knowledge to these models

Gemini isn't sure who I'm talking about when I ask it about me

We therefore need to help the AI by giving it some of the information that it needs to answer the question. This is very similar to when, if we ask AI about some code, we paste the relevant file into the query box so that it can help us with our specific problem. The problem is, we want to do this automatically, and we want to only give it the relevant information that it needs.

RAG Magic (or just vector algebra)

This is where our good friend RAG comes in. Retrieval Augmented Generation allows us to retrieve relevant information that I have stored about me, and give it to the GPT model to help answer the question. The first thing we need is relevant information about me, and the very site you are reading this on is a great place to start!

Every time I push my site to main on GitHub (i.e. when the main content on my site changes), I have set up a workflow that will run an ingestion script on Python. In simple terms, it takes all the text information on my website, carries out a small amount of cleaning and splitting it up, and finally uploads it to my vector store.

GitHub actions workflow for ingesting my website data to a vector store

Vectors are numbers though...right? Right. Numbers are much better for doing rapid calculations efficiently than words, and in this case we use this to our advantage. We use an embeddings model to convert our information into these vectors. Then, when we ask JackGPT a question, it will turn this question into a vector and compare this vector with all the vectors in my database. The closer these vectors are together, the higher similarity score we give them. We can then send only the most similar vectors (turned back into real words, of course) to the GPT model to help answer our question.

I don't want to go into huge technical detail here, so the key takeaway here is that we can take lots of information about me and return only the relevant pieces to GPT to answer our question. This is the same as pasting only our code file instead of our entire codebase to try and fix a bug in our code. If you do want to dive deeper into the maths, I enjoy this article from Pinecone about vector similarity. I also think their deep dive into RAG is very easy to grasp too (not an ad, but I did learn in part by reading Pinecone docs!).

Now we've given it the information it needs (hopefully), we also need to make sure we give it instructions on how it should answer the question.

I'm a software engineer not a prompt engineer!

I think primary school teachers would make incredible prompt engineers. Why? In my experience, asking AI to do something is a bit like teaching a young child to do something. You need to spell out, in no uncertain terms:

What do you want me to do?
How do you want me to do it?
What do you NOT want me to do?
How should I return all this back to you?

This can all be pretty tedious, especially seeing as most software engineers I know aren't big fans of creative writing. It can also be hard to objectively measure the success of your prompting (more on that later) since it is such a qualitative variable.

We can dive deeper into the art of prompt engineering another time, it is incredibly useful even if you are using ChatGPT, but until then a list of short and clear instructions are a great place to start. We call this set of instructions a 'System Prompt', and an example might look like this:

You are an AI assistant who is an expert in answering questions about Jack Woods. Your job is to respond to a user's question about Jack using the information provided below. If you don't know the answer, just say that you don't know. Do not make up the answer. You should only answer questions about Jack. Do NOT answer any questions not relevant to Jack. If someone asks an irrelevant question, let them know that you can only answer questions about Jack. Use three sentences maximum and keep the answer concise.

This is quite the mouthful, and could probably be optimised, but the main thing is it works! Note that there is a particular focus on what the AI shouldn't be answering - more on that next.

II - Challenges

Do you really need to go to all this trouble?

It seems like an awful lot of work to build an ingestion pipeline and RAG architecture for a CV and a few extra details, and in reality it probably is! When the set of information is relatively small, there are diminishing returns to using this approach since we could probably just give the AI all the information at once anyway. Of course, this will make the call more expensive (the more text we send to the API, the more it costs us), but it could make it faster too!

When we have large sets of information, such as detailed documentation or regulatory texts, RAG can be incredibly effective at only returning us the information we want in an efficient manner. It still relies heavily on your chunked vectors being of high enough quality so that we return relevant information, and that process is a whole art of its own. Nonetheless, it still feels a little bit magic when the AI seems to know what you are talking about when you ask it a specific question!

What if I want to ask it about where I should go on holiday?

I, personally, would recommend Krakow. It's a beautiful city that is inexpensive and full of culture. I don't really want JackGPT to answer that question for you though...

The issue with a GPT model is that it will answer questions about, well, pretty much anything (like ChatGPT). This is why we have to make sure that our system prompt is robust and will not allow you to use it for anything other than an intended use case. This is particularly important when we are being billed for our usage, and some people would love free access to a chatbot that would fix their code for them!

Even Amazon can't seem to guardrail their chatbots!

There are several amusing instances of even large corporations tripping over themselves when it comes to building AI interfaces where the user can ask anything - I will definitely be covering this soon.

Part of the reason why this keeps happening is because it is so hard to measure and cover all use cases. After all, when people can ask your chatbot anything - you are at the mercy of people on the internet and whatever they come up with. Whereas with code the result tends to be binary, pass or fail, AI can give us a variety of responses depending on what exactly we give it to work with. This will continue to be a challenge of developing AI applications for consumers, one that tools like Langsmith (and many others) are looking to begin to solve.

Wrapping Up

Generative AI took, and is still taking, the world by storm. Playing around with it building applications like this show how much even a relatively basic implementation can change the way users interact with machines. The idea of doing this on my website even two years ago would be practically unthinkable, and this really highlights the pace of change we are seeing in this space.

You can also apply this basic RAG architecture to many other use case. Any scenario where you want to take a large store of information and use specific pieces of it to help AI answer questions will follow the same structure. This is only one tool in the Gen AI toolbox, but JackGPT ends up making pretty good use of it.