How I Built a Smarter Question Answering System Using RAG | by Taj Elkatawneh

In an age the place language fashions can generate fluent responses on virtually any matter, the problem is not nearly getting solutions. The true query is, are these solutions correct, and helpful? That’s what led me to construct my very own Retrieval-Augmented Era (RAG) system. Not as a chatbot clone, however as a centered, doc conscious query answering system. It permits a language mannequin reply utilizing essentially the most related items of context as a substitute of simply guessing. Over the previous few weeks, I’ve been engaged on implementing this and it feels helpful. I constructed a RAG system from scratch, not by following a tutorial phrase for phrase, however by figuring issues out step-by-step. I began with a easy objective: to construct a system that might reply consumer questions extra intelligently.

The Downside I Wished to Clear up

Giant Language Fashions (LLMs) are extremely good at sounding right. However they don’t really know what’s true until they’re given dependable context. I needed to repair that. As an alternative of counting on pretraining alone, I got down to construct a system that solutions questions utilizing info drawn straight from paperwork I present. I didn’t need simply any reply, I need solutions primarily based on actual paperwork, filtering for relevance, and a method to see precisely the place the reply got here from. That meant mixing retrieval and era, which is what RAG is all about.

Getting the Paperwork Prepared

Step one was loading paperwork in PDF format. For this, I used PyPDFLoader, which extracts the textual content whereas preserving metadata corresponding to filename and web page quantity. To make this textual content usable for retrieval, I then cut up it into semantically significant chunks utilizing RecursiveCharacterTextSplitter

def split_documents(paperwork):
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=100,
length_function=len,
is_separator_regex=True,
)
chunks = []
for doc in paperwork:
split_texts = text_splitter.split_text(doc["content"])
for i, chunk_content in enumerate(split_texts):
chunks.append({
"content material": chunk_content,
"metadata": {**doc["metadata"], "chunk_id": i}
})
return chunks

This ensured that when a related passage is retrieved later, I’d know precisely the place it got here from, an important a part of constructing belief within the reply.

Semantic Embeddings and Vector Storage

Subsequent, I embedded the textual content utilizing the all-mpnet-base-v2 mannequin from SentenceTransformers. I went with this one as a result of it has a very good dimension dimension, not too small or overkill, and nonetheless provides good semantic embeddings.

def get_embedding(textual content="None"):
embedding = embedding_model.encode(textual content).tolist()
return embedding

These embeddings have been saved in Pinecone, a vector database that helps actual time similarity search. This meant that when a consumer requested a query, my system may shortly establish essentially the most related chunks.


def upsert_chunks_to_pinecone(index, chunks, batch_size=100):
vectors = []
for i, chunk in enumerate(chunks):
content material = chunk["content"]
metadata = chunk.get("metadata", {})
metadata["text"] = content materialembedding = get_embedding(content material) 
vector_id = str(uuid4())
vectors.append((vector_id, embedding, metadata))
if len(vectors) == batch_size or i == len(chunks) - 1:
index.upsert(vectors=vectors)
print(f"Upserted batch ending at chunk {i + 1}")
vectors = []
print(f"All {len(chunks)} vectors upserted to Pinecone.")

Including a Layer of Security and Relevance Filtering

Earlier than doing that, although, I applied a bit of filtering layer. I didn’t need the system to reply unsafe or questions out of scope. I wrote a operate that checks for issues like violence, hate, or express content material. And for area relevance, I used a language mannequin to resolve if the query had something to do with knowledge science, AI, or linear algebra, which is the area I skilled it for.

Reply Era with Groq and LLaMA 3

Then comes the ultimate step, producing an precise reply. For that, I used Groq’s API with the Llama 3.3 70B mannequin. It’s quick, correct, and doesn’t waste time. I go within the consumer’s query, and it returns a solution that’s related to the fabric given. It additionally exhibits the precise chunks it pulled from, so I can see the place the reply is coming from. Not hallucinated nonsense.

Whats Subsequent

This venture remains to be a piece in progress. I’ve already prolonged it additional by including a light-weight internet interface, Gradio. I’m additionally trying into multi-hop retrieval for extra advanced, multi-part questions. The core concept, although, will stay the identical: to reply questions with confidence as a result of the solutions are grounded in context. Constructing this RAG system has been one of the crucial sensible and eye-opening initiatives I’ve labored on to this point.

LinkedIn

Github

Source link

Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025

Blazing-Fast ML Model Serving with FastAPI + Redis (Boost 10x Speed!) | by Sarayavalasaravikiran | AI Simplified in Plain English | Jul, 2025

From Training to Drift Monitoring: End-to-End Fraud Detection in Python | by Aakash Chavan Ravindranath, Ph.D | Jul, 2025

Revisiting Benchmarking of Tabular Reinforcement Learning Methods

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

Texas Instruments to make ‘historic’ $60bn US chip investment

British Business Investments Commits £10M to Twin Path Ventures for UK AI

The One Thing That Will Ruin Your Business Faster Than Anything Else

Our Picks

Revisiting Benchmarking of Tabular Reinforcement Learning Methods

Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025

Qantas data breach to impact 6 million airline customers

How I Built a Smarter Question Answering System Using RAG | by Taj Elkatawneh | Jun, 2025

Related Posts