Retrieval Augmented Generation in SQLite

This is the second in a two-part collection on utilizing SQLite for Machine Learning. In my last article, I dove into how SQLite is quickly changing into a production-ready database for net functions. On this article, I’ll talk about the best way to carry out retrieval-augmented-generation utilizing SQLite.

Should you’d like a customized net software with generative AI integration, go to losangelesaiapps.com

The code referenced on this article could be discovered here.

Once I first realized the best way to carry out retrieval-augmented-generation (RAG) as a budding information scientist, I adopted the conventional path. This normally seems one thing like:

Google retrieval-augmented-generation and search for tutorials
Discover the most well-liked framework, normally LangChain or LlamaIndex
Discover the most well-liked cloud vector database, normally Pinecone or Weaviate
Learn a bunch of docs, put all of the items collectively, and success!

In actual fact I truly wrote an article about my expertise constructing a RAG system in LangChain with Pinecone.

There may be nothing terribly improper with utilizing a RAG framework with a cloud vector database. Nonetheless, I might argue that for first time learners it overcomplicates the scenario. Do we actually want a complete framework to learn to do RAG? Is it essential to carry out API calls to cloud vector databases? These databases act as black bins, which isn’t good for learners (or frankly for anybody).

On this article, I’ll stroll you thru the best way to carry out RAG on the only stack doable. In actual fact, this ‘stack’ is simply Sqlite with the sqlite-vec extension and the OpenAI API to be used of their embedding and chat fashions. I like to recommend you re ad part 1 of this collection to get a deep dive on SQLite and the way it’s quickly changing into manufacturing prepared for net functions. For our functions right here, it is sufficient to perceive that SQLite is the only sort of database doable: a single file in your repository.

So ditch your cloud vector databases and your bloated frameworks, and let’s do some RAG.

SQLite-Vec

One of many powers of the SQLite database is using extensions. For these of us aware of Python, extensions are quite a bit like libraries. They’re modular items of code written in C to increase the performance of SQLite, making issues that have been as soon as unattainable doable. One fashionable instance of a SQLite extension is the Full-Text Search (FTS) extension. This extension permits SQLite to carry out environment friendly searches throughout giant volumes of textual information in SQLite. As a result of the extension is written purely in C, we are able to run it anyplace a SQLite database could be run, together with Raspberry Pis and browsers.

On this article I shall be going over the extension often known as sqlite-vec. This provides SQLite the facility of performing vector search. Vector search is much like full-text search in that it permits for environment friendly search throughout textual information. Nonetheless, quite than seek for a precise phrase or phrase within the textual content, vector search has a semantic understanding. In different phrases, looking for “horses” will discover matches of “equestrian”, “pony”, “Clydesdale”, and many others. Full-text search is incapable of this.

sqlite-vec makes use of digital tables, as do most extensions in SQLite. A digital desk is much like an everyday desk, however with further powers:

Customized Information Sources: The info for the standard desk in SQLite is housed in a single db file. For a digital desk, the info could be housed in exterior sources, for instance a CSV file or an API name.
Versatile Performance: Digital tables can add specialised indexing or querying capabilities and assist advanced information varieties like JSON or XML.
Integration with SQLite Question Engine: Digital tables combine seamlessly with SQLite’s commonplace question syntax e.g. SELECT , INSERT, UPDATE, and DELETE choices. In the end it’s as much as the writers of the extensions to assist these operations.
Use of Modules: The backend logic for the way the digital desk will work is applied by a module (written in C or one other language).

The everyday syntax for making a digital desk seems like the next:

CREATE VIRTUAL TABLE my_table USING my_extension_module();

The essential a part of this assertion is my_extension_module(). This specifies the module that shall be powering the backend of the my_table digital desk. In sqlite-vec we are going to use the vec0 module.

Code Walkthrough

The code for this text could be discovered here. It’s a easy listing with the vast majority of information being .txt information that we’ll be utilizing as our dummy information. As a result of I’m a physics nerd, the vast majority of the information pertain to physics, with only a few information referring to different random fields. I cannot current the total code on this walkthrough, however as a substitute will spotlight the essential items. Clone my repo and mess around with it to research the total code. Beneath is a tree view of the repo. Be aware that my_docs.db is the single-file database utilized by SQLite to handle all of our information.

.

├── information

│   ├── cooking.txt

│   ├── gardening.txt

│   ├── general_relativity.txt

│   ├── newton.txt

│   ├── personal_finance.txt

│   ├── quantum.txt

│   ├── thermodynamics.txt

│   └── journey.txt

├── my_docs.db

├── necessities.txt

└── sqlite_rag_tutorial.py

Step 1 is to put in the required libraries. Beneath is our necessities.txt file. As you possibly can see it has solely three libraries. I like to recommend making a digital surroundings with the most recent Python model (3.13.1 was used for this text) after which working pip set up -r necessities.txt to put in the libraries.

# necessities.txt

sqlite-vec==0.1.6

openai==1.63.0

python-dotenv==1.0.1

Step 2 is to create an OpenAI API key when you don’t have already got one. We shall be utilizing OpenAI to generate embeddings for the textual content information in order that we are able to carry out our vector search.

# sqlite_rag_tutorial.py

import sqlite3

from sqlite_vec import serialize_float32

import sqlite_vec

import os

from openai import OpenAI

from dotenv import load_dotenv

# Arrange OpenAI shopper

shopper = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

Step 3 is to load the sqlite-vec extension into SQLite. We shall be utilizing Python and SQL for our examples on this article. Disabling the power to load extensions instantly after loading your extension is an efficient safety apply.

# Path to the database file

db_path="my_docs.db"

# Delete the database file if it exists

db = sqlite3.join(db_path)

db.enable_load_extension(True)

sqlite_vec.load(db)

db.enable_load_extension(False)

Subsequent we are going to go forward and create our digital desk:

db.execute('''

   CREATE VIRTUAL TABLE paperwork USING vec0(

       embedding float[1536],

       +file_name TEXT,

       +content material TEXT

   )

''')

paperwork is a digital desk with three columns:

sample_embedding : 1536-dimension float that can retailer the embeddings of our pattern paperwork.
file_name : Textual content that can home the identify of every file we retailer within the database. Be aware that this column and the next have a + image in entrance of them. This means that they’re auxiliary fields. Beforehand in sqlite-vec solely embedding information could possibly be saved within the digital desk. Nonetheless, just lately an update was pushed that enables us so as to add fields to our desk that we don’t actually need embedded. On this case we’re including the content material and identify of the file in the identical desk as our embeddings. This can permit us to simply see what embeddings correspond to what content material simply whereas sparing us the necessity for additional tables and JOIN statements.
content material : Textual content that can retailer the content material of every file.

Now that now we have our digital desk arrange in our SQLite database, we are able to start changing our textual content information into embeddings and storing them in our desk:

# Perform to get embeddings utilizing the OpenAI API

def get_openai_embedding(textual content):

   response = shopper.embeddings.create(

       mannequin="text-embedding-3-small",

       enter=textual content

   )

   return response.information[0].embedding

# Iterate over .txt information within the /information listing

for file_name in os.listdir("information"):

   file_path = os.path.be part of("information", file_name)

   with open(file_path, 'r', encoding='utf-8') as file:

       content material = file.learn()

       # Generate embedding for the content material

       embedding = get_openai_embedding(content material)

       if embedding:

           # Insert file content material and embedding into the vec0 desk

           db.execute(

               'INSERT INTO paperwork (embedding, file_name, content material) VALUES (?, ?, ?)',

               (serialize_float32(embedding), file_name, content material)

# Commit adjustments

db.commit()

We basically loop by every of our .txt information, embedding the content material from every file, after which utilizing an INSERT INTO assertion to insert the embedding, file_name, and content material into paperwork digital desk. A commit assertion on the finish ensures the adjustments are persevered. Be aware that we’re utilizing serialize_float32 right here from the sqlite-vec library. SQLite itself doesn’t have a built-in vector kind, so it shops vectors as binary giant objects (BLOBs) to avoid wasting house and permit quick operations. Internally, it makes use of Python’s struct.pack() operate, which converts Python information into C-style binary representations.

Lastly, to carry out RAG, you then use the next code to do a Ok-Nearest-Neighbors (KNN-style) operation. That is the center of vector search.

# Carry out a pattern KNN question

query_text = "What's normal relativity?"

query_embedding = get_openai_embedding(query_text)

if query_embedding:

   rows = db.execute(

       """

       SELECT

           file_name,

           content material,

           distance

       FROM paperwork

       WHERE embedding MATCH ?

       ORDER BY distance

       LIMIT 3

       """,

       [serialize_float32(query_embedding)]

   ).fetchall()

   print("High 3 most related paperwork:")

   top_contexts = []

   for row in rows:

       print(row)

       top_contexts.append(row[1])  # Append the 'content material' column

We start by taking in a question from the person, on this case “What’s normal relativity?” and embedding that question utilizing the identical embedding mannequin as earlier than. We then carry out a SQL operation. Let’s break this down:

The SELECT assertion means the retrieved information can have three columns: file_name, content material, and distance. The primary two now we have already talked about. Distance shall be calculated throughout the SQL operation, extra on this in a second.
The FROM assertion ensures you might be pulling information from the paperwork desk.
The WHERE embedding MATCH ? assertion performs a similarity search between all the vectors in your database and the question vector. The returned information will embody a distance column. This distance is only a floating level quantity measuring the similarity between the question and database vectors. The upper the quantity, the nearer the vectors are. sqlite-vec supplies a number of choices for the best way to calculate this similarity.
The ORDER BY distance makes certain to order the retrieved vectors in descending order of similarity (excessive -> low).
LIMIT 3 ensures we solely get the highest three paperwork which can be nearest to our question embedding vector. You possibly can tweak this quantity to see how retrieving kind of vectors impacts your outcomes.

Given our question of “What’s normal relativity?”, the following paperwork have been pulled. It did a fairly good job!

High 3 most related paperwork:

(‘general_relativity.txt’, ‘Einstein’s concept of normal relativity redefined our understanding of gravity. As a substitute of viewing gravity as a pressure performing at a distance, it interprets it because the curvature of spacetime round large objects. Gentle passing close to an enormous star bends barely, galaxies deflect beams touring thousands and thousands of light-years, and clocks tick at totally different charges relying on their gravitational potential. This groundbreaking concept led to predictions like gravitational lensing and black holes, phenomena later confirmed by observational proof, and it continues to information our understanding of the cosmos.’, 0.8316285610198975)

(‘newton.txt’, ‘In classical mechanics, Newton’s legal guidelines of movement type the muse of how we perceive the motion of objects. Newton’s first legislation, usually known as the legislation of inertia, states that an object at relaxation stays at relaxation and an object in movement continues in movement until acted upon by an exterior pressure. This idea extends into extra advanced physics issues, the place analyzing internet forces on objects permits us to foretell their future trajectories and behaviors. Over time, making use of Newton’s legal guidelines has enabled engineers and scientists to design safer automobiles, extra environment friendly machines, and even information spacecraft by intricate gravitational fields.’, 1.2036118507385254)

(‘quantum.txt’, ‘Quantum mechanics revolutionized our understanding of the microscopic world. Not like classical particles, quantum entities corresponding to electrons can exhibit each wave-like and particle-like behaviors. Phenomena like quantum superposition counsel that particles can exist in a number of states directly, and the act of measurement usually “collapses” these states into one noticed end result. This unusual and counterintuitive concept underpins fashionable applied sciences like semiconductors and lasers, and it supplies a conceptual framework for rising fields like quantum computing and cryptography.’, 1.251380205154419)

We are able to then stuff the context of the mannequin with these three paperwork and have it try to reply our query.

# Put together the context for the question

context = "nn".be part of(top_contexts)

system_message = "You're a useful assistant. Use the next context to reply the question."

# Ship question and context to OpenAI

attempt:

   completion = shopper.chat.completions.create(

       mannequin="gpt-4o-mini",

       messages=[

           {"role": "system", "content": system_message},

           {"role": "user", "content": f"Context: {context}nnQuery: {query_text}"}

       ]

   )

   print("Response:")

   print(completion.selections[0].message.content material)

besides Exception as e:

   print(f"Error producing response: {e}")

Here’s what the mannequin stated:

Basic relativity is a concept developed by Albert Einstein that redefines our understanding of gravity. As a substitute of viewing gravity as a pressure performing at a distance, normal relativity interprets it because the curvature of spacetime attributable to the presence of mass. In accordance with this concept, large objects like stars and planets create a distortion within the material of spacetime, and this curvature impacts the movement of different objects, together with gentle, which may bend when it passes close to an enormous physique.

That is faithfully sticking to the paperwork we gave the mannequin. Nice job 4o-mini!

Conclusion

sqlite-vec is a venture sponsored by the Mozilla Builders Accelerator program, so it has some important backing behind it. Have to provide an enormous due to Alex Garcia, the creator of sqlite-vec , for serving to to push the SQLite ecosystem and making ML doable with this straightforward database. It is a nicely maintained library, with updates coming down the pipeline frequently. As of November twentieth, they even added filtering by metadata! Maybe I ought to re-do my aforementioned RAG article utilizing SQLite 🤔.

The extension additionally gives bindings for a number of fashionable programming languages, together with Ruby, Go, Rust, and extra.

The truth that we’re capable of radically simplify our RAG pipeline to the naked necessities is outstanding. To recap, there isn’t any want for a database service to be spun up and spun down, like Postgres, MySQL, and many others. There is no such thing as a want for API calls to cloud distributors. Should you deploy to a server instantly through Digital Ocean or Hetzner, you possibly can even keep away from costly and unnecessary complexity related to managed cloud providers like AWS, Azure, or Vercel.

I imagine this straightforward structure can work for quite a lot of functions. It’s cheaper to make use of, simpler to keep up, and quicker to iterate on. When you attain a sure scale it is going to probably make sense emigrate to a extra sturdy database corresponding to Postgres with the pgvector extension for RAG capabilities. For extra superior capabilities corresponding to chunking and doc cleansing, a framework will be the proper alternative. However for startups and smaller gamers, it’s SQLite to the moon.

Have enjoyable attempting out sqlite-vec for your self!

Source link

Is Reading Dead? Why Gen Z Prefers AI Voices Over Books

From Pixels to Perfect Replicas

AI Twin Generator from Image (Unfiltered): My Experience

Why Netflix Seems to Know You Better Than Your Friends | by Rahul Mishra | Coding Nexus | Aug, 2025

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

End-to-End AWS RDS Setup with Bastion Host Using Terraform

Streamlit Made Simple: Building Interactive Dashboards for Data Science | by Harshit Kandoi | Feb, 2025

Equilibrium in the Embedding Space: When Novelty Becomes Familiar | by Faruk Alpay | Jul, 2025

Our Picks

Why Netflix Seems to Know You Better Than Your Friends | by Rahul Mishra | Coding Nexus | Aug, 2025

EdgeConneX and Lambda to Build AI Factory Infrastructure in Chicago and Atlanta

French streamer’s death ‘not traumatic’, autopsy finds

Retrieval Augmented Generation in SQLite

SQLite-Vec

Code Walkthrough

Conclusion

Related Posts