Finding the right tool for the job: Visual Search for 1 Million+ Products | by Elliot Ford | Kingfisher-Technology

A picture of someone hold a smartphone using their camera to take a photo or video of a yellow drilling, showing searching results for very similar yellow drills. The reall yellow drill is visible behind the camera, and has interconnected points overlayed on top of it,

Within the Kingfisher Group AI Group, we’ve constructed and launched a Visible Search engine we name ‘LENS’ that makes use of a ‘Vector Similarity Search’ to retrieve related merchandise to a person’s video or picture. Vector Similarity Search is a key approach for our staff, since we use it throughout a spread of AI options as a constructing block for every thing from suggestion methods to Athena, our in-house generative AI orchestration platform. Understanding this system properly and realizing find out how to apply it and consider it’s a key talent for as we speak’s Machine Studying practitioners.

What’s LENS, and why construct it?

As a DIY buyer engaged on dwelling enhancements or repairs, generally you may want a alternative half for one thing in your house, or a device, however you may not know the title of it or how finest to seek for it with a conventional textual content search.

And while a tradesperson may know the precise merchandise they want on website, looking out with a photograph they’ll tackle their cellphone is far quicker than typing out all the small print of the precise product.

With our LENS Visible Search engine, customers can add their very own video or picture to seek for the product they’re on the lookout for, and the app will return a listing of related merchandise.

Two side-by-side screenshots of a phone app, the first showing a user filming a timer switch within an app search interface, and the second show a start of a list of product results, the first result being the same timer switch.

How does LENS work?

At a excessive stage, our LENS Visible Search engine works by taking a user-submitted picture or video, operating it by means of a Machine Studying mannequin to supply an ‘embedding’, looking out with this embedding for essentially the most related photos in our product catalogue, and at last returning these related merchandise to the person.

Wait, what are embeddings?

An ‘embedding’ is only a means of representing one thing (right here a picture/video, but it surely may as an alternative be textual content or different info) as a listing of numbers like this:

[0.12, 0.80, 0.52]

This sort of checklist of numbers is often known as a ‘vector’, therefore ‘Vector Similarity Search’.

For these 3 numbers, we may consider them as coordinates of a degree on a 3D graph like this:

On this case, we’ve plotted this as an arrow. This can be a frequent means of plotting vectors: a technique of defining a vector is as a route and a magnitude (you may keep in mind this from faculty maths/physics). By representing it as an arrow, the route it’s pointing is the route of the vector, whereas the size is its magnitude, so it may be useful to signify these two facets of vector clearly when plotting them.

When trying to find related gadgets, we will discover the smallest distance between factors on this graph to search out essentially the most ‘related’ gadgets. For instance, take into account these 3 factors:

A plot of 3 arrows representing vectors on a 3D graph, two are close together and one is pointing in a different direction.

p1 and p2 are fairly shut collectively:

A plot of 3 arrows representing vectors on a 3D graph, two are close together and one is pointing in a different direction. There is a yellow dotted line between the two that are close together.

This implies they’re ‘related’ to one another. Compared, p1 is sort of removed from p3:

Our Machine Studying mannequin is educated to supply embeddings which are shut to one another within the ‘embedding area’ when photos/movies are related to one another by way of the objects current, and much away within the embedding area when they’re totally different. Because of this, the space will correspond to how associated the 2 photos/movies are to one another for our use case. Different models might be educated in fairly alternative ways to realize this, however hopefully you now have an understanding of what embeddings are at a primary stage and why we will use them for Vector Similarity Search.

In actuality, these ‘embeddings’ aren’t simply 3 numbers every, they’re usually tons of and even 1000’s of numbers per embedding. This implies as an alternative of a three-dimensional area that we’re calculating a distance in, these areas have hundred of dimensions. The underlying instinct nonetheless applies although, and even the maths for calculating the space between embeddings. We standardly speak of the size of the embedding being its ‘dimension’ in the very same sense.

Storing our embeddings: vector databases

Vector Similarity Search has turn out to be more and more commoditised in the previous couple of years, partly as a result of rise of Retrieval-Augmented Generation (RAG) approaches used with Massive Language Fashions (LLMs), the place Vector Similarity Search is a core a part of RAG. RAG is now very broadly used, together with in a few of our internal AI systems built to help colleagues.

Right this moment, there are a number of suppliers and libraries for Vector Similiarty Search, and it’s more and more simple for builders to make use of when constructing options. Choices embody open supply libraries specializing in the core search algorithms, like FAISS and ANNOY, by means of to extra fully-featured production-ready choices like Qdrant and Pinecone, which each supply absolutely hosted options to care for as a lot as potential for you, whilst you concentrate on fixing your online business drawback. Vector databases are extremely commoditized software program and the very best place to begin selecting a platform is by wanting by means of the varied detailed benchmarks.

At Kingfisher Group AI, our vector database of selection is Google’s Vector Search, which offers extremely aggressive search speeds utilizing the identical know-how used to energy Google Search and YouTube. Google have shared multiple pieces of impressive research describing their ‘ScaNN’ algorithm and benchmarking its spectacular efficiency:

A graph showing different vector search algorithms. On the x axis is queries/sec, on the y axis is indexing speed and the size of dots represents their memory usage. ScaNN is alone is the top right corner, with a small dot. — Graph reproduced from Google’s blog on their ‘SOAR’ improvement to ScaNN

This graph reveals how Google’s ScaNN algorithm performs properly throughout 3 totally different standards. The additional proper you get on the graph, the quicker the vector search is at runtime when retrieving outcomes for patrons. The upper up on the graph, the faster it’s to construct the index that permits quick looking out. The smaller the dot, the much less RAM the vector database makes use of on an ongoing foundation, which saves on operating prices. As you possibly can see, whereas there are different choices evaluated which are equally good or higher on one in all these three counts, not one of the different algorithms in contrast are near Google’s ScaNN on all 3 standards, because it lies in an space of its personal on the graph.

Along with the spectacular efficiency of Google’s Vector Seach, one other consideration in our determination was the truth that we use Google Cloud as our main cloud supplier, and by staying inside Google Cloud, we will preserve issues like infrastructure administration, runtime monitoring/debugging, networking and authentication easy. Given we use Vector Search pretty closely throughout a spread of use circumstances, competitive pricing can also be a key think about our selection.

Extra not too long ago, Google’s AlloyDB has added support for ScaNN indices, which additionally makes this a viable various, with the engaging proposition of having the ability to simply retailer each our vectors and metadata (like product names and classes) collectively. That is an possibility we’ll be investigating additional in future.

Populating our vector database

So, we’ve chosen Google Vector Search as our vector database. Subsequent, we have to populate this vector database with embeddings of product photos from our catalogue. That means, we will search our vector database to search out related merchandise from our catalogue.

We have to perform this course of regularly as new merchandise are added to {the catalogue}, or outdated merchandise cease being bought. To run this in a repeatable and traceable means, we use Google Cloud’s Vertex AI Pipelines, which we’ve mentioned beforehand, each by way of how its serverless nature meets our needs and how we approach the developer experience.

Our pipeline for updating the LENS Visible Search index has the next steps:

A digram of an embedding pipeline, covering this chain of steps: 1. Retrieve Active Screwfix Product Data 2. Download Product Images 3. Generate ML Embeddings 4. Format for Google Vector Search 5. Update Vertex AI Vector Search Index 6. Store Metadata in PostgreSQL

Internet hosting our Machine Studying mannequin at runtime

In fact, in addition to processing our product catalogue to supply embeddings, we additionally have to convert user-submitted movies and pictures into embeddings at runtime utilizing our Machine Studying mannequin, which implies internet hosting and deploying our mannequin.

At the moment, we use Torchserve, which has for a number of years been a normal means of deploying PyTorch-based Machine Studying fashions, offering options like multi-model serving, built-in batched inference, GPU assist and others. Sadly, it has been introduced comparatively not too long ago that Torchserve is not actively maintained. We’re investigating potential replacements comparable to NVIDIA Triton Inference Server and Litserve. This goes to indicate the altering panorama of Machine Studying instruments, and the expectation of ongoing maintainence and dependency modifications for manufacturing options. You may learn extra about our ideas on internet hosting reside GPU-based inference here as a part of our broader AI deployment technique.

Our runtime Visible Search API

We deploy our runtime LENS Visible Search API as a FastAPI deployment, utilizing our inner APIHandler wrapper, described as a part of our general deployment strategy here. The important thing FastAPI endpoint for LENS carries out the next steps:

Validate the request comprises picture/video information as anticipated
Load the picture/video information based on its format
Preprocess the picture/video as anticipated by the machine studying mannequin
Ship a request to the Torchserve API to run the picture/video by means of the mannequin and get an embedding
Use the obtained embedding to look in opposition to the Google Vector Search index to get related merchandise
Return the discovered related merchandise again to the requester

Why separate the internet hosting of this ultimate api (in FastAPI) from the mannequin internet hosting (in Torchserve)? Since we’re operating our mannequin by means of torchserve on a GPU, which is comparatively costly, we wish to maximize utilization of the GPU, and the general course of above leads to quantity of CPU utilization, and I/O work like receiving and sending HTTP requests. Because of this, we will use sources extra effectively by separating the 2 out. It additionally means utilizing each FastAPI and Torchserve for duties they’re good at and in a means that’s simply comprehensible to builders accustomed to these instruments. Torchserve is nice at mannequin serving, however not designed for orchestrating a number of steps as above in an asynchronous method, which FastAPI is a lot better suited to.

Evaluating our search’s efficiency

To guage how properly our LENS Visible Search is definitely performing, a pure beginning place is so-called ‘offline evaluation’ metrics. That is the place we accumulate a ‘check set’ of movies and pictures that we imagine are consultant of the sorts of movies/photos customers might submit in searches, and annotate them with our expectations of what good outcomes can be. This permits us to calculate metrics for a way properly our Visible Search is performing. Crucially, it additionally permits us to trace this over time — if we wish to experiment with a brand new change that we hope will enhance the system general, we will run this analysis course of to see if it does certainly appear to enhance issues.

Since it is a course of we wish to run pretty repeatedly, and one thing we wish traceability for, we once more use Vertex AI Pipelines:

A diagram of a pipeline, showing at the top an ‘importer’ step and a ‘load-ground-truth-datastep’ step feeding in to an ‘evaluate-model’ step, which leads to a ‘record-metrics’ step.

The above diagram reveals the important thing components of our analysis pipeline: we load our analysis information and “floor reality” (our expectations of what ‘good’ outcomes are), then run our analysis information by means of the mannequin to judge it, after which calculate and document our metrics.

Nonetheless, utilizing offline metrics alone has a few points. Firstly, accumulating this check set together with our expectations of excellent outcomes is a time-consuming course of. Secondly, it’s possible that prospects will use the Visible Search engine for a considerably totally different mixture of merchandise than makes up our check set.

For instance, think about that plumbing merchandise make up 25% of our check set, like this:

a grid of 8 product pictures of DIY products. 2 are plumbing related, the rest are things like a drill, lawnmower, screwdriver, bin. — A instance of a potential mixture of merchandise in a check set

however in actuality, maybe DIY prospects battle to determine plumbing merchandise particularly, so really greater than 60% of searches are for plumbing merchandise, and a typical mixture of merchandise looked for may look extra like this:

a grid of 8 product pictures of DIY products. 5are plumbing related, the rest are a drill, lawnmower, screwdriver. — An instance of a potential mixture of merchandise that prospects are literally trying to find

If a few of these plumbing merchandise are a weak level for our system, we is likely to be overestimating how properly we’re performing general.

Even facets of the video just like the lighting on the product and the background may have an effect on outcomes (though after all we attempt to construct our system to be sturdy to modifications in lighting and background).

Consequently, there’s a danger that once we consider the system utilizing our check set, the system seems to carry out rather well, however we don’t know for positive if the efficiency on the check set interprets to the system working equally as properly for the movies and pictures our prospects will really wish to search with.

Happily, there’s one other sort of metric we will accumulate. As soon as the answer is in manufacturing for patrons (or a part of an A/B test), we will additionally calculate ‘on-line’ metrics like Click-Through Rate, which assist us handle these deficiencies of offline metrics. For instance, if we all know {that a} buyer clicked on a product web page of one of many outcomes, and in the end ended up buying that product, that’s a robust sign that we’ve discovered related merchandise that remedy their drawback. This helps us confirm that efficiency on our check set really pertains to our prospects getting actual worth out of the device. Moreover, this permits us to gather new info on how properly the system is performing far more cheaply than us assembling a (bigger) check set and annotating it, since its primarily based on buyer behaviour at runtime relatively than work we have to perform throughout the AI staff.

Utilizing each offline and on-line metrics helps us get as clear an image as potential of how properly our LENS Visible Search engine is working for patrons, and permits us to validate potential enhancements to the system in future.

Placing AI instruments to work for our prospects

Each day, tradespeople and DIY fans alike are attempting to get the components and instruments they want as shortly and simply as potential to allow them to concentrate on the job at hand. LENS offers Kingfisher’s prospects with one other simple means of discovering what they’re on the lookout for with only a video or picture.

Behind that easy person expertise, we’re utilizing subtle but sturdy and dependable AI instruments. Constructing on high of production-ready parts together with Google’s Vector Search and Vertex AI Pipelines permits us to concentrate on tailoring our resolution to our prospects wants.

In fact, assembly our prospects wants means understanding how our resolution performs, and by combing each offline and on-line metrics, we might be assured we’re serving to our prospects discover what they’re on the lookout for.

LENS is powered by the identical vector similarity search utilized in our suggestion engines and Athena platform. By constructing our AI methods on sturdy, secure foundations, we guarantee we concentrate on a very powerful factor: making our prospects’ lives simpler and empowering them to search out options to their issues.

If you’re enthusiastic about becoming a member of us on our journey, please try our careers page.

Source link

Current Landscape of Artificial Intelligence Threats | by Kosiyae Yussuf | CodeToDeploy : The Tech Digest | Aug, 2025

Optimizing ML Costs with Azure Machine Learning | by Joshua Fox | Aug, 2025

Top Tools and Skills for AI/ML Engineers in 2025 | by Raviishankargarapti | Aug, 2025

TikTok to lay off hundreds of UK content moderators

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

How to Make Your LLM More Accurate with RAG & Fine-Tuning

These 5 free AI-powered Chrome extensions make Gmail so much better

The MCP Security Survival Guide: Best Practices, Pitfalls, and Real-World Lessons

Our Picks

TikTok to lay off hundreds of UK content moderators

People Really Only Care About These 3 Things at Work — Do You Offer Them?

Can Machines Really Recreate “You”?

Finding the right tool for the job: Visual Search for 1 Million+ Products | by Elliot Ford | Kingfisher-Technology | Jul, 2025

Related Posts