From Pixels to Plots | Towards Data Science

Turning tedious lab work shortly into actionable insights

Throughout my time as a Physics pupil, manually extracting and analysing experimental measurements was typically an unavoidable and irritating a part of Physics labs. Studying values from devices, writing them down, transferring them into spreadsheets, and eventually plotting the outcomes was gradual, repetitive, and error-prone.

Now that I work in Generative AI, I questioned: Why not automate this with AI?

This led me to construct AI-OCR, an open-source prototype that makes use of AI to extract numerical knowledge from photographs and switch it into insightful plots. The method of extracting textual content or numbers from photographs is usually known as Optical Character Recognition (OCR) – therefore the title for this undertaking.

The way it works:

Add photographs of measurements (or structured PDFs like monetary reviews)
Immediate the AI to extract particular values right into a clear DataFrame
Immediate the AI to generate visualisations like time collection, histograms, scatter plots, and many others.

By automating what was once tedious, AI-OCR helps cut back guide work whereas additionally breaking free from vendor lock-in. In lots of lab and industrial environments, even digital knowledge typically lives in proprietary codecs, requiring costly and/or restrictive software program for entry and evaluation. With AI-OCR, you’ll be able to merely {photograph} the measurements, extract the info from the picture, and analyse in addition to visualise the outcomes with a easy immediate.

Whereas conceived with simplifying lab workflows in thoughts, the instrument’s functions lengthen far past science. From monitoring well being metrics to analysing utility payments or monetary statements, AI-OCR can assist a variety of on a regular basis knowledge duties.

On this article, I’ll stroll by way of:

Actual-world use instances for the prototype
A breakdown of the way it works beneath the hood
Challenges, limits, and trade-offs encountered
Potential approaches for additional growth

Sensible use instances: The place AI-OCR shines

Since I now not work in a physics lab and I sadly don’t have one in my basement, I used to be not in a position to take a look at AI-OCR in its initially meant setting. As an alternative, I found a number of on a regular basis use instances the place this prototype proved to be surprisingly useful.

On this part, I’ll stroll by way of 4 real-world examples. I used AI-OCR to extract numerical knowledge from on a regular basis photographs/paperwork like those within the picture beneath and generate significant plots with minimal effort. For every of those use instances, I used OpenAI’s API to the GPT-4.1 mannequin for each the OCR and the info visualisation (extra technical particulars in part 3).

4 real-world examples the place AI-OCR can be utilized. Picture by creator.

Blood stress monitoring

On this first use case, I used AI-OCR to trace my blood stress and coronary heart price all through the day. You may see a full demonstration of this use case within the following video:

🎥 https://youtu.be/pTk9RgQ5SkM

Right here is how I used it in follow:

I recorded my blood stress roughly each half-hour by taking pictures of the monitor’s show.
I uploaded the photographs and prompted the AI to extract: systolic stress, diastolic stress, and coronary heart price.
AI-OCR returned a pandas.DataFrame with the extracted values, timestamped utilizing the picture metadata.
Lastly, I requested the AI to plot systolic and diastolic stress as a time collection, together with horizontal strains indicating customary wholesome ranges, in addition to the center price in a separate subplot.

AI-generated plot of my blood stress and coronary heart price over the course of a day. *Picture by creator*.

The consequence? A visible overview of my (barely elevated) blood stress fluctuations all through the day, with a transparent drop after lunch at 1PM. What’s notably encouraging is that the plot doesn’t present any apparent outliers, a superb sanity test that signifies the AI extracted the values accurately from the photographs.

Most fashionable blood stress screens solely retailer a restricted variety of readings internally. The system I used, for instance, can maintain as much as 120 values. Nevertheless, many inexpensive fashions (like mine) don’t assist knowledge export. Even after they do, they typically require proprietary apps, locking your well being knowledge into closed ecosystems. As you’ll be able to see, this isn’t the case right here.

Physique weight monitoring

In one other health-related use case, I used AI-OCR to trace my physique weight over a number of weeks throughout a private weight- loss effort.

Historically, you may weigh your self and manually enter the consequence right into a health app. Some fashionable scales supply synchronisation through Bluetooth, however once more the info is usually locked inside proprietary apps. These apps sometimes restrict each knowledge entry and the sorts of visualisations you’ll be able to generate, making it troublesome to really personal or analyse your personal well being knowledge.

With AI-OCR, I merely took a photograph of my scale studying each morning. For somebody who just isn’t precisely a morning individual, it felt far simpler than fidgeting with an app earlier than my breakfast tea. As soon as I had a batch of photographs, I uploaded them and requested AI-OCR to extract the burden values and generate a time collection plot of my weight.

AI-generated plot of my physique weight over time along with a linear regression. *Picture by autho*r

From the ensuing graph, you’ll be able to see that I misplaced round 3 kg over roughly two month. I additionally requested the AI to carry out a linear regression, estimating a weight reduction price of ~0.4 kg/week. With this method, the consumer has full management over the evaluation: I can ask the AI to generate a development line, estimate my weight reduction price, or apply any customized logic I want.

Monetary knowledge evaluation

AI-OCR is not only helpful for well being monitoring. It may additionally assist make sense of your private funds. In my case, I discovered that the analytics offered by my brokerage app supplied solely primary summaries of my portfolio and sometimes missed key insights about my funding technique. Some numbers have been even inaccurate or incomplete.

One instance: after shifting my portfolio to a brand new brokerage, I wished to confirm that my buy-in values have been transferred accurately. This may be cumbersome, particularly when shares are accrued over time by way of financial savings plans or a number of partial purchases. Doing this manually would imply digging by way of many PDFs, copying numbers into spreadsheets, and double-checking formulation, all of which is time-consuming and error-prone.

AI-OCR automated the whole workflow. I uploaded all of the PDF buy confirmations from my earlier dealer and prompted the AI to extract share title, nominal worth, and buy value. Within the second step, I requested it to compute the buy-in values for every share and generate a bar plot of the outcomes. Within the immediate I defined find out how to calculate the buy-in worth:

“Purchase-in worth = share value × nominal worth, normalized over complete nominal worth.”

The generated plot let me shortly spot inconsistencies within the switch of the purchase in-values. In actual fact, this plot allowed me to catch just a few errors within the numbers from my new brokerage app.

Equally, you’ll be able to immediate AI-OCR to calculate realised positive factors or losses over time, based mostly in your transaction historical past. This can be a metric my brokerage app doesn’t even present.

Electrical energy metre readings

For the ultimate use case, I’ll reveal how I digitised and tracked my electrical energy consumption utilizing this prototype.

Like many older homes in Germany, mine nonetheless makes use of an analogue electrical energy metre, which makes each day monitoring practically unimaginable utilizing fashionable (digital) know-how. If I wish to analyse consumption over a time interval, I’ve to learn the metre manually originally and finish of the interval. Then I need to repeat this for every interval/day. Doing this over a number of days shortly turns into mundane and error-prone.

As an alternative, I photographed the metre (virtually) each day for just a few weeks and uploaded the photographs to AI-OCR. With two easy prompts, the instrument extracted the readings and generated a time-series plot of my cumulative electrical energy consumption in kWh.

AI-generated plot of my cumulative electrical energy consumption over time. *Picture by creator*.

The plot reveals a usually linear development, an indication that my each day consumption was comparatively regular. Nevertheless, three outliers could be seen. These weren’t attributable to my secret bitcoin mining rigs however as an alternative resulted from misinterpret digits through the OCR course of. In three out of the 27 photographs, the mannequin merely made a recognition error.

These glitches level us to present limitations of AI-OCR, which I’ll discover in additional element shortly. However first, let’s have a better have a look at how this prototype truly works beneath the hood.

Beneath the hood: How AI-OCR works

AI-OCR is cut up into two most important elements: a frontend and a backend. The frontend is constructed utilizing Streamlit, a Python library that permits you to shortly flip Python scripts into net apps with little effort. It’s a standard selection for machine studying prototypes and proofs of idea, due to its simplicity. That stated: Streamlit just isn’t meant for production-scale functions although.

Because of this the primary focus of this text is on the backend, which is the place knowledge extraction and visualisation happen. It’s designed round two distinct processes:

OCR (Optical Character Recognition): Recognising the numerical knowledge from photographs or paperwork utilizing AI.
Knowledge visualisation: Reworking the extracted knowledge into insightful plots.

One in every of AI-OCR’s strengths is its flexibility: it’s model-agnostic. You aren’t locked right into a single Massive Language Mannequin (LLM) vendor. Each business and open-source fashions could be configured and swapped relying on the use case. Every course of is powered by configurable LLMs. Apart from OpenAI fashions reminiscent of GPT-4.1, the prototype helps (to date) quantised fashions in GGUF format, a binary file format that packages mannequin weights and metadata collectively. These are loaded in and run domestically through the llama.cpp Python library.

For the OCR activity, Hugging Face provides an enormous number of quantised fashions reminiscent of LLaVa, DeepSeek-VL, or Llama-3-vision. For the code era of the visualisation part, fashions with robust coding capabilities are best. Attributable to lack of computational assets at residence (I don’t have entry to a robust GPU), I’ve solely totally examined this prototype with OpenAI fashions through the API.

The OCR part: Extracting the info

To show photographs into insights, the related knowledge have to be recognised from the photographs, which is dealt with within the OCR part. The method begins when the consumer uploads photographs and submits a immediate describing which values must be recognised from the picture and optionally available further context to help the mannequin. The output is a pandas.DataFrame containing the extracted values alongside the timestamps of the photographs.

The diagram beneath illustrates the design of the info extraction pipeline. The outer field represents the Streamlit-based frontend, whereas the inside part particulars the backend structure, a REST API. Arrows connecting the frontend and backend signify API calls. Inside the backend, every icon symbolises a definite part of the backend.

Design of the OCR part. *Picture by creator*.

On the core of the backend is the OCR Modelling object. When a immediate is submitted, this object receives it together with the chosen mannequin configuration. It hundreds the suitable mannequin and accesses the uploaded photographs.

One notably instructive a part of this design is the best way the immediate is dealt with. Earlier than the precise OCR activity is carried out, the immediate from the consumer is enhanced with the assistance of a Small Language Mannequin (SLM). The SLM’s position is to determine the particular values talked about within the consumer’s immediate and return them as a listing. For instance, within the blood stress use case, the SLM would return:

[“heart rate”, “diastolic pressure”, “systolic pressure”].

This info is used to mechanically improve the unique consumer immediate. The LLM is all the time requested to return structured output. Thus, the immediate must be enhanced by the particular JSON output format, which for the blood stress case reads:

{“coronary heart price”: “worth”, “diastolic stress”: “worth”, “systolic stress”: “worth”}.

Discover that the SLM used right here runs domestically utilizing llama.cpp. For the use instances mentioned beforehand, I used Gemma-2 9B (in quantised GGUF format). This system highlights how smaller, light-weight fashions can be utilized for environment friendly and computerized immediate optimisation.

This enhanced immediate is then despatched sequentially, together with every picture, to the chosen LLM. The mannequin infers the requested values from the picture. The responses are then aggregated right into a pandas.DataFrame, which is finally returned to the consumer for viewing and downloading.

Visualising the consequence

The second a part of turning your photographs into insights is the visualisation course of. Right here, the numerical knowledge extracted into the DataFrame through the OCR course of is reworked into significant plots based mostly on the consumer’s request.

The consumer offers a immediate describing the kind of visualisation they need (e.g., time collection, histogram, scatter plot). The LLM then generates Python code to create the requested plot. This generated code is executed on the frontend, and the ensuing visualisation is displayed immediately throughout the frontend.

The diagram beneath as soon as once more illustrates this course of intimately. The core of this specific course of is the Plot Modelling object. It receives two key inputs:

The consumer’s immediate describing the specified visualisation
The pandas.DataFrame generated by the OCR course of.

Design of the info visualisation part. *Picture by creator.*

Earlier than passing the immediate and metadata in regards to the DataFrame to the LLM, the immediate first passes by way of a Governance Gateway. Its job is to make sure safety by stopping the era or execution of malicious code. It’s applied as an SLM. As beforehand, I used Gemma-2 9B (in quantized GGUF format) as an SLM that runs domestically utilizing llama.cpp.

Particularly, the Governance Gateway first verifies through the instructed SLM that the consumer’s immediate comprises a sound knowledge visualisation request and doesn’t embody any dangerous or suspicious directions. Provided that the immediate passes this preliminary test, it’s forwarded to the LLM to generate the Python plotting code. After the code is generated, it’s despatched again to the SLM for a second safety assessment to make sure the code is secure to execute.

After passing the second safety validation, the generated code is distributed again to the frontend, the place it’s executed to generate the requested plot. This two-factor governance method helps make sure that AI-generated code runs safely and securely, whereas giving the consumer the flexibleness to generate any desired knowledge visualisation throughout the Matplotlib ecosystem.

Challenges, limits, and trade-offs

As already touched upon within the use case part, this prototype, notably the OCR part, has notable limitations because of the constraints of the underlying language fashions.

On this part, I wish to explicitly reveal two situations (illustrated within the picture beneath) the place the instrument presently struggles considerably, and why, in some instances, it may not be probably the most optimum answer. These two situations each require decoding analogue knowledge. Regardless of the rising digitization of lab tools, that is nonetheless an essential requirement for the applying of such a instrument in lots of physics labs.

Two instance situations the place AI-OCR struggles. Picture by creator.

On the left is an try and measure the size of an object (on this case, a e-book) utilizing a bodily ruler. On the appropriate is a picture of my automotive’s analogue RPM metre. In each instances, I processed a number of photographs with the prototype: static photographs for measuring the size of the e-book and video frames for studying the RPM metre. Regardless of supplying high-quality inputs and punctiliously crafted prompts, the ensuing measurements have been imprecise. Whereas the extracted values all the time fell throughout the anticipated numeric vary, they have been persistently too far off for real-world functions.

Whereas AI-OCR provides comfort, in some instances the general price may outweigh its advantages. In instances just like the physique weight tracker, it’s value mentioning, the instrument offers comfort however at a price in reminiscence and token utilization. Every picture could also be a number of megabytes, whereas the extracted knowledge (a single float) is just some bytes. Picture evaluation with LLMs can be costly. These trade-offs spotlight the necessity to all the time align AI functions with clear enterprise worth.

Conclusion: Customized AI brokers for tomorrow’s lab

On this article, we explored find out how to construct an LLM-powered prototype that transforms measurement photographs into structured knowledge and insightful plots. Customers can add photographs, describe the values they wish to be recognised from the photographs in addition to the kind of knowledge visualisation to be carried out. The consumer then receives each uncooked values and visible interpretations.

When you have tried ChatGPT or different LLM platforms, you’ll have seen: it might probably already do a lot of this and maybe extra. Merely add a picture to the chat, describe your required knowledge visualisation (optionally add context), and the system (e.g. ChatGPT) figures out the remaining. Beneath the hood, this seemingly depends on a system of AI brokers working in live performance.

That very same kind of structure is what a future model of AI-OCR may embrace. However why hassle constructing it if one may merely use ChatGPT as an alternative? Due to customisation and management. Not like ChatGPT, the AI brokers in AI-OCR could be tailor-made to your particular wants (like that of a lab assistant), and with native fashions, you keep full management over your knowledge. As an example, you very seemingly would like to not add your private finance paperwork to ChatGPT.

A potential structure for such a system of AI brokers (that ChatGPT very seemingly depends on as effectively) is illustrated within the diagram beneath:

Potential structure for a system of AI brokers that could possibly be built-in in a lab assistant. *Picture by creator.*

On the high stage, a Root Agent receives the consumer’s enter and delegates duties through an Agent Communication Protocol (ACP). It may select between two auxiliary brokers:

OCR Agent: Extracts related numerical knowledge from photographs and interfaces with a Mannequin Context Protocol (MCP) server that manages CSV knowledge storage.
The Knowledge Vis(ualisation) Agent: Connects to a separate MCP plot server able to executing Python code. This server consists of the Governance Gateway powered by an SLM, which ensures all code is secure and applicable earlier than execution.

Not like ChatGPT, this setup could be absolutely tailor-made: from native LLMs for knowledge safety to system immediate tuning of the brokers for area of interest duties. AI-OCR just isn’t meant to interchange ChatGPT, however fairly complement it. It may evolve into an autonomous lab assistant that streamlines knowledge extraction, plotting, and evaluation in specialised environments.

Acknowledgement

If you happen to’re interested by the way forward for AI-OCR or interested by exploring concepts and collaborations, be at liberty to attach with me on LinkedIn.

On the finish, I wish to thank Oliver Sharif, Tobias Jung, Sascha Niechciol, Oisín Culhane, and Justin Mayer for his or her suggestions and sharp proofreading. Your insights vastly improved this text.

Source link

Unfiltered Roleplay AI Chatbots with Pictures – My Top Picks

Roleplay AI Chatbot Apps with the Best Memory: Tested

How to Perform Comprehensive Large Scale LLM Validation

Unfiltered Roleplay AI Chatbots with Pictures – My Top Picks

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

M&S cyber attack chaos leaves more questions than answers

This CEO Says the Secret to Growth Is Knowing Who You’re Not For

FDA Draft AI/ML Guidance Sparks Concern and Caution Among Startups in Health Tech Sector | by Prashanth Noble Bose | Jul, 2025

Our Picks

Unfiltered Roleplay AI Chatbots with Pictures – My Top Picks

Optimizing ML Costs with Azure Machine Learning | by Joshua Fox | Aug, 2025

Why Teams Rely on Data Structures

From Pixels to Plots | Towards Data Science

Turning tedious lab work shortly into actionable insights

Sensible use instances: The place AI-OCR shines

Blood stress monitoring

Physique weight monitoring

Monetary knowledge evaluation

Electrical energy metre readings

Beneath the hood: How AI-OCR works

The OCR part: Extracting the info

Visualising the consequence

Challenges, limits, and trade-offs

Conclusion: Customized AI brokers for tomorrow’s lab

Acknowledgement

Related Posts