Close Menu
    Trending
    • Revisiting Benchmarking of Tabular Reinforcement Learning Methods
    • Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025
    • Qantas data breach to impact 6 million airline customers
    • He Went From $471K in Debt to Teaching Others How to Succeed
    • An Introduction to Remote Model Context Protocol Servers
    • Blazing-Fast ML Model Serving with FastAPI + Redis (Boost 10x Speed!) | by Sarayavalasaravikiran | AI Simplified in Plain English | Jul, 2025
    • AI Knowledge Bases vs. Traditional Support: Who Wins in 2025?
    • Why Your Finance Team Needs an AI Strategy, Now
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»Building an Android Smart Gallery App for Image Organization & Search: Transitioning from Classification to Embedding Models | by D41 | Mar, 2025
    Machine Learning

    Building an Android Smart Gallery App for Image Organization & Search: Transitioning from Classification to Embedding Models | by D41 | Mar, 2025

    Team_AIBS NewsBy Team_AIBS NewsMarch 25, 2025No Comments4 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    SmartScan app banner

    In my previous approach for SmartScan, I designed it primarily for picture group utilizing a classification mannequin. Whereas the app functioned as supposed, the inflexible nature of a classification mannequin finally turned a limitation. This method required customers to coach it to precisely categorize their photographs, making it much less versatile when coping with the variety of consumer photographs. In distinction, embedding fashions generate function vectors, permitting us to compute cosine similarity between photographs and class representations — an method that adapts way more gracefully to various inputs. Though frameworks like ONNX and LiteRT help on-device coaching, implementing this added a layer of complexity that would negatively affect usability of the app.

    By transitioning to an embedding-based method, the app not solely improved its picture group capabilities but additionally gained a strong text-to-image search function, enabling customers to search out photographs utilizing pure language queries.

    Initially, I used CLIP embedding fashions (picture and textual content) for a zero-shot classification method by computing the cosine similarity between the textual illustration of folder names and the embeddings of latest photographs. Whereas this allowed for dynamic categorization, this technique was not at all times correct as a result of a folder identify alone didn’t seize the total variability of its contents. For instance, throughout testing with Twitter and Reddit screenshots, the mannequin incessantly misclassified Twitter screenshots as Reddit, with the cosine similarity between the 2 typically being extraordinarily shut.

    To enhance accuracy, I launched few-shot studying technique utilizing prototype embeddings. With this new method, every vacation spot folder is represented by a prototype embedding — the common of all picture embeddings inside that folder. New photographs are then in comparison with these prototype embeddings, resulting in significantly better matching accuracy.

    One other key motive for switching to an embedding mannequin was its suitability for implementing a text-to-image search function — which has now been added. Embedding fashions naturally map photographs and textual content right into a shared function area, making similarity comparability seamless and intuitive for customers looking out their gallery with textual queries.

    General utilizing the embedding method for classification gives a number of vital benefits:

    • Flexibility: Embedding fashions simply adapt to varied user-defined classes with out the constraints of discrete labels.
    • Dynamic Consumer Management: Customers can seamlessly add or modify classes with out retraining a posh classification mannequin.
    • Enhanced Scalability: The prototype embedding technique scales gracefully as new picture varieties and classes are added.

    To allow environment friendly use of the CLIP fashions on consumer units, I carried out the next steps:

    1. Conversion to ONNX: I transformed the CLIP visible and textual content encoder fashions to the ONNX format utilizing ONNX Runtime, making certain compatibility with cell environments.
    2. Mannequin Quantization: Each fashions had been quantized, decreasing their sizes by roughly 4x, with the picture encoder mannequin being decreased from 351.6MB to 95.6MB and the textual content encoder mannequin being decreased from 254.2MB to 64.4MB. This quantization not solely minimizes the APK measurement but additionally optimizes inference efficiency on cell units.

    Switching to an embedding mannequin has considerably enhanced the app’s flexibility and total consumer expertise. The preliminary zero-shot classification method utilizing embeddings served as a helpful stepping stone. Nonetheless, by transitioning to the few-shot studying method utilizing prototype embeddings, the categorization course of turned way more correct and strong. Furthermore, the embedding method now absolutely helps the built-in text-to-image search function, offering a pure and highly effective option to discover picture collections by easy textual content queries.

    Reflecting on my shift from zero-shot classification to few-shot studying, I couldn’t assist however take into account the controversy on AI changing software program builders. To not toot my very own horn, however this exemplifies why that received’t occur solely. Whereas AI will dominate code era, the complexities of system design, debugging, and problem-solving will at all times want human perception.

    For these excited by exploring the app additional, it’s open supply. You possibly can obtain it and take a look at the GitHub repository here. If you happen to discover it helpful or attention-grabbing, help it by giving it a star! It also needs to be launched on F-Droid someday this week.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleTrump’s Crypto Venture Introduces a Stablecoin
    Next Article OpenAI’s new image generator aims to be practical enough for designers and advertisers
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025

    July 2, 2025
    Machine Learning

    Blazing-Fast ML Model Serving with FastAPI + Redis (Boost 10x Speed!) | by Sarayavalasaravikiran | AI Simplified in Plain English | Jul, 2025

    July 2, 2025
    Machine Learning

    From Training to Drift Monitoring: End-to-End Fraud Detection in Python | by Aakash Chavan Ravindranath, Ph.D | Jul, 2025

    July 1, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Revisiting Benchmarking of Tabular Reinforcement Learning Methods

    July 2, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    Why We Aren’t Getting Any Better At AI Alignment? | by Vishal Rajput | AIGuys | Mar, 2025

    March 21, 2025

    Migrating from Snowflake to Databricks Lakehouse: A Complete Guide with Lakebridge | by THE BRICK LEARNING | Jun, 2025

    June 16, 2025

    How to Keep Fatigue From Turning Into Failure

    May 16, 2025
    Our Picks

    Revisiting Benchmarking of Tabular Reinforcement Learning Methods

    July 2, 2025

    Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025

    July 2, 2025

    Qantas data breach to impact 6 million airline customers

    July 2, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.