Close Menu
    Trending
    • A Founder’s Guide to Building a Real AI Strategy
    • Starting Your First AI Stock Trading Bot
    • Peering into the Heart of AI. Artificial intelligence (AI) is no… | by Artificial Intelligence Details | Aug, 2025
    • E1 CEO Rodi Basso on Innovating the New Powerboat Racing Series
    • When Models Stop Listening: How Feature Collapse Quietly Erodes Machine Learning Systems
    • Why I Still Don’t Believe in AI. Like many here, I’m a programmer. I… | by Ivan Roganov | Aug, 2025
    • The Exact Salaries Palantir Pays AI Researchers, Engineers
    • “I think of analysts as data wizards who help their product teams solve problems”
    AIBS News
    • Home
    • Artificial Intelligence
    • Machine Learning
    • AI Technology
    • Data Science
    • More
      • Technology
      • Business
    AIBS News
    Home»Machine Learning»Rethinking Data Architecture in the Age of GenAI: From ETL to Embeddings | by Dr. Junaid Farooq | Jul, 2025
    Machine Learning

    Rethinking Data Architecture in the Age of GenAI: From ETL to Embeddings | by Dr. Junaid Farooq | Jul, 2025

    Team_AIBS NewsBy Team_AIBS NewsJuly 11, 2025No Comments5 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    As organizations undertake giant language fashions (LLMs) to drive clever automation, information extraction, and semantic search, many are starting to appreciate that their conventional knowledge architectures are basically misaligned with the calls for of AI-native techniques.

    The legacy stack—designed for SQL queries, dashboards, and tabular ML—merely can not help the retrieval-first, unstructured-data-centric workflows required by fashionable LLM-based techniques. It’s not only a tooling mismatch—it’s an architectural divergence.

    On this article, I break down how the AI-native knowledge structure differs on the techniques degree, and what it means for organizations trying to operationalize LLMs past experimentation.

    The classical knowledge structure, refined over a long time for enterprise intelligence and predictive analytics, sometimes follows this sample:

    Supply Techniques → ETL → Knowledge Lake / Warehouse → Dashboards & ML Fashions
    • Knowledge Lakes function centralized repositories for structured and semi-structured knowledge.
    • ETL Pipelines normalize knowledge into schema-aligned codecs optimized for joins, aggregations, and BI consumption.
    • Downstream Consumption is geared towards dashboards, reporting layers, and tabular ML (regression, classification, and many others.).
    • Coaching Knowledge is extracted from historic tables and logs, typically requiring heavy handbook labeling.

    Whereas this pipeline helps KPIs and metric-driven reporting at scale, it turns into a bottleneck when tasked with ingesting unstructured content material or enabling clever retrieval, rating, and reasoning.

    The AI-native structure reorients your complete stack round mannequin readiness, semantic entry, and suggestions loops. Right here’s how the pipeline evolves:

    Knowledge Sources → Vector Retailer + Semantic Layer → LLM Pipeline → Suggestions Loop → Nice-Tuning / Distillation

    Let’s dissect every architectural layer with precision:

    The system have to be designed to deal with high-dimensional, context-rich enter: PDFs, information base articles, analysis papers, help logs, supply code, emails, chat transcripts, and many others.

    • Ingested through streaming or batch into object/blob storage (e.g., AWS S3, GCS).
    • Metadata extraction (supply, kind, writer, timestamp) is tightly coupled with ingest.

    Conventional schema design is changed by document-centric structuring—each file is a information asset, not a row in a desk.

    This layer is the semantic transformation layer—akin to the “ETL” for LLM-native techniques.

    • Knowledge is embedded utilizing sentence transformers, domain-tuned LLMs, or open embedding fashions.
    • Output vectors are saved in vector databases (FAISS, Pinecone, Weaviate, Elasticsearch), paired with supply metadata.
    • Helps dense retrieval, similarity search, hybrid filtering.

    This layer abstracts that means into high-dimensional latent area—enabling contextual recall, not simply key phrase matching.

    RAG (Retrieval-Augmented Era) pipelines dominate fashionable LLM techniques.

    • Question vectorization + metadata filters → candidate doc set.
    • Might embody scoring, reranking, and context home windows.
    • Retrieval may be pure vector-based, keyword-filtered, or hybrid (BM25 + ANN).

    This isn’t “search.” That is semantic orchestration, dynamically assembling context that guides era, classification, or summarization.

    As soon as related context is retrieved, LLMs function through rigorously structured prompts or autonomous brokers.

    • Brokers mix instruments, reminiscence, paperwork, and APIs into multi-step reasoning chains.
    • Outputs are usually not simply sentences—they are often structured information, JSON, executable code, or analytic summaries.

    Immediate engineering provides approach to immediate routing and chaining as system-level design patterns.

    AI-native stacks are by no means static. Suggestions loops are engineered into the system.

    • Human and system suggestions (scores, clicks, corrections) are logged.
    • This knowledge turns into the muse for desire modeling, reward shaping, and continuous fine-tuning.
    • Weak supervision, distillation, and bootstrapped labels are additionally built-in right here.

    That is the muse for self-improving AI—knowledge turns into gas for mannequin evolution, not simply reporting.

    In contrast to conventional MLOps the place fashions are occasionally retrained, LLM-native techniques help modular, iterative refinement:

    • LoRA adapters, QLoRA modules, instruction tuning datasets, and feedback-derived reward fashions are deployed and versioned.
    • Bigger instructor fashions are used to distill smaller, environment friendly brokers for edge deployment or high-throughput inference.

    Mannequin adaptation is modularized. We’re now not simply coaching fashions—we’re shaping brokers that perceive our area.

    | Facet              | Conventional Stack            | AI-Native Stack                           |
    | ------------------- | ---------------------------- | ----------------------------------------- |
    | **Main Knowledge** | Structured (Tabular, Occasions) | Unstructured (Textual content, Media, Logs) |
    | **Knowledge Retailer** | Knowledge Lake / Warehouse | Vector DB + Object Storage |
    | **Question Interface** | SQL, OLAP | Pure Language, Semantic Search |
    | **Output Format** | Dashboards, Studies, Metrics | Summaries, JSON, Embeddings, Directions |
    | **Studying Loop** | Offline, Static | On-line, Suggestions-Pushed |
    | **Reusability** | Options, Aggregations | Prompts, Embeddings, Retrieval Contexts |

    Probably the most important shift is philosophical: in AI-native techniques, schemas now not drive intelligence. Semantics do.

    In AI-native techniques, schemas now not drive intelligence. Semantics do.

    We architect techniques to not retrieve rows from tables, however to retrieve information from context—encoded in vectors, formed by prompts, and grounded in suggestions.

    As we transfer towards agentic AI, this stack turns into foundational. It allows brokers to go looking, cause, clarify, and be taught—not simply reply.

    In case your infrastructure remains to be optimizing for BI dashboards and metric aggregation, you are not simply behind—you are architecturally incompatible with the longer term.

    AI-native infrastructure is just not a minor iteration of the information warehouse period—it’s a paradigm shift. And it calls for architectural management, not simply LLM enthusiasm.

    For those who’re designing for real-world LLM deployments, construct for retrieval-first pondering, semantic scalability, and suggestions as a first-class citizen.

    The way forward for AI isn’t just data-driven. It’s context-native.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleRobot Videos: Reachy Mini, Autonomous Humanoids, More
    Next Article Are You Being Unfair to LLMs?
    Team_AIBS News
    • Website

    Related Posts

    Machine Learning

    Peering into the Heart of AI. Artificial intelligence (AI) is no… | by Artificial Intelligence Details | Aug, 2025

    August 2, 2025
    Machine Learning

    Why I Still Don’t Believe in AI. Like many here, I’m a programmer. I… | by Ivan Roganov | Aug, 2025

    August 2, 2025
    Machine Learning

    These 5 Programming Languages Are Quietly Taking Over in 2025 | by Aashish Kumar | The Pythonworld | Aug, 2025

    August 2, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    A Founder’s Guide to Building a Real AI Strategy

    August 2, 2025

    I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

    December 10, 2024

    Amazon and eBay to pay ‘fair share’ for e-waste recycling

    December 10, 2024

    Artificial Intelligence Concerns & Predictions For 2025

    December 10, 2024

    Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

    December 10, 2024
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    Most Popular

    One-Click LLM Bash Helper

    December 13, 2024

    Jshshs

    January 10, 2025

    AI Is Replacing Jobs in These Two Fields, Benchmark VC Says

    April 15, 2025
    Our Picks

    A Founder’s Guide to Building a Real AI Strategy

    August 2, 2025

    Starting Your First AI Stock Trading Bot

    August 2, 2025

    Peering into the Heart of AI. Artificial intelligence (AI) is no… | by Artificial Intelligence Details | Aug, 2025

    August 2, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Business
    • Data Science
    • Machine Learning
    • Technology
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2024 Aibsnews.comAll Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.