GraphRAG in Action: A Simple Agent for Know-Your-Customer Investigations

the world of monetary companies, Know-Your-Buyer (KYC) and Anti-Cash Laundering (AML) are important protection traces in opposition to illicit actions. KYC is of course modelled as a graph drawback, the place prospects, accounts, transactions, IP addresses, gadgets, and areas are all interconnected nodes in an enormous community of relationships. Investigators sift by way of these advanced webs of connections, making an attempt to attach seemingly disparate dots to uncover fraud, sanctions violations, and cash laundering rings.

It is a nice use case for AI grounded by a information graph (GraphRAG). The intricate net of connections requires capabilities past customary document-based RAG (sometimes based mostly on vector similarity search and reranking methods).

Disclosure

I’m a Senior Product Manager for AI at Neo4j, the graph database featured on this put up. Though the snippets concentrate on Neo4j, the identical patterns will be utilized with any graph database. My fundamental intention is to share sensible steerage on constructing GraphRAG brokers with the AI/ML group. All code within the linked repository is open-source and free so that you can discover, experiment with, and adapt.

All on this weblog put up have been created by the writer.

A GraphRAG KYC Agent

This weblog put up offers a hands-on information for AI engineers and builders on construct an preliminary KYC agent prototype with the OpenAI Agents SDK. We’ll discover equip our agent with a set of instruments to uncover and examine potential fraud patterns.

The diagram under illustrates the agent processing pipeline to reply questions raised throughout a KYC investigation.

Picture by the Creator generated utilizing Napkin AI

Let’s stroll by way of the key elements:

The KYC Agent: It leverages the OpenAI Brokers SDK and acts because the “mind,” deciding which instrument to make use of based mostly on the consumer’s question and the dialog historical past. It performs the position of MCP Host and MCP shopper to the Neo4j MCP Cypher Server. Most significantly, it runs a quite simple loop that takes a query from the consumer, invokes the agent, and processes the outcomes, whereas holding the dialog historical past.
The Toolset. A set of instruments accessible to the agent.
- GraphRAG Instruments: These are Graph knowledge retrieval capabilities that wrap a really particular Cypher question. For instance:
  - Get Buyer Particulars: A graph retrieval instrument that given a Buyer ID, it retrieves details about a buyer, together with their accounts and up to date transaction historical past.
- Neo4j MCP Server: A Neo4j MCP Cypher Server exposing instruments to work together with a Neo4j database. It offers three important instruments:
  1. Get Schema from the Database.
  2. Run a READ Cypher Question in opposition to the database
  3. Run a WRITE Cypher QUery in opposition to the database
- A Textual content-To-Cypher instrument: A python perform wrapping a fine-tuned Gemma3-4B mannequin working regionally through Ollama. The instrument interprets pure language questions into Cypher graph queries.
- A Reminiscence Creation instrument: This instrument permits investigators to doc their findings straight within the information graph. It creates a “reminiscence” (of an investigation) within the information graph and hyperlinks it to all related prospects, transactions, and accounts. Over time, this helps construct a useful information base for future investigations.
A KYC Information Graph: A Neo4j database storing a information graph of 8,000 fictitious prospects, their accounts, transactions, gadgets and IP addresses. It is usually used because the agent’s long-term reminiscence retailer.

Need to check out the agent now? Simply observe the instructions on the project repo. You’ll be able to come again and browse how the agent was constructed later.

Why GraphRAG for KYC?

Conventional RAG methods concentrate on discovering data inside massive our bodies of textual content which can be chunked up into fragments. KYC investigations depend on discovering attention-grabbing patterns in a fancy net of interconnected knowledge – prospects linked to accounts, accounts related by way of transactions, transactions tied to IP addresses and gadgets, and prospects related to private and employer addresses.

Understanding these relationships is essential to uncovering refined fraud patterns.

“Does this buyer share an IP handle with somebody on a watchlist?”
“Is that this transaction a part of a round cost loop designed to obscure the supply of funds?”
“Are a number of new accounts being opened by people working for a similar, newly-registered, shell firm?”

These are questions of connectivity. A information graph, the place prospects, accounts, transactions, and gadgets are nodes and their relationships are express edges, is the perfect knowledge construction for this job. GraphRAG (knowledge retrieval) instruments make it easy to determine uncommon patterns of exercise.

Image by the Author generated with Napkin AI — Picture by the Creator generated utilizing Napkin AI

A Artificial KYC Dataset

For the needs of this weblog, I’ve created an artificial dataset with 8,000 fictitious prospects and their accounts, transactions, registered addresses, gadgets and IP addresses.

The picture under exhibits the “schema” of the database after the dataset is loaded into Neo4j. In Neo4j, a schema describes the kind of entities and relationships saved within the database. In our case, the primary entities are: Buyer, Deal with, Accounts, Gadget, IP Deal with, Transactions. The primary relationships amongst them are as illustrated under.

The dataset accommodates a number of anomalies. Some prospects are concerned in suspicious transaction rings. There are a number of remoted gadgets and IP addresses (not linked to any buyer or account). There are some addresses shared by numerous prospects. Be at liberty to discover the artificial dataset generation script, if you wish to perceive or modify the dataset to your necessities.

A Primary Agent with OpenAI Brokers SDK

Let’s stroll by way of the key elements of our KYC Agent.

The implementation is usually inside kyc_agent.py. The total supply code and step-by-step directions on run the agent can be found on Github.

First, let’s outline the agent’s core identification with appropriate directions.

import os
from brokers import Agent, Runner, function_tool
# ... different imports

# Outline the directions for the agent
directions = """You're a KYC analyst with entry to a information graph. Use the instruments to reply questions on prospects, accounts, and suspicious patterns.
You might be additionally a Neo4j skilled and might use the Neo4j MCP server to question the graph.
If you happen to get a query concerning the KYC database you can not reply with GraphRAG instruments, you must
- use the Neo4j MCP server to fetch the schema of the graph (if wanted)
- use the generate_cypher instrument to generate a Cypher question from query and the schema
- use the Neo4j MCP server to question the graph to reply the query
"""

The directions are essential. They set the agent’s persona and supply a high-level technique for strategy issues, particularly when a pre-defined instrument doesn’t match the consumer’s request.

Now, let’s begin with a minimal agent. No instruments. Simply the directions.

# Agent Definition, we are going to add instruments later. 
kyc_agent = Agent(
   identify="KYC Analyst",
   directions=directions,
   instruments=[...],      # We are going to populate this checklist
   mcp_servers=[...] # And this one
)

Let’s add some instruments to our KYC Agent

An agent is simply pretty much as good as its instruments. Let’s look at 5 instruments we’re giving our KYC analyst.

Instrument 1 & 2: Pre-defined Cypher Queries

For frequent and demanding queries, it’s greatest to have optimized, pre-written Cypher queries wrapped in Python capabilities. You should use the @function_tool decorator from the OpenAI Agent SDK to make these capabilities accessible to the agent.

Instrument 1: `find_customer_rings`

This instrument is designed to detect recursive patterns attribute of cash laundering, particularly ‘round transactions’ the place funds cycle by way of a number of accounts to disguise their origin.

In KYC graph, this interprets on to discovering cycles or paths that return to or close to their place to begin inside a directed transaction graph. Implementing such detection entails advanced graph traversal algorithms, usually using variable-length paths to discover connections as much as a sure ‘hop’ distance.

The code snippet under exhibits a find_customer_rings perform that executes a Cypher Question in opposition to the KYC database and returns as much as 10 potential buyer rings. For every rings, the next data is returned: the purchasers accounts and transactions concerned in these rings.

@function_tool
def find_customer_rings(max_number_rings: int = 10, customer_in_watchlist: bool = True, ...):
   """
   Detects round transaction patterns (as much as 6 hops) involving high-risk prospects.
   Finds account cycles the place the accounts are owned by prospects matching specified
   threat standards (watchlisted and/or PEP standing).
   Args:
       max_number_rings: Most rings to return (default: 10)
       customer_in_watchlist: Filter for watchlisted prospects (default: True)
       customer_is_pep: Filter for PEP prospects (default: False)
       customer_id: Particular buyer to concentrate on (not carried out)
   Returns:
       dict: Incorporates ring paths and related high-risk prospects
   """
   logger.information(f"TOOL: FIND_CUSTOMER_RINGS")
   with driver.session() as session:
       outcome = session.run(
           f"""
           MATCH p=(a:Account)-[:FROM|TO*6]->(a:Account)
           WITH p, [n IN nodes(p) WHERE n:Account] AS accounts
           UNWIND accounts AS acct
           MATCH (cust:Buyer)-[r:OWNS]->(acct)
           WHERE cust.on_watchlist = $customer_in_watchlist
           // ... extra Cypher to gather outcomes ...
           """,
           max_number_rings=max_number_rings,
           customer_in_watchlist=customer_in_watchlist,
       )
       # ... Python code to course of and return outcomes ...

It’s price noting that the documentation string (doc string) is robotically utilized by OpenAI Brokers SDK because the instrument description! So good Python perform documentation pays off!.

Instrument 2: `get_customer_and_accounts`

A easy, but important, instrument for retrieving a buyer’s profile, together with their accounts and most up-to-date transactions. That is the bread-and-butter of any investigation. The code is just like our earlier instrument – a perform that takes a buyer ID and wraps round a easy Cypher question.

As soon as once more, the perform is adorned with @function_tool to make it accessible to the agent.

The Cypher question wrapped by this Python is proven under

outcome = session.run(
           """
           MATCH (c:Buyer {id: $customer_id})-[o:OWNS]->(a:Account)
           WITH c, a
           CALL (c,a) FROM]->(t:Transaction)
               ORDER BY t.timestamp DESC
               LIMIT $tx_limit
               RETURN gather(t) as transactions
           
           RETURN c as buyer, a as account, transactions
           """,
           customer_id=enter.customer_id
       )

A notable facet of this instrument’s design is using Pydantic to specify the perform’s output. The OpenAI AgentsSDK makes use of Pydantic fashions returned by the perform to robotically generate a textual content description of the output parameters.

If you happen to look rigorously, the perform returns

return CustomerAccountsOutput(          
 buyer=CustomerModel(**buyer),
 accounts=[AccountModel(**a) for a in accounts],
)

The CustomerModel and AccountModel embrace every of the properties returned for every Buyer, its accounts and a listing of current transactions. You’ll be able to see their definition in schemas.py.

Instruments 3 & 4: The place Neo4j MCP Server meets Textual content-To-Cypher

That is the place our KYC agent will get some extra attention-grabbing powers.

A big problem in constructing versatile AI brokers is enabling them to work together dynamically with advanced knowledge sources, past pre-defined, static capabilities. Brokers want the flexibility to carry out general-purpose querying the place new insights may require spontaneous knowledge exploration with out requiring a priori Python wrappers for each potential motion.

This part explores a standard architectural sample to deal with this. A instrument to translate pure language query into Cypher coupled with one other instrument to permit dynamic question execution.

We exhibit this mechanism utilizing the Neo4 MCP Server to reveal dynamic graph question execution and a Google Gemma3-4B fine-tuned mannequin for Textual content-to-Cypher translation.

Instrument 3: Including the Neo4j MCP server toolset

For a strong agent to function successfully with a information graph, it wants to know the graph’s construction and to execute Cypher queries. These capabilities allow the agent to introspect the information and execute dynamic ad-hoc queries.

The MCP Neo4j Cypher server offers the essential instruments: get-neo4j-schema (to retrieve graph schema dynamically), read-neo4j-cypher (for executing arbitrary learn queries), and write-neo4j-cypher (for create, replace, delete queries).

Luckily, the OpenAI Brokers SDK has assist for MCP. The code snippet under exhibits how straightforward it’s so as to add the Neo4j MCP Server to our KYC Agent.

# Instrument 3: Neo4j MCP server setup
neo4j_mcp_server = MCPServerStdio(
   params={
       "command": "uvx",
       "args": ["[email protected]"],
       "env": {
           "NEO4J_URI": NEO4J_URI,
           "NEO4J_USERNAME": NEO4J_USER,
           "NEO4J_PASSWORD": NEO4J_PASSWORD,
           "NEO4J_DATABASE": NEO4J_DATABASE,
       },
   },
   cache_tools_list=True,
   identify="Neo4j MCP Server",
)

You’ll be able to study extra about how MCP is supported in OpenAI Agents SDK here.

Instrument 4: A Textual content-To-Cypher Instrument

The flexibility to dynamically translate pure language into highly effective graph queries usually depends on specialised Massive Language Fashions (LLMs) – finetuned with schema-aware question technology.

We are able to use open weights, publicly accessible Textual content-to-Cypher fashions accessible on Huggingface, reminiscent of neo4j/text-to-cypher-Gemma-3-4B-Instruct-2025.04.0. This mannequin was particularly finetuned to generate correct Cypher queries from consumer query and a schema.

In an effort to run this mannequin on an area system, we are able to flip to Ollama. Utilizing Llama.cpp, it’s comparatively easy to transform any HuggingFace fashions to GGUF format, which is required to run a mannequin in Ollama. Utilizing the ‘convert-hf-to-GGUF’ python script, I generated a GGUF model of the Gemma3-4B finetuned mannequin and uploaded it to Ollama.

In case you are an Ollama consumer, you may obtain this mannequin to your native system with:

ollama pull ed-neo4j/t2c-gemma3-4b-it-q8_0-35k

What occurs when a consumer asks a query that doesn’t match any of our pre-defined instruments?

For instance, “For buyer CUST_00001, discover his addresses and test if they’re shared with different prospects”

As a substitute of failing, our agent can generate a Cypher question on the fly…

@function_tool
async def generate_cypher(request: GenerateCypherRequest) -> str:
   """
   Generate a Cypher question from pure language utilizing an area finetuned text2cypher Ollama mannequin
   """
   USER_INSTRUCTION = """...""" # Detailed immediate directions

   user_message = USER_INSTRUCTION.format(
       schema=request.database_schema,
       query=request.query
   )
   # Generate Cypher question utilizing the text2cypher mannequin
   mannequin: str = "ed-neo4j/t2c-gemma3-4b-it-q8_0-35k"
   response = await chat(
       mannequin=mannequin,
       messages=[{"role": "user", "content": user_message}]
   )
   return response['message']['content']

The generate_cypher instrument addresses the problem of Cypher question technology, however how does the agent know when to make use of this instrument? The reply lies within the agent directions.

You might keep in mind that initially of the weblog, we outlined the directions for the agent as follows:

directions = """You're a KYC analyst with entry to a information graph. Use the instruments to reply questions on prospects, accounts, and suspicious patterns.
   You might be additionally a Neo4j skilled and might use the Neo4j MCP server to question the graph.
   If you happen to get a query concerning the KYC database you can not reply with GraphRAG instruments, you must
   - use the Neo4j MCP server to get the schema of the graph (if wanted)
   - use the generate_cypher instrument to generate a Cypher question from query and the schema
   - use the Neo4j MCP server to question the graph to reply the query
   """

This time, word the precise directions to deal with ad-hoc queries that may not be answered by the graph retrieval based mostly instruments.

When the agent goes down this path, it goes by way of following steps:

The agent will get a novel query.
It first calls `neo4j-mcp-server.get-neo4j-schema` to get the schema of the database.
It then feeds the schema and the consumer’s query to the `generate_cypher` instrument. This may generate a Cypher question.
Lastly, it takes the generated Cypher question and run it utilizing `neo4j-mcp-server.read-neo4j-cypher`.

If there are errors, in both the cypher technology or the execution of the cypher, the agent retries to generate Cypher and rerun it.

As you may see, the above strategy is just not bullet-proof. It depends closely on the Textual content-To-Cypher mannequin to supply legitimate and proper Cypher. Typically, it really works. Nonetheless, in circumstances the place it doesn’t, you must take into account:

Defining express Cypher retrieval instruments for one of these questions.
Including some type of finish consumer suggestions (thumbs up / down) in your UI/UX. This may assist flag questions that the agent is fighting. You’ll be able to then determine greatest strategy to deal with this class of questions. (e.g cypher retrieval instrument, higher directions, enchancment to text2cypher mannequin, guardrails or simply get your agent to politely decline to reply the query).

Instrument 5 – Including Reminiscence to the KYC Agent

The subject of agent reminiscence is getting a lot of consideration recently.

Whereas brokers inherently handle short-term reminiscence by way of conversational historical past, advanced, multi-session duties like monetary investigations demand a extra persistent and evolving long-term reminiscence.

This long-term reminiscence isn’t only a log of previous interactions; it’s a dynamic information base that may accumulate insights, observe ongoing investigations, and supply context throughout completely different classes and even completely different brokers.

The create_memory instrument implements a type of express information graph reminiscence, the place summaries of investigations are saved as devoted nodes and explicitly linked to related entities (prospects, accounts, transactions).

@function_tool
def create_memory(content material: str, customer_ids: checklist[str] = [], account_ids: checklist[str] = [], transaction_ids: checklist[str] = []) -> str:


   """
   Create a Reminiscence node and hyperlink it to specified prospects, accounts, and transactions
   """
   logger.information(f"TOOL: CREATE_MEMORY")
   with driver.session() as session:
       outcome = session.run(
           """
           CREATE (m:Reminiscence {content material: $content material, created_at: datetime()})
           WITH m
           UNWIND $customer_ids as cid
           MATCH (c:Buyer {id: cid})
           MERGE (m)-[:FOR_CUSTOMER]->(c)
           WITH m
           UNWIND $account_ids as assist
           MATCH (a:Account {id: assist})
           MERGE (m)-[:FOR_ACCOUNT]->(a)
           WITH m
           UNWIND $transaction_ids as tid
           MATCH (t:Transaction {id: tid})
           MERGE (m)-[:FOR_TRANSACTION]->(t)
           RETURN m.content material as content material
           """,
           content material=content material,
           customer_ids=customer_ids,
           account_ids=account_ids,
           transaction_ids=transaction_ids
           # ...
       )

Further concerns for implementing “agent reminiscence” embrace:

Reminiscence Architectures: Exploring several types of reminiscence (episodic, semantic, procedural) and their frequent implementations (vector databases for semantic search, relational databases, or information graphs for structured insights).
Contextualization: How the information graph construction permits for wealthy contextualization of reminiscences, enabling highly effective retrieval based mostly on relationships and patterns, relatively than simply key phrase matching.
Replace and Retrieval Methods: How reminiscences are up to date over time (e.g., appended, summarized, refined) and the way they’re retrieved by the agent (e.g., by way of graph traversal, semantic similarity, or mounted guidelines).
Challenges: The complexities of managing reminiscence consistency, dealing with conflicting data, stopping ‘hallucinations’ in reminiscence retrieval, and making certain the reminiscence stays related and up-to-date with out changing into overly massive or noisy.”

That is an space of energetic growth and quickly evolving with many frameworks addressing a number of the concerns above.

Placing all of it collectively – An Instance Investigation

Let’s see how our agent handles a typical workflow. You’ll be able to run this your self (or be happy to observe alongside step-by-step directions on the KYC agent github repo)

1. “Get me the schema of the database“

Agent Motion: The agent identifies this as a schema question and makes use of the Neo4j MCP Server’s `get-neo4j-schema` instrument.

2. “Present me 5 watchlisted prospects concerned in suspicious rings“

Agent Motion: This straight matches the aim of our customized instrument. The agent calls `find_customer_rings` with `customer_in_watchlist=True`.

3. “For every of those prospects, discover their addresses and discover out if they’re shared with different prospects“.

Agent Motion: It is a query that may’t be answered with any of the GraphRAG instruments. The agent ought to observe its directions:
- It already has the schema (from our first interplay above).
- It calls `generate_cypher` with the query and schema. The instrument returns a Cypher question that tries to reply the investigator’s query.
- It executes this Cypher question utilizing the Neo4j MCP Cypher Server `read-neo4j-cypher` instrument.

4. “For the client whose handle is shared , are you able to get me extra particulars“

Agent Motion: The agent determines that the `get_customer_and_accounts` instrument is the right match and calls it with the client’s ID.

5. “Write a 300-word abstract of this investigation. Retailer it as a reminiscence. Ensure that to hyperlink it to each account and transaction belonging to this buyer“.

Agent Motion: The agent first makes use of its inside LLM capabilities to generate the abstract. Then, it calls the `create_memory` instrument, passing the abstract textual content and the checklist of all buyer, account, and transaction IDs it has encountered throughout the dialog.

Key Takeaways

If you happen to acquired this far, I hope you loved the journey of getting accustomed to a fundamental implementation of a KYC GraphRAG Agent. A lot of cool applied sciences right here: OpenAI Agent SDK, MCP, Neo4j, Ollama and a Gemma3-4B finetuned Textual content-To-Cypher mannequin!

I hope you gained some appreciation for:

GraphRAG, or extra particularly Graph-powered knowledge retrieval as a necessary for connected-data issues. It permits brokers to reply questions on closely related knowledge that may be unimaginable to reply with customary RAG.
The significance of a balanced toolkit is highly effective. Mix MCP Server instruments with your individual optimized instruments.
MCP Servers are a game-changer. They mean you can join your brokers to an growing set of MCP servers.
- Experiment with more MCP Servers so that you get a greater sense of the chances.
Brokers ought to be capable to write again to your knowledge retailer in a managed approach.
- In our instance we noticed how an analyst can persist its findings (e.g., including Reminiscence nodes to the knowlege graph) and within the course of making a virtuous cycle the place the agent improves the underlying information base for total groups of investigators.
- The agent provides data to the information graph and it by no means updates or deletes current data.

The patterns and instruments mentioned right here will not be restricted to KYC. They are often utilized to produce chain evaluation, digital twin administration, drug discovery, and another area the place the relationships between knowledge factors are as necessary as the information itself.

The period of graph-aware AI brokers is right here.

What’s Subsequent?

You will have constructed a easy AI agent on prime of OpenAI Brokers SDK with MCP, Neo4j and a Textual content-to-Cypher mannequin. All working on a single system.

Whereas this preliminary agent offers a powerful basis, transitioning to a production-level system entails addressing a number of further necessities, reminiscent of:

Agent UI/UX: That is the central half on your customers to work together together with your agent. This may finally be a key driver of the adoption and success of your agent.
Lengthy working duties and multiagent methods: Some duties are priceless however take a big period of time to run. In these circumstances, brokers ought to be capable to offload elements of their workload to different brokers.
- OpenAI does present some assist for handing off to subagents nevertheless it won’t be appropriate for long-running brokers.
Agent Guardrails – OpenAI Brokers SDK offers some assist for Guardrails.
Agent Internet hosting – It exposes your agent to your customers.
Securing comms to your agent – Finish consumer authentication and authorization to your agent.
Database entry controls – Managing entry management to the information saved within the KYC Information Graph.
Dialog Historical past.
Agent Observability.
Agent Reminiscence.
Agent Analysis – What’s the impression of fixing agent instruction and or including/eradicating a instrument?.
And extra…

Within the meantime, I hope this has impressed you to continue to learn and experimenting!.

Studying Sources

Source link

Tested an AI Crypto Trading Bot That Works With Binance

Tried Promptchan So You Don’t Have To: My Honest Review

Candy AI NSFW AI Video Generator: My Unfiltered Thoughts

10 Things That Separate Successful Founders From the Unsuccessful

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

How to Survive High-Demand Seasons Without Losing Customers

Only 48% of Founders Feel Confident About Their Taxes — Here’s How to Join Them

The Silent Chaos of Machine Learning: Why Data Version Control Is Your Secret Weapon | by Subash Palvel | Mar, 2025

Our Picks