LangChain Meets Home Assistant: Unlock the Power of Generative AI in Your Smart Home | by Lindo St. Angel

Discover ways to create an agent that understands your private home’s context, learns your preferences, and interacts with you and your private home to perform actions you discover beneficial.

This text describes the structure and design of a Home Assistant (HA) integration known as home-generative-agent. This mission makes use of LangChain and LangGraph to create a generative AI agent that interacts with and automates duties inside a HA good house setting. The agent understands your private home’s context, learns your preferences, and interacts with you and your private home to perform actions you discover beneficial. Key options embrace creating automations, analyzing pictures, and managing house states utilizing varied LLMs (Massive Language Fashions). The structure entails each cloud-based and edge-based fashions for optimum efficiency and cost-effectiveness. Set up directions, configuration particulars, and knowledge on the mission’s structure and the completely different fashions used are included and might be discovered on the home-generative-agent GitHub. The mission is open-source and welcomes contributions.

These are a few of the options at the moment supported:

Create advanced House Assistant automations.
Picture scene evaluation and understanding.
House state evaluation of entities, units, and areas.
Full agent management of allowed entities within the house.
Quick- and long-term reminiscence utilizing semantic search.
Automated summarization of house state to handle LLM context size.

That is my private mission and an instance of what I name learning-directed hacking. The mission isn’t affiliated with my work at Amazon nor am I related to the organizations answerable for House Assistant or LangChain/LangGraph in any manner.

Creating an agent to observe and management your private home can result in sudden actions and probably put your private home and your self in danger as a consequence of LLM hallucinations and privateness considerations, particularly when exposing house states and consumer info to cloud-based LLMs. I’ve made cheap architectural and design decisions to mitigate these dangers, however they can’t be utterly eradicated.

One key early choice was to depend on a hybrid cloud-edge method. This permits using essentially the most refined reasoning and planning fashions obtainable, which ought to assist scale back hallucinations. Easier, extra task-focused edge fashions are employed to additional reduce LLM errors.

One other essential choice was to leverage LangChain’s capabilities, which permit delicate info to be hidden from LLM instruments and supplied solely at runtime. For example, instrument logic could require utilizing the ID of the consumer who made a request. Nonetheless, such values ought to usually not be managed by the LLM. Permitting the LLM to govern the consumer ID might pose safety and privateness dangers. To mitigate this, I utilized the InjectedToolArg annotation.

Moreover, utilizing giant cloud-based LLMs incurs important cloud prices, and the sting {hardware} required to run LLM edge fashions might be costly. The mixed operational and set up prices are possible prohibitive for the typical consumer right now. An industry-wide effort to “make LLMs as low cost as CNNs” is required to deliver house brokers to the mass market.

You will need to concentrate on these dangers and perceive that, regardless of these mitigations, we’re nonetheless within the early phases of this mission and residential brokers basically. Important work stays to make these brokers really helpful and reliable assistants.

Under is a high-level view of the home-generative-agent structure.

The final integration structure follows the perfect practices as described in Home Assistant Core and is compliant with Home Assistant Community Store (HACS) publishing necessities.

The agent is constructed utilizing LangGraph and makes use of the HA dialog element to work together with the consumer. The agent makes use of the House Assistant LLM API to fetch the state of the house and perceive the HA native instruments it has at its disposal. I applied all different instruments obtainable to the agent utilizing LangChain. The agent employs a number of LLMs, a big and really correct main mannequin for high-level reasoning, smaller specialised helper fashions for digital camera picture evaluation, main mannequin context summarization, and embedding era for long-term semantic search. The first mannequin is cloud-based, and the helper fashions are edge-based and run beneath the Ollama framework on a pc positioned within the house.

The fashions at the moment getting used are summarized beneath.

LangGraph-based Agent

LangGraph powers the dialog agent, enabling you to create stateful, multi-actor functions using LLMs as rapidly as potential. It extends LangChain’s capabilities, introducing the flexibility to create and handle cyclical graphs important for creating advanced agent runtimes. A graph fashions the agent workflow, as seen within the picture beneath.

The agent workflow has 5 nodes, every Python module modifying the agent’s state, a shared knowledge construction. The sides between the nodes symbolize the allowed transitions between them, with strong traces unconditional and dashed traces conditional. Nodes do the work, and edges inform what to do subsequent.

The __start__ and __end__ nodes inform the graph the place to start out and cease. The agent node runs the first LLM, and if it decides to make use of a instrument, the motion node runs the instrument after which returns management to the agent. The summarize_and_trim node processes the LLM’s context to handle progress whereas sustaining accuracy if agent has no instrument to name and the variety of messages meets the below-mentioned situations.

LLM Context Administration

You must fastidiously handle the context size of LLMs to steadiness price, accuracy, and latency and keep away from triggering price limits similar to OpenAI’s Tokens per Minute restriction. The system controls the context size of the first mannequin in two methods: it trims the messages within the context in the event that they exceed a max parameter, and the context is summarized as soon as the variety of messages exceeds one other parameter. These parameters are configurable in const.py; their description is beneath.

CONTEXT_MAX_MESSAGES | Messages to maintain in context earlier than deletion | Default = 100
CONTEXT_SUMMARIZE_THRESHOLD | Messages in context earlier than abstract era | Default = 20

The summarize_and_trim node within the graph will trim the messages solely after content material summarization. You’ll be able to see the Python code related to this node within the snippet beneath.

async def _summarize_and_trim(
state: State, config: RunnableConfig, *, retailer: BaseStore
) -> dict[str, list[AnyMessage]]:
"""Coroutine to summarize and trim message historical past."""
abstract = state.get("abstract", "")if abstract:
summary_message = SUMMARY_PROMPT_TEMPLATE.format(abstract=abstract)
else:
summary_message = SUMMARY_INITIAL_PROMPT
messages = (
[SystemMessage(content=SUMMARY_SYSTEM_PROMPT)] +
state["messages"] +
[HumanMessage(content=summary_message)]
)
mannequin = config["configurable"]["vlm_model"]
choices = config["configurable"]["options"]
model_with_config = mannequin.with_config(
config={
"mannequin": choices.get(
CONF_VLM,
RECOMMENDED_VLM,
),
"temperature": choices.get(
CONF_SUMMARIZATION_MODEL_TEMPERATURE,
RECOMMENDED_SUMMARIZATION_MODEL_TEMPERATURE,
),
"top_p": choices.get(
CONF_SUMMARIZATION_MODEL_TOP_P,
RECOMMENDED_SUMMARIZATION_MODEL_TOP_P,
),
"num_predict": VLM_NUM_PREDICT,
}
)
LOGGER.debug("Abstract messages: %s", messages)
response = await model_with_config.ainvoke(messages)
# Trim message historical past to handle context window size.
trimmed_messages = trim_messages(
messages=state["messages"],
token_counter=len,
max_tokens=CONTEXT_MAX_MESSAGES,
technique="final",
start_on="human",
include_system=True,
)
messages_to_remove = [m for m in state["messages"] if m not in trimmed_messages]
LOGGER.debug("Messages to take away: %s", messages_to_remove)
remove_messages = [RemoveMessage(id=m.id) for m in messages_to_remove]
return {"abstract": response.content material, "messages": remove_messages}

Latency

The latency between consumer requests or the agent taking well timed motion on the consumer’s behalf is essential so that you can contemplate within the design. I used a number of strategies to scale back latency, together with utilizing specialised, smaller helper LLMs operating on the sting and facilitating main mannequin immediate caching by structuring the prompts to place static content material, similar to directions and examples, upfront and variable content material, similar to user-specific info on the finish. These strategies additionally scale back main mannequin utilization prices significantly.

You’ll be able to see the standard latency efficiency beneath.

HA intents (e.g., activate a light-weight) | < 1 second
Analyze digital camera picture (preliminary request) | < 3 seconds
Add automation | < 1 second
Reminiscence operations | < 1 second

Instruments

The agent can use HA instruments as specified within the LLM API and different instruments constructed within the LangChain framework as outlined in instruments.py. Moreover, you’ll be able to prolong the LLM API with instruments of your individual as nicely. The code offers the first LLM the listing of instruments it will possibly name, together with directions on utilizing them in its system message and within the docstring of the instrument’s Python perform definition. You’ll be able to see an instance of docstring directions within the code snippet beneath for the get_and_analyze_camera_image instrument.

@instrument(parse_docstring=False)
async def get_and_analyze_camera_image( # noqa: D417
camera_name: str,
detection_keywords: listing[str] | None = None,
*,
# Conceal these arguments from the mannequin.
config: Annotated[RunnableConfig, InjectedToolArg()],
) -> str:
"""
Get a digital camera picture and carry out scene evaluation on it.Args:
camera_name: Identify of the digital camera for scene evaluation.
detection_keywords: Particular objects to search for in picture, if any.
For instance, If consumer says "verify the entrance porch digital camera for
packing containers and canines", detection_keywords could be ["boxes", "dogs"].
"""
hass = config["configurable"]["hass"]
vlm_model = config["configurable"]["vlm_model"]
choices = config["configurable"]["options"]
picture = await _get_camera_image(hass, camera_name)
return await _analyze_image(vlm_model, choices, picture, detection_keywords)

If the agent decides to make use of a instrument, the LangGraph node motion is entered, and the node’s code runs the instrument. The node makes use of a easy error restoration mechanism that may ask the agent to strive calling the instrument once more with corrected parameters within the occasion of creating a mistake. The code snippet beneath reveals the Python code related to the motion node.

async def _call_tools(
state: State, config: RunnableConfig, *, retailer: BaseStore
) -> dict[str, list[ToolMessage]]:
"""Coroutine to name House Assistant or langchain LLM instruments."""
# Instrument calls would be the final message in state.
tool_calls = state["messages"][-1].tool_callslangchain_tools = config["configurable"]["langchain_tools"]
ha_llm_api = config["configurable"]["ha_llm_api"]
tool_responses: listing[ToolMessage] = []
for tool_call in tool_calls:
tool_name = tool_call["name"]
tool_args = tool_call["args"]
LOGGER.debug(
"Instrument name: %s(%s)", tool_name, tool_args
)
def _handle_tool_error(err:str, title:str, tid:str) -> ToolMessage:
return ToolMessage(
content material=TOOL_CALL_ERROR_TEMPLATE.format(error=err),
title=title,
tool_call_id=tid,
standing="error",
)
# A langchain instrument was known as.
if tool_name in langchain_tools:
lc_tool = langchain_tools[tool_name.lower()]
# Present hidden args to instrument at runtime.
tool_call_copy = copy.deepcopy(tool_call)
tool_call_copy["args"].replace(
{
"retailer": retailer,
"config": config,
}
)
strive:
tool_response = await lc_tool.ainvoke(tool_call_copy)
besides (HomeAssistantError, ValidationError) as e:
tool_response = _handle_tool_error(repr(e), tool_name, tool_call["id"])
# A House Assistant instrument was known as.
else:
tool_input = llm.ToolInput(
tool_name=tool_name,
tool_args=tool_args,
)
strive:
response = await ha_llm_api.async_call_tool(tool_input)
tool_response = ToolMessage(
content material=json.dumps(response),
tool_call_id=tool_call["id"],
title=tool_name,
)
besides (HomeAssistantError, vol.Invalid) as e:
tool_response = _handle_tool_error(repr(e), tool_name, tool_call["id"])
LOGGER.debug("Instrument response: %s", tool_response)
tool_responses.append(tool_response)
return {"messages": tool_responses}

The LLM API instructs the agent all the time to name instruments utilizing HA built-in intents when controlling House Assistant and to make use of the intents `HassTurnOn` to lock and `HassTurnOff` to unlock a lock. An intent describes a consumer’s intention generated by consumer actions.

You’ll be able to see the listing of LangChain instruments that the agent can use beneath.

get_and_analyze_camera_image | run scene evaluation on the picture from a digital camera
upsert_memory | add or replace a reminiscence
add_automation | create and register a HA automation
get_entity_history | question HA database for entity historical past

{Hardware}

I constructed the HA set up on a Raspberry Pi 5 with SSD storage, Zigbee, and LAN connectivity. I deployed the sting fashions beneath Ollama on an Ubuntu-based server with an AMD 64-bit 3.4 GHz CPU, Nvidia 3090 GPU, and 64 GB system RAM. The server is on the identical LAN because the Raspberry Pi.

I’ve been utilizing this mission at house for a couple of weeks and have discovered it helpful however irritating in a couple of areas that I shall be engaged on to deal with. Under is a listing of execs and cons of my expertise with the agent.

Execs

The digital camera picture scene evaluation may be very helpful and versatile since you’ll be able to question for nearly something and never have to fret having the fitting classifier as you’d for a conventional ML method.
Automations are very simple to setup and might be fairly advanced. Its thoughts blowing how good the first LLM is at producing HA-compliant YAML.
Latency generally is sort of acceptable.
Its very simple so as to add further LLM instruments and graph states with LangChain and LangGraph.

Cons

The digital camera picture evaluation appears much less correct than conventional ML approaches. For instance, detecting packages which are partially obscured may be very troublesome for the mannequin to deal with.
The first mannequin clould prices are excessive. Operating a single package deal detector as soon as each 30 minutes prices about $2.50 per day.
Utilizing structured mannequin outputs for the helper LLMs, which might make downstream LLM processing simpler, significantly reduces accuracy.
The agent must be extra proactive. Including a planning step to the agent graph will hopefully tackle this.

Listed below are a couple of examples of what you are able to do with the home-generative-agent (HGA) integration as illustrated by screenshots of the Help dialog taken by me throughout interactions with my HA set up.

Create an automation that runs periodically.

The snippet beneath reveals that the agent is fluent in YAML primarily based on what it generated and registered as an HA automation.

alias: Verify Litter Field Waste Drawer
triggers:
- minutes: /30
set off: time_pattern
situations:
- situation: numeric_state
entity_id: sensor.litter_robot_4_waste_drawer
above: 90
actions:
- knowledge:
message: The Litter Field waste drawer is greater than 90% full!
motion: notify.notify

Verify a number of cameras (video by the creator).

https://github.com/user-attachments/assets/230baae5-8702-4375-a3f0-ffa981ee66a3

Summarize the house state (video by the creator).

https://github.com/user-attachments/assets/96f834a8-58cc-4bd9-a899-4604c1103a98

Lengthy-term reminiscence with semantic search.

Source link

Revisiting Benchmarking of Tabular Reinforcement Learning Methods

An Introduction to Remote Model Context Protocol Servers

How to Access NASA’s Climate Data — And How It’s Powering the Fight Against Climate Change Pt. 1

Revisiting Benchmarking of Tabular Reinforcement Learning Methods

I Tried Buying a Car Through Amazon: Here Are the Pros, Cons

Amazon and eBay to pay ‘fair share’ for e-waste recycling

Artificial Intelligence Concerns & Predictions For 2025

Barbara Corcoran: Entrepreneurs Must ‘Embrace Change’

Most Popular

Neural Networks, but Make It Make Sense: My Week 1 & 2 Recap of Andrew Ng’s Deep Learning Course | by Stefen Ewers | Jun, 2025

A Guide to IEEE Education Week’s Events: 6–12 April

Air Quality Prediction and Analysis using Machine Learning | by ProjectsXpert.Com | Apr, 2025

Our Picks

Revisiting Benchmarking of Tabular Reinforcement Learning Methods

Is Your AI Whispering Secrets? How Scientists Are Teaching Chatbots to Forget Dangerous Tricks | by Andreas Maier | Jul, 2025

Qantas data breach to impact 6 million airline customers

LangChain Meets Home Assistant: Unlock the Power of Generative AI in Your Smart Home | by Lindo St. Angel | Jan, 2025

Discover ways to create an agent that understands your private home’s context, learns your preferences, and interacts with you and your private home to perform actions you discover beneficial.

LangGraph-based Agent

LLM Context Administration

Latency

Instruments

{Hardware}

Execs

Cons

Related Posts