Build Interruptible Agents with LangGraph

Javith Abbas
Apr 6
6 min read

Most AI demos follow the same pattern.

The model decides it needs a tool, the backend runs it immediately, and the result goes straight back to the model. Clean. Simple. Also a little too trusting.

That works fine for prototypes. It breaks down fast in production.

In real systems, you often do not want an LLM to trigger actions directly. Sometimes a tool call should pause execution instead of running right away. Maybe a human needs to approve it. Maybe another service needs to handle it. Maybe the work is long-running or asynchronous.

That is where LangGraph interrupts become useful.

Instead of letting the agent execute a tool call automatically, you can turn that call into a structured request that your system can inspect, route, approve, or execute later.

The flow looks like this:

- the agent asks for a tool

- the graph pauses

- an external system handles the request

- the graph resumes with the result

That small change turns a basic demo into something much closer to how production AI systems actually behave.

The Problem with Blind Tool Calls

Most tool-calling agents treat tools like synchronous functions.

From the model’s point of view, the flow is simple:

1. The model requests a tool

2. The tool runs immediately

3. The result goes back to the model

That is convenient, but it removes control from the rest of your system.

A few examples where that gets risky:

- approving a purchase

- mutating a database

- triggering a deployment

- calling a regulated API

- starting a long-running job

In those cases, you usually want the model to propose an action, not execute it directly.

What you really want is:

1. The model proposes an action

2. The system inspects the request

3. A human, policy engine, or worker decides what happens

4. The agent continues with the result

LangGraph interrupts make that pattern straightforward.

The tool stops being the place where the action happens. It becomes a request generator.

That was the key shift for me: the tool does not have to do the work itself.

Interrupts: The Core Pattern

The core mechanism is `interrupt(...)`.

Instead of returning a result immediately, a tool can raise an interrupt with structured data. LangGraph pauses execution at that point. Later, when the workflow resumes, the value passed back becomes the return value of the interrupt.

In other words: the tool behaves like it returned a result, even though the actual work happened somewhere else.

Here is the minimal pattern:


from langchain.tools import ToolRuntime
from langchain_core.tools import tool
from langgraph.types import interrupt

@tool
def get_weather(location: str, runtime: ToolRuntime | None = None) -> str:    """Pause and let an external system provide the tool result."""    resume_value = interrupt({
        "type": "tool-call",
        "tool_call_id": runtime.tool_call_id if runtime else "unknown",     			"tool_call_name": "get_weather",
        "tool_call_args": {"location": location},    })
    return resume_value["content"]

A couple of details matter here.

First, the tool accepts `ToolRuntime`. That gives you metadata about the active tool call, including `tool_call_id`.

That ID is important because it lets you correlate the model’s request with the eventual response.

Second, the interrupt payload is structured. It includes:

- the tool name

- the arguments

- the call ID

Treat that payload like an API contract. Whatever handles the request later will depend on it.

When the interrupt fires, the graph pauses and returns control to your application.

At that point, the tool has not run yet. It has only requested that the tool be run.

Wiring the Model with Azure OpenAI

Once the tool can pause execution, the next step is giving the model a reason to call it.

For this example, I used a LangChain agent backed by Azure OpenAI.

The setup is simple, but there is one Azure-specific detail that people often miss.

Here is the minimal configuration:

import os
from langchain.agents import create_agent
from langchain_openai import ChatOpenAI
model = ChatOpenAI(
    model=os.environ["AZURE_OPENAI_DEPLOYMENT"],
    base_url=f"{os.environ['AZURE_OPENAI_ENDPOINT'].rstrip('/')}/openai/v1/",    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    temperature=0,)

agent = create_agent(
		model=model,
	    tools=[get_weather],
		)

The important part is `model`.

With Azure OpenAI, this should not be the model family name, like `gpt-4o`.

It should be the deployment name you created in Azure.

Azure routes requests through deployments, so the deployment name is what LangChain uses to reach the model.

Once the agent is set up, the model can decide when to call the weather tool. The difference comes after the call is requested.

Checkpointing with LangGraph

Interrupts only matter if you can resume the run later.

That is what LangGraph handles.

LangGraph orchestrates the agent, keeps track of state, and checkpoints execution so a paused run can continue later.

Here is the minimal setup:

from typing import Annotated
from typing_extensions import TypedDict
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.graph import StateGraph, START, END, add_messages

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]

graph = StateGraph(AgentState)
graph.add_node("agent", agent)
graph.add_edge(START, "agent")
graph.add_edge("agent", END)

app = graph.compile(checkpointer=InMemorySaver())

A few pieces are worth calling out.

State definition

The graph state is a structured object:

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]

For a simple conversational agent, message history is enough.

Graph structure

The flow is intentionally small:

START → agent → END

The agent node runs the model and handles tool calls. If a tool raises an interrupt, the graph pauses there.

Checkpointing

checkpointer=InMemorySaver()

This stores graph state so execution can resume later.

`InMemorySaver()` is fine for a demo. For anything real, use a persistent checkpointer. Otherwise a restart wipes out paused runs.

Running, Pausing, and Inspecting the Interrupt

With the graph compiled, you can invoke it.

LangGraph uses a thread ID to tie execution state to a session. That is how it knows which paused run you want to resume later.

Here is a simple invocation:

from langchain_core.messages import HumanMessage

config = {"configurable": {"thread_id": "demo-thread-123"}}
app.invoke(
    {"messages": [HumanMessage(content="What's the weather in Tokyo?")]},   config=config,
)

Now the agent runs normally.

If the model decides to call `get_weather`, the interrupt inside that tool fires and the graph pauses.

You can inspect the pending interrupt like this:

state = app.get_state(config)
pending = state.interrupts[0]
print(pending.value)

The output will look something like this:

{
    "type": "tool-call",
    "tool_call_id": "...",
    "tool_call_name": "get_weather",
    "tool_call_args": {"location": "Tokyo"}
}

This is the handoff point.

Your application now has a structured tool request that can be:

- logged

- approved

- routed to another service

- executed asynchronously

The graph is paused and waiting.

Resume with an external tool result

Once an outside system has decided the result, you resume the graph with Command(resume=...).

from langgraph.types import Command

tool_output = {
    "type": "tool-call",
    "tool_call_id": pending.value["tool_call_id"],
    "content": "Current weather in Tokyo: 18 C\nNext 5 days: Sunny",
}

app.invoke(
   Command(resume={pending.id: tool_output}),
   config=config,
)

When the graph resumes, the interrupt returns the payload you passed in.

Inside the tool, this line:

resume_value = interrupt(...)

now evaluates to:

{
    "type": "tool-call",
    "tool_call_id": "...",
    "content": "Current weather in Tokyo: 18 C\nNext 5 days: Sunny"
}

So this return statement:

return resume_value["content"]

works exactly as if the tool had produced the result itself.

That is the important part: from the agent’s perspective, the tool returned normally. Under the hood, execution paused, the request moved through another system, and the run resumed later.

That is the architecture shift.

Why This Pattern Works in Production

The standard tool-calling demo is good at showing what an LLM can do.

It hides what production systems need: control.

Interrupt-driven agents give you that control in a few ways.

You can inspect every requested action.

Each tool call becomes a structured request that can be logged, audited, or reviewed before execution.

You get a natural approval point.

Human-in-the-loop workflows become simple because the graph can pause until someone approves the action.

You separate intent from execution.

The model decides what should happen. Your infrastructure decides whether and how it actually happens.

You get durable workflows.

With a persistent checkpointer, the agent can pause for minutes or hours and resume where it left off.

You can integrate external systems cleanly.

This works well with:

- worker queues

- compliance systems

- approval dashboards

- event-driven pipelines

That is a much better fit for enterprise AI than letting the model fire off actions on its own.

Takeaway

The biggest shift in production agent design is this: tool calls should not always execute immediately.

Sometimes the right move is to pause.

LangGraph interrupts make that easy. By raising an interrupt inside a tool, you turn a direct function call into a structured request that the rest of your system can handle.

TechThiran