Build a Chat Agent with LangChain.js + Ollama (Local, 2026)
LangChain v1 + LangGraph v1 in JS, paired with Ollama, gives you a fully local chat agent with tools, memory, and structured output. No OpenAI key required.
TL;DR — LangChain v1 (released February 2026) cleaned up the JS API. Combined with the
@langchain/ollamapackage and a local Llama 3.1 8B, you get a tool-using chat agent in a Node.js process — no cloud, no API key, no per-token bill.
What you'll build
A Node.js CLI chat agent: REPL on the terminal, persistent in-memory thread, two tools (web search via Tavily-or-stub and a SQLite "notes" store), running on Ollama via LangGraph's prebuilt createReactAgent.
Prerequisites
- Node.js 22+,
npm i langchain @langchain/core @langchain/ollama @langchain/langgraph zod better-sqlite3. - Ollama running:
ollama pull llama3.1:8b. - (Optional) Tavily API key for web search.
Architecture
flowchart LR
TTY[Terminal] --> AGT[createReactAgent]
AGT --> LLM[ChatOllama llama3.1:8b]
AGT --> T1[searchTool]
AGT --> T2[saveNoteTool SQLite]
AGT --> MEM[MemorySaver]
Step 1 — Set up the model
```js // model.js import { ChatOllama } from "@langchain/ollama";
export const llm = new ChatOllama({ model: "llama3.1:8b", baseUrl: "http://127.0.0.1:11434", temperature: 0.4, }); ```
@langchain/ollama 0.2+ supports tool calling natively for Llama 3.1.
Step 2 — Define tools
```js // tools.js import { tool } from "@langchain/core/tools"; import { z } from "zod"; import Database from "better-sqlite3"; const db = new Database("notes.db"); db.exec("CREATE TABLE IF NOT EXISTS notes (id INTEGER PRIMARY KEY, body TEXT)");
export const saveNote = tool(async ({ body }) => {
const r = db.prepare("INSERT INTO notes (body) VALUES (?)").run(body);
return Saved note #${r.lastInsertRowid};
}, {
name: "save_note",
description: "Save a short note to the local notes database.",
schema: z.object({ body: z.string().describe("Note content") }),
});
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
export const search = tool(async ({ q }) => {
// Replace with Tavily / SerpAPI / fetch as you like
return Stub search result for: ${q};
}, {
name: "search",
description: "Search the web for fresh information.",
schema: z.object({ q: z.string() }),
});
```
Step 3 — Build the React-style agent with LangGraph
```js // agent.js import { createReactAgent } from "@langchain/langgraph/prebuilt"; import { MemorySaver } from "@langchain/langgraph"; import { llm } from "./model.js"; import { saveNote, search } from "./tools.js";
export const agent = createReactAgent({ llm, tools: [saveNote, search], checkpointSaver: new MemorySaver(), // per-thread memory prompt: "You are a concise assistant. Use tools when relevant. Keep replies under 3 sentences.", }); ```
createReactAgent from LangGraph 1.x is the recommended way to build tool-using agents in 2026 — it replaces the older AgentExecutor pattern.
Step 4 — REPL with persistent thread
```js // repl.js import readline from "node:readline/promises"; import { agent } from "./agent.js";
const rl = readline.createInterface({ input: process.stdin, output: process.stdout }); const threadId = "main";
while (true) { const userInput = await rl.question("you> "); if (!userInput) continue; if (userInput === "/exit") break; const out = await agent.invoke( { messages: [{ role: "user", content: userInput }] }, { configurable: { thread_id: threadId } }); const last = out.messages[out.messages.length - 1]; console.log("bot>", last.content); } rl.close(); ```
thread_id keeps memory separate per conversation; swap MemorySaver for PostgresSaver in production.
Step 5 — Stream tokens (better UX)
```js const stream = await agent.stream( { messages: [{ role: "user", content: userInput }] }, { configurable: { thread_id: threadId }, streamMode: "messages" });
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
for await (const [chunk] of stream) { if (chunk?.content) process.stdout.write(chunk.content); } console.log(); ```
streamMode: "messages" yields token-level events; updates yields per-node deltas.
Step 6 — Structured output
```js import { z } from "zod"; const Lead = z.object({ name: z.string(), email: z.string().email() }); const structured = llm.withStructuredOutput(Lead); const lead = await structured.invoke("My name is Sagar and email is [email protected]"); console.log(lead); // { name: 'Sagar', email: '[email protected]' } ```
Common pitfalls
- Ollama tool support varies by model. Llama 3.1, Mistral, and Qwen 2.5 are reliable; smaller 1B models often hallucinate tool args.
baseUrlnot localhost. If Ollama is in Docker, use the host gateway IP.- Memory growth.
MemorySaveris in-process; long-running services need a real saver.
How CallSphere does this in production
CallSphere's chat agents share a memory architecture with LangGraph's checkpointers but back it with Postgres + Redis. 37 specialists across 6 verticals, 90+ tools, 115+ DB tables. Healthcare runs 14 HIPAA tools on FastAPI :8084; OneRoof's 10 specialists handle property workflows. Pricing flat $149 / $499 / $1499. 14-day trial · 22% affiliate · /demo.
FAQ
LangChain.js vs Python? Same APIs; choose by team language.
Best Ollama model for tools? llama3.1:8b-instruct-q4_K_M for speed, qwen2.5:14b-instruct for quality.
Production memory store? @langchain/langgraph-checkpoint-postgres.
Streaming + tools? Yes — tool events come through the stream too.
Multi-agent? LangGraph supports supervisor, swarm, and hierarchical patterns.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.