Skip to content
AI Engineering
AI Engineering10 min read0 views

Build a Chat Agent with LangChain.js + Ollama (Local, 2026)

LangChain v1 + LangGraph v1 in JS, paired with Ollama, gives you a fully local chat agent with tools, memory, and structured output. No OpenAI key required.

TL;DR — LangChain v1 (released February 2026) cleaned up the JS API. Combined with the @langchain/ollama package and a local Llama 3.1 8B, you get a tool-using chat agent in a Node.js process — no cloud, no API key, no per-token bill.

What you'll build

A Node.js CLI chat agent: REPL on the terminal, persistent in-memory thread, two tools (web search via Tavily-or-stub and a SQLite "notes" store), running on Ollama via LangGraph's prebuilt createReactAgent.

Prerequisites

  1. Node.js 22+, npm i langchain @langchain/core @langchain/ollama @langchain/langgraph zod better-sqlite3.
  2. Ollama running: ollama pull llama3.1:8b.
  3. (Optional) Tavily API key for web search.

Architecture

flowchart LR
  TTY[Terminal] --> AGT[createReactAgent]
  AGT --> LLM[ChatOllama llama3.1:8b]
  AGT --> T1[searchTool]
  AGT --> T2[saveNoteTool SQLite]
  AGT --> MEM[MemorySaver]

Step 1 — Set up the model

```js // model.js import { ChatOllama } from "@langchain/ollama";

export const llm = new ChatOllama({ model: "llama3.1:8b", baseUrl: "http://127.0.0.1:11434", temperature: 0.4, }); ```

@langchain/ollama 0.2+ supports tool calling natively for Llama 3.1.

Step 2 — Define tools

```js // tools.js import { tool } from "@langchain/core/tools"; import { z } from "zod"; import Database from "better-sqlite3"; const db = new Database("notes.db"); db.exec("CREATE TABLE IF NOT EXISTS notes (id INTEGER PRIMARY KEY, body TEXT)");

export const saveNote = tool(async ({ body }) => { const r = db.prepare("INSERT INTO notes (body) VALUES (?)").run(body); return Saved note #${r.lastInsertRowid}; }, { name: "save_note", description: "Save a short note to the local notes database.", schema: z.object({ body: z.string().describe("Note content") }), });

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

export const search = tool(async ({ q }) => { // Replace with Tavily / SerpAPI / fetch as you like return Stub search result for: ${q}; }, { name: "search", description: "Search the web for fresh information.", schema: z.object({ q: z.string() }), }); ```

Step 3 — Build the React-style agent with LangGraph

```js // agent.js import { createReactAgent } from "@langchain/langgraph/prebuilt"; import { MemorySaver } from "@langchain/langgraph"; import { llm } from "./model.js"; import { saveNote, search } from "./tools.js";

export const agent = createReactAgent({ llm, tools: [saveNote, search], checkpointSaver: new MemorySaver(), // per-thread memory prompt: "You are a concise assistant. Use tools when relevant. Keep replies under 3 sentences.", }); ```

createReactAgent from LangGraph 1.x is the recommended way to build tool-using agents in 2026 — it replaces the older AgentExecutor pattern.

Step 4 — REPL with persistent thread

```js // repl.js import readline from "node:readline/promises"; import { agent } from "./agent.js";

const rl = readline.createInterface({ input: process.stdin, output: process.stdout }); const threadId = "main";

while (true) { const userInput = await rl.question("you> "); if (!userInput) continue; if (userInput === "/exit") break; const out = await agent.invoke( { messages: [{ role: "user", content: userInput }] }, { configurable: { thread_id: threadId } }); const last = out.messages[out.messages.length - 1]; console.log("bot>", last.content); } rl.close(); ```

thread_id keeps memory separate per conversation; swap MemorySaver for PostgresSaver in production.

Step 5 — Stream tokens (better UX)

```js const stream = await agent.stream( { messages: [{ role: "user", content: userInput }] }, { configurable: { thread_id: threadId }, streamMode: "messages" });

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

for await (const [chunk] of stream) { if (chunk?.content) process.stdout.write(chunk.content); } console.log(); ```

streamMode: "messages" yields token-level events; updates yields per-node deltas.

Step 6 — Structured output

```js import { z } from "zod"; const Lead = z.object({ name: z.string(), email: z.string().email() }); const structured = llm.withStructuredOutput(Lead); const lead = await structured.invoke("My name is Sagar and email is [email protected]"); console.log(lead); // { name: 'Sagar', email: '[email protected]' } ```

Common pitfalls

  • Ollama tool support varies by model. Llama 3.1, Mistral, and Qwen 2.5 are reliable; smaller 1B models often hallucinate tool args.
  • baseUrl not localhost. If Ollama is in Docker, use the host gateway IP.
  • Memory growth. MemorySaver is in-process; long-running services need a real saver.

How CallSphere does this in production

CallSphere's chat agents share a memory architecture with LangGraph's checkpointers but back it with Postgres + Redis. 37 specialists across 6 verticals, 90+ tools, 115+ DB tables. Healthcare runs 14 HIPAA tools on FastAPI :8084; OneRoof's 10 specialists handle property workflows. Pricing flat $149 / $499 / $1499. 14-day trial · 22% affiliate · /demo.

FAQ

LangChain.js vs Python? Same APIs; choose by team language.

Best Ollama model for tools? llama3.1:8b-instruct-q4_K_M for speed, qwen2.5:14b-instruct for quality.

Production memory store? @langchain/langgraph-checkpoint-postgres.

Streaming + tools? Yes — tool events come through the stream too.

Multi-agent? LangGraph supports supervisor, swarm, and hierarchical patterns.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.