By Sagar Shankaran, Founder of CallSphere
LangChain v1 + LangGraph v1 in JS, paired with Ollama, gives you a fully local chat agent with tools, memory, and structured output. No OpenAI key required.
Key takeaways
TL;DR — LangChain v1 (released February 2026) cleaned up the JS API. Combined with the
@langchain/ollamapackage and a local Llama 3.1 8B, you get a tool-using chat agent in a Node.js process — no cloud, no API key, no per-token bill.
A Node.js CLI chat agent: REPL on the terminal, persistent in-memory thread, two tools (web search via Tavily-or-stub and a SQLite "notes" store), running on Ollama via LangGraph's prebuilt createReactAgent.
npm i langchain @langchain/core @langchain/ollama @langchain/langgraph zod better-sqlite3.ollama pull llama3.1:8b.flowchart LR
TTY[Terminal] --> AGT[createReactAgent]
AGT --> LLM[ChatOllama llama3.1:8b]
AGT --> T1[searchTool]
AGT --> T2[saveNoteTool SQLite]
AGT --> MEM[MemorySaver]
```js // model.js import { ChatOllama } from "@langchain/ollama";
export const llm = new ChatOllama({ model: "llama3.1:8b", baseUrl: "http://127.0.0.1:11434", temperature: 0.4, }); ```
@langchain/ollama 0.2+ supports tool calling natively for Llama 3.1.
```js // tools.js import { tool } from "@langchain/core/tools"; import { z } from "zod"; import Database from "better-sqlite3"; const db = new Database("notes.db"); db.exec("CREATE TABLE IF NOT EXISTS notes (id INTEGER PRIMARY KEY, body TEXT)");
export const saveNote = tool(async ({ body }) => {
const r = db.prepare("INSERT INTO notes (body) VALUES (?)").run(body);
return Saved note #${r.lastInsertRowid};
}, {
name: "save_note",
description: "Save a short note to the local notes database.",
schema: z.object({ body: z.string().describe("Note content") }),
});
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
export const search = tool(async ({ q }) => {
// Replace with Tavily / SerpAPI / fetch as you like
return Stub search result for: ${q};
}, {
name: "search",
description: "Search the web for fresh information.",
schema: z.object({ q: z.string() }),
});
```
```js // agent.js import { createReactAgent } from "@langchain/langgraph/prebuilt"; import { MemorySaver } from "@langchain/langgraph"; import { llm } from "./model.js"; import { saveNote, search } from "./tools.js";
export const agent = createReactAgent({ llm, tools: [saveNote, search], checkpointSaver: new MemorySaver(), // per-thread memory prompt: "You are a concise assistant. Use tools when relevant. Keep replies under 3 sentences.", }); ```
createReactAgent from LangGraph 1.x is the recommended way to build tool-using agents in 2026 — it replaces the older AgentExecutor pattern.
```js // repl.js import readline from "node:readline/promises"; import { agent } from "./agent.js";
const rl = readline.createInterface({ input: process.stdin, output: process.stdout }); const threadId = "main";
while (true) { const userInput = await rl.question("you> "); if (!userInput) continue; if (userInput === "/exit") break; const out = await agent.invoke( { messages: [{ role: "user", content: userInput }] }, { configurable: { thread_id: threadId } }); const last = out.messages[out.messages.length - 1]; console.log("bot>", last.content); } rl.close(); ```
thread_id keeps memory separate per conversation; swap MemorySaver for PostgresSaver in production.
```js const stream = await agent.stream( { messages: [{ role: "user", content: userInput }] }, { configurable: { thread_id: threadId }, streamMode: "messages" });
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
for await (const [chunk] of stream) { if (chunk?.content) process.stdout.write(chunk.content); } console.log(); ```
streamMode: "messages" yields token-level events; updates yields per-node deltas.
```js import { z } from "zod"; const Lead = z.object({ name: z.string(), email: z.string().email() }); const structured = llm.withStructuredOutput(Lead); const lead = await structured.invoke("My name is Sagar and email is sagar@callsphere.ai"); console.log(lead); // { name: 'Sagar', email: 'sagar@callsphere.ai' } ```
baseUrl not localhost. If Ollama is in Docker, use the host gateway IP.MemorySaver is in-process; long-running services need a real saver.CallSphere's chat agents share a memory architecture with LangGraph's checkpointers but back it with Postgres + Redis. 37 specialists across 6 verticals, 90+ tools, 115+ DB tables. Healthcare runs 14 HIPAA tools on FastAPI :8084; OneRoof's 10 specialists handle property workflows. Pricing flat $149 / $499 / $1499. 14-day trial · 22% affiliate · /demo.
LangChain.js vs Python? Same APIs; choose by team language.
Best Ollama model for tools? llama3.1:8b-instruct-q4_K_M for speed, qwen2.5:14b-instruct for quality.
Production memory store? @langchain/langgraph-checkpoint-postgres.
Streaming + tools? Yes — tool events come through the stream too.
Multi-agent? LangGraph supports supervisor, swarm, and hierarchical patterns.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
HVAC companies miss 40–60% of inbound. Build a 4-agent dispatch (intake, scheduling, parts, emergency) that integrates with ServiceTitan in 600 lines.
Ollama matured significantly through 2025-26 and added serious features. The honest take on whether it belongs in production for agent workloads, and where the limits sit.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI