Claude Cowork walkthrough: from problem to shipped
A realistic end-to-end Claude Cowork use case: a quarterly vendor-spend review from vague ask to shipped deliverable, with every agentic step shown.
Most explanations of agentic AI stop at the demo — a clean prompt, a tidy answer, applause. Real work is messier. The data is in three systems, the ask is ambiguous, half the value is in catching the thing nobody asked about. To show what Claude Cowork actually does on real knowledge work, this post walks a single task end to end: a quarterly vendor-spend review for a mid-sized operations team. We start where the work really starts — a one-line request from a manager — and follow it to a reviewed, shipped deliverable, naming every decision and guardrail along the way.
The starting point: a vague, real ask
The request lands as a message: "Can you pull together our vendor spend for the quarter and flag anything weird before the budget meeting Thursday?" That is how real work arrives — underspecified, with an implicit definition of done buried in "anything weird." A junior analyst would spend a day pulling data and a half-day formatting. The first job with Claude Cowork is not to run it immediately; it is to turn that sentence into a checkable spec.
So the analyst writes a short brief: pull spend from the accounting system and the procurement tool for the last quarter; compare each vendor against the prior quarter and the same quarter last year; flag any vendor up more than 25 percent, any new vendor over a threshold, and any duplicate-looking line items; produce a two-page summary plus a backing spreadsheet. That spec is the actual skilled work. It names the canonical sources, the comparison baselines, and the definition of "weird." Everything downstream depends on it.
Wiring up context and connectors
Claude Cowork reaches external systems through connectors built on the Model Context Protocol — the open standard that lets Claude call external tools and pull structured data. For this task the analyst attaches a read-only connector to the accounting system and another to the procurement tool, plus the team's "vendor review" Agent Skill, which encodes how this company formats the summary and which categories matter. Read-only is deliberate: this workflow never needs to write anything back, so it is never granted the ability to.
With context attached, the agent has what it needs to stop guessing. It knows which system is canonical for spend (accounting, not procurement, when they disagree), it knows the company's fiscal calendar, and it knows the house style for the deliverable. This is the difference between a generic answer and one that looks like your team produced it. The skill is doing real work here: without it, the agent would invent a reasonable-but-wrong format every run.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart TD
A["Manager's one-line ask"] --> B["Analyst writes checkable spec"]
B --> C["Attach read-only connectors & skill"]
C --> D["Claude Cowork pulls & reconciles spend"]
D --> E["Sub-agent computes deltas & flags anomalies"]
E --> F{"Anomalies need judgment?"}
F -->|Yes| G["Surface to analyst for review"]
F -->|No| H["Draft 2-page summary & spreadsheet"]
G --> H
H --> I["Analyst verifies & ships to manager"]
The agentic run: decomposition in action
When the task runs, Claude Cowork does not treat it as one monolithic prompt. It decomposes: first pull and normalize the data from both sources, reconciling vendor names that are spelled differently across systems; then compute the quarter-over-quarter and year-over-year deltas; then apply the flagging rules; then assemble the narrative. Where the work is parallelizable, sub-agents handle independent slices — one reconciling the data, another scanning for duplicate line items — and the orchestrator stitches the results together.
The interesting moments are the anomalies that need judgment. The agent flags a vendor whose spend jumped 40 percent — but it also notices the jump is a single annual software renewal, not runaway spending, and says so in the draft rather than ringing a false alarm. It flags two line items that look like duplicate payments and, crucially, marks them as needs human confirmation rather than asserting a double-payment occurred. This is the right behavior: surface the signal, defer the consequential judgment to a person.
Verification: where the human earns their keep
The agent produces a draft summary and a backing spreadsheet in minutes. The analyst's job now is not to admire it but to verify it. They spot-check the three largest flagged vendors against the source systems directly, confirm the reconciliation merged the right name variants, and resolve the two "needs confirmation" duplicates — one was a genuine duplicate worth catching, the other a legitimate split invoice. This verification step is non-negotiable; shipping unverified agentic output is how teams get burned by a confident wrong number in front of leadership.
The analyst also catches something the spec did not ask for: a vendor that should have been consolidated under a parent account is showing up twice, inflating the apparent vendor count. They add a line to the summary about it. This is the human-and-agent division of labor at its best — the agent did the exhaustive mechanical pass that no human would do thoroughly under time pressure, and the human supplied the contextual judgment the agent could not have.
Shipping and capturing the work
The deliverable ships Thursday morning: a tight two-page summary with the flagged anomalies, each annotated as confirmed or contextual, plus the spreadsheet for anyone who wants to dig. What took an analyst a day and a half now takes a couple of focused hours, most of it verification rather than mechanical assembly. But the real compounding benefit comes from the last step: the analyst updates the "vendor review" skill with the two refinements this run surfaced — the parent-account consolidation check and a better duplicate-detection rule.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Next quarter, the agent runs the improved process automatically. This is the flywheel that makes agentic knowledge work pay off over time: each run is an opportunity to encode a little more of the team's judgment into a reusable skill, so the agent gets steadily better at your work, not just work in general. The deliverable is the visible output; the upgraded skill is the durable asset.
Frequently asked questions
How long does a task like this actually take?
The agentic run itself is minutes. The human time is mostly the upfront spec and the downstream verification, which together might be a couple of hours versus a day and a half of fully manual work. The savings come from removing mechanical assembly, not from skipping the thinking.
What stops the agent from acting on a wrong conclusion?
Two things: read-only connectors mean it cannot write back to any system, and the workflow surfaces consequential judgments — like a suspected duplicate payment — as items needing human confirmation rather than acting on them. The agent flags; the human decides.
Why bother writing a detailed spec instead of just asking?
Because "flag anything weird" is unspecified, and an underspecified ask produces a plausible-but-wrong answer. The spec names canonical sources, comparison baselines, and the definition of done, which is exactly the context the agent cannot infer on its own.
How does the work compound over time?
By capturing each run's refinements back into the Agent Skill. Every quarter the team encodes a little more of its judgment — new checks, better rules — so the agent gets progressively better at that specific task rather than staying generic.
Bringing agentic AI to your phone lines
The same problem-to-shipped arc plays out in real time on a phone call. CallSphere brings these agentic-AI patterns to voice and chat — assistants that gather context, use tools mid-conversation, surface what needs a human, and complete the booking. See a live walkthrough at callsphere.ai.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.