Hackathon hiring shifts: skills for agentic Claude Code
A Built-with-Opus Claude Code hackathon revealed which skills, roles, and hiring shifts make agentic AI work. Here's what engineering teams should learn.
At a recent Built-with-Opus hackathon, the most interesting data point was not which team shipped the slickest demo. It was watching who got unblocked fastest. The engineers who moved quickest with Claude Code were rarely the ones with the deepest framework knowledge. They were the ones who could write a crisp spec, decompose a problem into verifiable steps, and read an agent's transcript the way a senior reviewer reads a pull request. That gap — between traditional coding fluency and agentic fluency — is the clearest signal of how hiring and skill development are about to change.
This post is a field report on the human side of agentic AI. Not the model, not the SDK, but the skills people actually need to learn, the roles that emerge, and how engineering leaders should rethink hiring when a single developer can now orchestrate several Opus 4.8 subagents at once.
Why agentic work rewards different skills than coding
Writing code by hand rewards local mastery: you know the language, the standard library, the gotchas of your runtime. Agentic work rewards something closer to engineering management. When Claude Code is doing the typing, your leverage comes from how well you frame the task, how clearly you define done, and how quickly you can spot when the agent has drifted off course. Three teams at the hackathon built nearly identical ideas; the winner was simply better at specifying.
We saw the same person who struggled to write a regex from memory absolutely fly when the work became directing rather than typing. They described the desired output, gave two examples and one counter-example, told the agent which files were off-limits, and asked it to write tests first. That is not a coding skill in the classic sense. It is a specification-and-review skill, and it is teachable.
The five capabilities that separated fast teams
Across roughly two dozen teams, the same five capabilities kept correlating with shipped, working software rather than impressive-but-broken demos.
flowchart TD
A["Raw problem"] --> B["Specification skill: define done & constraints"]
B --> C["Decomposition: split into verifiable steps"]
C --> D["Orchestration: assign subagents & tools"]
D --> E["Transcript review: read agent reasoning"]
E --> F{"Output correct?"}
F -->|No| B
F -->|Yes| G["Shipped & verified outcome"]First, specification: stating the goal, the constraints, and the acceptance criteria precisely enough that a capable but literal collaborator can act on them. Second, decomposition: breaking a fuzzy problem into steps each of which has a checkable result, so failures localize. Third, orchestration: deciding when to run a single agent versus spawning parallel subagents, and which tools or MCP servers each needs. Fourth, transcript review: reading the agent's chain of actions to catch a wrong assumption three steps before it becomes a wrong file. Fifth, evaluation thinking: defining how you will know the result is correct before you start.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Notice how few of these are about syntax. They are about judgment, communication, and verification — the same skills that distinguish a strong tech lead from a fast typist. That is the heart of the hiring shift.
What this means for how you hire
If the highest-leverage skill is directing and verifying agents, your interview loop should test for it. A whiteboard syntax quiz tells you little about how someone will perform when Claude Code writes the syntax. A better signal: hand the candidate a vague feature request and a terminal with an agent, and watch how they narrow it. Do they ask clarifying questions? Do they define acceptance tests up front? Do they catch the agent confidently doing the wrong thing?
Agentic fluency is the skill of getting reliable, verifiable results from an AI agent by specifying tasks precisely, decomposing them into checkable steps, and reviewing the agent's work as rigorously as you would a colleague's. It is the load-bearing competency of agentic engineering. We watched it matter more than years of experience: a second-year engineer with strong review instincts out-shipped several veterans who treated the agent like an autocomplete and never read its reasoning.
This does not mean deep technical knowledge becomes worthless — far from it. You still need people who can tell whether the agent's database migration is safe, whether its concurrency model is sound, whether its security assumptions hold. The shift is in where that knowledge gets applied: from production to review, from writing the first draft to judging it.
New roles that emerge on the team
Two roles showed up organically at the hackathon even though no one assigned them. The first was an agent harness owner — the person who set up shared skills, hooks, and MCP connections so the whole team's agents had the same tools and guardrails. The second was an eval owner — the person responsible for the test suite and acceptance criteria that every agent output had to pass before it counted.
In a permanent team, these become real responsibilities. The harness owner curates the Agent Skills library, maintains the MCP server connections, and tunes the hooks that enforce conventions. The eval owner builds the offline test sets and the gating checks. Neither role existed five years ago; both are now central to whether agentic work stays reliable as it scales. Smart teams will start growing these specialists internally rather than waiting to hire them.
How to build these skills in your existing team
You almost certainly do not need to fire and rehire. The capabilities above are learnable, and the fastest way to teach them is deliberate practice with feedback. Pair a strong reviewer with someone newer and have them work a real task through Claude Code together, narrating every spec decision and every transcript catch. Run internal mini-hackathons where the rule is that humans cannot type code — only direct and review. The constraint forces the right muscles to develop.
Make transcript review a normal team ritual. Just as code review spread good practices person to person, agent-transcript review spreads good specification and good skepticism. When a teammate shows how they noticed the agent had silently assumed the wrong API version, everyone watching gets a little better at reading reasoning. Within a week of doing this at the hackathon, teams' second-day output was visibly cleaner than their first-day output.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Pitfalls to avoid in the transition
The biggest failure mode we saw was treating the agent as either an oracle or a toy. Teams that trusted every output blindly shipped confident bugs. Teams that distrusted it so much they re-typed everything got no leverage at all. The skill is calibrated trust: lean on the agent for breadth and speed, verify the parts that matter, and know which parts matter.
A second pitfall is neglecting the foundations. Agentic fluency multiplies whatever judgment you already have; if your team cannot tell a safe migration from a dangerous one, the agent will let them ship the dangerous one faster. Keep investing in real systems knowledge. The agent is a force multiplier, and a multiplier applied to weak fundamentals just produces bigger mistakes.
Frequently asked questions
Do I need to hire prompt engineers for agentic work?
Usually not as a separate title. The valuable skill is full-task specification and verification, which lives best inside engineers who understand the system. A standalone prompt specialist with no domain context tends to produce prompts that read well but fail acceptance tests. Grow the capability inside your existing engineers instead.
Will agentic tools make junior engineers unnecessary?
No, but the junior path changes. Juniors who learn to specify, decompose, and review early become productive faster than ever, because the agent handles the boilerplate while they build judgment. The risk is juniors who never learn to read what the agent does — they plateau. Mentorship around transcript review matters more than ever.
What is the single best interview signal for agentic ability?
Give a vague task and watch whether the candidate defines acceptance criteria before doing anything else. The strongest agentic engineers reflexively answer the question "how will we know this is correct?" up front. That instinct predicts almost everything else about how they will work with Claude Code.
How long does it take to retrain a team?
The basics of directing and reviewing an agent come within days of deliberate practice. The deeper calibrated trust — knowing exactly which outputs to scrutinize — takes weeks of real work. A few structured internal hackathons compress that timeline dramatically.
Bringing agentic AI to your phone lines
CallSphere puts these same agentic-AI skills to work on voice and chat — assistants that specify, act, and verify their way through every customer conversation, calling tools mid-call and booking work around the clock. See it live at callsphere.ai.
Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.