Skip to content
AI Engineering
AI Engineering10 min0 views

WebRTC + AI Interview Transcription with Calendly + Zoom: A 2026 Pipeline

Calendly schedules, Zoom hosts, WebRTC fans out, AI transcribes. Here is the 2026 reference architecture for interview transcription with speaker diarization, sentiment, and CRM hand-off.

The most common interview stack in 2026 is Calendly to schedule, Zoom to host, and an AI notetaker (Otter, Fireflies, Read, Otter Meeting Agent) to transcribe and summarize. Behind the scenes, every one of those notetakers is a WebRTC bot joining the meeting and pulling raw audio. Here is how to build it yourself when the SaaS pricing hits the wall.

Why this matters

Interview-heavy teams — sales, recruiting, research, podcast production — burn $30 to $50 per seat per month on AI notetakers. Once you cross 200 seats, building your own WebRTC bot to join Zoom (or Meet, or Teams) and pipe audio to a transcription service pays back in three months. Otter.ai, Fireflies, Read, and Tactiq are all variations on the same architecture: a headless Chromium with WebRTC, a Calendly/calendar listener, and an AI summarization layer.

For CallSphere, this matters because every voice agent demo, every prospect intake call, and every research interview goes through a similar pipeline. Owning the bot means owning the transcripts, the diarization quality, and the CRM hand-off — all of which are differentiators in a B2B sales motion.

Architecture

```mermaid flowchart LR Calendly[Calendly Webhook] --> Sched[Scheduler Service] Sched --> Bot[Headless Chromium Bot] Bot -- WebRTC join --> Zoom[Zoom Meeting SDK] Bot -- raw audio --> Gateway[Pion Go gateway 1.23] Gateway -- NATS --> ASR[Whisper Diarized] ASR -- transcript --> CRM[(115+ table CRM)] ASR --> Summary[GPT-5 Summary] Summary --> Slack[Slack / Email] ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

CallSphere implementation

CallSphere uses this same pattern for two business-critical workflows:

  • Real Estate (OneRoof) showings — Buyers schedule property tours via Calendly; the bot dials in to the Zoom (or native WebRTC) call, transcribes both sides, and writes structured intent ("3-bed, $400k, 90-day timeline") to one of the 115+ tables. Agents see the digest in their CRM before the buyer hangs up. See /industries/real-estate.
  • /demo recorded sessions — Every prospect demo runs through the same WebRTC pipeline; transcripts feed back into the GTM CRM for follow-up. Try it at /demo.
  • 6-container pod — CRM, MLS, calendar, SMS, audit, and transcript are the same six containers across all 6 verticals; only the agent personality changes.

Pricing remains $149/$499/$1499 with the 14-day /trial; 22% affiliate at /affiliate.

Build steps with code

```typescript // 1. Calendly webhook → schedule the bot import express from "express"; const app = express(); app.post("/webhooks/calendly", async (req, res) => { const { event } = req.body; if (event === "invitee.created") { const meetingUrl = req.body.payload.location.join_url; const startTime = req.body.payload.scheduled_event.start_time; await scheduleBot({ meetingUrl, startTime }); } res.sendStatus(200); });

// 2. Spin a headless Chromium that joins Zoom 60s before start import puppeteer from "puppeteer"; async function joinMeeting(url: string) { const browser = await puppeteer.launch({ args: ["--use-fake-ui-for-media-stream", "--use-fake-device-for-media-stream"], }); const page = await browser.newPage(); await page.goto(url); // Inject WebRTC tap that pipes remote audio to our Pion gateway await page.evaluate(() => { const orig = RTCPeerConnection.prototype.addTrack; RTCPeerConnection.prototype.addTrack = function (track, ...rest) { if (track.kind === "audio") forwardToGateway(track); return orig.apply(this, [track, ...rest]); }; }); }

// 3. Diarized ASR + CRM write async function onTranscript({ speaker, text, ts, meetingId }) { await db.transcripts.insert({ speaker, text, ts, meetingId }); if (text.match(/(price|budget|timeline|move-in)/i)) { await crm.tagOpportunity(meetingId, extractIntent(text)); } } ```

Pitfalls

  • Joining Zoom without a license — Zoom blocks bots in waiting rooms; use the Zoom Meeting SDK with a real account and request the cohost role.
  • Recording without consent — every US state plus the EU requires disclosure; play a TTS notice at join and write consent to the audit log.
  • Trying to caption from Zoom's caption API — it works but lags 4-6 seconds and degrades over 90-minute calls. Tap raw audio.
  • Diarization on noisy audio — pyannote and NVIDIA NeMo both need clean input; run noise suppression before ASR.
  • Forgetting time zones in Calendly webhooks — the `start_time` is UTC ISO; convert before scheduling the bot.

FAQ

Will Zoom ban my bot? No, if it joins via the Meeting SDK with a real account. Yes, if it scrapes the web client without disclosure.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Can I do the same on Google Meet? Yes, but Meet has tighter bot detection; use the official Meet REST API and join as a participant.

Does this work with native WebRTC (no Zoom)? Yes, and it is simpler — your gateway is already in the call path.

How accurate is diarization? With clean audio and 2-3 speakers, Whisper + pyannote 3.1 hits ~95% turn accuracy. Cross-talk drops it to 80%.

Does Otter's Meeting Agent compete with this? Yes — it is the SaaS version. Build your own when seat costs exceed $4k/month.

Sources

Trial it at /trial, see /pricing, or take the /demo.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.