By Sagar Shankaran, Founder of CallSphere
Wire Pion WebRTC in pure Go to OpenAI Realtime over a single PeerConnection. Real working code for SDP exchange, Opus tracks, data channel events, and barge-in.
Key takeaways
TL;DR — OpenAI's own infra runs on Pion, so you can hit the WebRTC endpoint directly from Go with zero CGo. One
PeerConnection, one Opus track, one data channel — that's the entire agent.
A standalone Go binary that opens a WebRTC PeerConnection to https://api.openai.com/v1/realtime, attaches a microphone Opus track, and prints model events from the data channel. Total latency on a US east-coast box lands around 480ms — Go + Pion is the lowest-overhead client you can ship.
pion/webrtc/v4 install.go get github.com/pion/webrtc/v4 and github.com/pion/mediadevices for mic capture.sequenceDiagram
participant G as Go binary
participant K as Your /session endpoint
participant O as OpenAI Realtime
G->>K: POST /session (mint ephemeral key)
K-->>G: { client_secret.value }
G->>G: pc.CreateOffer() -> SDP
G->>O: POST /v1/realtime?model=... (SDP, Bearer eph)
O-->>G: SDP answer
G->>O: ICE + DTLS handshake
G->>O: Opus mic track
O-->>G: Opus TTS track + DC events
Never ship a long-lived OpenAI key inside a desktop binary. Stand up a tiny HTTPS endpoint that calls /v1/realtime/sessions with your real key and returns the 60-second client_secret.
```go type session struct { ClientSecret struct{ Value string `json:"value"` } `json:"client_secret"` }
func mintEphemeral(ctx context.Context) (string, error) { body := `{"model":"gpt-4o-realtime-preview-2025-06-03","voice":"alloy"}` req, _ := http.NewRequestWithContext(ctx, "POST", "https://api.openai.com/v1/realtime/sessions", strings.NewReader(body)) req.Header.Set("Authorization", "Bearer "+os.Getenv("OPENAI_API_KEY")) req.Header.Set("Content-Type", "application/json") resp, err := http.DefaultClient.Do(req) if err != nil { return "", err } defer resp.Body.Close() var s session if err := json.NewDecoder(resp.Body).Decode(&s); err != nil { return "", err } return s.ClientSecret.Value, nil } ```
```go import "github.com/pion/webrtc/v4"
config := webrtc.Configuration{ ICEServers: []webrtc.ICEServer{{URLs: []string{"stun:stun.l.google.com:19302"}}}, } pc, err := webrtc.NewPeerConnection(config) if err != nil { log.Fatal(err) }
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
// Recv-only audio transceiver — OpenAI sends back TTS on this. _, _ = pc.AddTransceiverFromKind(webrtc.RTPCodecTypeAudio, webrtc.RTPTransceiverInit{Direction: webrtc.RTPTransceiverDirectionSendrecv})
dc, err := pc.CreateDataChannel("oai-events", nil) if err != nil { log.Fatal(err) } dc.OnMessage(func(m webrtc.DataChannelMessage) { fmt.Println("event:", string(m.Data)) }) ```
Use mediadevices to wrap a host mic into an Opus-encoded track. Pion will negotiate the codec automatically:
```go import ( "github.com/pion/mediadevices" "github.com/pion/mediadevices/pkg/codec/opus" _ "github.com/pion/mediadevices/pkg/driver/microphone" )
opusParams, _ := opus.NewParams() codecSelector := mediadevices.NewCodecSelector( mediadevices.WithAudioEncoders(&opusParams)) ms, err := mediadevices.GetUserMedia(mediadevices.MediaStreamConstraints{ Audio: func(c *mediadevices.MediaTrackConstraints) {}, Codec: codecSelector, }) if err != nil { log.Fatal(err) } for _, t := range ms.GetAudioTracks() { pc.AddTransceiverFromTrack(t.(webrtc.TrackLocal), webrtc.RTPTransceiverInit{Direction: webrtc.RTPTransceiverDirectionSendonly}) } ```
This is the only OpenAI-specific bit: POST your SDP offer as application/sdp and you get the answer back as plain text.
```go offer, _ := pc.CreateOffer(nil) _ = pc.SetLocalDescription(offer) <-webrtc.GatheringCompletePromise(pc)
eph, _ := mintEphemeral(ctx) url := "https://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03" req, _ := http.NewRequest("POST", url, strings.NewReader(pc.LocalDescription().SDP)) req.Header.Set("Authorization", "Bearer "+eph) req.Header.Set("Content-Type", "application/sdp") resp, _ := http.DefaultClient.Do(req) ans, _ := io.ReadAll(resp.Body) _ = pc.SetRemoteDescription(webrtc.SessionDescription{ Type: webrtc.SDPTypeAnswer, SDP: string(ans), }) ```
Once the channel is open, push your system prompt and turn-detection config:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```go dc.OnOpen(func() { payload, _ := json.Marshal(map[string]any{ "type": "session.update", "session": map[string]any{ "instructions": "You are CallSphere, a friendly receptionist.", "voice": "alloy", "turn_detection": map[string]any{"type": "server_vad", "threshold": 0.5}, "input_audio_transcription": map[string]any{"model": "whisper-1"}, }, }) _ = dc.SendText(string(payload)) }) ```
Subscribe to the inbound track and pipe it to your speaker. mediadevices exposes a Player driver that handles the OS-level glue:
```go pc.OnTrack(func(t *webrtc.TrackRemote, _ *webrtc.RTPReceiver) { log.Printf("got remote %s track", t.Kind()) buf := make([]byte, 1500) for { n, _, err := t.Read(buf) if err != nil { return } // forward Opus packet to your audio sink sink.Write(buf[:n]) } }) ```
OnICEConnectionStateChange handler. You'll fly blind on transient drops.CallSphere's real-estate agent OneRoof runs a Pion-based Go gateway at the edge. Each call gets its own PeerConnection, NATS hands the audio frames off to a transcription worker, and Postgres stores the run. Across 6 verticals and 37 agents we see 480–620ms p50 voice latency. Try it on the 14-day trial or book a live demo.
Why Pion over Janus or LiveKit for this? Single binary, no media server, no Docker — perfect for a per-call sidecar.
Does it work behind NAT? Yes, with Google STUN. Add a TURN server for symmetric NAT users.
Can I run this on Fly.io? Yes, but pin to one region per call — WebRTC sessions are stateful.
What about Whisper transcription? Add input_audio_transcription in session.update; deltas arrive on the data channel.
How do I scale? One pod per N concurrent calls; OpenAI's quota dominates, not Pion.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
BrowserStack offers 30,000+ real devices; Sauce Labs ships deep Appium automation. Here is how AI voice agent teams use both for WebRTC mobile QA in 2026.
WebTransport is Baseline as of March 2026. Media Over QUIC ships in production within the year. Here is what changes for AI voice agents — and what stays the same.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
On May 4 2026 OpenAI published its Realtime stack rebuild — split-relay plus transceiver edge. Here is what changed and what it means for production voice agents.
Evaluate build vs buy for enterprise calling platforms. Architecture patterns, SIP infrastructure, WebRTC, cost models, and timeline estimates for custom telephony systems.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI