---
title: "Build a Voice Agent in Go with Pion WebRTC and OpenAI Realtime"
description: "Wire Pion WebRTC in pure Go to OpenAI Realtime over a single PeerConnection. Real working code for SDP exchange, Opus tracks, data channel events, and barge-in."
canonical: https://callsphere.ai/blog/vw2h-build-voice-agent-go-pion-webrtc-openai-realtime
category: "AI Voice Agents"
tags: ["Tutorial", "Build", "Go", "Pion", "WebRTC", "OpenAI Realtime"]
author: "CallSphere Team"
published: 2026-03-15T00:00:00.000Z
updated: 2026-05-07T09:27:39.196Z
---

# Build a Voice Agent in Go with Pion WebRTC and OpenAI Realtime

> Wire Pion WebRTC in pure Go to OpenAI Realtime over a single PeerConnection. Real working code for SDP exchange, Opus tracks, data channel events, and barge-in.

> **TL;DR** — OpenAI's own infra runs on Pion, so you can hit the WebRTC endpoint directly from Go with zero CGo. One `PeerConnection`, one Opus track, one data channel — that's the entire agent.

## What you'll build

A standalone Go binary that opens a WebRTC `PeerConnection` to `https://api.openai.com/v1/realtime`, attaches a microphone Opus track, and prints model events from the data channel. Total latency on a US east-coast box lands around 480ms — Go + Pion is the lowest-overhead client you can ship.

## Prerequisites

1. Go 1.23+ and a working `pion/webrtc/v4` install.
2. OpenAI API key with Realtime access.
3. An ephemeral key endpoint (do not ship the raw key in your binary).
4. `go get github.com/pion/webrtc/v4` and `github.com/pion/mediadevices` for mic capture.
5. Familiarity with SDP offer/answer and ICE.

## Architecture

```mermaid
sequenceDiagram
  participant G as Go binary
  participant K as Your /session endpoint
  participant O as OpenAI Realtime
  G->>K: POST /session (mint ephemeral key)
  K-->>G: { client_secret.value }
  G->>G: pc.CreateOffer() -> SDP
  G->>O: POST /v1/realtime?model=... (SDP, Bearer eph)
  O-->>G: SDP answer
  G->>O: ICE + DTLS handshake
  G->>O: Opus mic track
  O-->>G: Opus TTS track + DC events
```

## Step 1 — Mint an ephemeral key

Never ship a long-lived OpenAI key inside a desktop binary. Stand up a tiny HTTPS endpoint that calls `/v1/realtime/sessions` with your real key and returns the 60-second `client_secret`.

```go
type session struct {
    ClientSecret struct{ Value string `json:"value"` } `json:"client_secret"`
}

func mintEphemeral(ctx context.Context) (string, error) {
    body := `{"model":"gpt-4o-realtime-preview-2025-06-03","voice":"alloy"}`
    req, _ := http.NewRequestWithContext(ctx, "POST",
        "[https://api.openai.com/v1/realtime/sessions](https://api.openai.com/v1/realtime/sessions)", strings.NewReader(body))
    req.Header.Set("Authorization", "Bearer "+os.Getenv("OPENAI_API_KEY"))
    req.Header.Set("Content-Type", "application/json")
    resp, err := http.DefaultClient.Do(req)
    if err != nil { return "", err }
    defer resp.Body.Close()
    var s session
    if err := json.NewDecoder(resp.Body).Decode(&s); err != nil { return "", err }
    return s.ClientSecret.Value, nil
}
```

## Step 2 — Build the PeerConnection

```go
import "github.com/pion/webrtc/v4"

config := webrtc.Configuration{
    ICEServers: []webrtc.ICEServer{{URLs: []string{"stun:stun.l.google.com:19302"}}},
}
pc, err := webrtc.NewPeerConnection(config)
if err != nil { log.Fatal(err) }

// Recv-only audio transceiver — OpenAI sends back TTS on this.
_, _ = pc.AddTransceiverFromKind(webrtc.RTPCodecTypeAudio,
    webrtc.RTPTransceiverInit{Direction: webrtc.RTPTransceiverDirectionSendrecv})

dc, err := pc.CreateDataChannel("oai-events", nil)
if err != nil { log.Fatal(err) }
dc.OnMessage(func(m webrtc.DataChannelMessage) {
    fmt.Println("event:", string(m.Data))
})
```

## Step 3 — Capture mic and add the track

Use `mediadevices` to wrap a host mic into an Opus-encoded track. Pion will negotiate the codec automatically:

```go
import (
    "github.com/pion/mediadevices"
    "github.com/pion/mediadevices/pkg/codec/opus"
    _ "github.com/pion/mediadevices/pkg/driver/microphone"
)

opusParams, _ := opus.NewParams()
codecSelector := mediadevices.NewCodecSelector(
    mediadevices.WithAudioEncoders(&opusParams))
ms, err := mediadevices.GetUserMedia(mediadevices.MediaStreamConstraints{
    Audio: func(c *mediadevices.MediaTrackConstraints) {},
    Codec: codecSelector,
})
if err != nil { log.Fatal(err) }
for _, t := range ms.GetAudioTracks() {
    pc.AddTransceiverFromTrack(t.(webrtc.TrackLocal),
        webrtc.RTPTransceiverInit{Direction: webrtc.RTPTransceiverDirectionSendonly})
}
```

## Step 4 — Trade SDP with OpenAI

This is the only OpenAI-specific bit: POST your SDP offer as `application/sdp` and you get the answer back as plain text.

```go
offer, _ := pc.CreateOffer(nil)
_ = pc.SetLocalDescription(offer)
<-webrtc.GatheringCompletePromise(pc)

eph, _ := mintEphemeral(ctx)
url := "[https://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03](https://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03)"
req, _ := http.NewRequest("POST", url, strings.NewReader(pc.LocalDescription().SDP))
req.Header.Set("Authorization", "Bearer "+eph)
req.Header.Set("Content-Type", "application/sdp")
resp, _ := http.DefaultClient.Do(req)
ans, _ := io.ReadAll(resp.Body)
_ = pc.SetRemoteDescription(webrtc.SessionDescription{
    Type: webrtc.SDPTypeAnswer, SDP: string(ans),
})
```

## Step 5 — Send a session.update over the data channel

Once the channel is open, push your system prompt and turn-detection config:

```go
dc.OnOpen(func() {
    payload, _ := json.Marshal(map[string]any{
        "type": "session.update",
        "session": map[string]any{
            "instructions":      "You are CallSphere, a friendly receptionist.",
            "voice":             "alloy",
            "turn_detection":    map[string]any{"type": "server_vad", "threshold": 0.5},
            "input_audio_transcription": map[string]any{"model": "whisper-1"},
        },
    })
    _ = dc.SendText(string(payload))
})
```

## Step 6 — Play remote audio

Subscribe to the inbound track and pipe it to your speaker. `mediadevices` exposes a `Player` driver that handles the OS-level glue:

```go
pc.OnTrack(func(t *webrtc.TrackRemote, _ *webrtc.RTPReceiver) {
    log.Printf("got remote %s track", t.Kind())
    buf := make([]byte, 1500)
    for {
        n, _, err := t.Read(buf)
        if err != nil { return }
        // forward Opus packet to your audio sink
        sink.Write(buf[:n])
    }
})
```

## Common pitfalls

- **Forgetting GatheringCompletePromise.** OpenAI rejects half-baked SDP. Wait for ICE gathering before POSTing.
- **Long-lived API key in the binary.** Always mint ephemeral keys server-side.
- **Wrong codec.** Force Opus on both sides; Pion will fall back to PCMU otherwise.
- **No `OnICEConnectionStateChange` handler.** You'll fly blind on transient drops.

## How CallSphere does this in production

CallSphere's real-estate agent **OneRoof** runs a Pion-based Go gateway at the edge. Each call gets its own `PeerConnection`, NATS hands the audio frames off to a transcription worker, and Postgres stores the run. Across 6 verticals and 37 agents we see **480–620ms p50 voice latency**. Try it on the [14-day trial](/trial) or [book a live demo](/demo).

## FAQ

**Why Pion over Janus or LiveKit for this?** Single binary, no media server, no Docker — perfect for a per-call sidecar.

**Does it work behind NAT?** Yes, with Google STUN. Add a TURN server for symmetric NAT users.

**Can I run this on Fly.io?** Yes, but pin to one region per call — WebRTC sessions are stateful.

**What about Whisper transcription?** Add `input_audio_transcription` in session.update; deltas arrive on the data channel.

**How do I scale?** One pod per N concurrent calls; OpenAI's quota dominates, not Pion.

## Sources

- [OpenAI: How we deliver low-latency voice AI at scale](https://openai.com/index/delivering-low-latency-voice-ai-at-scale/)
- [Pion WebRTC v4 docs](https://pkg.go.dev/github.com/pion/webrtc/v4)
- [OpenAI Realtime WebRTC guide](https://developers.openai.com/api/docs/guides/realtime-webrtc)
- [webrtcHacks — Unofficial Realtime WebRTC guide](https://webrtchacks.com/the-unofficial-guide-to-openai-realtime-webrtc-api/)

---

Source: https://callsphere.ai/blog/vw2h-build-voice-agent-go-pion-webrtc-openai-realtime
