Push-to-Talk Voice
Walkie-talkie style voice communications with real-time streaming.
6 min readHow It Works
GroundWave voice uses a push-to-talk (PTT) model familiar from walkie-talkies. Pressing and holding the PTT button captures audio from the device microphone and streams it to the server. The server relays the audio to all other clients in the same voice channel. Releasing the button ends the transmission.
Key design decisions:
- Server-relayed, not peer-to-peer. All audio flows through the GroundWave server over the existing Socket.IO connection. There is no WebRTC signaling, no STUN/TURN infrastructure, and no direct device-to-device connectivity requirement. This makes the system work identically on any network topology the server runs on.
-
Channel-aligned. Voice channels correspond directly to chat channels.
Joining a voice channel with
voice:joinsubscribes the client to audio relayed on that channel. Users can be in one voice channel at a time. -
Single transmitter per channel. Only one client may transmit at a
time per channel. If a second client attempts to transmit while another is active, the
server rejects the transmission with a
voice:busyevent and the client UI indicates the channel is occupied.
Voice recording is optional and off by default. A toggle switch in the
VoicePanel enables recording. When enabled, transmissions are saved as regular files
(WebM/Opus) via the standard file upload API, appearing in the Files panel with a
timestamp-based filename like Voice_Recording_2026-02-27_14-30-00.webm.
These files can be renamed, played back inline, or downloaded like any other file.
Audio Pipeline
Capture
When PTT is engaged, the client requests microphone access via the Web Audio API and routes the input stream through an AudioWorklet processor. The worklet runs in a dedicated audio rendering thread separate from the main JavaScript thread, which avoids audio glitches caused by UI work.
The worklet captures raw PCM samples at 16kHz mono, accumulating them into 20ms frames before posting each frame to the main thread. Twenty milliseconds is the standard Opus packet duration and balances latency against per-packet overhead.
In parallel, a MediaRecorder instance records the same microphone stream in WebM/Opus format. This produces the full session recording that is uploaded to the server when the PTT button is released.
Transport
Each 20ms PCM frame is emitted over the Socket.IO connection as a binary event
(voice:audio-chunk). The payload is an ArrayBuffer containing
the raw 16-bit signed integer PCM samples with no additional framing overhead.
The server receives each chunk, verifies the sender is the active transmitter on the channel, and relays the buffer to all other subscribers using Socket.IO's room broadcast mechanism. End-to-end latency from microphone capture to speaker playback is approximately 60ms on a local Wi-Fi network — comparable to analog walkie-talkie performance.
Playback
Receiving clients decode incoming voice:audio-chunk events using the Web
Audio API. Each chunk is written into a jitter buffer — a small FIFO
queue of PCM frames that decouples the arrival rate from the playback rate and absorbs
brief network irregularities without audible gaps.
The AudioContext scheduler reads frames from the jitter buffer at precisely scheduled
intervals using AudioBufferSourceNode. This produces smooth, glitch-free
playback even when individual frames arrive slightly late or out of order.
Two playback controls are available and their values are persisted to
localStorage so preferences survive page reloads:
- Volume slider — adjusts the
GainNodeapplied to incoming audio, from 0 to 100. - Mute toggle — disconnects the output gain node entirely, silencing all incoming voice without stopping the jitter buffer.
User Interface
The VoicePanel is a slide-out panel accessible from the main toolbar. It displays the active channel, connected voice participants, and the PTT control.
PTT Button
The PTT button is a large, touch-friendly control designed for use with gloves or in low-visibility conditions. It has three visual states:
- Idle — standard appearance, ready to transmit
- Transmitting — highlighted with a pulsing ring, indicating active audio capture
- Busy — dimmed with a "Channel busy" label when another user is transmitting
Keyboard Shortcut
The Spacebar key activates PTT when the VoicePanel is open and no text input is focused. Holding the spacebar begins transmission; releasing it ends it. This mirrors the ergonomics of traditional radio software and is suitable for laptop or desktop operation.
Audio Level Meter
During transmission, an audio level meter visualizes the microphone input amplitude
in real time using a Web Audio AnalyserNode. The meter bar grows and
glows dynamically in proportion to the detected level — providing immediate visual
feedback that the microphone is active and capturing audio. A flat meter while
holding PTT signals a potential microphone permission problem.
Speaker Indicators
The participants list shows a speaker icon next to the callsign of the user currently
transmitting. The icon is driven by the voice:transmit-start and
voice:transmit-end Socket.IO events, which carry the transmitting user's
callsign. All clients receive these events regardless of whether they are in the voice
channel, allowing the main roster panel to also indicate who is speaking.
Mobile UX
On mobile devices, two additional affordances activate when PTT is pressed:
- Haptic feedback — the device vibrates briefly at the start and end of each transmission using the Vibration API, confirming the button state change without the user needing to look at the screen.
- Screen wake lock — the Wake Lock API is requested when PTT is engaged. This prevents the screen from dimming or locking during an active transmission, which would interrupt audio capture on some browsers.
Permissions
Voice access is governed by the same RBAC system used across all GroundWave features.
| Capability | Observer | Operator | Admin |
|---|---|---|---|
| Join voice channel (listen) | Yes | Yes | Yes |
| Transmit (push to talk) | No | Yes | Yes |
| Leave voice channel | Yes | Yes | Yes |
The PTT button is hidden entirely in the UI for users with the observer role. Attempts
to emit voice:audio-chunk events from an observer socket are rejected
server-side with a permission error, regardless of UI state.
Socket.IO Events
| Event | Direction | Description |
|---|---|---|
voice:join |
Client → Server | Subscribe to a voice channel. Payload: { channel_id } |
voice:leave |
Client → Server | Unsubscribe from the current voice channel. |
voice:audio-chunk |
Client → Server → others | Binary PCM frame (20ms, 16kHz, 16-bit signed). Server relays to channel subscribers. |
voice:transmit-start |
Server → all clients | A user began transmitting. Payload: { callsign, channel_id } |
voice:transmit-end |
Server → all clients | A user stopped transmitting. Payload: { callsign, channel_id } |
voice:busy |
Server → requester | Transmit rejected — another user is already transmitting on this channel. |
Feature Toggle
Voice is an opt-in feature controlled by the FEATURES_ENABLED environment
variable. It is not active by default.
# docker-compose.yml environment section
FEATURES_ENABLED=chat,markers,files,overlays,voice
When voice is not listed in FEATURES_ENABLED:
- The VoicePanel is not rendered in the client UI.
- All
voice:*Socket.IO event handlers are unregistered on the server. - No microphone permission request is issued to the browser.
This allows operators to disable voice for deployments where radio communications are handled externally or where the additional server CPU load from audio relay is undesirable on very constrained hardware.
On a Raspberry Pi 4, enabling voice for a 10-user session adds approximately 15–20%
CPU utilization due to audio relay. Benchmark your specific hardware with the
resource benchmarking suite (scripts/benchmark/) before enabling in
high-participant deployments.