Push-to-Talk Voice

Walkie-talkie style voice communications with real-time streaming.

6 min read

How It Works

GroundWave voice uses a push-to-talk (PTT) model familiar from walkie-talkies. Pressing and holding the PTT button captures audio from the device microphone and streams it to the server. The server relays the audio to all other clients in the same voice channel. Releasing the button ends the transmission.

Key design decisions:

Server-relayed, not peer-to-peer. All audio flows through the GroundWave server over the existing Socket.IO connection. There is no WebRTC signaling, no STUN/TURN infrastructure, and no direct device-to-device connectivity requirement. This makes the system work identically on any network topology the server runs on.
Channel-aligned. Voice channels correspond directly to chat channels. Joining a voice channel with voice:join subscribes the client to audio relayed on that channel. Users can be in one voice channel at a time.
Single transmitter per channel. Only one client may transmit at a time per channel. If a second client attempts to transmit while another is active, the server rejects the transmission with a voice:busy event and the client UI indicates the channel is occupied.

Voice recording is optional and off by default. A toggle switch in the VoicePanel enables recording. When enabled, transmissions are saved as regular files (WebM/Opus) via the standard file upload API, appearing in the Files panel with a timestamp-based filename like Voice_Recording_2026-02-27_14-30-00.webm. These files can be renamed, played back inline, or downloaded like any other file.

Audio Pipeline

Capture

When PTT is engaged, the client requests microphone access via the Web Audio API and routes the input stream through an AudioWorklet processor. The worklet runs in a dedicated audio rendering thread separate from the main JavaScript thread, which avoids audio glitches caused by UI work.

The worklet captures raw PCM samples at 16kHz mono, accumulating them into 20ms frames before posting each frame to the main thread. Twenty milliseconds is the standard Opus packet duration and balances latency against per-packet overhead.

In parallel, a MediaRecorder instance records the same microphone stream in WebM/Opus format. This produces the full session recording that is uploaded to the server when the PTT button is released.

Transport

Each 20ms PCM frame is emitted over the Socket.IO connection as a binary event (voice:audio-chunk). The payload is an ArrayBuffer containing the raw 16-bit signed integer PCM samples with no additional framing overhead.

The server receives each chunk, verifies the sender is the active transmitter on the channel, and relays the buffer to all other subscribers using Socket.IO's room broadcast mechanism. End-to-end latency from microphone capture to speaker playback is approximately 60ms on a local Wi-Fi network — comparable to analog walkie-talkie performance.

Playback

Receiving clients decode incoming voice:audio-chunk events using the Web Audio API. Each chunk is written into a jitter buffer — a small FIFO queue of PCM frames that decouples the arrival rate from the playback rate and absorbs brief network irregularities without audible gaps.

The AudioContext scheduler reads frames from the jitter buffer at precisely scheduled intervals using AudioBufferSourceNode. This produces smooth, glitch-free playback even when individual frames arrive slightly late or out of order.

Two playback controls are available and their values are persisted to localStorage so preferences survive page reloads:

Volume slider — adjusts the GainNode applied to incoming audio, from 0 to 100.
Mute toggle — disconnects the output gain node entirely, silencing all incoming voice without stopping the jitter buffer.

User Interface

The VoicePanel is a slide-out panel accessible from the main toolbar. It displays the active channel, connected voice participants, and the PTT control.

PTT Button

The PTT button is a large, touch-friendly control designed for use with gloves or in low-visibility conditions. It has three visual states:

Idle — standard appearance, ready to transmit
Transmitting — highlighted with a pulsing ring, indicating active audio capture
Busy — dimmed with a "Channel busy" label when another user is transmitting

Keyboard Shortcut

The Spacebar key activates PTT when the VoicePanel is open and no text input is focused. Holding the spacebar begins transmission; releasing it ends it. This mirrors the ergonomics of traditional radio software and is suitable for laptop or desktop operation.

Audio Level Meter

During transmission, an audio level meter visualizes the microphone input amplitude in real time using a Web Audio AnalyserNode. The meter bar grows and glows dynamically in proportion to the detected level — providing immediate visual feedback that the microphone is active and capturing audio. A flat meter while holding PTT signals a potential microphone permission problem.

Speaker Indicators

The participants list shows a speaker icon next to the callsign of the user currently transmitting. The icon is driven by the voice:transmit-start and voice:transmit-end Socket.IO events, which carry the transmitting user's callsign. All clients receive these events regardless of whether they are in the voice channel, allowing the main roster panel to also indicate who is speaking.

Mobile UX

On mobile devices, two additional affordances activate when PTT is pressed:

Haptic feedback — the device vibrates briefly at the start and end of each transmission using the Vibration API, confirming the button state change without the user needing to look at the screen.
Screen wake lock — the Wake Lock API is requested when PTT is engaged. This prevents the screen from dimming or locking during an active transmission, which would interrupt audio capture on some browsers.

Permissions

Voice access is governed by the same RBAC system used across all GroundWave features.

Capability	Observer	Operator	Admin
Join voice channel (listen)	Yes	Yes	Yes
Transmit (push to talk)	No	Yes	Yes
Leave voice channel	Yes	Yes	Yes

The PTT button is hidden entirely in the UI for users with the observer role. Attempts to emit voice:audio-chunk events from an observer socket are rejected server-side with a permission error, regardless of UI state.

Socket.IO Events

Event	Direction	Description
`voice:join`	Client → Server	Subscribe to a voice channel. Payload: `{ channel_id }`
`voice:leave`	Client → Server	Unsubscribe from the current voice channel.
`voice:audio-chunk`	Client → Server → others	Binary PCM frame (20ms, 16kHz, 16-bit signed). Server relays to channel subscribers.
`voice:transmit-start`	Server → all clients	A user began transmitting. Payload: `{ callsign, channel_id }`
`voice:transmit-end`	Server → all clients	A user stopped transmitting. Payload: `{ callsign, channel_id }`
`voice:busy`	Server → requester	Transmit rejected — another user is already transmitting on this channel.

Feature Toggle

Voice is an opt-in feature controlled by the FEATURES_ENABLED environment variable. It is not active by default.

# docker-compose.yml environment section
FEATURES_ENABLED=chat,markers,files,overlays,voice

When voice is not listed in FEATURES_ENABLED:

The VoicePanel is not rendered in the client UI.
All voice:* Socket.IO event handlers are unregistered on the server.
No microphone permission request is issued to the browser.

This allows operators to disable voice for deployments where radio communications are handled externally or where the additional server CPU load from audio relay is undesirable on very constrained hardware.

On a Raspberry Pi 4, enabling voice for a 10-user session adds approximately 15–20% CPU utilization due to audio relay. Benchmark your specific hardware with the resource benchmarking suite (scripts/benchmark/) before enabling in high-participant deployments.