Skip to main content

Voice Activity Detection (VAD)

Micdrop uses a VAD (Voice Activity Detection) to detect speech and silence and send chunks of audio to the server only when speech is detected.

Supported VAD Types

Micdrop supports the following VADs by name:

  • 'volume': Volume-based VAD (default)
  • 'silero': AI-based VAD using Silero

You can also pass instances of these VADs, or combine them in an array. See below for details.

Note: Only 'volume' and 'silero' are supported as string names. Custom VADs must be passed as instances.

Quick Start

Configure VAD when starting a call:

import { Micdrop } from '@micdrop/client'

// Use volume-based detection (default)
await Micdrop.start({
url: 'ws://localhost:8081',
vad: 'volume',
})

// Use AI-based detection for better accuracy
await Micdrop.start({
url: 'ws://localhost:8081',
vad: 'silero',
})

// Combine multiple VADs for best results
await Micdrop.start({
url: 'ws://localhost:8081',
vad: ['volume', 'silero'],
})

Or when starting the microphone (before starting the call):

Micdrop.startMic({ vad: 'volume' })

Volume VAD: Speech detection based on volume

By default, MicdropClient uses VolumeVAD for speech detection. You can use it explicitly when starting Micdrop:

Micdrop.start({ vad: 'volume' })

or when starting the microphone (before starting the call):

Micdrop.startMic({ vad: 'volume' })

It is inspired by hark and triggers speech detection events based on volume changes.

You can also pass an instance of VolumeVAD to MicdropClient:

const vad = new VolumeVAD({
history: 5, // Number of frames to consider for volume calculation
threshold: -55, // Threshold in decibels for speech detection
})
Micdrop.start({ vad })
  • Default options: { history: 5, threshold: -55 }
  • Persistence: Options are saved to localStorage and restored automatically.

When to use Volume VAD:

  • ✅ Low latency requirements
  • ✅ Quiet environments
  • ✅ Clear speech patterns
  • ❌ Noisy environments
  • ❌ Soft-spoken users

Silero VAD: Human speech detection with AI

To use SileroVAD for speech detection:

Micdrop.start({ vad: 'silero' })

It is based on @ricky0123/vad-web which runs a Silero VAD model in the browser using ONNX Runtime Web.

It is more accurate than VolumeVAD and works better with low voice.

You can also pass an instance of SileroVAD to MicdropClient:

const vad = new SileroVAD({
positiveSpeechThreshold: 0.18, // Threshold for positive speech detection
negativeSpeechThreshold: 0.11, // Threshold for negative speech detection
minSpeechFrames: 8, // Minimum number of frames to consider for speech detection
redemptionFrames: 20, // Number of frames to consider for silence detection
})
Micdrop.start({ vad })
  • Default options: { positiveSpeechThreshold: 0.18, negativeSpeechThreshold: 0.11, minSpeechFrames: 8, redemptionFrames: 20 }
  • Persistence: Options are saved to localStorage and restored automatically.

When to use Silero VAD:

  • ✅ Noisy environments
  • ✅ Soft-spoken users
  • ✅ Multiple speakers
  • ✅ Background music/TV
  • ❌ Extremely low latency needs (adds ~50ms processing)

Multiple VAD: Combine multiple VADs

Combining multiple VADs is useful to get more accurate speech detection:

  • Volume to ignore low voice
  • Silero to detect human speech

You can combine multiple VADs by passing an array of VAD names:

Micdrop.start({ vad: ['volume', 'silero'] })

Or with instances:

const vad = [new VolumeVAD(), new SileroVAD()]
Micdrop.start({ vad })

Or mix names and instances:

await Micdrop.start({
vad: ['volume', new SileroVAD({ positiveSpeechThreshold: 0.15 })],
})

How it works:

  • StartSpeaking is emitted when any VAD detects possible speech.
  • ConfirmSpeaking is emitted only when all VADs confirm speech.
  • StopSpeaking is emitted when all VADs detect silence.
  • CancelSpeaking is emitted if all VADs agree speech was a false positive.

This approach reduces false positives while maintaining quick response times.

VAD Events

VADs emit the following events:

  • StartSpeaking: Possible speech detected (not yet confirmed)
  • ConfirmSpeaking: Speech confirmed
  • CancelSpeaking: Speech start was a false positive (noise, etc.)
  • StopSpeaking: Speech ended
  • ChangeStatus: Status changed (Silence, MaybeSpeaking, Speaking)

Monitor VAD activity in your application:

Micdrop.vad.on('StartSpeaking', () => {
console.log('🎤 Possible speech detected...')
showListeningIndicator()
})

Micdrop.vad.on('ConfirmSpeaking', () => {
console.log('✅ Speech confirmed - recording')
highlightMicrophoneButton()
})

Micdrop.vad.on('StopSpeaking', () => {
console.log('🔇 Speech ended')
resetMicrophoneButton()
})

Micdrop.vad.on('CancelSpeaking', () => {
console.log('❌ False positive - not speech')
hideListeningIndicator()
})

Micdrop.vad.on('ChangeStatus', (status) => {
console.log('VAD status:', status) // 'Silence', 'MaybeSpeaking', 'Speaking'
})

Custom VAD

You can also pass your own VAD implementation:

Micdrop.start({ vad: new MyVAD() })

Your VAD implementation should extend the VAD class:

import { VAD } from '@micdrop/client'

class MyVAD extends VAD {
private started = false

get isStarted(): boolean {
return this.started
}

async start(stream: MediaStream) {
this.started = true

// When speech is detected, emit StartSpeaking event
this.emit('StartSpeaking')

// When speech is confirmed, emit ConfirmSpeaking event
this.emit('ConfirmSpeaking')

// When speech is cancelled, emit CancelSpeaking event
this.emit('CancelSpeaking')

// When speech stops, emit StopSpeaking event
this.emit('StopSpeaking')
}

async stop() {
this.started = false
}
}

Tip: See the codebase for HarkVAD as another example of a custom VAD. It is not exported by default, as VolumeVAD is the recommended and supported option inspired by Hark.

For a more complex example with audio analysis:

import { VAD } from '@micdrop/client'

class CustomVAD extends VAD {
private started = false
private audioContext: AudioContext
private analyzer: AnalyserNode

get isStarted(): boolean {
return this.started
}

async start(stream: MediaStream) {
this.started = true

// Set up audio analysis
this.audioContext = new AudioContext()
this.analyzer = this.audioContext.createAnalyser()

const source = this.audioContext.createMediaStreamSource(stream)
source.connect(this.analyzer)

// Start your detection logic
this.detectSpeech()
}

async stop() {
this.started = false
this.audioContext?.close()
}

private detectSpeech() {
// Implement your speech detection logic
const dataArray = new Uint8Array(this.analyzer.frequencyBinCount)

const analyze = () => {
if (!this.started) return

this.analyzer.getByteFrequencyData(dataArray)

// Your custom detection algorithm
const isSpeaking = this.customDetection(dataArray)

if (isSpeaking) {
this.emit('StartSpeaking')
this.emit('ConfirmSpeaking')
} else {
this.emit('StopSpeaking')
}

requestAnimationFrame(analyze)
}

analyze()
}

private customDetection(audioData: Uint8Array): boolean {
// Your custom speech detection logic here
// Return true if speech is detected, false otherwise
return false
}
}

// Use your custom VAD
await Micdrop.start({
vad: new CustomVAD(),
})

VAD Delay

All VADs have a delay property (default: 100ms) that controls the interval for speech detection checks. You can adjust this in custom VADs if needed.

Tuning VAD Performance

Volume VAD Tuning

Adjust sensitivity based on environment:

// Quiet environment - more sensitive
const quietVad = new VolumeVAD({
threshold: -65, // Lower threshold for quiet voices
history: 3, // Faster response
})

// Noisy environment - less sensitive
const noisyVad = new VolumeVAD({
threshold: -45, // Higher threshold to ignore noise
history: 8, // More frames for stability
})

Silero VAD Tuning

Fine-tune AI detection:

// More sensitive - catches quiet speech
const sensitiveVad = new SileroVAD({
positiveSpeechThreshold: 0.15, // Lower threshold
minSpeechFrames: 6, // Faster confirmation
})

// More conservative - reduces false positives
const conservativeVad = new SileroVAD({
positiveSpeechThreshold: 0.22, // Higher threshold
minSpeechFrames: 12, // More confirmation needed
redemptionFrames: 30, // Longer silence confirmation
})

Dynamic VAD Configuration

You can update VAD settings in real-time without restarting:

// Update Volume VAD settings
const volumeVad = Micdrop.vad as VolumeVAD
volumeVad.setOptions({ threshold: -45 })

// Update Silero VAD settings
const sileroVad = Micdrop.vad as SileroVAD
sileroVad.setOptions({ positiveSpeechThreshold: 0.15 })

// Reset to default options
volumeVad.resetOptions()
sileroVad.resetOptions()

Persistent Settings

Both VolumeVAD and SileroVAD settings are automatically saved to localStorage and restored when loading with their names ('volume' or 'silero') and not instances.

React VAD Settings UI

See a complete React component for VAD configuration based on the demo client: VADSettings