Micdrop - Real-time voice conversations with AI in the browser

How It Works

Micdrop orchestrates a complete voice conversation pipeline. Watch how voice data flows through each component in real-time, from your microphone to AI responses and back to your speakers.

Audio

Message chunks

Messages

🎤 Voice Input

Client captures microphone input, VAD detects speech, and audio chunks are sent to the server for processing.

🤖 AI Processing

Server orchestrates STT transcription, AI agent reasoning, tool calls, and TTS generation for natural responses.

🔊 Voice Output

Generated audio streams back to the client for playback, with full support for interruptions and real-time interaction.

📦 Packages

Modular architecture with specialized packages for different use cases

Core Packages

Essential packages for browser and server-side implementation

@micdrop/client @micdrop/server

AI Integrations

Ready-to-use integrations with popular AI providers

@micdrop/openai @micdrop/ai-sdk @micdrop/elevenlabs @micdrop/cartesia @micdrop/mistral @micdrop/gladia

Utilities

React hooks and utilities for seamless frontend integration

@micdrop/react

Just call start() and your app has complete voice AI.

Micdrop handles all the complexity for you.

Low Latency

Optimized for minimal delay with streaming audio processing, voice activity detection (VAD), and efficient WebSocket communication for near real-time voice interactions.

AI Provider Agnostic

Choose the best AI providers for your use case. Mix and match OpenAI, ElevenLabs, Cartesia, Mistral, Gladia, and more. Build custom integrations using our abstract interfaces.

Developer Experience

Built with TypeScript for excellent DX. Comprehensive documentation, React hooks, demo applications, and detailed examples to get you started quickly.

🎥 See it in Action

Watch the creator explain Micdrop and voice AI technology

Watch Demo Video ▶️