How It Works
Micdrop orchestrates a complete voice conversation pipeline. Watch how voice data flows through each component in real-time, from your microphone to AI responses and back to your speakers.
🎤 Voice Input
Client captures microphone input, VAD detects speech, and audio chunks are sent to the server for processing.
🤖 AI Processing
Server orchestrates STT transcription, AI agent reasoning, tool calls, and TTS generation for natural responses.
🔊 Voice Output
Generated audio streams back to the client for playback, with full support for interruptions and real-time interaction.
📦 Packages
Modular architecture with specialized packages for different use cases
Core Packages
Essential packages for browser and server-side implementation
AI Integrations
Ready-to-use integrations with popular AI providers
start()
and your app has complete voice AI.Low Latency
Optimized for minimal delay with streaming audio processing, voice activity detection (VAD), and efficient WebSocket communication for near real-time voice interactions.
AI Provider Agnostic
Choose the best AI providers for your use case. Mix and match OpenAI, ElevenLabs, Cartesia, Mistral, Gladia, and more. Build custom integrations using our abstract interfaces.
Developer Experience
Built with TypeScript for excellent DX. Comprehensive documentation, React hooks, demo applications, and detailed examples to get you started quickly.