How It Works
Micdrop orchestrates a complete voice conversation pipeline. Watch how voice data flows through each component in real-time, from your microphone to AI responses and back to your speakers.
🎤 Voice Input
Client captures microphone input, VAD detects speech, and audio chunks are sent to the server for processing.
🤖 AI Processing
Server orchestrates STT transcription, AI agent reasoning, tool calls, and TTS generation for natural responses.
🔊 Voice Output
Generated audio streams back to the client for playback, with full support for interruptions and real-time interaction.
📦 Packages
Modular architecture with specialized packages for different use cases
Core Packages
Essential packages for browser and server-side implementation
AI Integrations
Ready-to-use integrations with popular AI providers
start() and your app has complete voice AI.Low Latency
Optimized for minimal delay with streaming audio processing, voice activity detection (VAD), and efficient WebSocket communication for near real-time voice interactions.
AI Provider Agnostic
Choose the best AI providers for your use case. Mix and match OpenAI, ElevenLabs, Cartesia, Mistral, Gladia, and more. Build custom integrations using our abstract interfaces.
Developer Experience
Built with TypeScript for excellent DX. Comprehensive documentation, React hooks, demo applications, and detailed examples to get you started quickly.
Build Sovereign Voice AI
Micdrop is the perfect solution to build sovereign voice AI applications, especially in French. Combine European AI providers to keep your data and infrastructure fully within EU borders.
Mistral
Agent (LLM)
French LLM for conversational AI
Gladia
Speech-to-Text
French STT with 90+ languages
Gradium
Text-to-Speech
French TTS with natural voices
100% open-source stack. No data leaves the EU. Full GDPR compliance.
Learn more about sovereign voice AI