Skip to main content

AI Integration

Integrate speech-to-text, text-to-speech, and AI agents from multiple providers or build custom implementations.

Overview

Micdrop provides a modular AI architecture allowing you to:

  • Mix and match providers for optimal cost and quality
  • Build custom integrations using abstract base classes

Provider Categories

Provided Integrations

Ready-to-use implementations for popular AI services:

Speech-to-Text (STT):

  • Gladia - Fast, accurate multilingual transcription
  • OpenAI Whisper - High-quality speech recognition

Text-to-Speech (TTS):

AI Agents (LLM):

  • OpenAI - GPT models for conversation
  • Mistral - Open-source and commercial LLMs

Custom Integrations

Build your own integrations using abstract base classes:

Quick Start

Basic Setup

import { MicdropServer } from '@micdrop/server'
import { OpenaiAgent } from '@micdrop/openai'
import { ElevenLabsTTS } from '@micdrop/elevenlabs'
import { GladiaSTT } from '@micdrop/gladia'

new MicdropServer(socket, {
// AI Agent for conversation
agent: new OpenaiAgent({
apiKey: process.env.OPENAI_API_KEY,
model: 'gpt-4-turbo-preview',
}),

// Speech-to-Text
stt: new GladiaSTT({
apiKey: process.env.GLADIA_API_KEY,
language: 'en',
}),

// Text-to-Speech
tts: new ElevenLabsTTS({
apiKey: process.env.ELEVENLABS_API_KEY,
voiceId: 'voice-id-here',
}),
})

Cost-Optimized Setup

// Use different providers for optimal cost/quality balance
new MicdropServer(socket, {
agent: new MistralAgent({
// Cost-effective LLM
apiKey: process.env.MISTRAL_API_KEY,
model: 'mistral-large-latest',
}),

stt: new GladiaSTT({
// Fast, affordable STT
apiKey: process.env.GLADIA_API_KEY,
}),

tts: new CartesiaTTS({
// Low-latency TTS
apiKey: process.env.CARTESIA_API_KEY,
voiceId: 'cartesia-voice-id',
}),
})

Models Comparison

Speech-to-Text Comparison

ProviderLatencyLanguagesCostBest For
Gladia 🇫🇷~200ms99+$General use, multilingual
OpenAI Whisper~300ms57$$High accuracy, multilingual

Text-to-Speech Comparison

ProviderLatencyQualityVoicesBest For
ElevenLabs~400msExcellent1000+High-quality voices
Cartesia~150msGood50+Low-latency streaming

AI Agent Comparison

ProviderSpeedQualityCostBest For
OpenAI GPT-4MediumExcellent$$$Complex reasoning
OpenAI GPT-3.5FastGood$Simple conversations
Mistral Large 🇫🇷FastExcellent$$Cost-effective quality