Skip to main content

OpenAI

OpenAI implementation for @micdrop/server.

This package provides AI agent and speech-to-text implementations using OpenAI's API.

Installation

npm install @micdrop/openai

OpenAI Agent

Usage

import { OpenaiAgent } from '@micdrop/openai'
import { MicdropServer } from '@micdrop/server'

const agent = new OpenaiAgent({
apiKey: process.env.OPENAI_API_KEY || '',
model: 'gpt-4o', // Default model
systemPrompt: 'You are a helpful assistant',

// Custom OpenAI Responses API settings (optional)
settings: {
temperature: 0.7,
max_output_tokens: 150,
},
})

// Use with MicdropServer
new MicdropServer(socket, {
agent,
// ... other options
})

Options

OptionTypeDefaultDescription
apiKeystringRequired*Your OpenAI API key (required if openai not provided)
openaiOpenAIOptionalOpenAI instance (alternative to apiKey)
modelstring'gpt-4o'OpenAI model to use
systemPromptstringRequiredSystem prompt for the agent
maxRetrynumber3Maximum number of retries on API failures
maxStepsnumber5Maximum number of steps (for tool calls)
autoEndCallboolean | stringfalseAuto-detect when user wants to end call
autoSemanticTurnboolean | stringfalseHandle incomplete user sentences
autoIgnoreUserNoiseboolean | stringfalseFilter meaningless user sounds
extractExtractJsonOptions | ExtractTagOptionsundefinedExtract structured data from responses
onBeforeAnswerfunctionundefinedHook called before answer generation - return true to skip generation
settingsobject{}Additional OpenAI Responses API parameters

The OpenAI Agent supports adding and removing custom tools to extend its capabilities. For detailed information about tool management, see the Tools documentation.

Advanced Features

The OpenAI Agent supports advanced features for improved conversation handling:

Langfuse Integration

You can integrate Langfuse for observability by using the openai option with a Langfuse-wrapped OpenAI client:

import { OpenaiAgent } from '@micdrop/openai'
import { Langfuse, observeOpenAI } from 'langfuse'
import OpenAI from 'openai'

// Initialize Langfuse
const langfuse = new Langfuse({
secretKey: process.env.LANGFUSE_SECRET_KEY,
publicKey: process.env.LANGFUSE_PUBLIC_KEY,
baseUrl: process.env.LANGFUSE_BASE_URL, // Optional, defaults to https://cloud.langfuse.com
})

// Get system prompt from Langfuse
const systemPrompt = await langfuse.getPrompt('voice-assistant-system-prompt')

// Create OpenAI client and wrap with Langfuse observability
const openai = observeOpenAI(
new OpenAI({ apiKey: process.env.OPENAI_API_KEY }),
{
sessionId: 'session-123',
userId: 'user-456',
}
)

// Create agent with Langfuse-wrapped OpenAI client
const agent = new OpenaiAgent({
openai,
model: 'gpt-4o',
systemPrompt: systemPrompt.prompt,
})

This integration will automatically track all OpenAI API calls, token usage, and conversation flows in your Langfuse dashboard with session and user context.

OpenAI STT (Speech-to-Text)

Real-time speech-to-text implementation using OpenAI's WebSocket-based real-time transcription API.

Usage

import { OpenaiSTT } from '@micdrop/openai'
import { MicdropServer } from '@micdrop/server'

const stt = new OpenaiSTT({
apiKey: process.env.OPENAI_API_KEY || '',
model: 'gpt-4o-transcribe', // Default real-time transcription model
language: 'en', // Optional: specify language for better accuracy
prompt: 'Transcribe the incoming audio in real time.', // Optional: custom prompt
transcriptionTimeout: 4000, // Optional: timeout in ms for transcription
})

// Use with MicdropServer
new MicdropServer(socket, {
stt,
// ... other options
})

Options

OptionTypeDefaultDescription
apiKeystringRequiredYour OpenAI API key
modelstring'gpt-4o-transcribe'Real-time transcription model to use
languagestring'en'Language code for transcription
promptstring'Transcribe the incoming audio in real time.'Custom prompt to guide transcription behavior
transcriptionTimeoutnumber4000Timeout in milliseconds to wait for transcription result