Skip to main content

OpenAI

OpenAI implementation for @micdrop/server.

This package provides AI agent and speech-to-text implementations using OpenAI's API.

Installation

npm install @micdrop/openai

OpenAI Agent

Usage

import { OpenaiAgent } from '@micdrop/openai'
import { MicdropServer } from '@micdrop/server'

const agent = new OpenaiAgent({
apiKey: process.env.OPENAI_API_KEY || '',
model: 'gpt-4o', // Default model
systemPrompt: 'You are a helpful assistant',

// Advanced features (optional)
autoEndCall: true, // Automatically end call when user requests
autoSemanticTurn: true, // Handle incomplete sentences
autoIgnoreUserNoise: true, // Filter out meaningless sounds

// Custom OpenAI settings (optional)
settings: {
temperature: 0.7,
max_output_tokens: 150,
},
})

// Use with MicdropServer
new MicdropServer(socket, {
agent,
// ... other options
})

Options

OptionTypeDefaultDescription
apiKeystringRequiredYour OpenAI API key
modelstring'gpt-4o'OpenAI model to use
systemPromptstringRequiredSystem prompt for the agent
autoEndCallboolean | stringfalseAuto-detect when user wants to end call
autoSemanticTurnboolean | stringfalseHandle incomplete user sentences
autoIgnoreUserNoiseboolean | stringfalseFilter meaningless user sounds
extractExtractJsonOptions | ExtractTagOptionsundefinedExtract structured data from responses
settingsobject{}Additional OpenAI API parameters

The OpenAI Agent supports adding and removing custom tools to extend its capabilities.

Adding Tools

Use addTool(tool: Tool) to add custom functions that the agent can call during conversations:

import { z } from 'zod'

const agent = new OpenaiAgent({
apiKey: process.env.OPENAI_API_KEY || '',
systemPrompt: 'You are a helpful assistant that can manage user information.',
})

// Add a simple tool without parameters
agent.addTool({
name: 'get_time',
description: 'Get the current time',
callback: () => new Date().toLocaleTimeString(),
})

// Add a tool with typed parameters using Zod schema
agent.addTool({
name: 'set_user_info',
description: 'Save user information to the database',
parameters: z.object({
city: z.string().describe('City'),
jobTitle: z.string().describe('Job title').nullable(),
experience: z
.number()
.describe('Number of years of experience of the user')
.nullable(),
}),
callback: ({ city, jobTitle, experience }) => {
// Your implementation here
console.log('Saving user:', { city, jobTitle, experience })
return { success: true, message: 'User information saved' }
},
})

// Add a tool that returns data for the conversation
agent.addTool({
name: 'search_database',
description: 'Search for items in the database',
parameters: z.object({
query: z.string().describe('Search query'),
limit: z.number().default(10).describe('Maximum number of results'),
}),
callback: async ({ query, limit }) => {
// Your search implementation
const results = await searchDatabase(query, limit)
return { results, total: results.length }
},
emitOutput: true, // Enable tool call events
})

Tool Options

OptionTypeDefaultDescription
namestringRequiredUnique name for the tool
descriptionstringRequiredDescription of what the tool does
parametersz.ZodObjectOptionalZod schema for parameter validation
callback(input) => any | Promise<any>RequiredFunction to execute when tool is called
skipAnswerbooleanfalseSkip assistant response after tool call
emitOutputbooleanfalseEmit ToolCall events for monitoring
tip

If emitOutput is true, the tool call output is also sent to the client and available with the ToolCall event.

Removing Tools

Use removeTool(name: string) to remove tools by name:

// Remove a specific tool
agent.removeTool('get_time')

Getting Tools

Use getTool(name: string) to retrieve a tool by name:

// Get a specific tool (undefined if not found)
const tool = agent.getTool('get_time')

Tool Call Events

Monitor tool executions in real-time by enabling the emitOutput option and listening for ToolCall events:

import { OpenaiAgent } from '@micdrop/openai'

const agent = new OpenaiAgent({
apiKey: process.env.OPENAI_API_KEY || '',
systemPrompt: 'You are a helpful assistant with access to tools.',
})

// Add tool with emitOutput enabled
agent.addTool({
name: 'save_user_data',
description: 'Save user data to the system',
parameters: z.object({
name: z.string(),
email: z.string().email(),
}),
emitOutput: true, // Enable events for this tool
})

// Listen for tool call event
agent.on('ToolCall', (toolCall) => {
console.log(`Tool called: ${toolCall.name}`)
console.log('Parameters:', toolCall.parameters)
console.log('Output:', toolCall.output)

if (toolCall.name === 'save_user_data') {
// Sync data to external systems
syncToDatabase(toolCall.output)
}
})
tip

It may be easier to use tool callback option if you don't need to emit the output to the client.

Tool Call Event Structure

The ToolCall event provides complete information about tool execution:

interface ToolCall {
name: string // Tool name that was called
parameters: any // Parameters passed to the tool
output: any // Result returned by the tool callback
}

Auto End Call

When enabled, the agent automatically detects when the user wants to end the conversation and triggers the call termination:

const agent = new OpenaiAgent({
apiKey: process.env.OPENAI_API_KEY || '',
systemPrompt: 'You are a helpful assistant',
autoEndCall: true, // Use default detection
// or provide custom prompt:
// autoEndCall: 'User is saying goodbye or wants to hang up',
})

Semantic Turn Detection

Handles cases where users speak incomplete sentences, allowing for more natural conversation flow:

const agent = new OpenaiAgent({
apiKey: process.env.OPENAI_API_KEY || '',
systemPrompt: 'You are a helpful assistant',
autoSemanticTurn: true, // Wait for complete thoughts
// or provide custom prompt:
// autoSemanticTurn: 'Last user message is an incomplete sentence',
})

User Noise Filtering

Filters out meaningless sounds like "uh", "hmm", "ahem" that don't carry conversational meaning:

const agent = new OpenaiAgent({
apiKey: process.env.OPENAI_API_KEY || '',
systemPrompt: 'You are a helpful assistant',
autoIgnoreUserNoise: true, // Ignore filler sounds
// or provide custom prompt:
// autoIgnoreUserNoise: 'Last user message is just an interjection',
})

Extract Options

The OpenAI Agent can extract structured data from assistant responses, such as JSON objects or content between custom tags. The extracted data can be processed via callbacks and saved to message metadata.

By outputting message first an then the data to extract, you can maintain a low latency. Micdrop streams immediately the answer and stops streaming when the data to extract starts.

JSON Extraction

Extract JSON objects from the end of assistant responses:

const agent = new OpenaiAgent({
apiKey: process.env.OPENAI_API_KEY || '',
systemPrompt: `You are a helpful assistant that extracts user information.
When collecting user details, append them as JSON at the end of your response.`,
extract: {
json: true,
callback: (data) => {
console.log('Extracted data:', data)
},
saveInMetadata: true,
},
})

Example conversation:

  • Input: "I'm John, 25 years old, living in Paris"
  • Received output: "Nice to meet you John! I've noted your information. {"name": "John", "age": 25, "city": "Paris"}"
  • Message: {"role": "assistant", "content": "Nice to meet you John! I've noted your information.", "metadata": {"extracted": {"name": "John", "age": 25, "city": "Paris"}}}
  • Callback: {"name": "John", "age": 25, "city": "Paris"}

Custom Tag Extraction

Extract content between custom start and end tags:

const agent = new OpenaiAgent({
apiKey: process.env.OPENAI_API_KEY || '',
systemPrompt: `You are a task management assistant.
When creating tasks, wrap the task details in <TASK></TASK> tags at the end.`,
extract: {
startTag: '<TASK>',
endTag: '</TASK>',
callback: (taskData) => {
console.log('New task created:', taskData)
createTask(taskData)
},
saveInMetadata: true,
},
})

Example conversation:

  • Input: "Remind me to call mom tomorrow at 3pm"
  • Received output: "I'll create that reminder for you! <TASK>Call mom tomorrow at 3pm - priority: normal</TASK>"
  • Message: {"role": "assistant", "content": "I'll create that reminder for you!", "metadata": {"extracted": "Call mom tomorrow at 3pm - priority: normal"}}
  • Callback: {"task": "Call mom tomorrow at 3pm - priority: normal"}

Extract Options

OptionTypeDescription
jsonbooleanExtract JSON objects (uses { and } as tags)
startTagstringCustom start tag for extraction
endTagstringCustom end tag for extraction
callback(value: any) => voidFunction called with extracted data
saveInMetadatabooleanSave extracted data to message metadata

Best Practices

  • Provide clear system prompts about where and how to include extractable data
  • Extracted content must be at the end of responses
  • Keep in mind metadata is also passed to the client, so be aware that the user can access it

OpenAI STT (Speech-to-Text)

Usage

import { OpenaiSTT } from '@micdrop/openai'
import { MicdropServer } from '@micdrop/server'

const stt = new OpenaiSTT({
apiKey: process.env.OPENAI_API_KEY || '',
model: 'whisper-1', // Default Whisper model
language: 'en', // Optional: specify language for better accuracy
})

// Use with MicdropServer
new MicdropServer(socket, {
stt,
// ... other options
})

Options

OptionTypeDefaultDescription
apiKeystringRequiredYour OpenAI API key
modelstring'whisper-1'Whisper model to use
languagestringOptionalLanguage code for transcription