Features
Supports multiple model providers (OpenAI, xAI, Anthropic, Ollama) and multiple TTS engines (XTTS, OpenAI TTS, ElevenLabs, Kokoro). Real-time WebRTC integration with OpenAI Realtime enables continuous, low-latency, interruptible voice conversations. OpenAI Enhanced mode uses newer TTS and transcription models such as gpt-4o-mini-tts and gpt-4o-mini-transcribe. Flexible transcription options let you use cloud transcription or local Faster Whisper. Sentiment analysis maps user mood to response styles via mood prompts. Web UI and terminal CLI are provided. Includes a large set of ready characters, tools for adding custom characters, interactive game modes and story adventures (15+ game types). Docker support with CPU and CUDA images and prebuilt containers. Audio commands and screen analysis integration using llava or fallback providers are included. Extensive troubleshooting guidance and example .env configuration are provided.
Use Cases
For end users and hobbyists it provides an out-of-the-box voice-first conversational experience with configurable voices, models and character personalities. Real-time WebRTC support enables natural back-and-forth speech and interruptible responses for more human-like interactions. Multiple TTS and transcription options let users prioritize quality, latency, cost or privacy, including fully local modes with XTTS or Kokoro and local Faster Whisper for transcription. Built-in games and story modes turn conversations into interactive entertainment or role play. Docker images and clear installation instructions simplify deployment on Windows, Linux, MacOS or WSL, and the terminal CLI offers an accessible local alternative to the Web UI. The project also includes troubleshooting tips for audio, CUDA and dependency issues to help users get running reliably.