Report Abuse

Basic Information

Voice Chat AI is an application for interacting with AI characters using natural speech. It is designed to run locally or in Docker and can connect to multiple chat model providers including OpenAI, xAI, Anthropic and Ollama. Users can choose from a large library of built-in characters, add new characters by providing a prompt and mood prompts, and switch TTS and transcription providers on the fly via a Web UI or a feature-rich terminal CLI. The project supports real-time WebRTC conversations with OpenAI Realtime for continuous, interruptible voice exchanges and also offers local options such as XTTS, Kokoro TTS and Faster Whisper transcription for offline or GPU-accelerated setups. Configuration is handled through a .env file and the app is served via uvicorn on port 8000.

Links

Categorization

App Details

Features
Supports multiple model providers (OpenAI, xAI, Anthropic, Ollama) and multiple TTS engines (XTTS, OpenAI TTS, ElevenLabs, Kokoro). Real-time WebRTC integration with OpenAI Realtime enables continuous, low-latency, interruptible voice conversations. OpenAI Enhanced mode uses newer TTS and transcription models such as gpt-4o-mini-tts and gpt-4o-mini-transcribe. Flexible transcription options let you use cloud transcription or local Faster Whisper. Sentiment analysis maps user mood to response styles via mood prompts. Web UI and terminal CLI are provided. Includes a large set of ready characters, tools for adding custom characters, interactive game modes and story adventures (15+ game types). Docker support with CPU and CUDA images and prebuilt containers. Audio commands and screen analysis integration using llava or fallback providers are included. Extensive troubleshooting guidance and example .env configuration are provided.
Use Cases
For end users and hobbyists it provides an out-of-the-box voice-first conversational experience with configurable voices, models and character personalities. Real-time WebRTC support enables natural back-and-forth speech and interruptible responses for more human-like interactions. Multiple TTS and transcription options let users prioritize quality, latency, cost or privacy, including fully local modes with XTTS or Kokoro and local Faster Whisper for transcription. Built-in games and story modes turn conversations into interactive entertainment or role play. Docker images and clear installation instructions simplify deployment on Windows, Linux, MacOS or WSL, and the terminal CLI offers an accessible local alternative to the Web UI. The project also includes troubleshooting tips for audio, CUDA and dependency issues to help users get running reliably.

Please fill the required fields*