Report Abuse

Basic Information

This repository is a demo playground for building and experimenting with voice-enabled AI agents that combine large language models and real-time speech-to-speech processing. It demonstrates an end-to-end stack with a React + TypeScript frontend and a Python FastAPI backend. The demo integrates Google Gemini Live for AI model capabilities and VideoSDK for real-time audio/video transport and conferencing. It provides examples of agent lifecycle management, real-time audio pipelines, and integration with Gemini Realtime API to enable conversational voice agents. The README includes prerequisites, server and client setup steps, environment configuration, and sample API usage to run and test agents locally or via tunneling. The project is intended for prototyping interactive assistants, automated calling flows, and voice interfaces while showcasing configuration options and core components.

Links

Categorization

App Details

Features
The project offers real-time voice communication using VideoSDK and Gemini Realtime integration, configurable agent personalities including voice selection, temperature, top-p and top-k settings, and system prompts. It exposes meeting and agent APIs such as POST /join-agent and POST /leave-agent and a MeetingConfig structure for runtime parameters. Built-in features include AI outbound calling, cold calling workflows, voicemail generation, voice creation from samples, goal-based agents, and various voice profiles. Key backend components include MyVoiceAgent, AgentSession, RealTimePipeline, and GeminiRealtime. Frontend components include an AgentMeeting UI, toast notifications, and mobile responsiveness. Security and operational features cover environment variable management, CORS, token-based authentication, input validation, connection pooling, background tasks, and session management.
Use Cases
This demo helps developers and teams prototype and deploy voice-first conversational agents for customer service, sales automation, appointment scheduling, and market research. It supplies end-to-end setup instructions, dependency lists, and environment examples to run the server and client locally or via ngrok for testing. The configurable personality and voice settings let teams tailor agent tone, modality, and behavior for different scenarios. Prebuilt components and API endpoints accelerate integration into applications and make it easier to test outbound calling, IVR-like flows, voicemail automation, and executive assistant tasks. Security recommendations and performance optimizations help in moving from prototype to more robust deployments. The project also outlines future extensions such as multi-language support and CRM integration.

Please fill the required fields*