Report Abuse

Basic Information

VideoSDK AI Agents is an open-source Python SDK built on top of the VideoSDK platform to create real-time, multimodal conversational AI agents that join VideoSDK meetings as participants. It is designed for developers who want to bridge LLMs and speech models with live audio/video sessions, enabling agents to listen, speak, and interact with human participants or phone systems. The repository provides core agent classes, session management, pipeline primitives for realtime and cascading model flows, and support for registering external and internal function tools. It documents prerequisites such as a VideoSDK auth token, a meeting ID, Python 3.12+, and third-party API keys for STT/LLM/TTS providers. The codebase includes examples and a plugin architecture so teams can assemble pipelines with different STT, LLM, and TTS providers and extend agent capabilities for production use.

Links

App Details

Features
The project exposes a plugin-driven architecture and ready-made components for realtime audio/video interaction, SIP and telephony integration, and virtual avatars. It supports multiple realtime model providers and separate plugins for STT, LLM, TTS, VAD and turn detection. Developers get RealTime and Cascading pipeline types, built-in turn detection and voice activity handling, and a function tool system for external and internal callable tools. The repo lists supported providers such as OpenAI, Google Gemini, AWS NovaSonic and many STT/TTS vendors. Additional features include Model Context Protocol (MCP) integration for external data, an A2A protocol for agent-to-agent workflows, denoise and VAD plugins, example demos (telephony, avatar, conversational flow), and documentation plus a guide to build custom plugins.
Use Cases
This SDK simplifies building live voice and multimodal assistants by wiring together meeting connectivity, speech input/output, model inference, and application logic in a single framework. It reduces integration work by providing session orchestration, sample agent implementations, and prebuilt pipelines that handle turn-taking, VAD, and streaming responses. The plugin model lets teams swap or add STT/LLM/TTS providers without changing agent logic, and function tools let agents perform external actions like scheduling or API calls. SIP support enables deploying agents into phone systems and PSTN. Examples and documentation accelerate prototyping for use cases such as telephony appointment booking, avatar-based Q&A, and e-commerce voice flows. The project also includes guidance for creating custom plugins so teams can adapt the system to their preferred providers.

Please fill the required fields*