agents
Basic Information
VideoSDK AI Agents is an open-source Python SDK built on top of the VideoSDK platform to create real-time, multimodal conversational AI agents that join VideoSDK meetings as participants. It is designed for developers who want to bridge LLMs and speech models with live audio/video sessions, enabling agents to listen, speak, and interact with human participants or phone systems. The repository provides core agent classes, session management, pipeline primitives for realtime and cascading model flows, and support for registering external and internal function tools. It documents prerequisites such as a VideoSDK auth token, a meeting ID, Python 3.12+, and third-party API keys for STT/LLM/TTS providers. The codebase includes examples and a plugin architecture so teams can assemble pipelines with different STT, LLM, and TTS providers and extend agent capabilities for production use.