Features
joinly provides live interaction capabilities that let agents respond by voice or chat and execute tasks during a meeting. It includes conversational flow handling for interruptions and multi speaker scenarios. Cross platform browser support lets agents join Google Meet, Zoom and Teams. The architecture is modular and supports bring your own LLM providers including OpenAI, Anthropic and local Ollama. Multiple speech providers are supported for transcription and synthesis such as Whisper, Deepgram, Kokoro and ElevenLabs. The MCP server exposes tools including join meeting, leave meeting, speak text, send chat message, mute and unmute, get chat history, get participants and get transcript. It also offers a subscribable live transcript resource and supports Docker based deployment, a CUDA image for GPU acceleration, a joinly client package and example configurations for integrating additional MCP servers.
Use Cases
joinly simplifies building meeting aware AI agents by providing the infrastructure and toolset required for real time participation. Developers can quickly deploy a self hosted MCP server or run a client in Docker and configure LLM, STT and TTS providers via environment variables or command line options. The exposed tools and live transcript resource enable common meeting automation tasks such as real time note taking, spoken responses, chat interactions, participant inspection and integrations demonstrated in demos like editing Notion or creating a GitHub issue. GPU enabled images improve transcription and TTS performance for production use. The project includes example clients, MCP configuration syntax for adding external tool servers, developer container support and debugging options to accelerate extension, testing and safe, privacy conscious deployments.