Multi-Agent-GPT

Multi-Agent-GPT is a multimodal expert assistant platform implemented with agent patterns and RAG-inspired components. It is designed to enable text and image based conversational agents and to integrate additional modalities such as audio and video as development progresses. The project bundles agent definitions, tools for web search, image generation and image captioning, and model interfaces to services like ChatGPT, DALL¬¨‚àëE and BLIP. It targets local deployment workflows and includes instructions to run a Gradio-based UI by launching web.py. The README and repository structure emphasize a developer-facing codebase that demonstrates how to build, run and extend multimodal agents, host model files locally in Models/BLIP, configure API keys via an .env file, and experiment with single- and multi-turn chat scenarios while capturing agent logs for debugging.

Stars

237

Language

App URL

https://github.com/YangXuanyi/Multi-Agent-GPT

Github Repository

https://github.com/YangXuanyi/Multi-Agent-GPT/blob/main/README.md

Features

The repository contains a compact set of features for building and running multimodal agents. It supports single- and multi-turn chat, multimodal display and interaction for text and images, and agent abstractions under Agents/openai_agents.py. Tools include web search, DALL¬¨‚àëE-based image generation, and BLIP-based image captioning in the Tools directory. The project exposes model interfaces to ChatGPT, DALL¬¨‚àëE, Google Search and BLIP and uses a local BLIP model directory for image understanding. A simple Gradio UI is provided and launched via python ./web.py. Utility modules capture program logs, handle image processing and JSON utilities. The stack is Python, torch, LangChain and Gradio, and dependencies are installed via requirements.txt.

Use Cases

This repo is useful for developers who want a runnable example of a multimodal agent system and a starting point for prototyping RAG-style assistants. It supplies ready-made agent definitions, integration points for external model APIs, and local hooks to use a downloaded BLIP model for image understanding. The included web.py launches a local UI to interact with agents, making it easy to test single- and multi-turn conversations and multimodal queries. Utilities for logging and data I/O help observe agent behavior. The structure and roadmap clarify which features are implemented and which are planned, enabling incremental extension toward audio, video, private database retrieval and offline deployment.

Multi-Agent-GPT

Basic Information

Links

App Details

Categories

Similar Listings

yutu

vibevideo-mcp

xpert

Curie

Open_Data_QnA

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags

More Filters

Multi-Agent-GPT

Categories

Similar Listings

yutu

vibevideo-mcp

xpert

Curie

Open_Data_QnA

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags