Report Abuse

Basic Information

This repository provides TARS, a multimodal AI Agent stack that includes Agent TARS and the UI-TARS Desktop application. It is designed to let developers and advanced users run and extend agents that combine vision, GUI automation, and large language models. It ships a CLI, a Web UI, a native desktop GUI agent, an SDK and documentation for building GUI automation agents and operators. The stack supports local and remote computer and browser operators, integration with MCP servers to connect to real-world tools, and deployment guides for cloud and local model hosting. The project targets workflows that aim for human-like task completion across terminal, desktop and browser environments using multimodal LLMs and visual grounding.

Links

Categorization

App Details

Features
One-click out-of-the-box CLI supporting both headful Web UI and headless server execution. A hybrid browser agent capable of GUI, DOM or hybrid control strategies. An Event Stream protocol that drives context engineering and the Agent UI. MCP integration and support for mounting MCP Servers to connect external tools. UI-TARS Desktop features include screenshot and visual recognition, precise mouse and keyboard control, cross-platform support, local and remote operators, real-time feedback and secure local processing. Provides an SDK for building GUI automation agents and references to Seed/UI-TARS and Vision-Language models. Quick-starts via npx and npm and example showcases are provided.
Use Cases
The stack enables automation and orchestration of complex multimodal tasks by combining vision and language with GUI control. Users can automate browser workflows, control local or remote computers and browsers, perform visual grounding using screenshots, and build custom operators that integrate with external services through MCP. The CLI and Web UI let teams prototype agents quickly while the SDK and documentation support developer extension and deployment to cloud or local model hosts. The desktop app offers private local processing and remote operator options for sandboxed or cloud-based control. Example scenarios include booking hotels, generating charts, and programmatically adjusting application settings.

Please fill the required fields*