Report Abuse

Basic Information

PC Agent is a research and developer-focused framework designed to create autonomous digital agents by transferring human cognitive patterns into agent behavior. The repository bundles tools and reference code to collect large-scale human-computer interaction traces, transform raw interaction logs into structured cognitive trajectories, and run a multi-agent system that separates planning from visual grounding. It targets researchers and engineers who want to study or build agents that can autonomously control a Windows PC to complete complex multi-step tasks. The project includes a data collection client called PC Tracker, post-processing pipelines to refine and complete cognition signals, and a reference agent implementation with deployment scripts. The README also points to a paper, demo materials, and a released companion model, and documents environment setup steps and required API credentials for certain post-processing stages.

Links

App Details

Features
The repository provides three core components: PC Tracker for lightweight, customizable human-computer interaction data collection with packaging instructions for Windows and configurable tasks.json, a post-processing pipeline with refinement.py and completion.py to convert raw events into cognitive trajectories and which requires an OpenAI API key for completion steps, and a reference multi-agent system in the agent/ directory combining a planning agent and a grounding agent. Additional features include example data in postprocess/data, build and deployment helper scripts in agent/server, README documentation, demo media showing autonomous task execution, and pointers to an academic paper, project website, and associated model and dataset releases.
Use Cases
This repository helps researchers and developers reproduce experiments and accelerate development of autonomous desktop agents by providing end-to-end tools from data collection to agent execution. PC Tracker enables scalable annotation of realistic user interactions so teams can gather training data. The post-processing pipeline standardizes and enriches event logs into cognitive trajectories suitable for model training or analysis. The provided multi-agent reference demonstrates how to separate strategic planning from robust visual grounding when automating complex GUI tasks, and server scripts illustrate deployment patterns. Environment configuration via environment.yml and step-by-step instructions lower the barrier to run experiments. The included paper, demo, and links to model and dataset releases support validation, benchmarking, and further research.

Please fill the required fields*