Report Abuse

Basic Information

This repository presents Ghost in the Minecraft (GITM), a research framework that integrates large language models with text-based knowledge and memory to build generally capable agents for the open-world game Minecraft. The project demonstrates a hierarchical LLM-based agent paradigm that decomposes high-level goals into sub-goals, plans structured actions, and translates those into low-level keyboard/mouse operations. It is intended to explore how LLMs can handle long-horizon, complex tasks and adapt to environment uncertainty in a Minecraft setting. The README documents experimental results, architecture components, demo videos, and quantitative comparisons against prior RL-based methods, emphasizing broad task coverage across the Minecraft Overworld technology tree and improved training efficiency on modest hardware.

Links

Categorization

App Details

Features
GITM implements a three-part LLM agent architecture comprising a LLM Decomposer to split goals into sub-goals, a LLM Planner to generate structured action sequences, and a LLM Interface to map structured actions to keyboard and mouse operations. The framework incorporates text-based external knowledge and a summarized text memory to enhance planning and reuse successful action lists. Empirical highlights include unlocking 100% of items in the Overworld technology tree, a 67.5% success rate on the ObtainDiamond challenge, and nonzero success across all items where prior methods covered only 30%. The project emphasizes training efficiency, requiring only a single CPU node with 32 cores and no GPUs for training.
Use Cases
GITM provides a concrete demonstration that LLM-driven hierarchical agents can solve long-horizon, multi-step tasks in a complex open-world environment. For researchers it supplies an architecture and empirical benchmarks showing higher task coverage and improved sample and compute efficiency compared to prior RL approaches. The text-based knowledge and memory design offers a reproducible approach to incorporate internet knowledge and to record effective action sequences for future planning. Practitioners can study its decomposition, planning and interface modules to prototype interactive agents that operate across diverse biomes, lighting conditions and adversarial encounters in Minecraft, and to explore LLM capabilities for similar open-world challenges.

Please fill the required fields*