Report Abuse

Basic Information

DynaSaur is a research-oriented framework for building dynamic, LLM-driven agents that use a programming language as the universal representation of actions. Instead of relying on a fixed set of declarative or prewired actions, the agent generates Python snippets at each step to invoke existing actions or to create new actions when needed. New actions can be authored from scratch or composed from existing primitives, enabling the system to expand a reusable action library over time. The repository provides the code, example entry point, and instructions to run the system, including environment variables, keys for Azure-based LLM and embedding usage, and steps to download the GAIA benchmark data. It is intended as a platform for experimentation and reproducible evaluation of dynamic action generation methods described in the linked paper.

Links

Categorization

App Details

Features
Generative action representation: the agent emits Python code to call or synthesize actions rather than selecting from a fixed action set. Action creation and composition: when existing actions are insufficient the agent can create new functions or compose primitives to solve tasks, thereby growing a reusable library. Recovery and robustness: the framework is designed to recover when no relevant actions exist or when actions fail due to edge cases. Embedding-based action retrieval: supports embeddings for action lookup and retrieval. Benchmarked: top performer on the GAIA benchmark and leading non-ensemble method at time of writing. Research-ready setup: includes environment and data download instructions, conda environment recommendation, and a runnable entry point script.
Use Cases
DynaSaur helps researchers and developers explore agents that dynamically generate executable actions, reducing the need to predefine exhaustive toolsets. It demonstrates how an LLM can synthesize new callable behaviors and compose existing ones to handle unforeseen tasks or edge cases, which can accelerate prototyping of more flexible agent behaviors. The repository includes setup instructions, required environment keys, and dataset download steps so users can reproduce experiments and evaluate performance on the GAIA benchmark. Because the agent produces Python code as actions, developers can inspect, reuse, and refine generated behaviors. The README notes current limitations and a TODO to add OpenAI API support, indicating active research and extension paths.

Please fill the required fields*