LLM Tool Survey

Report Abuse

Basic Information

This repository collects and organizes the literature on tool learning with large language models. It accompanies the authors' survey paper and serves as a curated bibliography and reference for research on how LLMs integrate and learn to use external tools. The README presents a clear motivation for tool learning, a taxonomy of methods, and a workflow diagram that breaks the problem into four stages: task planning, tool selection, tool calling, and response generation. It aggregates representative papers across benefits, methods, benchmarks, and evaluation protocols. The repo also summarizes benchmarks, evaluation metrics, challenges, future directions, and other resources, and includes citation information for the accepted survey paper. The collection aims to reduce fragmentation in the literature and provide a structured entry point for researchers and practitioners interested in tool-augmented LLMs.

Links

App Details

Features
Comprehensive paper list grouped by themes such as benefits of tools, how tool learning is implemented, and subtopics like planning, selection, calling, and response generation. A taxonomy and a workflow diagram that clarifies four key stages of tool learning. A detailed benchmarks table listing many evaluation suites, their scope, number of tools and instances, and intended use cases. Organized coverage of tuning-free and tuning-based methods, retriever and LLM-based tool selection, and benchmarks and evaluation metrics mapped to stages. Sections on challenges, future directions and auxiliary resources including awesome lists and other surveys. Citation entry and contribution guidance are provided to encourage community updates.
Use Cases
For researchers the repo provides a single curated source to survey prior work, compare approaches, and identify benchmarks and evaluation protocols suited to specific subproblems in tool learning. For practitioners it helps locate datasets and benchmarks for testing tool-augmented agents and suggests metrics for measuring planning, selection, calling and generation performance. For newcomers it offers a guided taxonomy, examples of representative papers across domains, and pointers to additional resources and multilingual summaries. The organized presentation of challenges and future directions supports idea generation and helps align experimental design with open problems in robustness, latency, safety and multi-modal tool learning.

Please fill the required fields*