LLM Agents Papers

Report Abuse

Basic Information

This repository is a curated, continuously updated bibliography of academic and technical papers about large language model (LLM) based agents. It aggregates surveys, methodological studies, application papers, evaluation benchmarks, datasets, and infrastructure reports that relate to agent architectures, planning, memory, reflection and feedback, retrieval-augmented generation (RAG), tool usage, web and GUI agents, simulation and multi-agent systems. The README organizes entries by topic and subtopic (for example planning, memory, RAG, interaction, automation, training, scaling, stability and infrastructure) and records publication dates. Many entries indicate associated code artifacts where available. The list is intended as a one-stop reference to recent and foundational work in the LLM-agent research area and is explicitly maintained with timestamps showing frequent updates.

Links

Categorization

App Details

Features
Organized topical index covering surveys, enhancement techniques, interaction modes, applications, automation, training regimes, scaling and safety. Entries include publication date, paper title and an indication when code is available. Subsections provide deep coverage of techniques such as planning, memory mechanisms, feedback and reflection, RAG and search strategies. Application sections span domains including math, chemistry, biology, medicine, finance and software engineering. Infrastructure coverage lists benchmarks, evaluation suites, datasets and platforms. The README highlights recommended complementary paper lists and contains many recent entries through mid-2025. The repository is formatted as a human-readable, navigable README with permalinks to each subsection and consistent metadata per paper for quick scanning and follow-up.
Use Cases
Researchers, practitioners and students can use the list to discover relevant literature, identify state-of-the-art techniques and find implementations or code for reproducibility. It supports literature reviews, syllabus preparation, benchmark selection and gap analysis by grouping works by theme and application. The index of benchmarks and datasets helps select evaluation resources for experiments. The organization by method (planning, memory, RAG, tool usage, training, RL, DPO) and domain (medicine, finance, software engineering, chemistry, biology, simulation) aids domain-specific exploration. Frequent updates and date annotations make it useful to track emerging trends and recent advances. The repository also points readers to related curated lists for broader coverage.

Please fill the required fields*