Report Abuse

Basic Information

This repository accompanies the Agent-FLAN research project and provides data, model checkpoints, and methods for fine-tuning large language models to act as agents. It documents key observations about agent training data and learning dynamics, and presents a data generation pipeline and training protocol designed to improve agent reasoning, tool use, and format following while reducing hallucinations. The repo releases the Agent-FLAN dataset and a finetuned Agent-FLAN-7B model that follow the Llama2-chat conversation format. It also links to the paper, model and dataset hubs, and reports comparative evaluations on held-in and held-out agent tasks. The scope is research and reproducible development for improving LLM agent capabilities rather than an end-user chatbot product.

Links

Categorization

App Details

Features
Provides a finetuned Agent-FLAN-7B model and the Agent-FLAN dataset produced by mixed training on AgentInstruct, ToolBench, and ShareGPT. Implements a data generation pipeline and a conversation template compatible with Llama2-chat. Introduces carefully constructed negative samples to mitigate hallucinations and documents a decomposition and redesign of training corpora to separate format following from agent reasoning. Includes evaluation results on held-in and held-out tasks showing improved performance relative to prior agent-tuning approaches. Built with and compatible with tooling such as Lagent and T-Eval. Releases are published under the Apache 2.0 license and accompanied by a citation to the arXiv paper.
Use Cases
This project is useful to researchers and engineers who want to improve or reproduce agent behaviors in open LLMs by providing models, datasets, and methodological guidance. The released finetuned checkpoint and dataset let practitioners apply the same fine-tuning recipe to Llama2-chat style models and test tool utilization and multi-step agent reasoning. The paper and artifacts explain data decomposition strategies and negative-sample construction that reduce hallucinations, and the evaluation benchmarks show where improvements are gained across tasks and model scales. Shared training protocols, templates, and comparative results make it easier to adopt, evaluate, and extend agent tuning techniques in research or development workflows.

Please fill the required fields*