LLaMA Factory

Report Abuse

Basic Information

LLaMA-Factory is an open source framework for researchers and engineers to fine-tune, train, evaluate and deploy large language and vision-language models. The project unifies many training modes (pre-training, supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO and other preference/RL approaches) and supports over a hundred model families and sizes. It provides zero-code and scriptable workflows via a CLI and an all-in-one Web UI (LLaMA Board) so users can run LoRA/QLoRA/freeze/full tuning and export merged models. The README documents supported models, datasets, hardware and backend options including CUDA, ROCm and Ascend NPU, and shows quickstart commands, examples and Docker images. The repo also integrates experiment logging and inference backends to simplify moving from training to an OpenAI-style API deployment.

Links

Categorization

App Details

Features
The repository lists comprehensive feature support: broad model compatibility (LLaMA, LLaVA, Mistral, Qwen, Gemma, ChatGLM, Phi and many others), multiple tuning strategies (full, freeze, LoRA, QLoRA), and quantized training/inference options. It integrates advanced optimizers and algorithms such as GaLore, BAdam, APOLLO, Muon, DoRA, LongLoRA and LoRA+, and practical speed/memory tricks including FlashAttention-2, Unsloth and Liger Kernel. Backends and toolings include vLLM and SGLang for fast inference, bitsandbytes/AQLM/AWQ/GPTQ/LLM.int8 for quantization, experiment monitors (LlamaBoard, TensorBoard, Wandb, MLflow, SwanLab), Docker images and prebuilt Colab/Cloud examples. The repo provides many curated datasets, example configs, export/merge utilities and an OpenAI-style demo API for straightforward integration into applications.
Use Cases
LLaMA-Factory helps practitioners accelerate and standardize model fine-tuning and deployment. It reduces friction by bundling installable extras, prebuilt Docker images, quickstart commands and a GUI for zero-code runs, enabling teams to tune many architectures and sizes without implementing training pipelines from scratch. Hardware and backend guidance (CUDA, ROCm, Ascend NPU) and resource estimates let users plan GPU/quantization strategies such as QLoRA or full fp16 training. Built-in support for RL and preference learning methods, logging to W&B or SwanLab, example datasets and export paths to merged model files and an OpenAI-style API make it easier to evaluate, serve and integrate custom models in production or research workflows. The README also documents compatibility notes, changelog and community projects using the framework.

Please fill the required fields*