Report Abuse

Basic Information

This repository provides the GLM-4.5 family of open-source foundation language models and associated tooling intended for building intelligent agents. It hosts model artifacts and documentation for GLM-4.5 and GLM-4.5-Air, including base and FP8 variants, and describes parameter counts and active parameter configurations. The models are described as hybrid reasoning models with two inference modes, thinking mode for complex reasoning and tool usage and non-thinking mode for immediate responses. The README includes evaluation results, system requirements for inference and fine-tuning, guidance for running the models with frameworks such as transformers, vLLM and SGLang, and notes that the code and models are released under the MIT license. The repo also provides quick-start commands, model download references, and example settings for speculative inference and large context usage.

Links

Categorization

App Details

Features
Large-scale hybrid models in the GLM-4.5 series with two main variants: GLM-4.5 (355 billion total, 32 billion active) and GLM-4.5-Air (106 billion total, 12 billion active). Dual-mode inference with a thinking mode optimized for reasoning and tool use and a non-thinking mode for immediate responses. Available in BF16 and FP8 precisions and published base and FP8 releases. Integrations and example usage for transformers, vLLM and SGLang are included along with tool-call and reasoning parsers. Support for long context lengths (up to 128K under recommended hardware), speculative inference settings documented, and deployment guidance for H100/H200-class GPUs. Fine-tuning recipes and hardware profiles for Lora, SFT and RL with Llama Factory and Swift are provided. Quick-start scripts, requirements, and example API request templates are supplied.
Use Cases
This repository helps developers and researchers who want to build, evaluate or deploy advanced agent-capable language models by providing open-source model weights, inference instructions, and integration examples. It enables building agents that require complex reasoning, tool calling and coding abilities by offering hybrid reasoning modes and built-in parsers for tool and reasoning calls. The availability of BF16 and FP8 formats and detailed GPU configuration guidance supports efficient large-scale inference and full-length context workloads. Fine-tuning guidance and supported workflows make customization possible for domain adaptation or RL/SFT experiments. Example commands for vLLM, SGLang and transformers and model artifacts on public model hubs facilitate testing, deployment and downstream development under an MIT license.

Please fill the required fields*