optillm
Basic Information
optillm is an OpenAI API–compatible inference proxy designed to improve the accuracy and performance of large language models at inference time. It is intended for developers and researchers who want to apply state-of-the-art inference-time techniques to boost reasoning on coding, logical and mathematical queries without changing client code. The proxy can run locally or forward to remote providers, expose the same chat completions endpoint as OpenAI, and act as a drop-in replacement by changing the base_url. It supports running a built-in local inference server with HuggingFace models and LoRAs, wrapping other providers via LiteLLM, and connecting to external model servers. The project centralizes many optimization strategies and plugins to let engineers experiment with and deploy inference-time enhancements in existing tools and workflows.