optillm

New

optillm is an OpenAI API‚Äìcompatible inference proxy designed to improve the accuracy and performance of large language models at inference time. It is intended for developers and researchers who want to apply state-of-the-art inference-time techniques to boost reasoning on coding, logical and mathematical queries without changing client code. The proxy can run locally or forward to remote providers, expose the same chat completions endpoint as OpenAI, and act as a drop-in replacement by changing the base_url. It supports running a built-in local inference server with HuggingFace models and LoRAs, wrapping other providers via LiteLLM, and connecting to external model servers. The project centralizes many optimization strategies and plugins to let engineers experiment with and deploy inference-time enhancements in existing tools and workflows.

Stars

2754

App URL

https://github.com/codelion/optillm

Github Repository

https://github.com/codelion/optillm/blob/main/README.md

Features

The repository implements many inference-time optimization approaches including CePO, CoT with reflection, PlanSearch, ReRead, Self-Consistency, Z3 integration, R* algorithm, LEAP, Round-Trip Optimization, Best-of-N, Mixture-of-Agents, MCTS, PV Game, CoT and Entropy decoding, and AutoThink. It includes plugins for system prompt learning, deep think, long-context processing, majority voting, MCP client integration, routing, chain-of-code, memory, privacy/anonymization, URL reading, code execution, structured JSON outputs, generative selection, web search, and deep research. optillm supports provider flexibility (OpenAI, Azure, Cerebras, LiteLLM, local HuggingFace), LoRA stacking, configurable parameters and CLI/docker deployment, approach control via model-name slugs or request fields, and an automated test suite with CI.

Use Cases

optillm helps teams and researchers get better results from existing models by applying extra compute and inference strategies that improve reasoning and coding performance. As a transparent OpenAI-compatible proxy it integrates with existing clients and tools with minimal changes, enabling experimentation with single or combined techniques in pipelines or parallel. The MCP plugin lets models access filesystem, search and database tools securely to enrich context. Built-in local inference, LoRA support and provider wrapping let users run private or custom models and apply specialized decoding methods. The README documents benchmark and SOTA improvements on public evaluations and provides configuration, Docker and testing guidance to aid deployment, evaluation and reproducible experimentation.

optillm

Basic Information

Links

Categorization

App Details

Categories

Similar Listings

virtual lab

mnemo

EdgeChains

RAG Agents Accelerator

LLM Zero to Hundred

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags

More Filters

optillm

Categories

Similar Listings

virtual lab

mnemo

EdgeChains

RAG Agents Accelerator

LLM Zero to Hundred

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags