llama_deploy

New

LlamaDeploy is an async-first framework designed to deploy, scale, and productionize agentic multi-service systems built with workflows from llama_index. It lets developers convert workflows they prototype in notebooks into cloud-run services with minimal or no changes to the original code. Workflows are exposed as HTTP-accessible services so user interfaces or other services can call them. The project provides both a Python SDK and an interactive CLI called llamactl to scaffold projects, deploy workflow definitions, and run deployments. It also includes a control-plane API server component to host and manage deployed workflows. The repository includes examples and templates to demonstrate end-to-end deployments, and the package is installable via pip.

Stars

2048

App URL

https://github.com/run-llama/llama_deploy

Github Repository

https://github.com/run-llama/llama_deploy/blob/main/README.md

Features

Async-first runtime and design optimized for high-concurrency and real-time scenarios. Ability to run llama_index workflows as standalone services accessible via HTTP. CLI tool llamactl for scaffolding projects, interactive or non-interactive initialization, deploying YAML deployment files, and running deployments. Python SDK for programmatic control and integration. Hub-and-spoke architecture to swap or add components without full system changes. Built-in production concerns such as retry mechanisms and failure handling for fault tolerance. Control-plane API server module for hosting and managing services. Example templates and demos showing integrations like message queues and web UIs. Packaged for easy installation with pip.

Use Cases

LlamaDeploy reduces friction moving from local experimentation to cloud-hosted agentic services by keeping your workflow code largely unchanged while adding deployment, scaling, and runtime management. The CLI and SDK let developers scaffold, deploy, and invoke workflows quickly, enabling fast iteration and testing in production-like environments. Its async-first model and fault-tolerant features make it suitable for high-throughput and concurrent workloads. The hub-and-spoke design simplifies evolving architectures by allowing component swaps and incremental changes. Included examples and templates accelerate real-world integrations such as message queues and web frontends. Overall, it helps teams operationalize llama_index workflows into manageable, observable, and resilient services.

llama_deploy

Basic Information

Links

Categorization

App Details

Categories

Similar Listings

virtual lab

mnemo

EdgeChains

RAG Agents Accelerator

LLM Zero to Hundred

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags

More Filters

llama_deploy

Categories

Similar Listings

virtual lab

mnemo

EdgeChains

RAG Agents Accelerator

LLM Zero to Hundred

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags