Report Abuse

Basic Information

Icepick is a TypeScript library for building fault-tolerant, durable AI agents that scale. It provides a lightweight, code-first abstraction for writing agents and tools as plain functions while offloading durable execution, queuing, scheduling, and checkpointing to the infrastructure layer. The project integrates with Hatchet's durable task queue so agent executions can be replayed, recovered, and resumed after failures or long waits for external events. Icepick is explicitly not a prescriptive framework for prompt design, memory, or LLM usage; instead it focuses on the execution model and orchestration so teams can integrate their existing business logic, data access patterns, and LLM calls. The README includes example agent and toolbox code, a CLI-based project bootstrapping command, and links to documentation and patterns for common agent workflows.

Links

App Details

Features
Durable execution and automatic checkpointing via a durable task queue to enable recovery and long-running waits. Code-first agent and tool APIs so agents and tools are plain TypeScript functions with input/output schemas. A toolbox abstraction to register and pick tools programmatically. Scheduling features including cron jobs, one-time scheduling, event listeners, and durable sleep. Scalability primitives such as distributed execution, global rate limits, concurrency control, priority queues, DAG support, event streaming, and sticky assignment/routing. Configurable retries, rate limiting, and flow control. Lightweight runtime that can run on container-based platforms and integrates with Hatchet for durable task storage. Included patterns for prompt chaining, routing, parallelization, multi-agent setups, and human-in-the-loop workflows.
Use Cases
Icepick helps teams build resilient agentic systems by handling the hard infrastructure problems so developers can focus on business logic and LLM integration. Its durable execution model reduces risk from machine failures by replaying execution history and resuming from checkpoints, which is useful for long-running workflows and human or external-event waits. The library simplifies scaling because scheduling and task distribution are handled across a fleet of workers, and features like rate limiting, concurrency control, and priority queues help manage third-party limits and throughput. Because agents are defined as functions, teams can reuse existing code, validate inputs/outputs, and choose their own memory and knowledge layers. The repo also provides CLI scaffolding, documentation, examples, and recommended best practices for safe agent design.

Please fill the required fields*