tarsier
Basic Information
Tarsier is a developer-focused Python library that provides visual perception utilities for web interaction agents. It is designed to convert web pages and screenshots into structured, LLM-friendly representations and to tag interactable page elements with stable IDs so an LLM can reference and act on them. The project addresses common problems when using LLMs to automate browser tasks, including how to represent page structure, how to map natural-language actions back to DOM elements, and how to convey visual layout to text-only models. The README includes usage examples that integrate with Playwright and shows how to obtain a text representation and a mapping from tags to xpaths. The project is distributed on PyPI and intended for use inside agent stacks such as LangChain and LlamaIndex.