ai.robots.txt

Report Abuse

Basic Information

This repository is a curated, community-maintained list of AI-related web crawlers and related configuration snippets intended to help website operators identify and block or manage automated AI bots. It collects names and metrics for crawlers, documents how to implement exclusions using standard robots.txt semantics, and supplies ready-to-use server configuration fragments for common web servers and proxies. The project centralizes crawler metadata in a source file named robots.json and generates derivative artifacts such as robots.txt and server snippets via a GitHub Action. The README points to additional documentation including a table of bot metrics and a FAQ, notes that some entries were sourced from a third-party tracker, and explains contribution workflow and testing steps so maintainers can keep the list up to date.

Links

Categorization

App Details

Features
Provides a canonical robots.txt tailored to block or identify AI crawlers and includes multiple server-side blocking options: an .htaccess example for Apache, an nginx include snippet, a Caddyfile matcher group, and a HAProxy list file and usage example. Maintains a machine-readable robots.json as the authoritative source with automated generation of human-facing files and a table-of-bot-metrics document. Offers guidance on an additional meta tag for opting out of certain crawls, instructions for reporting abusive crawlers when used with Cloudflare, and a short test harness runnable with Python. The repo also includes contribution guidelines, links to external resources on bot blocking best practices, and a releases feed for notifications.
Use Cases
The repository helps website owners and administrators quickly adopt consistent rules to control AI-driven crawling across different hosting stacks without creating their own lists from scratch. It lowers the operational burden by supplying ready-made configuration snippets for common server technologies and a single source of truth that can be updated and regenerated automatically. It documents extra steps such as a specific meta tag for one major search provider and provides a path for reporting misbehaving crawlers. Contributors can add new bot entries and run included tests, enabling communities to keep blocking rules current as new crawlers appear. The materials are practical for webmasters, DevOps engineers, and site security teams.

Please fill the required fields*