Report Abuse

Basic Information

Agent-K is an agentic knowledge base system that uses large language model agents to automate question answering over hybrid tabular and document data such as tables, reports and PDFs. The repository focuses on automating the end-to-end workflow of extracting and enriching structured data from sources like MRDS and 43-101 reports, loading and persisting data in DuckDB, and answering complex queries without manual entity resolution, database construction or explicit text-to-SQL engineering. It targets use cases that require cross-document and cross-table reasoning, for example mineral resource queries, and cites applicability to healthcare, financial analysis and academic research. The project includes a Database Agent for SQL-style interaction and a PDF Agent with fast and slow extractors. It requires Python 3.12+ and an OpenAI API key and provides scripted data setup and development tooling.

Links

Categorization

App Details

Features
The repository provides an LLM-powered Database Agent offering SQL assistance, automatic SQL error correction, schema introspection and validation, persistent storage via DuckDB, and CSV export support. A comprehensive data setup script orchestrates MRDS data download and filtering, DuckDB loading, MinMod Hyper enrichment, concurrent download of 43-101 PDF reports, and construction of a match-based evaluation JSONL dataset. The PDF Agent contains a Fast Extractor for batch structured entity extraction and a Slow Extractor that uses dynamic tool calling and a map-reduce style parallelization to handle complex entities. Development tooling includes ruff for linting, mypy for type checking, and pre-commit hooks. Configuration is driven by environment variables and a .env example.
Use Cases
Agent-K reduces manual work required to build unified queryable datasets from heterogeneous reports and tables by automating data download, extraction, enrichment and ingestion into DuckDB and by letting LLM agents interact with schemas and run corrected SQL. This accelerates research and analysis workflows in domains that combine many PDFs and tables, such as mining, healthcare, finance and academia. The PDF Agent enables both quick batch extraction and deeper, slower extraction for complex entities, enabling flexible trade-offs between speed and accuracy. Export features and DuckDB persistence make it easy to integrate outputs into downstream analytics. Setup scripts and example commands enable reproducible initialization and experimentation.

Please fill the required fields*