Report Abuse

Basic Information

Open Data QnA is a Python-based solution accelerator and library that enables conversational access to SQL databases by leveraging LLM agents on Google Cloud. It is designed to let users ask natural language questions against PostgreSQL (Cloud SQL) and BigQuery datasets and receive SQL-generated queries, validated results, and natural language responses without requiring SQL knowledge. The repository provides modular components including database connectors, vector stores, agent implementations, notebooks, a CLI, Terraform deployment scripts, and optional frontend and backend APIs. It targets developers and teams who want to build or deploy chat-driven data exploration tools on GCP using Python, Vertex AI embeddings, Firestore for session logs, and optional PGVector or BigQuery vector stores. The README documents setup options, prerequisites such as Python >= 3.10 and Google Cloud CLI, and deployment workflows for notebooks, CLI runs, or Terraform.

Links

Categorization

App Details

Features
The project offers conversational querying with multi-turn follow-ups, automatic SQL generation, validation and debugging of queries, and natural language result summaries. It supports table grouping and multi-schema/dataset grouping for focused retrieval and prompt context. Prompts are customizable via YAML and the system can ingest known-good SQL examples for few-shot prompting. Vector store support includes PGVector on Cloud SQL and a BigQuery vector store, and embeddings can be created with Vertex AI or LangChain VertexAIEmbeddings. Agent implementations include BuildSQLAgent, ValidateSQLAgent, DebugSQLAgent, DescriptionAgent, EmbedderAgent, ResponseAgent, and VisualizeAgent which can produce JavaScript for charts. The repo includes Jupyter notebooks, a CLI (opendataqna.py), Terraform for end-to-end deployment, backend APIs and an Angular frontend scaffold. It is extensible for integration into custom UIs and workflows.
Use Cases
Open Data QnA accelerates building conversational data assistants that let analysts, product owners, or non-SQL users query data in natural language and iterate with follow-up questions. Developers benefit from reusable agents that generate, validate, and debug SQL, and from components for embeddings, vector stores, and session logging to improve retrieval and context. The toolkit reduces time to prototype by providing notebooks, CLI commands, sample data scripts, and Terraform for one-click infrastructure deployment on GCP. It can produce human-friendly explanations of query results and optional visualizations, improving decision-making and communication of insights. The modular design and prompt customization make it practical to integrate into existing analytics pipelines, add domain-specific context, and deploy a production-facing frontend and backend on Google Cloud.

Please fill the required fields*