gpt4V-scraper

This repository provides a working web agent that automates full-page web captures, scraping, and image-based content extraction using GPT-4V. It combines Puppeteer (with a stealth plugin) to drive a real desktop Chrome/Chrome Canary browser session for screenshots and navigation, plus Python code that sends the captured images to the GPT-4V API for image-to-text extraction. The project includes node scripts for taking snapshots and a realtime interactive web agent that can perform guided browsing and Bing searches. Configuration relies on a .env file with an OpenAI API key and local browser paths and user data directories to support authenticated pages. Example commands and usage scenarios are included for capturing snapshot.jpg and receiving structured text output in the console.

Stars

290

Language

App URL

https://github.com/vdutts7/gpt4V-scraper

Github Repository

https://github.com/vdutts7/gpt4V-scraper/blob/main/README.md

Features

Full-page screenshot capture via Puppeteer with stealth plugin to reduce anti-bot detection. Configurable executablePath and userDataDir to reuse logged-in browser sessions and handle sites that require authentication. snapshot.js for one-off snapshots and web_agent.js for an interactive, AutoGPT-style browsing agent that can navigate search results. Python script gpt4v_scraper.py for converting screenshots to extracted text using the GPT-4V API. Instructions for environment setup including npm install, .env configuration, and a Python virtualenv with requirements.txt. Customizable timeout settings and examples showing console output and generated snapshot.jpg. Tooling list shows dependencies are Puppeteer and OpenAI GPT-4V integration.

Use Cases

The project automates manual web capture and data extraction workflows by combining browser automation with vision-capable LLM processing. It lets users capture complex or authenticated pages as images and obtain readable, contextual text and answers that traditional OCR might miss. The interactive web agent supports conversational guidance and automated Bing searches to follow up on results, enabling exploratory scraping and quick data pulls. Configurable browser profiles let you reuse sessions to scrape content behind logins. Overall it speeds up tasks like archiving pages, extracting structured information from visual layouts, and prototyping vision-enabled web automation without building a custom pipeline from scratch.

gpt4V-scraper

Basic Information

Links

Categorization

App Details

Categories

Similar Listings

cyber-doctor

Curie

pocketgroq

llm-reader

localforge

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags

More Filters

gpt4V-scraper

Categories

Similar Listings

cyber-doctor

Curie

pocketgroq

llm-reader

localforge

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags