gpt4V-scraper

Report Abuse

Basic Information

This repository provides a working web agent that automates full-page web captures, scraping, and image-based content extraction using GPT-4V. It combines Puppeteer (with a stealth plugin) to drive a real desktop Chrome/Chrome Canary browser session for screenshots and navigation, plus Python code that sends the captured images to the GPT-4V API for image-to-text extraction. The project includes node scripts for taking snapshots and a realtime interactive web agent that can perform guided browsing and Bing searches. Configuration relies on a .env file with an OpenAI API key and local browser paths and user data directories to support authenticated pages. Example commands and usage scenarios are included for capturing snapshot.jpg and receiving structured text output in the console.

Links

Categorization

App Details

Features
Full-page screenshot capture via Puppeteer with stealth plugin to reduce anti-bot detection. Configurable executablePath and userDataDir to reuse logged-in browser sessions and handle sites that require authentication. snapshot.js for one-off snapshots and web_agent.js for an interactive, AutoGPT-style browsing agent that can navigate search results. Python script gpt4v_scraper.py for converting screenshots to extracted text using the GPT-4V API. Instructions for environment setup including npm install, .env configuration, and a Python virtualenv with requirements.txt. Customizable timeout settings and examples showing console output and generated snapshot.jpg. Tooling list shows dependencies are Puppeteer and OpenAI GPT-4V integration.
Use Cases
The project automates manual web capture and data extraction workflows by combining browser automation with vision-capable LLM processing. It lets users capture complex or authenticated pages as images and obtain readable, contextual text and answers that traditional OCR might miss. The interactive web agent supports conversational guidance and automated Bing searches to follow up on results, enabling exploratory scraping and quick data pulls. Configurable browser profiles let you reuse sessions to scrape content behind logins. Overall it speeds up tasks like archiving pages, extracting structured information from visual layouts, and prototyping vision-enabled web automation without building a custom pipeline from scratch.

Please fill the required fields*