- Comprehensive scraper for Dollar General Pokemon TCG products - Professional PDF catalog generator with UPC-A barcodes - Robust anti-bot handling with requests + Selenium fallback - Automatic image downloading and barcode generation - Unix-friendly timestamped filenames - Virtual environment support and dependency management - Complete documentation and usage guides
3.1 KiB
3.1 KiB
Quick Start Guide
Simple Usage (Recommended)
-
Make sure you're in the project directory:
cd pokemon-disco -
Run the complete scraper and PDF generator:
./run.shThis single command will:
- Set up the Python virtual environment
- Install all required packages
- Scrape Pokemon TCG products from Dollar General
- Generate a professional PDF catalog with barcodes
- Create timestamped files for easy organization
What You'll Get
Generated Files:
pokemon_tcg_products_YYYYMMDD_HHMMSS.json- Raw data in JSON formatcatalog_output/pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf- Professional PDF catalog
PDF Catalog Contents:
- Product images (downloaded automatically)
- Product details (title, price, stock, SKU)
- UPC-A barcodes for each product (generated from SKU)
- Table of contents for easy navigation
- Professional formatting suitable for printing
Alternative Commands
If you prefer more control:
# Activate virtual environment first
source venv/bin/activate
# Run only the scraper
python scraper.py
# Run only the PDF generator (after scraping)
python pdf_generator.py pokemon_tcg_products_YYYYMMDD_HHMMSS.json
# Run everything (installs requirements automatically)
python run_scraper.py
Output Location
All generated files will be in:
- JSON data: Current directory
- PDF catalog:
catalog_output/directory - Product images:
catalog_output/images/ - Barcode images:
catalog_output/barcodes/
Requirements
- Python 3.7+
- pandoc (for PDF generation)
- Internet connection (for scraping)
The script will automatically handle Python dependencies via virtual environment.
Troubleshooting
If you encounter issues:
-
Permission denied: Make sure the script is executable:
chmod +x run.sh -
Pandoc not found: Install pandoc for your system:
# Ubuntu/Debian sudo apt install pandoc # Arch Linux sudo pacman -S pandoc # macOS brew install pandoc -
No products found: The website may have anti-bot protection or changed structure. The script includes fallback mechanisms.
-
PDF generation fails: The markdown file will still be generated, which you can manually convert or view.
File Naming Convention
All output files include Unix-friendly timestamps:
- Format:
YYYYMMDD_HHMMSS(e.g.,20241221_143025) - This ensures chronological sorting with
lscommand - No spaces or special characters for script-friendly handling
Example Output
pokemon-disco/
├── pokemon_tcg_products_20241221_143025.json # Scraped data
├── catalog_output/
│ ├── pokemon_tcg_catalog_20241221_143025.pdf # Final catalog
│ ├── pokemon_tcg_catalog_20241221_143025.md # Markdown source
│ ├── images/
│ │ ├── product_1_SKU123456.jpg # Product images
│ │ └── product_2_SKU789012.jpg
│ └── barcodes/
│ ├── barcode_SKU123456.png # UPC-A barcodes
│ └── barcode_SKU789012.png