Files
pokemon-disco/USAGE.md
pi-bot-01 e6dd999aeb Initial commit: Pokemon Discovery - TCG product scraper and PDF catalog generator
- Comprehensive scraper for Dollar General Pokemon TCG products
- Professional PDF catalog generator with UPC-A barcodes
- Robust anti-bot handling with requests + Selenium fallback
- Automatic image downloading and barcode generation
- Unix-friendly timestamped filenames
- Virtual environment support and dependency management
- Complete documentation and usage guides
2026-03-21 14:41:17 -07:00

3.1 KiB

Quick Start Guide

  1. Make sure you're in the project directory:

    cd pokemon-disco
    
  2. Run the complete scraper and PDF generator:

    ./run.sh
    

    This single command will:

    • Set up the Python virtual environment
    • Install all required packages
    • Scrape Pokemon TCG products from Dollar General
    • Generate a professional PDF catalog with barcodes
    • Create timestamped files for easy organization

What You'll Get

Generated Files:

  • pokemon_tcg_products_YYYYMMDD_HHMMSS.json - Raw data in JSON format
  • catalog_output/pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf - Professional PDF catalog

PDF Catalog Contents:

  • Product images (downloaded automatically)
  • Product details (title, price, stock, SKU)
  • UPC-A barcodes for each product (generated from SKU)
  • Table of contents for easy navigation
  • Professional formatting suitable for printing

Alternative Commands

If you prefer more control:

# Activate virtual environment first
source venv/bin/activate

# Run only the scraper
python scraper.py

# Run only the PDF generator (after scraping)
python pdf_generator.py pokemon_tcg_products_YYYYMMDD_HHMMSS.json

# Run everything (installs requirements automatically)
python run_scraper.py

Output Location

All generated files will be in:

  • JSON data: Current directory
  • PDF catalog: catalog_output/ directory
  • Product images: catalog_output/images/
  • Barcode images: catalog_output/barcodes/

Requirements

  • Python 3.7+
  • pandoc (for PDF generation)
  • Internet connection (for scraping)

The script will automatically handle Python dependencies via virtual environment.

Troubleshooting

If you encounter issues:

  1. Permission denied: Make sure the script is executable:

    chmod +x run.sh
    
  2. Pandoc not found: Install pandoc for your system:

    # Ubuntu/Debian
    sudo apt install pandoc
    
    # Arch Linux  
    sudo pacman -S pandoc
    
    # macOS
    brew install pandoc
    
  3. No products found: The website may have anti-bot protection or changed structure. The script includes fallback mechanisms.

  4. PDF generation fails: The markdown file will still be generated, which you can manually convert or view.

File Naming Convention

All output files include Unix-friendly timestamps:

  • Format: YYYYMMDD_HHMMSS (e.g., 20241221_143025)
  • This ensures chronological sorting with ls command
  • No spaces or special characters for script-friendly handling

Example Output

pokemon-disco/
├── pokemon_tcg_products_20241221_143025.json     # Scraped data
├── catalog_output/
│   ├── pokemon_tcg_catalog_20241221_143025.pdf   # Final catalog
│   ├── pokemon_tcg_catalog_20241221_143025.md    # Markdown source
│   ├── images/
│   │   ├── product_1_SKU123456.jpg               # Product images
│   │   └── product_2_SKU789012.jpg
│   └── barcodes/
│       ├── barcode_SKU123456.png                 # UPC-A barcodes
│       └── barcode_SKU789012.png