Files
pokemon-disco/USAGE.md
pi-bot-01 e6dd999aeb Initial commit: Pokemon Discovery - TCG product scraper and PDF catalog generator
- Comprehensive scraper for Dollar General Pokemon TCG products
- Professional PDF catalog generator with UPC-A barcodes
- Robust anti-bot handling with requests + Selenium fallback
- Automatic image downloading and barcode generation
- Unix-friendly timestamped filenames
- Virtual environment support and dependency management
- Complete documentation and usage guides
2026-03-21 14:41:17 -07:00

115 lines
3.1 KiB
Markdown

# Quick Start Guide
## Simple Usage (Recommended)
1. **Make sure you're in the project directory:**
```bash
cd pokemon-disco
```
2. **Run the complete scraper and PDF generator:**
```bash
./run.sh
```
This single command will:
- Set up the Python virtual environment
- Install all required packages
- Scrape Pokemon TCG products from Dollar General
- Generate a professional PDF catalog with barcodes
- Create timestamped files for easy organization
## What You'll Get
### Generated Files:
- **`pokemon_tcg_products_YYYYMMDD_HHMMSS.json`** - Raw data in JSON format
- **`catalog_output/pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf`** - Professional PDF catalog
### PDF Catalog Contents:
- Product images (downloaded automatically)
- Product details (title, price, stock, SKU)
- UPC-A barcodes for each product (generated from SKU)
- Table of contents for easy navigation
- Professional formatting suitable for printing
## Alternative Commands
If you prefer more control:
```bash
# Activate virtual environment first
source venv/bin/activate
# Run only the scraper
python scraper.py
# Run only the PDF generator (after scraping)
python pdf_generator.py pokemon_tcg_products_YYYYMMDD_HHMMSS.json
# Run everything (installs requirements automatically)
python run_scraper.py
```
## Output Location
All generated files will be in:
- JSON data: Current directory
- PDF catalog: `catalog_output/` directory
- Product images: `catalog_output/images/`
- Barcode images: `catalog_output/barcodes/`
## Requirements
- Python 3.7+
- pandoc (for PDF generation)
- Internet connection (for scraping)
The script will automatically handle Python dependencies via virtual environment.
## Troubleshooting
If you encounter issues:
1. **Permission denied:** Make sure the script is executable:
```bash
chmod +x run.sh
```
2. **Pandoc not found:** Install pandoc for your system:
```bash
# Ubuntu/Debian
sudo apt install pandoc
# Arch Linux
sudo pacman -S pandoc
# macOS
brew install pandoc
```
3. **No products found:** The website may have anti-bot protection or changed structure. The script includes fallback mechanisms.
4. **PDF generation fails:** The markdown file will still be generated, which you can manually convert or view.
## File Naming Convention
All output files include Unix-friendly timestamps:
- Format: `YYYYMMDD_HHMMSS` (e.g., `20241221_143025`)
- This ensures chronological sorting with `ls` command
- No spaces or special characters for script-friendly handling
## Example Output
```
pokemon-disco/
├── pokemon_tcg_products_20241221_143025.json # Scraped data
├── catalog_output/
│ ├── pokemon_tcg_catalog_20241221_143025.pdf # Final catalog
│ ├── pokemon_tcg_catalog_20241221_143025.md # Markdown source
│ ├── images/
│ │ ├── product_1_SKU123456.jpg # Product images
│ │ └── product_2_SKU789012.jpg
│ └── barcodes/
│ ├── barcode_SKU123456.png # UPC-A barcodes
│ └── barcode_SKU789012.png
```