Initial commit: Pokemon Discovery - TCG product scraper and PDF catalog generator
- Comprehensive scraper for Dollar General Pokemon TCG products - Professional PDF catalog generator with UPC-A barcodes - Robust anti-bot handling with requests + Selenium fallback - Automatic image downloading and barcode generation - Unix-friendly timestamped filenames - Virtual environment support and dependency management - Complete documentation and usage guides
This commit is contained in:
115
USAGE.md
Normal file
115
USAGE.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# Quick Start Guide
|
||||
|
||||
## Simple Usage (Recommended)
|
||||
|
||||
1. **Make sure you're in the project directory:**
|
||||
```bash
|
||||
cd pokemon-disco
|
||||
```
|
||||
|
||||
2. **Run the complete scraper and PDF generator:**
|
||||
```bash
|
||||
./run.sh
|
||||
```
|
||||
|
||||
This single command will:
|
||||
- Set up the Python virtual environment
|
||||
- Install all required packages
|
||||
- Scrape Pokemon TCG products from Dollar General
|
||||
- Generate a professional PDF catalog with barcodes
|
||||
- Create timestamped files for easy organization
|
||||
|
||||
## What You'll Get
|
||||
|
||||
### Generated Files:
|
||||
- **`pokemon_tcg_products_YYYYMMDD_HHMMSS.json`** - Raw data in JSON format
|
||||
- **`catalog_output/pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf`** - Professional PDF catalog
|
||||
|
||||
### PDF Catalog Contents:
|
||||
- Product images (downloaded automatically)
|
||||
- Product details (title, price, stock, SKU)
|
||||
- UPC-A barcodes for each product (generated from SKU)
|
||||
- Table of contents for easy navigation
|
||||
- Professional formatting suitable for printing
|
||||
|
||||
## Alternative Commands
|
||||
|
||||
If you prefer more control:
|
||||
|
||||
```bash
|
||||
# Activate virtual environment first
|
||||
source venv/bin/activate
|
||||
|
||||
# Run only the scraper
|
||||
python scraper.py
|
||||
|
||||
# Run only the PDF generator (after scraping)
|
||||
python pdf_generator.py pokemon_tcg_products_YYYYMMDD_HHMMSS.json
|
||||
|
||||
# Run everything (installs requirements automatically)
|
||||
python run_scraper.py
|
||||
```
|
||||
|
||||
## Output Location
|
||||
|
||||
All generated files will be in:
|
||||
- JSON data: Current directory
|
||||
- PDF catalog: `catalog_output/` directory
|
||||
- Product images: `catalog_output/images/`
|
||||
- Barcode images: `catalog_output/barcodes/`
|
||||
|
||||
## Requirements
|
||||
|
||||
- Python 3.7+
|
||||
- pandoc (for PDF generation)
|
||||
- Internet connection (for scraping)
|
||||
|
||||
The script will automatically handle Python dependencies via virtual environment.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
If you encounter issues:
|
||||
|
||||
1. **Permission denied:** Make sure the script is executable:
|
||||
```bash
|
||||
chmod +x run.sh
|
||||
```
|
||||
|
||||
2. **Pandoc not found:** Install pandoc for your system:
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt install pandoc
|
||||
|
||||
# Arch Linux
|
||||
sudo pacman -S pandoc
|
||||
|
||||
# macOS
|
||||
brew install pandoc
|
||||
```
|
||||
|
||||
3. **No products found:** The website may have anti-bot protection or changed structure. The script includes fallback mechanisms.
|
||||
|
||||
4. **PDF generation fails:** The markdown file will still be generated, which you can manually convert or view.
|
||||
|
||||
## File Naming Convention
|
||||
|
||||
All output files include Unix-friendly timestamps:
|
||||
- Format: `YYYYMMDD_HHMMSS` (e.g., `20241221_143025`)
|
||||
- This ensures chronological sorting with `ls` command
|
||||
- No spaces or special characters for script-friendly handling
|
||||
|
||||
## Example Output
|
||||
|
||||
```
|
||||
pokemon-disco/
|
||||
├── pokemon_tcg_products_20241221_143025.json # Scraped data
|
||||
├── catalog_output/
|
||||
│ ├── pokemon_tcg_catalog_20241221_143025.pdf # Final catalog
|
||||
│ ├── pokemon_tcg_catalog_20241221_143025.md # Markdown source
|
||||
│ ├── images/
|
||||
│ │ ├── product_1_SKU123456.jpg # Product images
|
||||
│ │ └── product_2_SKU789012.jpg
|
||||
│ └── barcodes/
|
||||
│ ├── barcode_SKU123456.png # UPC-A barcodes
|
||||
│ └── barcode_SKU789012.png
|
||||
```
|
||||
Reference in New Issue
Block a user