Clean up: remove obsolete files, update docs and docstrings

Removed 20 files: old test scripts, debug tools, duplicate docs, generated JSON, old PDF generator, launcher scripts. Kept: disco.py — main tool (scrape HAR + generate PDF) scraper.py — reference site scraper (HTML + Selenium/Brave) requirements.txt *.har — browser capture with API data Updated: README.md — rewritten to reflect current tool and usage .gitignore — simplified scraper.py — module/class/method docstrings updated to clarify this is a reference implementation, disco.py is primary
Move all text above image: title, stock/price, SKU/UPC then picture then barcode
2026-03-21 23:28:52 -07:00 · 2026-03-21 23:19:07 -07:00 · 2026-03-21 23:16:42 -07:00 · 2026-03-21 23:14:12 -07:00 · 2026-03-21 23:11:38 -07:00 · 2026-03-21 23:06:05 -07:00
20 changed files with 629 additions and 2434 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,37 +1,11 @@
-# Virtual environment
 venv/
-env/
-.env
-
-# Python cache
 __pycache__/
 *.pyc
-*.pyo
-*.pyd
-.Python
-*.so
-.pytest_cache/

-# Output files
-pokemon_tcg_products_*.json
+# Generated output
 catalog_output/
-test_output/
+pokemon_tcg_products_*.json

-# Logs
-*.log
-
-# OS files
+# OS / editor
 .DS_Store
-Thumbs.db
-.directory
-
-# IDE files
-.vscode/
-.idea/
 *.swp
-*.swo
-
-# Temporary files
-*.tmp
-*.temp
-.cache/
--- a/DISCOVERY_SUCCESS.md
+++ b/DISCOVERY_SUCCESS.md
@@ -1,169 +0,0 @@
-# Pokemon Discovery - URL Discovery SUCCESS! 🎉
-
-## ✅ **API Endpoint Successfully Discovered**
-
-**Your HAR file revealed the exact API endpoint used by Dollar General!**
-
-### 🔍 **Discovered API Details**
-
-**Endpoint**: `https://dggo.dollargeneral.com/omni/api/v2/category/search/provider`
-**Method**: POST  
-**Content-Type**: application/json  
-**Authentication**: Bearer token required  
-
-### 📋 **Exact Request Format**
-```json
-{
-  "StoreNbr": 17506,
-  "SearchTerm": null,
-  "PageSize": 24,
-  "PageStartRecordIndex": 0,
-  "Filters": {
-    "category": [],
-    "brand": [],
-    "dgDelivery": false,
-    "dgPickUp": false,
-    "dgShipTohome": false,
-    "soldAtStore": true,
-    "inStock": false,
-    "onlyActivatedDeals": false
-  },
-  "IncludeSponsored": true,
-  "IncludeShipToHome": true,
-  "IncludeDeals": true,
-  "offerSourceType": 0,
-  "Id": 723960,
-  "IncludeProducts": false,
-  "DoNotSave": false,
-  "OptOut": false,
-  "SearchType": 1
-}
-```
-
-### 🎯 **Key Findings from HAR Analysis**
-
-1. **✅ Contains Your Test Product**: SKU `41936301` and UPC `728192558375` found!
-2. **✅ Multiple Pokemon Products**: API returns 4-12 Pokemon items per request
-3. **✅ Proper Filtering**: `soldAtStore: true` shows in-store products
-4. **✅ Stock Control**: `inStock: false` includes out-of-stock items
-5. **✅ Category ID**: `723960` is the Pokemon category identifier
-6. **✅ Store Location**: `17506` is the store number used
-
-### 📊 **API Response Contains**
-```json
-{
-  "ItemList": {
-    "Items": [
-      {
-        "Title": "Pokémon Trading Card Game, 15 Card Pack, 1 ct",
-        "ItemNbr": "41936301",
-        "UPC": "728192558375", 
-        "Price": {"Amount": 4.25},
-        "ProductUrl": "/p/pok-mon-trading-card-game-card-pack-ct/728192558375",
-        "Inventory": {"InStock": false},
-        "ImageURL": "...",
-        "Description": "...",
-        "Brand": "..."
-      }
-    ]
-  }
-}
-```
-
-## 🔧 **Implementation Status**
-
-### ✅ **Completed**
- [x] API endpoint discovery via HAR analysis
- [x] Request format extraction and documentation  
- [x] Response structure mapping
- [x] Pokemon product filtering logic
- [x] Integration into Pokemon Discovery scraper
- [x] Individual product extraction (100% working)
-
-### ⚠️ **Authentication Challenge**
- **Issue**: API requires Bearer token from authenticated session
- **Status**: Token extraction attempted but expires quickly
- **Solutions Available**:
-  1. **Browser Automation**: Use Selenium with proper session management
-  2. **Session Replication**: Implement full authentication flow
-  3. **Individual Products**: Current working approach (proven successful)
-
-## 🚀 **Current Capabilities**
-
-### 1. **Individual Product Extraction** (✅ WORKING)
-```bash
-# Test with your specific product
-python test_real_products.py
-# Result: Successfully extracts SKU 41936301 with all details
-```
-
-### 2. **API Framework** (✅ READY)
-```python
-# API call implementation ready in scraper.py
-# Just needs authentication token to activate
-```
-
-### 3. **Complete Pipeline** (✅ WORKING)
-```bash
-# Generate PDF from any product data
-python pdf_generator.py test_data.json
-# Result: 153KB professional PDF with UPC-A barcodes
-```
-
-## 📈 **Performance Comparison**
-
-| Method | Speed | Product Count | Authentication | Status |
-|--------|-------|---------------|----------------|--------|
-| **API Endpoint** | Very Fast | 24+ per request | Required | Discovered ✅ |
-| **Individual Products** | Moderate | 1 per request | None | Working ✅ |
-| **Browser Automation** | Slower | Variable | Session-based | Possible |
-
-## 🎯 **Next Steps**
-
-### **Option A: Full API Implementation**
-1. Implement proper browser session management
-2. Extract Bearer token during session
-3. Use API for bulk product discovery
-4. **Result**: Very fast, bulk product scraping
-
-### **Option B: Enhanced Individual Scraping**
-1. Create list of known Pokemon product URLs
-2. Process each URL individually (current working method)  
-3. Scale up with concurrent requests
-4. **Result**: Reliable, no authentication needed
-
-### **Option C: Hybrid Approach**
-1. Use individual scraping for reliable operation
-2. Add API capability when authentication is solved
-3. Provide both options to users
-4. **Result**: Best of both worlds
-
-## 🏆 **SUCCESS METRICS**
-
- ✅ **URL Discovery**: SOLVED via HAR analysis
- ✅ **API Endpoint**: Found and documented
- ✅ **Request Format**: Complete specification extracted  
- ✅ **Product Extraction**: Working with real products
- ✅ **PDF Generation**: Professional catalogs with barcodes
- ✅ **Repository**: Public and ready for use
-
-## 💡 **Practical Usage Right Now**
-
-**Pokemon Discovery is fully functional for product catalog generation:**
-
-```bash
-# Clone and use immediately
-git clone https://git.dominat.us/pi-bot-01/pokemon-disco.git
-cd pokemon-disco
-./run.sh
-
-# Add more product URLs to test_real_products.py
-# Generate professional PDF catalogs with barcodes
-```
-
-**The API endpoint discovery is a major breakthrough that makes bulk scraping possible once authentication is properly implemented!** 🎉
-
---
-
-**Repository**: https://git.dominat.us/pi-bot-01/pokemon-disco  
-**Status**: Production-ready with API framework for future enhancement
--- a/README.md
+++ b/README.md
@@ -1,232 +1,129 @@
 # Pokemon Discovery (pokemon-disco)

-A comprehensive tool for discovering Pokemon Trading Card Game products from Dollar General's website and generating a professional PDF catalog with product images, details, and UPC-A barcodes.
+Scrapes Pokemon TCG card pack and tin products from Dollar General and generates a PDF product catalog with images and UPC-A barcodes.

-## Features
+## How It Works

- **🔍 API Discovery**: Discovered Dollar General's internal product API via HAR analysis
- **📱 Product Extraction**: Successfully extracts Pokemon TCG product details (title, SKU, price, stock)
- **🏷️ Barcode Generation**: Creates scannable UPC-A barcodes for inventory management  
- **📄 PDF Catalogs**: Professional PDF catalogs with images, details, and barcodes
- **🕰️ Unix-Friendly**: Timestamped filenames (`YYYYMMDD_HHMMSS`) for easy scripting
- **🌐 Brave Browser Support**: Configured for dynamic content scraping
- **🛡️ Anti-Bot Handling**: Multiple fallback strategies (requests → Selenium → individual products)
+Dollar General's Pokemon category page loads products dynamically via an internal API. A browser HAR capture contains the API responses with all product data. `disco.py` extracts products from the HAR file, filters for card packs and tins, downloads product images, generates UPC-A barcodes, and produces a LaTeX-based PDF catalog.
+
+### Pipeline
+
+```
+HAR file → Extract API responses → Filter packs/tins → Download images
+         → Generate UPC-A barcodes → Compile PDF catalog (pdflatex)
+```

 ## Requirements

-### System Requirements
- Python 3.7+
- pandoc (for PDF generation)
- Chrome/Chromium browser (for Selenium fallback)
+- Python 3.10+
+- pdflatex (via `texlive-core` + `texlive-latexextra`)
+- Python packages: `requests`, `beautifulsoup4`, `python-barcode`, `Pillow`

-### Python Dependencies
-All dependencies are automatically installed via `requirements.txt`:
- requests
- beautifulsoup4
- selenium
- webdriver-manager
- python-barcode
- Pillow
- pandas
- lxml
+### Install (Arch / CachyOS)

-## Installation
-
-1. **Clone/Download** this directory to your system
-
-2. **Install pandoc** (required for PDF generation):
-   ```bash
-   # Ubuntu/Debian
-   sudo apt install pandoc
-   
-   # macOS
-   brew install pandoc
-   
-   # Arch Linux
-   sudo pacman -S pandoc
-   ```
-
-3. **Install Python dependencies** (automatically done by the script):
-   ```bash
-   cd pokemon-disco
-   pip3 install -r requirements.txt
-   ```
+```bash
+sudo pacman -S texlive-basic texlive-latex texlive-latexextra texlive-fontsrecommended
+python -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+```

 ## Usage

-### Quick Start (Recommended)
-
-Run the complete pipeline with one command:
+### Full run (scrape + PDF)

 ```bash
-cd pokemon-disco
-python3 run_scraper.py
+source venv/bin/activate
+python disco.py
 ```

-This will:
-1. Check and install Python requirements
-2. Scrape Pokemon TCG products from Dollar General
-3. Generate a PDF catalog with images and barcodes
-4. Create timestamped files for easy organization
+### Scrape only (output JSON)

-### Manual Usage
-
-If you prefer to run components separately:
-
-#### 1. Scrape Products
 ```bash
-python3 scraper.py
+python disco.py --scrape-only
 ```
-This creates a JSON file like `pokemon_tcg_products_20241221_143025.json`

-#### 2. Generate PDF Catalog
+### PDF only (from existing JSON)
+
 ```bash
-python3 pdf_generator.py pokemon_tcg_products_20241221_143025.json
+python disco.py --pdf-only pokemon_tcg_products_YYYYMMDD_HHMMSS.json
 ```

-## Output Files
+## Output

-### Generated Files
- **JSON Data**: `pokemon_tcg_products_YYYYMMDD_HHMMSS.json`
-  - Raw scraped data in JSON format
-  - Contains all product information
-
- **PDF Catalog**: `catalog_output/pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf`
-  - Professional PDF catalog
-  - Includes product images, details, and UPC-A barcodes
-
-### Output Directory Structure
 ```
-pokemon-disco/
-├── pokemon_tcg_products_YYYYMMDD_HHMMSS.json
-├── catalog_output/
-│   ├── pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf
-│   ├── pokemon_tcg_catalog_YYYYMMDD_HHMMSS.md
-│   ├── images/
-│   │   ├── product_1_SKU123.jpg
-│   │   ├── product_2_SKU456.jpg
-│   │   └── placeholder.png
-│   └── barcodes/
-│       ├── barcode_SKU123.png
-│       ├── barcode_SKU456.png
-│       └── ...
+pokemon_tcg_products_YYYYMMDD_HHMMSS.json    Product data
+catalog_output/
+├── pokemon_catalog_YYYYMMDD_HHMMSS.pdf      PDF catalog
+├── pokemon_catalog_YYYYMMDD_HHMMSS.tex      LaTeX source
+├── images/                                   Product images (PNG)
+└── barcodes/                                 UPC-A barcodes (PNG)
 ```

-## PDF Catalog Features
+### PDF Layout

-Each product in the PDF includes:
- **Product Image**: Downloaded from Dollar General or placeholder
- **Product Details Table**:
-  - Title
-  - Price
-  - Stock Status
-  - SKU (formatted as code)
-  - Product URL
- **UPC-A Barcode**: Generated from SKU for inventory management
+**Page 1 — Manifest:** table of all products with SKU, price, and stock count.

-## Data Fields Extracted
+**Product pages:**

-For each Pokemon TCG product:
- `title`: Product name
- `price`: Current price
- `stock`: Availability status
- `sku`: Product SKU/item number
- `image_url`: Direct link to product image
- `url`: Link to product page
+```
+Product Name
+Stock status                              Price
+SKU: XXXXXXXX                   UPC: XXXXXXXXXXXX

-## Troubleshooting
+┌─────────────────────────────┐
+│                             │
+│       Product Image         │
+│                             │
+└─────────────────────────────┘

-### Common Issues
+┌─────────────────────────────┐
+│      UPC-A Barcode          │
+└─────────────────────────────┘
+```

-1. **No products found**
-   - Dollar General may have anti-bot protection
-   - The script will automatically retry with Selenium
-   - Website structure may have changed
+## Capturing a HAR File

-2. **PDF generation fails**
-   - Ensure pandoc is installed: `pandoc --version`
-   - Try alternative LaTeX engines if available
-   - Markdown file is still generated for manual conversion
+The HAR file provides product data from Dollar General's internal API. To capture one:

-3. **Image download failures**
-   - Network connectivity issues
-   - Placeholder images will be used automatically
+1. Open your browser (Brave, Chrome, Firefox)
+2. Open DevTools → **Network** tab
+3. Visit `https://www.dollargeneral.com/c/toys/pokemon?q=`
+4. Wait for products to load, toggle any filters you want
+5. Right-click in the Network tab → **Save all as HAR**
+6. Place the `.har` file in the project root

-4. **Browser/Selenium issues**
-   - **Brave browser supported**: Configured to use Brave at `/usr/bin/brave`
-   - **ChromeDriver compatibility**: May require version matching (Brave 146 vs ChromeDriver 114)
-   - **Alternative browsers**: Chrome, Chromium, or Firefox with geckodriver
-   - Script falls back to requests-only mode if Selenium fails
-   
-   **For Brave users**: If you see ChromeDriver version mismatch:
-   ```bash
-   # Test browser integration
-   python test_brave.py
-   
-   # Solutions for version mismatch:
-   pip install --upgrade webdriver-manager
-   # or manually install compatible ChromeDriver
-   ```
+`disco.py` looks for any `.har` file matching the default name pattern. Edit the `HAR_FILE` constant at the top of `disco.py` if your filename differs.

-### Debug Mode
+## Files

-To see more detailed output, check the console output during scraping. The scripts provide detailed logging of:
- Which products are found and filtered
- Network request status
- File generation progress
+| File | Purpose |
+|------|---------|
+| `disco.py` | Main tool — scrape, filter, generate PDF |
+| `scraper.py` | Reference site scraper (HTML + Selenium/Brave) |
+| `requirements.txt` | Python dependencies |
+| `*.har` | Browser HAR capture with API data |

-## API Discovery Success 🎉
+## API Details (Reference)

-**Pokemon Discovery has successfully discovered Dollar General's internal API endpoint!**
+The product data comes from this internal API:

- **Endpoint Found**: `https://dggo.dollargeneral.com/omni/api/v2/category/search/provider`
- **Method**: POST with JSON payload
- **Category ID**: `723960` (Pokemon products)
- **Response Format**: Complete product details including your test product (SKU: `41936301`)
- **Status**: Documented and integrated, requires authentication token
+```
+POST https://dggo.dollargeneral.com/omni/api/v2/category/search/provider
+Content-Type: application/json
+Authorization: Bearer <session-token>

-**Current Status**: Individual product extraction works perfectly. API bulk scraping available once authentication is implemented.
+{
+  "StoreNbr": 17506,
+  "Id": 723960,          // Pokemon category
+  "PageSize": 24,
+  "Filters": {
+    "soldAtStore": true,
+    "inStock": false      // false = include out of stock
+  }
+}
+```

-## Technical Details
+Response contains `ItemList.Items[]` with fields: `Description`, `UPC`, `Price`, `Image`, `AvailableQty`, `rootSV` (internal ID → SKU).

-### Scraping Strategy
-1. **Primary Method**: Uses requests with browser-like headers
-2. **Fallback Method**: Selenium with headless Chrome for dynamic content
-3. **Product Filtering**: Only includes products matching Pokemon TCG keywords
-4. **Rate Limiting**: 1-second delay between requests to be respectful
-
-### Barcode Generation
- Converts SKUs to 11-digit numeric format
- Generates UPC-A barcodes with check digits
- High-quality PNG images suitable for printing
-
-### PDF Generation
- Uses pandoc with LaTeX for professional formatting
- Includes table of contents
- Optimized for printing and digital viewing
- Images scaled appropriately for page layout
-
-## Customization
-
-### Modifying Product Filters
-Edit the `is_pokemon_tcg_product()` method in `scraper.py` to change which products are included.
-
-### Changing PDF Layout
-Modify the markdown generation in `pdf_generator.py` or add custom pandoc templates.
-
-### Adding New Data Fields
-Extend the `extract_product_info()` method in `scraper.py` to capture additional product information.
-
-## License
-
-This tool is for educational and personal use. Please respect Dollar General's terms of service and robots.txt when using this scraper.
-
-## Support
-
-If you encounter issues:
-1. Check the console output for error messages
-2. Ensure all system requirements are installed
-3. Verify internet connectivity
-4. Check if the Dollar General website structure has changed
-
-Generated files include timestamps for easy organization and version tracking.
+The bearer token is session-scoped and short-lived. `disco.py` sidesteps this by reading the API responses directly from a HAR capture.
--- a/TEST_RESULTS.md
+++ b/TEST_RESULTS.md
@@ -1,114 +0,0 @@
-# Pokemon Discovery - Test Results
-
-## Testing Overview
-Date: 2026-03-21  
-System: CachyOS (Arch Linux)  
-
-## ✅ Successfully Tested Components
-
-### 1. Virtual Environment Setup
- ✅ Virtual environment creation works
- ✅ All Python dependencies install correctly
- ✅ Requirements.txt includes all necessary packages
-
-### 2. Barcode Generation
- ✅ UPC-A barcode generation from SKUs works perfectly
- ✅ High-quality PNG images generated (3-6KB each)
- ✅ Proper barcode formatting with check digits
- ✅ File naming fixed (no double .png extension)
-
-### 3. PDF Generation
- ✅ Markdown catalog generation works
- ✅ Professional table formatting for product details
- ✅ PDF generation works with pdflatex (fallback from xelatex)
- ✅ Unix-friendly timestamped filenames
- ✅ Proper directory structure creation
-
-### 4. Core Functionality
- ✅ JSON data parsing and processing
- ✅ Product filtering logic
- ✅ Image placeholder generation
- ✅ Error handling and graceful fallbacks
-
-### 5. Brave Browser Integration
- ✅ Brave browser detected and configured
- ✅ Selenium WebDriver setup for Brave
- ⚠️ ChromeDriver version compatibility issue (expected)
- ✅ Graceful fallback when browser automation fails
- ✅ Test script provided (`test_brave.py`) for troubleshooting
-
-## ⚠️ Current Limitations
-
-### 1. Web Scraping
- **Issue**: Dollar General uses dynamic JavaScript loading
- **Status**: Basic HTML parsing works, but product links require JavaScript execution
- **Solution**: Selenium fallback is implemented but requires Chrome/Chromium browser
- **Workaround**: Test data demonstrates full pipeline functionality
-
-### 2. External Dependencies & Browser Integration
- **LaTeX**: Requires texlive packages for PDF generation (✅ installed)
- **Brave Browser**: Configured and detected (✅ available at /usr/bin/brave)
- **ChromeDriver Compatibility**: Version mismatch (Brave 146 vs ChromeDriver 114)
-  - ⚠️ Requires compatible ChromeDriver version for web scraping
-  - 💡 Main functionality (PDF generation) works without browser
- **Network**: External image downloads require internet connectivity
-
-## 📋 Test Results Summary
-
-### Working Pipeline Test
-Using test data (`test_data.json`) with 3 Pokemon TCG products:
-
-**Input**: 3 sample Pokemon products  
-**Generated**:
- ✅ Professional PDF catalog (161KB)
- ✅ 3 UPC-A barcode images (3-6KB each)
- ✅ Structured markdown source
- ✅ Proper file organization
-
-**PDF Contents**:
- Table of contents
- Product details tables (title, price, stock, SKU, URL)
- Barcode images for each product
- Professional formatting suitable for printing
-
-### File Structure Generated
-```
-catalog_output/
-├── pokemon_tcg_catalog_20260321_144548.pdf  # Final catalog
-├── pokemon_tcg_catalog_20260321_144548.md   # Markdown source
-├── barcodes/
-│   ├── barcode_DG12345678.png              # UPC-A barcodes
-│   ├── barcode_DG87654321.png
-│   └── barcode_DG11223344.png
-└── images/
-    └── placeholder.png                      # Image placeholders
-```
-
-## 🚀 Deployment Status
-
- **Repository**: Successfully pushed to public Git repository
- **Documentation**: Complete with README.md and USAGE.md
- **Dependencies**: All Python packages working in virtual environment
- **Core Features**: PDF generation and barcode creation fully functional
-
-## 💡 Recommendations
-
-1. **For Production Use**: Install Chrome/Chromium for better web scraping
-   ```bash
-   sudo pacman -S chromium
-   ```
-
-2. **For Complete Testing**: Test with live website when network allows
-3. **Alternative Approach**: The tool can be easily adapted for other product sites
-4. **Data Integration**: JSON output format allows easy integration with other systems
-
-## ✅ Conclusion
-
-**Pokemon Discovery is fully functional** for the core use case:
- ✅ Processes product data (from any source)
- ✅ Generates professional PDF catalogs
- ✅ Creates scannable UPC-A barcodes
- ✅ Handles Unix-friendly file management
- ✅ Ready for production deployment
-
-The web scraping component requires additional browser setup for full dynamic content handling, but the complete data processing and catalog generation pipeline works perfectly.
--- a/USAGE.md
+++ b/USAGE.md
@@ -1,115 +0,0 @@
-# Quick Start Guide
-
-## Simple Usage (Recommended)
-
-1. **Make sure you're in the project directory:**
-   ```bash
-   cd pokemon-disco
-   ```
-
-2. **Run the complete scraper and PDF generator:**
-   ```bash
-   ./run.sh
-   ```
-   
-   This single command will:
-   - Set up the Python virtual environment
-   - Install all required packages
-   - Scrape Pokemon TCG products from Dollar General
-   - Generate a professional PDF catalog with barcodes
-   - Create timestamped files for easy organization
-
-## What You'll Get
-
-### Generated Files:
- **`pokemon_tcg_products_YYYYMMDD_HHMMSS.json`** - Raw data in JSON format
- **`catalog_output/pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf`** - Professional PDF catalog
-
-### PDF Catalog Contents:
- Product images (downloaded automatically)
- Product details (title, price, stock, SKU)
- UPC-A barcodes for each product (generated from SKU)
- Table of contents for easy navigation
- Professional formatting suitable for printing
-
-## Alternative Commands
-
-If you prefer more control:
-
-```bash
-# Activate virtual environment first
-source venv/bin/activate
-
-# Run only the scraper
-python scraper.py
-
-# Run only the PDF generator (after scraping)
-python pdf_generator.py pokemon_tcg_products_YYYYMMDD_HHMMSS.json
-
-# Run everything (installs requirements automatically)
-python run_scraper.py
-```
-
-## Output Location
-
-All generated files will be in:
- JSON data: Current directory
- PDF catalog: `catalog_output/` directory
- Product images: `catalog_output/images/`
- Barcode images: `catalog_output/barcodes/`
-
-## Requirements
-
- Python 3.7+
- pandoc (for PDF generation)
- Internet connection (for scraping)
-
-The script will automatically handle Python dependencies via virtual environment.
-
-## Troubleshooting
-
-If you encounter issues:
-
-1. **Permission denied:** Make sure the script is executable:
-   ```bash
-   chmod +x run.sh
-   ```
-
-2. **Pandoc not found:** Install pandoc for your system:
-   ```bash
-   # Ubuntu/Debian
-   sudo apt install pandoc
-   
-   # Arch Linux  
-   sudo pacman -S pandoc
-   
-   # macOS
-   brew install pandoc
-   ```
-
-3. **No products found:** The website may have anti-bot protection or changed structure. The script includes fallback mechanisms.
-
-4. **PDF generation fails:** The markdown file will still be generated, which you can manually convert or view.
-
-## File Naming Convention
-
-All output files include Unix-friendly timestamps:
- Format: `YYYYMMDD_HHMMSS` (e.g., `20241221_143025`)
- This ensures chronological sorting with `ls` command
- No spaces or special characters for script-friendly handling
-
-## Example Output
-
-```
-pokemon-disco/
-├── pokemon_tcg_products_20241221_143025.json     # Scraped data
-├── catalog_output/
-│   ├── pokemon_tcg_catalog_20241221_143025.pdf   # Final catalog
-│   ├── pokemon_tcg_catalog_20241221_143025.md    # Markdown source
-│   ├── images/
-│   │   ├── product_1_SKU123456.jpg               # Product images
-│   │   └── product_2_SKU789012.jpg
-│   └── barcodes/
-│       ├── barcode_SKU123456.png                 # UPC-A barcodes
-│       └── barcode_SKU789012.png
-```
--- a/analyze_har.py
+++ b/analyze_har.py
@@ -1,181 +0,0 @@
-#!/usr/bin/env python3
-"""
-Analyze HAR file to find product loading endpoints
-"""
-
-import json
-import sys
-from urllib.parse import urlparse, parse_qs
-
-def analyze_har_file(har_file):
-    """Analyze HAR file to find product-related API calls"""
-    
-    print(f"Analyzing HAR file: {har_file}")
-    
-    try:
-        with open(har_file, 'r', encoding='utf-8') as f:
-            har_data = json.load(f)
-        
-        entries = har_data.get('log', {}).get('entries', [])
-        print(f"Found {len(entries)} network requests")
-        print()
-        
-        # Filter for API calls that might contain product data
-        api_calls = []
-        product_calls = []
-        
-        for entry in entries:
-            request = entry.get('request', {})
-            response = entry.get('response', {})
-            url = request.get('url', '')
-            method = request.get('method', '')
-            status = response.get('status', 0)
-            
-            # Look for API calls
-            parsed_url = urlparse(url)
-            path = parsed_url.path.lower()
-            query = parsed_url.query.lower()
-            
-            # Check if this might be a product-related API call
-            is_api = any(keyword in path for keyword in ['/api/', '/search', '/products', '/inventory', '/catalog'])
-            contains_pokemon = 'pokemon' in query or 'pokemon' in path
-            is_json_response = any(h.get('name', '').lower() == 'content-type' and 'json' in h.get('value', '') 
-                                 for h in response.get('headers', []))
-            
-            if is_api or is_json_response:
-                api_calls.append({
-                    'url': url,
-                    'method': method,
-                    'status': status,
-                    'is_pokemon': contains_pokemon,
-                    'response_size': response.get('bodySize', 0)
-                })
-                
-                if contains_pokemon or 'product' in path or 'search' in path:
-                    product_calls.append(entry)
-        
-        print(f"Found {len(api_calls)} potential API calls")
-        print(f"Found {len(product_calls)} product-related calls")
-        print()
-        
-        # Show interesting API calls
-        print("=== API CALLS ===")
-        for call in api_calls[:20]:  # Show first 20
-            url = call['url']
-            pokemon_flag = "🎯" if call['is_pokemon'] else "  "
-            print(f"{pokemon_flag} {call['method']} {call['status']} - {url}")
-            if call['response_size'] > 1000:
-                print(f"   📦 Response size: {call['response_size']} bytes")
-        
-        print()
-        
-        # Analyze product-specific calls in detail
-        if product_calls:
-            print("=== DETAILED PRODUCT CALL ANALYSIS ===")
-            
-            for i, entry in enumerate(product_calls[:5]):  # Analyze first 5 product calls
-                request = entry.get('request', {})
-                response = entry.get('response', {})
-                
-                print(f"\n--- Product Call {i+1} ---")
-                print(f"URL: {request.get('url', '')}")
-                print(f"Method: {request.get('method', '')}")
-                print(f"Status: {response.get('status', 0)}")
-                
-                # Show headers
-                headers = request.get('headers', [])
-                important_headers = [h for h in headers if h.get('name', '').lower() in 
-                                   ['accept', 'content-type', 'authorization', 'x-api-key', 'referer']]
-                if important_headers:
-                    print("Important Headers:")
-                    for header in important_headers:
-                        print(f"  {header.get('name')}: {header.get('value', '')[:100]}")
-                
-                # Show query parameters
-                parsed = urlparse(request.get('url', ''))
-                if parsed.query:
-                    params = parse_qs(parsed.query)
-                    print("Query Parameters:")
-                    for key, values in params.items():
-                        print(f"  {key}: {values}")
-                
-                # Show POST data if any
-                post_data = request.get('postData', {})
-                if post_data.get('text'):
-                    print(f"POST Data: {post_data.get('text')[:200]}...")
-                
-                # Check response content
-                response_content = response.get('content', {})
-                response_text = response_content.get('text', '')
-                
-                if response_text:
-                    print(f"Response size: {len(response_text)} characters")
-                    
-                    # Try to parse as JSON
-                    try:
-                        response_json = json.loads(response_text)
-                        print("✓ Valid JSON response")
-                        
-                        # Look for product-like structures
-                        def find_products_in_json(obj, path=""):
-                            products = []
-                            if isinstance(obj, dict):
-                                for key, value in obj.items():
-                                    new_path = f"{path}.{key}" if path else key
-                                    if key.lower() in ['products', 'items', 'results', 'data']:
-                                        if isinstance(value, list):
-                                            products.append((new_path, len(value)))
-                                    products.extend(find_products_in_json(value, new_path))
-                            elif isinstance(obj, list):
-                                for idx, item in enumerate(obj):
-                                    products.extend(find_products_in_json(item, f"{path}[{idx}]"))
-                            return products
-                        
-                        product_arrays = find_products_in_json(response_json)
-                        if product_arrays:
-                            print("Potential product arrays found:")
-                            for path, count in product_arrays:
-                                print(f"  {path}: {count} items")
-                                
-                        # Check for our specific product
-                        response_str = str(response_json).lower()
-                        if '41936301' in response_str:
-                            print("🎯 CONTAINS OUR TEST PRODUCT SKU!")
-                        if '728192558375' in response_str:
-                            print("🎯 CONTAINS OUR TEST PRODUCT UPC!")
-                        if 'pokemon' in response_str:
-                            print("🎯 CONTAINS POKEMON REFERENCES!")
-                            
-                    except json.JSONDecodeError:
-                        print("Response is not JSON")
-                        # Check if it contains our product anyway
-                        if '41936301' in response_text:
-                            print("🎯 CONTAINS OUR TEST PRODUCT SKU!")
-        
-        # Return the most promising API calls
-        return api_calls, product_calls
-        
-    except Exception as e:
-        print(f"Error analyzing HAR file: {e}")
-        return [], []
-
-if __name__ == "__main__":
-    har_files = ['www.dollargeneral.com_Archive [26-03-21 15-14-28].har']
-    
-    for har_file in har_files:
-        try:
-            api_calls, product_calls = analyze_har_file(har_file)
-            print(f"\n🎯 SUMMARY:")
-            print(f"   Total API calls: {len(api_calls)}")
-            print(f"   Product-related calls: {len(product_calls)}")
-            
-            if product_calls:
-                print(f"\n💡 NEXT STEPS:")
-                print(f"   1. Test the identified API endpoints")
-                print(f"   2. Replicate the headers and parameters")
-                print(f"   3. Integrate successful calls into Pokemon Discovery")
-            
-        except FileNotFoundError:
-            print(f"HAR file not found: {har_file}")
-        except Exception as e:
-            print(f"Error processing {har_file}: {e}")
--- a/api_request_template.json
+++ b/api_request_template.json
@@ -1,41 +0,0 @@
-{
-  "endpoint": "https://dggo.dollargeneral.com/omni/api/v2/category/search/provider",
-  "method": "POST",
-  "headers": {
-    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:148.0) Gecko/20100101 Firefox/148.0",
-    "Accept": "application/json, text/plain, */*",
-    "Content-Type": "application/json",
-    "Authorization": "Bearer eyJ0eXAiOiJhdCtKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6Ik5qRTJNemczTXpSRVFrUXpNak5GUmprMU1FUkNNRUZDTVRBek1FWTFRa0pCTXpRM1EwTkNNZyJ9.eyJzY29wZSI6bnVsbCwiaWF0IjoxNzc0MTI3Nzc5LCJleHAiOjE3NzQxMzEzNzksImF1ZCI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiIsImlzcyI6Imh0dHBzOi8vcHJvZC1kZ2dvLyIsInN1YiI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiIsInNpZCI6IlNrWk9makF5TURRMU1EVXpOVFEwWWpBM016SXpNak14TXpFek9ETTNNekV3TWpreFl6VitUVUZXYVhwbk56SXpVRGg2VWxkcmEySkRkMk5EZUdVNFlUWm5XVXBHVDBveVExTlRNVWxXWlhSalQzRnFWazVWZGtGWlIwOWtZV2x0WVVwRVRucG5SVlZvUTE5SE5VcHVObGhuTURSb2JuUkVhVlF3UTBzelNIND0iLCJqdGkiOiJzdDIucy5BdEx0VlphRHFnLnZrdW5OV2RWNjN2ZlJTTG00Y3VUd2d5bmc2X0pJNmxKRjA5a2lXTXVQeGZkVDRvT0NhMXhwa1VoRlRkM2tocHZUaFhsRUVwLWw0QzJrZnoycjkzVlYzeldBaUw5Y2x6Snl0amFJamJ4TEJnLkJOZy1CeUdpZnV0WnppQWhhMV8xRDBXTUFWR3JpNVVCX0pKbTRCNVRNYVhTWkZneXpxeUZERjJxZ3B3UTgyajZ2eGVtcnA5RERFTHZnM3hvdlZmZzBnLnNjMyIsImNsaWVudF9pZCI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiIsImF6cCI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiJ9.I6ou9atkJ8ndkr2m2Trpg53fMIL3hpofCLUHoHYgZkOJnLnbmL0CQu7_pIChQ6nIDK03GagK6aqxd97E8B8vv9nweSmb7zXhrt43dKLEIdhxIGFkJ4xYgNNg-3cVjSlThBQ_AwCx924lOGjEfikEw4NrvGvrlNvrg1lnNz4hf629hUH-5ccVSdgo1w_LQzsLOeMCjuC_bmAoRxT5KLI9oESd4tPJZU5Nlt2ICbWJD9h-zNrt-ijwYCvb7j8amGbpMGhJZqtzu9f3wN0JUFxDg5rAN-WOtLjwEmR_NxDKq0NEeuU16uhaB8AJzy217XAgJ87bKZldZowsWs-Q9oAH3g",
-    "Referer": "https://www.dollargeneral.com/"
-  },
-  "post_data": {
-    "StoreNbr": 17506,
-    "SearchTerm": null,
-    "PageSize": 24,
-    "PageStartRecordIndex": 0,
-    "Filters": {
-      "category": [],
-      "brand": [],
-      "dgDelivery": false,
-      "dgPickUp": false,
-      "dgShipTohome": false,
-      "soldAtStore": true,
-      "inStock": true,
-      "onlyActivatedDeals": false
-    },
-    "IncludeSponsored": true,
-    "IncludeShipToHome": true,
-    "IncludeDeals": true,
-    "offerSourceType": 0,
-    "Id": 723960,
-    "IncludeProducts": false,
-    "DoNotSave": false,
-    "OptOut": false,
-    "SearchType": 1
-  },
-  "example_response": {
-    "total_items": 4,
-    "pokemon_items": 0,
-    "sample_pokemon_product": null
-  }
-}
--- a/disco.py
+++ b/disco.py
@@ -0,0 +1,514 @@
+#!/usr/bin/env python3
+"""
+Pokemon Discovery (disco.py)
+Scrapes Pokemon TCG pack & tin products from Dollar General and generates a PDF catalog.
+
+Usage:
+    python disco.py                          # Full run: scrape + generate PDF
+    python disco.py --scrape-only            # Just scrape, output JSON
+    python disco.py --pdf-only FILE.json     # Just generate PDF from existing JSON
+"""
+
+import json
+import os
+import re
+import subprocess
+import sys
+import time
+import requests
+from datetime import datetime
+from pathlib import Path
+from urllib.parse import urljoin, quote
+
+import barcode
+from barcode.writer import ImageWriter
+from bs4 import BeautifulSoup
+from PIL import Image, ImageDraw, ImageFont
+
+# ---------------------------------------------------------------------------
+# Configuration
+# ---------------------------------------------------------------------------
+
+HAR_FILE = "www.dollargeneral.com_Archive [26-03-21 15-14-28].har"
+BASE_URL = "https://www.dollargeneral.com"
+OUTPUT_DIR = Path("catalog_output")
+IMAGES_DIR = OUTPUT_DIR / "images"
+BARCODES_DIR = OUTPUT_DIR / "barcodes"
+
+HEADERS = {
+    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:148.0) Gecko/20100101 Firefox/148.0",
+    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
+    "Accept-Language": "en-US,en;q=0.9",
+}
+
+# Keywords that identify card packs and tins (case-insensitive)
+CARD_TIN_KEYWORDS = ["pack", "tin", "booster", "card game", "tcg"]
+
+# ---------------------------------------------------------------------------
+# Step 1 — Product Discovery (from HAR file API responses)
+# ---------------------------------------------------------------------------
+
+def extract_products_from_har(har_path: str) -> list[dict]:
+    """Parse HAR file and extract all Pokemon products from API responses."""
+    print(f"📦 Reading HAR file: {har_path}")
+
+    with open(har_path, "r", encoding="utf-8") as f:
+        har = json.load(f)
+
+    api_url = "https://dggo.dollargeneral.com/omni/api/v2/category/search/provider"
+    unique: dict[str, dict] = {}
+
+    for entry in har["log"]["entries"]:
+        req = entry["request"]
+        resp = entry["response"]
+        if req["url"] != api_url or req["method"] != "POST":
+            continue
+        text = resp.get("content", {}).get("text", "")
+        if not text:
+            continue
+        try:
+            data = json.loads(text)
+        except json.JSONDecodeError:
+            continue
+        for item in data.get("ItemList", {}).get("Items", []):
+            upc = str(item.get("UPC", ""))
+            if upc and upc not in unique:
+                unique[upc] = item
+
+    print(f"   Found {len(unique)} unique products in HAR data")
+    return list(unique.values())
+
+
+def rootsv_to_sku(rootsv: str) -> str:
+    """Convert rootSV like '0419363_1' to SKU like '41936301'.
+
+    The rootSV base (minus leading zero) + '01' gives the DG item number.
+    The '_N' suffix is a variant/image index, not part of the SKU.
+    """
+    if not rootsv:
+        return ""
+    base = rootsv.split("_")[0].lstrip("0")
+    return base + "01"
+
+
+def build_product_url(upc: str) -> str:
+    """Construct a Dollar General product page URL from a UPC."""
+    return f"{BASE_URL}/p/pokemon-product/{upc}"
+
+
+def filter_card_and_tin_products(raw_items: list[dict]) -> list[dict]:
+    """Keep only products whose description contains card/pack/tin keywords."""
+    filtered = []
+    for item in raw_items:
+        desc = item.get("Description", "").lower()
+        if any(kw in desc for kw in CARD_TIN_KEYWORDS):
+            filtered.append(item)
+    return filtered
+
+
+def normalize_product(item: dict) -> dict:
+    """Convert raw API item into a clean product dict."""
+    upc = str(item.get("UPC", ""))
+    rootsv = item.get("rootSV", "")
+    sku = rootsv_to_sku(rootsv)
+    qty = item.get("AvailableQty", 0)
+
+    return {
+        "title": item.get("Description", "Unknown Product"),
+        "sku": sku,
+        "upc": upc,
+        "price": f"${item.get('Price', 0):.2f}",
+        "stock": f"In Stock ({qty})" if qty and qty > 0 else "Out of Stock",
+        "quantity": qty,
+        "image_url": item.get("Image", ""),
+        "rating": item.get("AverageRating", 0),
+        "reviews": item.get("RatingReviewCount", 0),
+        "url": build_product_url(upc),
+    }
+
+# ---------------------------------------------------------------------------
+# Step 2 — Enrich from product pages (get real URL slug, extra details)
+# ---------------------------------------------------------------------------
+
+def enrich_from_product_page(product: dict) -> dict:
+    """Visit the actual product page to get the real URL and any missing data."""
+    upc = product["upc"]
+    sku = product["sku"]
+
+    # Try to get the real product page
+    # DG product pages can be accessed by UPC at search
+    search_url = f"{BASE_URL}/search?q={upc}"
+    try:
+        resp = requests.get(search_url, headers=HEADERS, timeout=15)
+        if resp.status_code == 200:
+            soup = BeautifulSoup(resp.text, "html.parser")
+            # Look for the canonical product link
+            links = soup.select(f'a[href*="/p/"][href*="{upc}"]')
+            if links:
+                href = links[0].get("href", "")
+                product["url"] = urljoin(BASE_URL, href)
+    except Exception:
+        pass
+
+    # Also try visiting the product page directly by known pattern
+    # The image URL contains the DG item number: dg-XXXXXXXX-1
+    img_url = product.get("image_url", "")
+    match = re.search(r"dg-(\d+)-", img_url)
+    if match:
+        dg_item = match.group(1)
+        # This is the item number used in the SKU
+        if not product.get("sku"):
+            product["sku"] = dg_item
+
+    return product
+
+# ---------------------------------------------------------------------------
+# Step 3 — Download images & generate barcodes
+# ---------------------------------------------------------------------------
+
+def download_image(url: str, dest: Path) -> Path | None:
+    """Download image from URL, convert to PNG for LaTeX compatibility."""
+    if not url:
+        return None
+    try:
+        resp = requests.get(url, headers=HEADERS, timeout=15)
+        resp.raise_for_status()
+        # Convert to PNG regardless of source format (handles WebP, etc.)
+        from io import BytesIO
+        img = Image.open(BytesIO(resp.content)).convert("RGB")
+        png_dest = dest.with_suffix(".png")
+        img.save(png_dest, "PNG")
+        return png_dest
+    except Exception as e:
+        print(f"   ⚠ Image download failed: {e}")
+        return None
+
+
+def make_placeholder(dest: Path, text: str = "No Image") -> Path:
+    """Create a simple placeholder image."""
+    img = Image.new("RGB", (300, 300), "#e0e0e0")
+    draw = ImageDraw.Draw(img)
+    try:
+        font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 20)
+    except Exception:
+        font = ImageFont.load_default()
+    bbox = draw.textbbox((0, 0), text, font=font)
+    tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
+    draw.text(((300 - tw) / 2, (300 - th) / 2), text, fill="#888", font=font)
+    img.save(dest)
+    return dest
+
+
+def generate_barcode(upc: str, dest_dir: Path) -> Path | None:
+    """Generate a UPC-A barcode PNG from a UPC number. Returns path to the .png file."""
+    digits = re.sub(r"\D", "", upc)
+    if not digits:
+        return None
+    # UPC-A: pass first 11 digits, library auto-calculates the 12th (check digit)
+    # A full UPC is 12 digits where the 12th is already the check digit
+    digits = digits[:11].zfill(11)
+    try:
+        upc_cls = barcode.get_barcode_class("upca")
+        bc = upc_cls(digits, writer=ImageWriter())
+        # barcode lib appends .png automatically
+        out = dest_dir / f"barcode_{upc}"
+        saved = bc.save(
+            str(out),
+            options={
+                "module_width": 0.3,
+                "module_height": 15.0,
+                "quiet_zone": 6.5,
+                "font_size": 10,
+                "text_distance": 5.0,
+            },
+        )
+        return Path(saved)
+    except Exception as e:
+        print(f"   ⚠ Barcode generation failed for {upc}: {e}")
+        return None
+
+# ---------------------------------------------------------------------------
+# Step 4 — Generate PDF via pandoc
+# ---------------------------------------------------------------------------
+
+def generate_catalog_pdf(products: list[dict]) -> Path | None:
+    """Build a LaTeX file and convert to PDF with pandoc.
+
+    Layout per page (matching product.png mockup):
+        ┌─────────────────────┐
+        │                     │
+        │    Product Image    │   ← large, centered, bordered
+        │                     │
+        └─────────────────────┘
+        Name                      ← product title, bold
+        Stk                       ← stock / price info
+        ┌─────────────────────┐
+        │    UPC-A Barcode    │   ← centered, bordered
+        └─────────────────────┘
+        SKU: XXXXXXX              ← small text
+        UPC: XXXXXXXXXXXX         ← small text
+    """
+    timestamp_label = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+    timestamp_file = datetime.now().strftime("%Y%m%d_%H%M%S")
+
+    # Build LaTeX document directly for precise layout control
+    latex_lines = [
+        r"\documentclass[11pt,letterpaper]{article}",
+        r"\usepackage[margin=0.75in]{geometry}",
+        r"\usepackage{graphicx}",
+        r"\usepackage{fancybox}",
+        r"\usepackage{xcolor}",
+        r"\usepackage{parskip}",
+        r"\usepackage[utf8]{inputenc}",
+        r"\usepackage[T1]{fontenc}",
+        r"\usepackage{lmodern}",
+        r"\usepackage{hyperref}",
+        r"\pagestyle{empty}",
+        r"\begin{document}",
+        "",
+        # Manifest page
+        r"\begin{center}",
+        r"{\Huge\bfseries Pokemon TCG Product Catalog}\\[0.4cm]",
+        r"{\Large Dollar General}\\[0.2cm]",
+        r"{\large Generated: " + timestamp_label + r"}\\[0.2cm]",
+        r"{\large " + str(len(products)) + r" Cards \& Tins}",
+        r"\end{center}",
+        r"\vspace{0.8cm}",
+        r"\begin{tabular}{r l l r r}",
+        r"\hline",
+        r"\textbf{\#} & \textbf{Product} & \textbf{SKU} & \textbf{Price} & \textbf{Stock} \\",
+        r"\hline",
+    ]
+    for i, prod in enumerate(products, 1):
+        safe = (
+            prod["title"][:50]
+            .replace("&", r"\&").replace("%", r"\%").replace("$", r"\$")
+            .replace("#", r"\#").replace("_", r"\_").replace("é", r"\'e")
+        )
+        price = prod["price"].replace("$", r"\$")
+        qty = prod.get("quantity", 0)
+        stock_short = str(qty) if qty else "---"
+        latex_lines.append(
+            f"{i} & {safe} & \\texttt{{{prod['sku']}}} & {price} & {stock_short} \\\\"
+        )
+    latex_lines += [
+        r"\hline",
+        r"\end{tabular}",
+        r"\newpage",
+        "",
+    ]
+
+    for i, prod in enumerate(products, 1):
+        title = prod["title"]
+        sku = prod["sku"]
+        upc = prod["upc"]
+        price = prod["price"]
+        stock = prod["stock"]
+
+        # Download product image
+        img_dest = IMAGES_DIR / f"product_{i}_{sku}.jpg"
+        img_path = download_image(prod.get("image_url"), img_dest)
+        if not img_path:
+            img_path = make_placeholder(
+                IMAGES_DIR / f"product_{i}_{sku}_placeholder.png", title[:30]
+            )
+
+        # Generate barcode from UPC (not SKU)
+        bc_path = generate_barcode(upc, BARCODES_DIR)
+
+        # Escape LaTeX special characters in text fields
+        safe_title = (
+            title.replace("&", r"\&")
+            .replace("%", r"\%")
+            .replace("$", r"\$")
+            .replace("#", r"\#")
+            .replace("_", r"\_")
+            .replace("é", r"\'e")
+        )
+        safe_stock = stock.replace("&", r"\&")
+        safe_price = price.replace("$", r"\$")
+
+        # Absolute paths for LaTeX
+        abs_img = str(img_path.resolve())
+        abs_bc = str(bc_path.resolve()) if bc_path else None
+
+        latex_lines += [
+            # Name — bold, large
+            r"{\Large\bfseries " + safe_title + r"}",
+            "",
+            r"\vspace{0.15cm}",
+            "",
+            # Stock and price
+            r"{\large " + safe_stock + r" \hfill " + safe_price + r"}",
+            "",
+            r"\vspace{0.1cm}",
+            "",
+            # SKU and UPC
+            r"{\small SKU: \texttt{" + sku + r"} \hfill UPC: \texttt{" + upc + r"}}",
+            "",
+            r"\vspace{0.3cm}",
+            "",
+            r"\begin{center}",
+            # Product image — large, centered, with border
+            r"\fbox{\includegraphics[width=0.7\textwidth,height=0.40\textheight,keepaspectratio]{"
+            + abs_img
+            + r"}}",
+            r"\end{center}",
+            "",
+            r"\vfill",
+            "",
+        ]
+
+        # Barcode — centered, bordered, pushed to bottom
+        if abs_bc:
+            latex_lines += [
+                r"\begin{center}",
+                r"\fbox{\includegraphics[width=0.55\textwidth]{"
+                + abs_bc
+                + r"}}",
+                r"\end{center}",
+                "",
+            ]
+
+        # Page break between products (not after last)
+        if i < len(products):
+            latex_lines.append(r"\newpage")
+            latex_lines.append("")
+
+        print(f"   ✅ [{i}/{len(products)}] {title}")
+
+    latex_lines.append(r"\end{document}")
+
+    # Write .tex file
+    tex_file = OUTPUT_DIR / f"pokemon_catalog_{timestamp_file}.tex"
+    tex_file.write_text("\n".join(latex_lines), encoding="utf-8")
+    print(f"\n📝 LaTeX source: {tex_file}")
+
+    # Compile to PDF with pdflatex directly (pandoc strips images from raw .tex)
+    pdf_file = OUTPUT_DIR / f"pokemon_catalog_{timestamp_file}.pdf"
+
+    for engine in ["pdflatex", "xelatex"]:
+        try:
+            result = subprocess.run(
+                [engine, "-interaction=nonstopmode",
+                 f"-output-directory={OUTPUT_DIR}", str(tex_file)],
+                capture_output=True, text=True, timeout=120,
+            )
+            if pdf_file.exists() and pdf_file.stat().st_size > 1000:
+                # Clean up LaTeX temp files
+                for ext in [".aux", ".log", ".out"]:
+                    tmp = pdf_file.with_suffix(ext)
+                    if tmp.exists():
+                        tmp.unlink()
+                print(
+                    f"📄 PDF generated: {pdf_file}  ({pdf_file.stat().st_size // 1024} KB)"
+                )
+                return pdf_file
+        except FileNotFoundError:
+            continue
+        except Exception:
+            continue
+
+    print(f"⚠ PDF generation failed. LaTeX source available at: {tex_file}")
+    return None
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+
+def main():
+    args = sys.argv[1:]
+
+    # Handle --pdf-only mode
+    if "--pdf-only" in args:
+        idx = args.index("--pdf-only")
+        json_file = args[idx + 1] if idx + 1 < len(args) else None
+        if not json_file or not Path(json_file).exists():
+            print(f"Usage: {sys.argv[0]} --pdf-only <products.json>")
+            sys.exit(1)
+        products = json.loads(Path(json_file).read_text())
+        for d in [OUTPUT_DIR, IMAGES_DIR, BARCODES_DIR]:
+            d.mkdir(parents=True, exist_ok=True)
+        print(f"\n🖨️  Generating PDF from {json_file} ({len(products)} products)...")
+        generate_catalog_pdf(products)
+        return
+
+    scrape_only = "--scrape-only" in args
+
+    # --- Banner ---
+    timestamp_file = datetime.now().strftime("%Y%m%d_%H%M%S")
+    print("=" * 60)
+    print("  🔍  Pokemon Discovery (pokemon-disco)")
+    print("  Dollar General — Pokemon TCG Cards & Tins")
+    print(f"  {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+    print("=" * 60)
+
+    # --- Step 1: Extract from HAR ---
+    if not Path(HAR_FILE).exists():
+        print(f"\n❌ HAR file not found: {HAR_FILE}")
+        print("   Capture a HAR file from the Pokemon page in your browser")
+        print("   and place it in the project directory.")
+        sys.exit(1)
+
+    raw_items = extract_products_from_har(HAR_FILE)
+
+    # --- Step 2: Filter for Cards & Tins ---
+    print(f"\n🎯 Filtering for card packs and tins...")
+    card_tin_items = filter_card_and_tin_products(raw_items)
+    print(f"   {len(card_tin_items)} of {len(raw_items)} products match (pack/tin/booster/tcg)")
+
+    if not card_tin_items:
+        print("❌ No card or tin products found.")
+        sys.exit(1)
+
+    # Show what was filtered out
+    excluded = [i for i in raw_items if i not in card_tin_items]
+    if excluded:
+        print(f"\n   Excluded {len(excluded)} non-card/tin products:")
+        for item in excluded:
+            print(f"     ✗ {item.get('Description', '?')}")
+
+    # --- Step 3: Normalize ---
+    print(f"\n📋 Processing {len(card_tin_items)} products...")
+    products = [normalize_product(item) for item in card_tin_items]
+
+    # Print summary table
+    print()
+    print(f"  {'#':<3} {'Title':<55} {'SKU':<12} {'Price':<8} {'Stock'}")
+    print(f"  {'—'*3} {'—'*55} {'—'*12} {'—'*8} {'—'*15}")
+    for i, p in enumerate(products, 1):
+        title = p['title'][:53]
+        print(f"  {i:<3} {title:<55} {p['sku']:<12} {p['price']:<8} {p['stock']}")
+
+    # --- Step 4: Save JSON ---
+    json_file = f"pokemon_tcg_products_{timestamp_file}.json"
+    Path(json_file).write_text(json.dumps(products, indent=2, ensure_ascii=False))
+    print(f"\n💾 Product data: {json_file}")
+
+    if scrape_only:
+        print("\n✅ Scrape complete (--scrape-only). Run with --pdf-only to generate catalog.")
+        return
+
+    # --- Step 5: Generate PDF ---
+    for d in [OUTPUT_DIR, IMAGES_DIR, BARCODES_DIR]:
+        d.mkdir(parents=True, exist_ok=True)
+
+    print(f"\n🖨️  Generating PDF catalog...")
+    pdf_path = generate_catalog_pdf(products)
+
+    # --- Done ---
+    print("\n" + "=" * 60)
+    if pdf_path:
+        print(f"  ✅ COMPLETE!")
+        print(f"  📄 PDF Catalog:  {pdf_path}")
+        print(f"  💾 Product JSON: {json_file}")
+        print(f"  🏷️  Barcodes:     {BARCODES_DIR}/")
+        print(f"  🖼️  Images:       {IMAGES_DIR}/")
+    else:
+        print(f"  ⚠ PDF generation failed — markdown file available in {OUTPUT_DIR}/")
+        print(f"  💾 Product JSON: {json_file}")
+    print("=" * 60)
+
+
+if __name__ == "__main__":
+    main()
--- a/extract_api_details.py
+++ b/extract_api_details.py
@@ -1,135 +0,0 @@
-#!/usr/bin/env python3
-"""
-Extract exact API request details from HAR file
-"""
-
-import json
-from urllib.parse import urlparse, parse_qs
-
-def extract_api_request_details():
-    """Extract the exact API request format"""
-    
-    har_file = 'www.dollargeneral.com_Archive [26-03-21 15-14-28].har'
-    
-    with open(har_file, 'r', encoding='utf-8') as f:
-        har_data = json.load(f)
-    
-    entries = har_data.get('log', {}).get('entries', [])
-    
-    # Find the API calls that contain our product
-    api_endpoint = "https://dggo.dollargeneral.com/omni/api/v2/category/search/provider"
-    
-    successful_calls = []
-    
-    for entry in entries:
-        request = entry.get('request', {})
-        response = entry.get('response', {})
-        
-        if (request.get('url') == api_endpoint and 
-            request.get('method') == 'POST' and 
-            response.get('status') == 200):
-            
-            # Check if response contains our product
-            response_text = response.get('content', {}).get('text', '')
-            if '41936301' in response_text and 'pokemon' in response_text.lower():
-                successful_calls.append(entry)
-    
-    print(f"Found {len(successful_calls)} successful API calls with Pokemon products")
-    print()
-    
-    for i, entry in enumerate(successful_calls):
-        request = entry.get('request', {})
-        response = entry.get('response', {})
-        
-        print(f"=== API Call {i+1} ===")
-        print(f"URL: {request.get('url')}")
-        print(f"Method: {request.get('method')}")
-        
-        # Extract headers
-        headers = {}
-        for header in request.get('headers', []):
-            name = header.get('name')
-            value = header.get('value')
-            if name.lower() in ['authorization', 'content-type', 'accept', 'referer', 'user-agent']:
-                headers[name] = value
-        
-        print("Headers:")
-        for name, value in headers.items():
-            if name.lower() == 'authorization':
-                print(f"  {name}: {value[:50]}... (Bearer token)")
-            else:
-                print(f"  {name}: {value}")
-        
-        # Extract POST data
-        post_data = request.get('postData', {})
-        if post_data.get('text'):
-            try:
-                post_json = json.loads(post_data.get('text'))
-                print("POST Data:")
-                print(json.dumps(post_json, indent=2))
-            except:
-                print(f"POST Data (raw): {post_data.get('text')}")
-        
-        # Analyze response
-        response_text = response.get('content', {}).get('text', '')
-        if response_text:
-            try:
-                response_json = json.loads(response_text)
-                print(f"Response size: {len(response_text)} characters")
-                
-                # Extract product information
-                items = response_json.get('ItemList', {}).get('Items', [])
-                print(f"Products found: {len(items)}")
-                
-                # Show Pokemon products
-                pokemon_products = []
-                for item in items:
-                    title = item.get('Title', '').lower()
-                    if 'pokemon' in title or 'pokémon' in title:
-                        pokemon_products.append({
-                            'title': item.get('Title'),
-                            'sku': item.get('ItemNbr'),
-                            'upc': item.get('UPC'),
-                            'price': item.get('Price', {}).get('Amount'),
-                            'url': item.get('ProductUrl'),
-                            'in_stock': item.get('Inventory', {}).get('InStock'),
-                            'available_online': item.get('Inventory', {}).get('AvailableOnline')
-                        })
-                
-                if pokemon_products:
-                    print(f"\nPokemon products in this response: {len(pokemon_products)}")
-                    for prod in pokemon_products:
-                        print(f"  • {prod['title']}")
-                        print(f"    SKU: {prod['sku']}, UPC: {prod['upc']}")
-                        print(f"    Price: ${prod['price']}, In Stock: {prod['in_stock']}")
-                        print(f"    URL: {prod['url']}")
-                
-                # Extract the store number and filters used
-                if i == 0:  # Save the working request format
-                    with open('api_request_template.json', 'w') as f:
-                        json.dump({
-                            'endpoint': api_endpoint,
-                            'method': 'POST',
-                            'headers': headers,
-                            'post_data': post_json,
-                            'example_response': {
-                                'total_items': len(items),
-                                'pokemon_items': len(pokemon_products),
-                                'sample_pokemon_product': pokemon_products[0] if pokemon_products else None
-                            }
-                        }, f, indent=2)
-                    print(f"\n✅ Saved working API template to: api_request_template.json")
-                
-            except Exception as e:
-                print(f"Error parsing response: {e}")
-        
-        print("\n" + "="*60 + "\n")
-    
-    return successful_calls
-
-if __name__ == "__main__":
-    successful_calls = extract_api_request_details()
-    
-    print("🎯 SUMMARY:")
-    print(f"   Successfully extracted {len(successful_calls)} working API calls")
-    print("   Next step: Implement this API call in Pokemon Discovery scraper")
--- a/implement_api_scraper.py
+++ b/implement_api_scraper.py
@@ -1,297 +0,0 @@
-#!/usr/bin/env python3
-"""
-Implement API-based scraping for Pokemon Discovery
-"""
-
-import json
-import requests
-import sys
-from datetime import datetime
-from urllib.parse import urljoin
-
-class DollarGeneralAPIScaper:
-    def __init__(self):
-        self.base_url = "https://www.dollargeneral.com"
-        self.api_base = "https://dggo.dollargeneral.com"
-        self.session = requests.Session()
-        
-        # Headers that mimic a real browser session
-        self.headers = {
-            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:148.0) Gecko/20100101 Firefox/148.0',
-            'Accept': 'application/json, text/plain, */*',
-            'Accept-Language': 'en-US,en;q=0.9',
-            'Accept-Encoding': 'gzip, deflate, br',
-            'DNT': '1',
-            'Connection': 'keep-alive',
-            'Sec-Fetch-Dest': 'empty',
-            'Sec-Fetch-Mode': 'cors',
-            'Sec-Fetch-Site': 'cross-site',
-        }
-        self.session.headers.update(self.headers)
-        
-        self.auth_token = None
-        
-    def get_auth_token(self):
-        """Try multiple methods to get authentication token"""
-        
-        print("🔑 Attempting to get authentication token...")
-        
-        # Method 1: Get token from main page
-        try:
-            print("  - Visiting main Pokemon page...")
-            pokemon_url = f"{self.base_url}/c/toys/pokemon?q=&soldAtStore=true"
-            response = self.session.get(pokemon_url, timeout=30)
-            
-            if response.status_code == 200:
-                # Look for embedded tokens in the page
-                import re
-                
-                # Look for bearer tokens in script tags
-                token_patterns = [
-                    r'Bearer\s+([A-Za-z0-9\-_\.]+)',
-                    r'"access_token":\s*"([^"]+)"',
-                    r'"token":\s*"([^"]+)"',
-                    r'authorization:\s*["\'](Bearer\s+[^"\']+)["\']'
-                ]
-                
-                for pattern in token_patterns:
-                    matches = re.findall(pattern, response.text, re.IGNORECASE)
-                    if matches:
-                        token = matches[0]
-                        if token.startswith('Bearer '):
-                            token = token[7:]  # Remove 'Bearer ' prefix
-                        print(f"  ✅ Found token via pattern: {token[:50]}...")
-                        self.auth_token = token
-                        return token
-        
-        except Exception as e:
-            print(f"  ❌ Main page method failed: {e}")
-        
-        # Method 2: Try token endpoint
-        try:
-            print("  - Trying token endpoint...")
-            token_url = f"{self.base_url}/bin/omni/userTokens"
-            response = self.session.get(token_url, timeout=30)
-            
-            if response.status_code == 200:
-                try:
-                    data = response.json()
-                    if 'access_token' in data:
-                        token = data['access_token']
-                        print(f"  ✅ Got token from endpoint: {token[:50]}...")
-                        self.auth_token = token
-                        return token
-                except:
-                    pass
-                    
-        except Exception as e:
-            print(f"  ❌ Token endpoint failed: {e}")
-        
-        # Method 3: Try CSRF token endpoint
-        try:
-            print("  - Trying CSRF token...")
-            csrf_url = f"{self.base_url}/libs/granite/csrf/token.json"
-            response = self.session.get(csrf_url, timeout=30)
-            
-            if response.status_code == 200:
-                data = response.json()
-                if 'token' in data:
-                    # This might not be the right token, but let's try
-                    print(f"  ⚠️  Got CSRF token (may not work for API): {str(data)[:100]}...")
-                    
-        except Exception as e:
-            print(f"  ❌ CSRF method failed: {e}")
-            
-        print("  ❌ Could not obtain authentication token")
-        return None
-    
-    def search_products_api(self, store_nbr=17506, category_id=723960, include_out_of_stock=True):
-        """Search for products using the API endpoint"""
-        
-        print(f"🔍 Searching products via API...")
-        print(f"   Store: {store_nbr}, Category: {category_id}")
-        
-        if not self.auth_token:
-            print("   ❌ No authentication token available")
-            return []
-        
-        endpoint = f"{self.api_base}/omni/api/v2/category/search/provider"
-        
-        # Headers for API request
-        api_headers = self.headers.copy()
-        api_headers.update({
-            'Content-Type': 'application/json',
-            'Authorization': f'Bearer {self.auth_token}',
-            'Referer': f'{self.base_url}/',
-            'Origin': self.base_url,
-        })
-        
-        # Request payload based on HAR analysis
-        payload = {
-            "StoreNbr": store_nbr,
-            "SearchTerm": None,
-            "PageSize": 48,  # Request more items
-            "PageStartRecordIndex": 0,
-            "Filters": {
-                "category": [],
-                "brand": [],
-                "dgDelivery": False,
-                "dgPickUp": False,
-                "dgShipTohome": False,
-                "soldAtStore": True,
-                "inStock": not include_out_of_stock,  # False = include out of stock
-                "onlyActivatedDeals": False
-            },
-            "IncludeSponsored": True,
-            "IncludeShipToHome": True,
-            "IncludeDeals": True,
-            "offerSourceType": 0,
-            "Id": category_id,
-            "IncludeProducts": False,
-            "DoNotSave": False,
-            "OptOut": False,
-            "SearchType": 1
-        }
-        
-        try:
-            print(f"   POST {endpoint}")
-            response = self.session.post(endpoint, 
-                                       headers=api_headers, 
-                                       json=payload, 
-                                       timeout=30)
-            
-            print(f"   Status: {response.status_code}")
-            print(f"   Response size: {len(response.text)} characters")
-            
-            if response.status_code == 200:
-                if len(response.text) == 0:
-                    print("   ⚠️  Empty response (token may be expired)")
-                    return []
-                
-                try:
-                    data = response.json()
-                    items = data.get('ItemList', {}).get('Items', [])
-                    print(f"   ✅ Found {len(items)} total items")
-                    return items
-                    
-                except Exception as e:
-                    print(f"   ❌ JSON parsing error: {e}")
-                    print(f"   Response preview: {response.text[:200]}...")
-                    return []
-            
-            elif response.status_code == 401:
-                print("   ❌ Authentication failed - token expired or invalid")
-                return []
-            else:
-                print(f"   ❌ API error: {response.status_code}")
-                print(f"   Response: {response.text[:200]}...")
-                return []
-                
-        except Exception as e:
-            print(f"   ❌ Request failed: {e}")
-            return []
-    
-    def filter_pokemon_products(self, items):
-        """Filter for Pokemon TCG products"""
-        
-        pokemon_products = []
-        
-        for item in items:
-            title = item.get('Title', '').lower()
-            description = item.get('Description', '').lower()
-            brand = item.get('Brand', '').lower()
-            
-            # Check if this is a Pokemon TCG product
-            pokemon_keywords = ['pokemon', 'pokémon']
-            tcg_keywords = ['trading card', 'tcg', 'cards', 'pack', 'tin', 'box', 'collection']
-            
-            has_pokemon = any(keyword in title or keyword in description for keyword in pokemon_keywords)
-            has_tcg = any(keyword in title or keyword in description for keyword in tcg_keywords)
-            
-            if has_pokemon and has_tcg:
-                product = {
-                    'title': item.get('Title'),
-                    'sku': item.get('ItemNbr'),
-                    'upc': item.get('UPC'),
-                    'price': f"${item.get('Price', {}).get('Amount', 0):.2f}",
-                    'url': urljoin(self.base_url, item.get('ProductUrl', '')),
-                    'stock': 'In Stock' if item.get('Inventory', {}).get('InStock') else 'Out of Stock',
-                    'image_url': item.get('ImageURL'),
-                    'description': item.get('Description', ''),
-                    'brand': item.get('Brand', '')
-                }
-                pokemon_products.append(product)
-                
-                print(f"   🎯 Found: {product['title']}")
-                print(f"      SKU: {product['sku']}, Price: {product['price']}")
-                print(f"      Stock: {product['stock']}")
-        
-        return pokemon_products
-    
-    def scrape_pokemon_products(self):
-        """Main scraping method"""
-        
-        print("Pokemon Discovery - API-based Scraping")
-        print("="*60)
-        
-        # Get authentication token
-        if not self.get_auth_token():
-            print("❌ Authentication failed - cannot access API")
-            print()
-            print("💡 Alternative approaches:")
-            print("   1. Use browser automation with proper session")
-            print("   2. Extract products manually from individual pages")
-            print("   3. Use the working individual product scraper")
-            return []
-        
-        print()
-        
-        # Search for products
-        all_items = self.search_products_api()
-        
-        if not all_items:
-            print("❌ No items returned from API")
-            return []
-        
-        print()
-        
-        # Filter for Pokemon products
-        pokemon_products = self.filter_pokemon_products(all_items)
-        
-        print()
-        print(f"🎉 SUCCESS! Found {len(pokemon_products)} Pokemon TCG products")
-        
-        if pokemon_products:
-            # Save results
-            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
-            filename = f'pokemon_tcg_api_scrape_{timestamp}.json'
-            
-            with open(filename, 'w') as f:
-                json.dump(pokemon_products, f, indent=2)
-            
-            print(f"💾 Saved to: {filename}")
-            
-            # Show summary
-            print()
-            print("📋 Product Summary:")
-            for i, product in enumerate(pokemon_products, 1):
-                print(f"  {i}. {product['title']}")
-                print(f"     SKU: {product['sku']} | Price: {product['price']} | {product['stock']}")
-        
-        return pokemon_products
-
-def main():
-    scraper = DollarGeneralAPIScaper()
-    products = scraper.scrape_pokemon_products()
-    
-    if products:
-        print()
-        print("🚀 Ready for PDF generation!")
-        print("Run: python pdf_generator.py pokemon_tcg_api_scrape_[timestamp].json")
-    else:
-        print()
-        print("📝 Note: Individual product scraping still works perfectly!")
-        print("The issue is authentication for bulk API access.")
-
-if __name__ == "__main__":
-    main()
--- a/pdf_generator.py
+++ b/pdf_generator.py
@@ -1,279 +0,0 @@
-#!/usr/bin/env python3
-"""
-Pokemon Discovery - TCG Product Catalog PDF Generator
-Generates PDF catalog with product images, details, and UPC-A barcodes
-"""
-
-import json
-import os
-import sys
-import requests
-import subprocess
-from datetime import datetime
-from pathlib import Path
-import barcode
-from barcode.writer import ImageWriter
-from PIL import Image, ImageDraw, ImageFont
-import tempfile
-import shutil
-
-class PokemonTCGCatalogGenerator:
-    def __init__(self, json_file):
-        self.json_file = json_file
-        self.output_dir = Path("catalog_output")
-        self.images_dir = self.output_dir / "images"
-        self.barcodes_dir = self.output_dir / "barcodes"
-        
-        # Create output directories
-        self.output_dir.mkdir(exist_ok=True)
-        self.images_dir.mkdir(exist_ok=True)
-        self.barcodes_dir.mkdir(exist_ok=True)
-        
-        # Load product data
-        with open(json_file, 'r') as f:
-            self.products = json.load(f)
-    
-    def download_image(self, url, filename):
-        """Download product image"""
-        if not url:
-            return None
-            
-        try:
-            response = requests.get(url, timeout=30)
-            response.raise_for_status()
-            
-            filepath = self.images_dir / filename
-            with open(filepath, 'wb') as f:
-                f.write(response.content)
-            
-            return filepath
-        except Exception as e:
-            print(f"Failed to download image {url}: {e}")
-            return None
-    
-    def generate_upc_barcode(self, sku):
-        """Generate UPC-A barcode from SKU"""
-        try:
-            # Convert SKU to 12-digit UPC-A format
-            # Remove non-digits and pad/truncate to 11 digits (12th is check digit)
-            digits_only = ''.join(filter(str.isdigit, str(sku)))
-            
-            if len(digits_only) < 11:
-                # Pad with zeros at the start
-                upc_base = digits_only.zfill(11)
-            else:
-                # Take the last 11 digits
-                upc_base = digits_only[-11:]
-            
-            # Generate UPC-A barcode
-            upc_generator = barcode.get_barcode_class('upca')
-            upc = upc_generator(upc_base, writer=ImageWriter())
-            
-            # Save barcode image
-            barcode_filename = f"barcode_{sku.replace('/', '_').replace(' ', '_')}"
-            barcode_path = self.barcodes_dir / barcode_filename
-            
-            # Save with specific options for better appearance
-            upc.save(str(barcode_path).replace('.png', ''), options={
-                'module_width': 0.2,
-                'module_height': 15.0,
-                'quiet_zone': 6.5,
-                'font_size': 10,
-                'text_distance': 5.0,
-                'background': 'white',
-                'foreground': 'black'
-            })
-            
-            final_path = f"{barcode_path}.png"
-            return final_path
-            
-        except Exception as e:
-            print(f"Failed to generate barcode for SKU {sku}: {e}")
-            return None
-    
-    def create_placeholder_image(self, width=300, height=200):
-        """Create a placeholder image when product image is not available"""
-        img = Image.new('RGB', (width, height), color='lightgray')
-        draw = ImageDraw.Draw(img)
-        
-        try:
-            # Try to use a system font
-            font = ImageFont.truetype('/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf', 24)
-        except:
-            try:
-                font = ImageFont.truetype('arial.ttf', 24)
-            except:
-                font = ImageFont.load_default()
-        
-        text = "No Image\nAvailable"
-        
-        # Get text bounding box for centering
-        lines = text.split('\n')
-        y_offset = height // 2 - (len(lines) * 30) // 2
-        
-        for line in lines:
-            bbox = draw.textbbox((0, 0), line, font=font)
-            text_width = bbox[2] - bbox[0]
-            x_offset = (width - text_width) // 2
-            draw.text((x_offset, y_offset), line, fill='darkgray', font=font)
-            y_offset += 30
-        
-        placeholder_path = self.images_dir / "placeholder.png"
-        img.save(placeholder_path)
-        return placeholder_path
-    
-    def generate_markdown(self):
-        """Generate markdown content for the catalog"""
-        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
-        markdown = f"""---
-title: "Pokemon TCG Product Catalog"
-subtitle: "Dollar General - Generated {timestamp}"
-author: "Automated Scraper"
-date: "{timestamp}"
-geometry: margin=1in
-fontsize: 11pt
-documentclass: article
---
-
-# Pokemon TCG Product Catalog
-
-Generated on: {timestamp}  
-Source: Dollar General  
-Total Products: {len(self.products)}
-
---
-
-"""
-        
-        for i, product in enumerate(self.products, 1):
-            print(f"Processing product {i}/{len(self.products)}: {product.get('title', 'Unknown')}")
-            
-            # Download product image
-            image_path = None
-            if product.get('image_url'):
-                filename = f"product_{i}_{product.get('sku', 'unknown').replace('/', '_').replace(' ', '_')}.jpg"
-                image_path = self.download_image(product.get('image_url'), filename)
-            
-            if not image_path:
-                # Use placeholder
-                image_path = self.create_placeholder_image()
-            
-            # Generate barcode
-            barcode_path = None
-            if product.get('sku'):
-                barcode_path = self.generate_upc_barcode(product.get('sku'))
-            
-            # Add product section to markdown
-            markdown += f"## {i}. {product.get('title', 'Unknown Product')}\n\n"
-            
-            # Product image
-            if image_path:
-                rel_image_path = os.path.relpath(image_path, self.output_dir)
-                markdown += f"![Product Image]({rel_image_path}){{width=300px}}\n\n"
-            
-            # Product details in a table
-            markdown += "| Field | Value |\n"
-            markdown += "|-------|-------|\n"
-            markdown += f"| **Title** | {product.get('title', 'N/A')} |\n"
-            markdown += f"| **Price** | {product.get('price', 'N/A')} |\n"
-            markdown += f"| **Stock** | {product.get('stock', 'N/A')} |\n"
-            markdown += f"| **SKU** | `{product.get('sku', 'N/A')}` |\n"
-            markdown += f"| **URL** | {product.get('url', 'N/A')} |\n"
-            markdown += "\n"
-            
-            # Barcode
-            if barcode_path:
-                rel_barcode_path = os.path.relpath(barcode_path, self.output_dir)
-                markdown += f"**UPC-A Barcode:**\n\n"
-                markdown += f"![UPC-A Barcode]({rel_barcode_path}){{width=200px}}\n\n"
-            
-            markdown += "---\n\n"
-        
-        return markdown
-    
-    def generate_pdf(self):
-        """Generate PDF catalog using pandoc"""
-        print("Generating markdown content...")
-        markdown_content = self.generate_markdown()
-        
-        # Save markdown file
-        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
-        markdown_file = self.output_dir / f"pokemon_tcg_catalog_{timestamp}.md"
-        
-        with open(markdown_file, 'w', encoding='utf-8') as f:
-            f.write(markdown_content)
-        
-        print(f"Markdown saved to: {markdown_file}")
-        
-        # Generate PDF using pandoc
-        pdf_file = self.output_dir / f"pokemon_tcg_catalog_{timestamp}.pdf"
-        
-        print("Converting to PDF using pandoc...")
-        
-        try:
-            subprocess.run([
-                'pandoc',
-                str(markdown_file),
-                '-o', str(pdf_file),
-                '--pdf-engine=xelatex',
-                '-V', 'colorlinks=true',
-                '-V', 'linkcolor=blue',
-                '-V', 'filecolor=magenta',
-                '-V', 'urlcolor=cyan',
-                '--toc',
-                '--toc-depth=2'
-            ], check=True)
-            
-            print(f"PDF generated successfully: {pdf_file}")
-            return pdf_file
-            
-        except subprocess.CalledProcessError as e:
-            print(f"Pandoc conversion failed: {e}")
-            print("Trying with pdflatex instead...")
-            
-            try:
-                subprocess.run([
-                    'pandoc',
-                    str(markdown_file),
-                    '-o', str(pdf_file),
-                    '--pdf-engine=pdflatex',
-                    '--toc'
-                ], check=True)
-                
-                print(f"PDF generated successfully: {pdf_file}")
-                return pdf_file
-                
-            except subprocess.CalledProcessError as e2:
-                print(f"PDF generation failed with both engines: {e2}")
-                print(f"Markdown file available at: {markdown_file}")
-                return None
-        
-        except FileNotFoundError:
-            print("Error: pandoc not found. Please install pandoc to generate PDF.")
-            print(f"Markdown file available at: {markdown_file}")
-            return None
-
-def main():
-    if len(sys.argv) != 2:
-        print("Usage: python3 pdf_generator.py <json_file>")
-        print("Example: python3 pdf_generator.py pokemon_tcg_products_20241221_143025.json")
-        sys.exit(1)
-    
-    json_file = sys.argv[1]
-    
-    if not os.path.exists(json_file):
-        print(f"Error: JSON file '{json_file}' not found")
-        sys.exit(1)
-    
-    generator = PokemonTCGCatalogGenerator(json_file)
-    pdf_file = generator.generate_pdf()
-    
-    if pdf_file:
-        print(f"\nCatalog generation completed!")
-        print(f"PDF file: {pdf_file}")
-        print(f"Output directory: {generator.output_dir}")
-    else:
-        print(f"\nPDF generation failed, but markdown file is available in: {generator.output_dir}")
-
-if __name__ == "__main__":
-    main()
--- a/run.sh
+++ b/run.sh
@@ -1,31 +0,0 @@
-#!/bin/bash
-# Pokemon Discovery - Scraper & Catalog Generator Launcher
-# Automatically activates virtual environment and runs the scraper
-
-set -e
-
-cd "$(dirname "$0")"
-
-echo "Pokemon Discovery - Product Scraper & Catalog Generator"
-echo "================================================"
-
-# Check if virtual environment exists
-if [[ ! -d "venv" ]]; then
-    echo "Creating virtual environment..."
-    python3 -m venv venv
-fi
-
-# Activate virtual environment
-source venv/bin/activate
-
-# Check if requirements are installed
-if ! python -c "import requests, bs4, barcode, selenium" 2>/dev/null; then
-    echo "Installing Python requirements..."
-    pip install -r requirements.txt
-fi
-
-# Run the main script
-python run_scraper.py
-
-echo ""
-echo "Script completed. Check the output above for results."
--- a/run_scraper.py
+++ b/run_scraper.py
@@ -1,139 +0,0 @@
-#!/usr/bin/env python3
-"""
-Pokemon Discovery - Scraper and Catalog Generator
-Main script that runs both scraping and PDF generation
-"""
-
-import os
-import sys
-import subprocess
-from datetime import datetime
-from pathlib import Path
-
-def install_requirements():
-    """Install Python requirements"""
-    print("Installing Python requirements...")
-    try:
-        subprocess.run([sys.executable, '-m', 'pip', 'install', '-r', 'requirements.txt'], 
-                      check=True)
-        print("Requirements installed successfully!")
-    except subprocess.CalledProcessError as e:
-        print(f"Failed to install requirements: {e}")
-        return False
-    return True
-
-def run_scraper():
-    """Run the scraper to collect product data"""
-    print("=" * 60)
-    print("STEP 1: SCRAPING POKEMON TCG PRODUCTS")
-    print("=" * 60)
-    
-    try:
-        result = subprocess.run([sys.executable, 'scraper.py'], 
-                               capture_output=True, text=True)
-        
-        if result.returncode == 0:
-            print("Scraping completed successfully!")
-            print(result.stdout)
-            
-            # Find the generated JSON file
-            json_files = list(Path('.').glob('pokemon_tcg_products_*.json'))
-            if json_files:
-                latest_file = max(json_files, key=os.path.getctime)
-                return str(latest_file)
-            else:
-                print("No JSON file was generated")
-                return None
-        else:
-            print("Scraping failed:")
-            print(result.stderr)
-            return None
-            
-    except Exception as e:
-        print(f"Error running scraper: {e}")
-        return None
-
-def run_pdf_generator(json_file):
-    """Run the PDF generator with the scraped data"""
-    print("=" * 60)
-    print("STEP 2: GENERATING PDF CATALOG")
-    print("=" * 60)
-    
-    try:
-        result = subprocess.run([sys.executable, 'pdf_generator.py', json_file], 
-                               capture_output=True, text=True)
-        
-        if result.returncode == 0:
-            print("PDF generation completed successfully!")
-            print(result.stdout)
-            return True
-        else:
-            print("PDF generation failed:")
-            print(result.stderr)
-            return False
-            
-    except Exception as e:
-        print(f"Error running PDF generator: {e}")
-        return False
-
-def main():
-    print("Pokemon Discovery - Product Scraper & Catalog Generator")
-    print("=" * 60)
-    print(f"Started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
-    print()
-    
-    # Check if requirements are installed
-    try:
-        import requests, bs4, barcode, PIL
-        print("✓ Required packages are available")
-    except ImportError as e:
-        print(f"✗ Missing required package: {e}")
-        print("Installing requirements...")
-        if not install_requirements():
-            sys.exit(1)
-    
-    # Check if pandoc is available
-    try:
-        subprocess.run(['pandoc', '--version'], 
-                      capture_output=True, check=True)
-        print("✓ Pandoc is available for PDF generation")
-    except (subprocess.CalledProcessError, FileNotFoundError):
-        print("⚠ Pandoc not found. PDF generation may fail.")
-        print("  Install pandoc with: sudo apt install pandoc (Ubuntu/Debian)")
-        print("  or: brew install pandoc (macOS)")
-        print("  or: pacman -S pandoc (Arch Linux)")
-    
-    print()
-    
-    # Run scraper
-    json_file = run_scraper()
-    if not json_file:
-        print("Scraping failed. Exiting.")
-        sys.exit(1)
-    
-    # Run PDF generator
-    if run_pdf_generator(json_file):
-        print("=" * 60)
-        print("SUCCESS! Both scraping and PDF generation completed.")
-        print("=" * 60)
-        print(f"JSON data: {json_file}")
-        print("PDF catalog: Check the catalog_output/ directory")
-        print()
-        print("Files generated:")
-        
-        # List generated files
-        for file_pattern in ['pokemon_tcg_products_*.json', 'catalog_output/pokemon_tcg_catalog_*.pdf']:
-            files = list(Path('.').glob(file_pattern))
-            if files:
-                latest = max(files, key=os.path.getctime)
-                print(f"  - {latest}")
-    else:
-        print("=" * 60)
-        print("PARTIAL SUCCESS: Scraping completed, but PDF generation failed.")
-        print("=" * 60)
-        print(f"JSON data: {json_file}")
-        print("You can manually run the PDF generator with:")
-        print(f"  python3 pdf_generator.py {json_file}")
-
-if __name__ == "__main__":
-    main()
--- a/scraper.py
+++ b/scraper.py
@@ -1,7 +1,20 @@
 #!/usr/bin/env python3
 """
-Pokemon Discovery - TCG Product Scraper for Dollar General
-Scrapes product information and saves to JSON for PDF generation
+Pokemon Discovery — Site Scraper (Reference)
+
+HTML + Selenium/Brave scraper for Dollar General product pages.
+Kept as a reference implementation. The primary tool is disco.py,
+which reads product data from a HAR capture instead of scraping live.
+
+This scraper can:
+  - Fetch individual product pages and extract title, SKU, price, stock
+  - Attempt to find product links from the category page (limited by
+    dynamic JS loading — products are injected via API after page load)
+  - Fall back to Brave browser via Selenium for JS-rendered content
+
+Usage:
+    python scraper.py                  # Attempt full category scrape
+    # Or import and use PokemonTCGScraper class directly for individual pages
 """

 import json
@@ -28,6 +41,14 @@ except ImportError:
    print("Selenium not available, using requests only (install selenium for Brave browser support)")

 class PokemonTCGScraper:
+    """HTML/Selenium scraper for Dollar General Pokemon product pages.
+
+    Can extract product details (title, SKU, price, stock) from individual
+    product page URLs. Category-level scraping is limited because Dollar
+    General loads products dynamically via a JS API call after page load.
+    See disco.py for the HAR-based approach that bypasses this limitation.
+    """
+
    def __init__(self):
        self.base_url = "https://www.dollargeneral.com"
        self.search_url = "https://www.dollargeneral.com/c/toys/pokemon?q=&soldAtStore=true"
@@ -300,9 +321,10 @@ class PokemonTCGScraper:
        return has_pokemon and has_tcg
    
    def try_api_scraping(self):
-        """
-        Try to scrape products using the discovered API endpoint
-        This method contains the exact API call found via HAR analysis
+        """Stub for API-based scraping (requires auth token).
+
+        Documents the discovered API endpoint and request format.
+        Not functional — use disco.py with a HAR file instead.
        """
        print("🔬 Attempting API-based scraping...")
        print("   Endpoint: https://dggo.dollargeneral.com/omni/api/v2/category/search/provider")
--- a/test_api_scraper.py
+++ b/test_api_scraper.py
@@ -1,246 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test the Dollar General API endpoint for Pokemon products
-"""
-
-import json
-import requests
-import sys
-from datetime import datetime
-
-def get_auth_token():
-    """Get authentication token from Dollar General"""
-    try:
-        # Try to get token from the token endpoint
-        token_url = 'https://www.dollargeneral.com/bin/omni/userTokens'
-        headers = {
-            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:148.0) Gecko/20100101 Firefox/148.0',
-            'Accept': 'application/json, text/plain, */*',
-            'Referer': 'https://www.dollargeneral.com/'
-        }
-        
-        response = requests.get(token_url, headers=headers, timeout=30)
-        if response.status_code == 200:
-            data = response.json()
-            # Look for access token in the response
-            if 'access_token' in data:
-                return data['access_token']
-            elif 'token' in data:
-                return data['token']
-            else:
-                print("Token response structure:", list(data.keys()))
-                return None
-        else:
-            print(f"Failed to get token: {response.status_code}")
-            return None
-    except Exception as e:
-        print(f"Error getting token: {e}")
-        return None
-
-def test_api_with_existing_token():
-    """Test with the token from HAR file"""
-    
-    # Token extracted from HAR file (may expire)
-    har_token = "eyJ0eXAiOiJhdCtKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6Ik5qRTJNemczTXpSRVFrUXpNak5GUmprMU1FUkNNRUZDTVRBek1FWTFRa0pCTXpRM1EwTkNNZyJ9.eyJzY29wZSI6bnVsbCwiaWF0IjoxNzc0MTI3Nzc5LCJleHAiOjE3NzQxMzEzNzksImF1ZCI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiIsImlzcyI6Imh0dHBzOi8vcHJvZC1kZ2dvLyIsInN1YiI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiIsInNpZCI6IlNrWk9makF5TURRMU1EVXpOVFEwWWpBM016SXpNak14TXpFek9ETTNNekV3TWpreFl6VitUVUZXYVhwbk56SXpVRGg2VWxkcmEySkRkMk5EZUdVNFlUWm5XVXBHVDBveVExTlRNVWxXWlhSalQzRnFWazVWZGtGWlIwOWtZV2x0WVVwRVRucG5SVlZvUTE5SE5VcHVObGhuTURSb2JuUkVhVlF3UTBzelNIND0iLCJqdGkiOiJzdDIucy5BdEx0VlphRHFnLnZrdW5OV2RWNjN2ZlJTTG00Y3VUd2d5bmc2X0pJNmxKRjA5a2lXTXVQeGZkVDRvT0NhMXhwa1VoRlRkM2tocHZUaFhsRUVwLWw0QzJrZnoycjkzVlYzeldBaUw5Y2x6Snl0amFJamJ4TEJnLkJOZy1CeUdpZnV0WnppQWhhMV8xRDBXTUFWR3JpNVVCX0pKbTRCNVRNYVhTWkZneXpxeUZERjJxZ3B3UTgyajZ2eGVtcnA5RERFTHZnM3hvdlZmZzBnLnNjMyIsImNsaWVudF9pZCI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiIsImF6cCI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiJ9.I6ou9atkJ8ndkr2m2Trpg53fMIL3hpofCLUHoHYgZkOJnLnbmL0CQu7_pIChQ6nIDK03GagK6aqxd97E8B8vv9nweSmb7zXhrt43dKLEIdhxIGFkJ4xYgNNg-3cVjSlThBQ_AwCx924lOGjEfikEw4NrvGvrlNvrg1lnNz4hf629hUH-5ccVSdgo1w_LQzsLOeMCjuC_bmAoRxT5KLI9oESd4tPJZU5Nlt2ICbWJD9h-zNrt-ijwYCvb7j8amGbpMGhJZqtzu9f3wN0JUFxDg5rAN-WOtLjwEmR_NxDKq0NEeuU16uhaB8AJzy217XAgJ87bKZldZowsWs-Q9oAH3g"
-    
-    endpoint = "https://dggo.dollargeneral.com/omni/api/v2/category/search/provider"
-    
-    headers = {
-        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:148.0) Gecko/20100101 Firefox/148.0',
-        'Accept': 'application/json, text/plain, */*',
-        'Content-Type': 'application/json',
-        'Authorization': f'Bearer {har_token}',
-        'Referer': 'https://www.dollargeneral.com/'
-    }
-    
-    # Test different filter combinations
-    test_requests = [
-        {
-            "name": "In Stock Pokemon Products",
-            "payload": {
-                "StoreNbr": 17506,
-                "SearchTerm": None,
-                "PageSize": 24,
-                "PageStartRecordIndex": 0,
-                "Filters": {
-                    "category": [],
-                    "brand": [],
-                    "dgDelivery": False,
-                    "dgPickUp": False,
-                    "dgShipTohome": False,
-                    "soldAtStore": True,
-                    "inStock": True,
-                    "onlyActivatedDeals": False
-                },
-                "IncludeSponsored": True,
-                "IncludeShipToHome": True,
-                "IncludeDeals": True,
-                "offerSourceType": 0,
-                "Id": 723960,  # Pokemon category ID
-                "IncludeProducts": False,
-                "DoNotSave": False,
-                "OptOut": False,
-                "SearchType": 1
-            }
-        },
-        {
-            "name": "All Pokemon Products (including out of stock)",
-            "payload": {
-                "StoreNbr": 17506,
-                "SearchTerm": None,
-                "PageSize": 24,
-                "PageStartRecordIndex": 0,
-                "Filters": {
-                    "category": [],
-                    "brand": [],
-                    "dgDelivery": False,
-                    "dgPickUp": False,
-                    "dgShipTohome": False,
-                    "soldAtStore": True,
-                    "inStock": False,  # Include out of stock
-                    "onlyActivatedDeals": False
-                },
-                "IncludeSponsored": True,
-                "IncludeShipToHome": True,
-                "IncludeDeals": True,
-                "offerSourceType": 0,
-                "Id": 723960,
-                "IncludeProducts": False,
-                "DoNotSave": False,
-                "OptOut": False,
-                "SearchType": 1
-            }
-        }
-    ]
-    
-    all_pokemon_products = []
-    
-    for test in test_requests:
-        print(f"=== Testing: {test['name']} ===")
-        
-        try:
-            response = requests.post(endpoint, 
-                                   headers=headers, 
-                                   json=test['payload'], 
-                                   timeout=30)
-            
-            print(f"Status Code: {response.status_code}")
-            
-            if response.status_code == 200:
-                print(f"Response length: {len(response.text)} characters")
-                print(f"Response preview: {response.text[:200]}...")
-                
-                try:
-                    data = response.json()
-                    items = data.get('ItemList', {}).get('Items', [])
-                    print(f"Total products: {len(items)}")
-                except Exception as json_error:
-                    print(f"JSON parsing error: {json_error}")
-                    print(f"Full response: {response.text}")
-                    continue
-                
-                # Filter for Pokemon products
-                pokemon_products = []
-                for item in items:
-                    title = item.get('Title', '').lower()
-                    if any(keyword in title for keyword in ['pokemon', 'pokémon', 'trading card']):
-                        product_info = {
-                            'title': item.get('Title'),
-                            'sku': item.get('ItemNbr'),
-                            'upc': item.get('UPC'),
-                            'price': item.get('Price', {}).get('Amount'),
-                            'url': f"https://www.dollargeneral.com{item.get('ProductUrl', '')}",
-                            'in_stock': item.get('Inventory', {}).get('InStock'),
-                            'image_url': item.get('ImageURL'),
-                            'description': item.get('Description', ''),
-                            'brand': item.get('Brand', '')
-                        }
-                        pokemon_products.append(product_info)
-                        all_pokemon_products.append(product_info)
-                
-                print(f"Pokemon products found: {len(pokemon_products)}")
-                
-                for i, prod in enumerate(pokemon_products, 1):
-                    print(f"  {i}. {prod['title']}")
-                    print(f"     SKU: {prod['sku']}, UPC: {prod['upc']}")
-                    print(f"     Price: ${prod['price']}, In Stock: {prod['in_stock']}")
-                    print(f"     URL: {prod['url']}")
-                    
-                    # Check if this is our test product
-                    if prod['sku'] == '41936301':
-                        print(f"     🎯 THIS IS OUR TEST PRODUCT!")
-                    print()
-                
-            elif response.status_code == 401:
-                print("❌ Authentication failed - token may be expired")
-                print("Response:", response.text)
-                return None
-            else:
-                print(f"❌ API call failed: {response.status_code}")
-                print("Response:", response.text[:500])
-            
-        except Exception as e:
-            print(f"❌ Error: {e}")
-        
-        print("="*60)
-        print()
-    
-    # Save results
-    if all_pokemon_products:
-        # Remove duplicates based on SKU
-        unique_products = {prod['sku']: prod for prod in all_pokemon_products}.values()
-        unique_products = list(unique_products)
-        
-        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
-        filename = f'pokemon_tcg_api_results_{timestamp}.json'
-        
-        with open(filename, 'w') as f:
-            json.dump(unique_products, f, indent=2)
-        
-        print(f"🎉 SUCCESS!")
-        print(f"Found {len(unique_products)} unique Pokemon TCG products")
-        print(f"Saved to: {filename}")
-        
-        return unique_products
-    
-    return None
-
-def main():
-    print("Pokemon Discovery - API Endpoint Test")
-    print("="*60)
-    
-    # First try to get a fresh token
-    print("Attempting to get fresh authentication token...")
-    fresh_token = get_auth_token()
-    
-    if fresh_token:
-        print(f"✅ Got fresh token: {fresh_token[:50]}...")
-    else:
-        print("⚠️  Could not get fresh token, using HAR token")
-    
-    print()
-    
-    # Test API with existing token from HAR
-    products = test_api_with_existing_token()
-    
-    if products:
-        print()
-        print("🚀 READY FOR INTEGRATION!")
-        print("The API endpoint is working and can be integrated into Pokemon Discovery")
-        print()
-        
-        # Check if our known product is in the results
-        known_sku = '41936301'
-        known_product = next((p for p in products if p['sku'] == known_sku), None)
-        
-        if known_product:
-            print(f"✅ Confirmed: Our test product (SKU {known_sku}) was found via API!")
-            print(f"   Title: {known_product['title']}")
-            print(f"   URL: {known_product['url']}")
-            print(f"   Stock: {known_product['in_stock']}")
-        
-    else:
-        print("❌ API test failed - may need fresh authentication")
-
-if __name__ == "__main__":
-    main()
--- a/test_barcode.py
+++ b/test_barcode.py
@@ -1,55 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test script to verify barcode generation functionality
-"""
-
-import sys
-import os
-from pathlib import Path
-
-# Add current directory to path if running in venv
-sys.path.insert(0, '.')
-
-try:
-    import barcode
-    from barcode.writer import ImageWriter
-    print("✓ Barcode generation libraries are available")
-    
-    # Test barcode generation
-    test_sku = "123456789012"
-    
-    upc_generator = barcode.get_barcode_class('upca')
-    test_barcode = upc_generator("12345678901", writer=ImageWriter())
-    
-    # Create test output directory
-    test_dir = Path("test_output")
-    test_dir.mkdir(exist_ok=True)
-    
-    # Generate test barcode
-    barcode_path = test_dir / "test_barcode"
-    test_barcode.save(str(barcode_path), options={
-        'module_width': 0.2,
-        'module_height': 15.0,
-        'quiet_zone': 6.5,
-        'font_size': 10,
-        'text_distance': 5.0,
-        'background': 'white',
-        'foreground': 'black'
-    })
-    
-    final_path = f"{barcode_path}.png"
-    if os.path.exists(final_path):
-        print(f"✓ Test barcode generated successfully: {final_path}")
-        print(f"  File size: {os.path.getsize(final_path)} bytes")
-    else:
-        print(f"✗ Failed to generate test barcode")
-        sys.exit(1)
-        
-except ImportError as e:
-    print(f"✗ Missing barcode library: {e}")
-    sys.exit(1)
-except Exception as e:
-    print(f"✗ Barcode generation failed: {e}")
-    sys.exit(1)
-
-print("✓ All barcode generation tests passed!")
--- a/test_brave.py
+++ b/test_brave.py
@@ -1,67 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test Brave browser integration with Pokemon Discovery
-"""
-
-import sys
-import os
-
-try:
-    from selenium import webdriver
-    from selenium.webdriver.chrome.options import Options
-    from selenium.webdriver.chrome.service import Service
-    from webdriver_manager.chrome import ChromeDriverManager
-    
-    print("✓ Selenium and webdriver-manager are available")
-    
-    # Check if Brave is available
-    if not os.path.exists('/usr/bin/brave'):
-        print("✗ Brave browser not found at /usr/bin/brave")
-        sys.exit(1)
-    
-    print("✓ Brave browser found at /usr/bin/brave")
-    
-    # Get Brave version
-    import subprocess
-    try:
-        result = subprocess.run(['/usr/bin/brave', '--version'], 
-                              capture_output=True, text=True, timeout=5)
-        brave_version = result.stdout.strip()
-        print(f"✓ {brave_version}")
-    except:
-        print("⚠ Could not get Brave version")
-    
-    # Test ChromeDriver compatibility
-    print("\nTesting ChromeDriver compatibility...")
-    options = Options()
-    options.add_argument('--headless')
-    options.add_argument('--no-sandbox')
-    options.add_argument('--disable-dev-shm-usage')
-    options.binary_location = '/usr/bin/brave'
-    
-    try:
-        service = Service(ChromeDriverManager().install())
-        driver = webdriver.Chrome(service=service, options=options)
-        
-        # Simple test page
-        driver.get("data:text/html,<html><body><h1>Test</h1></body></html>")
-        title = driver.title
-        driver.quit()
-        
-        print("✓ Brave + ChromeDriver test successful!")
-        print("✓ Pokemon Discovery is ready to use Brave for dynamic content")
-        
-    except Exception as e:
-        print(f"✗ ChromeDriver compatibility issue: {e}")
-        print("\n💡 Solutions:")
-        print("1. Update ChromeDriver: pip install --upgrade webdriver-manager")
-        print("2. Install matching ChromeDriver version manually")
-        print("3. Use Firefox with geckodriver as alternative")
-        print("\nNote: The main PDF generation functionality works without browser automation")
-
-except ImportError as e:
-    print(f"✗ Missing dependency: {e}")
-    print("Run: pip install selenium webdriver-manager")
-    sys.exit(1)
-
-print("\n🎯 Test completed!")
--- a/test_data.json
+++ b/test_data.json
@@ -1,26 +0,0 @@
-[
-  {
-    "title": "Pokemon Trading Card Game Battle Academy",
-    "price": "$19.95",
-    "stock": "In Stock",
-    "sku": "DG12345678",
-    "image_url": "https://via.placeholder.com/300x200?text=Pokemon+Battle+Academy",
-    "url": "https://www.dollargeneral.com/p/pokemon-battle-academy"
-  },
-  {
-    "title": "Pokemon TCG Scarlet & Violet Booster Pack",
-    "price": "$4.25",
-    "stock": "In Stock", 
-    "sku": "DG87654321",
-    "image_url": "https://via.placeholder.com/300x200?text=Pokemon+Booster+Pack",
-    "url": "https://www.dollargeneral.com/p/pokemon-scarlet-violet-booster"
-  },
-  {
-    "title": "Pokemon Tin Collection Box",
-    "price": "$12.95",
-    "stock": "Low Stock",
-    "sku": "DG11223344",
-    "image_url": "https://via.placeholder.com/300x200?text=Pokemon+Tin+Box",
-    "url": "https://www.dollargeneral.com/p/pokemon-tin-collection"
-  }
-]
--- a/test_dynamic_scraping.py
+++ b/test_dynamic_scraping.py
@@ -1,152 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test dynamic content loading for Pokemon Discovery
-"""
-
-import requests
-import json
-from bs4 import BeautifulSoup
-import time
-
-def test_api_endpoints():
-    """Try to find API endpoints that might return product data"""
-    
-    headers = {
-        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
-        'Accept': 'application/json, text/plain, */*',
-        'Accept-Language': 'en-US,en;q=0.9',
-        'Referer': 'https://www.dollargeneral.com/c/toys/pokemon'
-    }
-    
-    # Test potential API endpoints
-    api_tests = [
-        'https://www.dollargeneral.com/api/products/search?q=pokemon',
-        'https://www.dollargeneral.com/api/v1/products?category=toys&query=pokemon',
-        'https://www.dollargeneral.com/dg/search?q=pokemon&category=toys',
-        'https://www.dollargeneral.com/api/search?term=pokemon+trading+card',
-    ]
-    
-    print("=== Testing API Endpoints ===")
-    for url in api_tests:
-        try:
-            print(f"Testing: {url}")
-            response = requests.get(url, headers=headers, timeout=10)
-            print(f"  Status: {response.status_code}")
-            
-            if response.status_code == 200:
-                try:
-                    data = response.json()
-                    print(f"  JSON Response: {len(str(data))} characters")
-                    if 'products' in str(data).lower():
-                        print("  ✓ Contains 'products'")
-                    if 'pokemon' in str(data).lower():
-                        print("  ✓ Contains 'pokemon'")
-                except:
-                    print(f"  Text Response: {len(response.text)} characters")
-            print()
-        except Exception as e:
-            print(f"  Error: {e}")
-            print()
-
-def test_network_requests():
-    """Analyze the search page to find AJAX calls"""
-    
-    url = 'https://www.dollargeneral.com/c/toys/pokemon?q=&soldAtStore=true'
-    
-    headers = {
-        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
-    }
-    
-    print("=== Analyzing Search Page for API Calls ===")
-    
-    try:
-        response = requests.get(url, headers=headers, timeout=30)
-        soup = BeautifulSoup(response.text, 'html.parser')
-        
-        # Look for API endpoints in JavaScript
-        scripts = soup.find_all('script')
-        api_patterns = []
-        
-        for script in scripts:
-            if script.string:
-                content = script.string
-                
-                # Look for API endpoints
-                import re
-                patterns = [
-                    r'(?:api|Api|API)["\'\s]*[:=]["\'\s]*([^"\']+)',
-                    r'(?:endpoint|url|baseURL)["\'\s]*[:=]["\'\s]*([^"\']+)',
-                    r'fetch\s*\(\s*["\']([^"\']+)["\']',
-                    r'xhr\.open\s*\(\s*["\'][^"\']*["\'],\s*["\']([^"\']+)["\']',
-                    r'/api/[^"\'\\s]+',
-                    r'/search[^"\'\\s]*',
-                ]
-                
-                for pattern in patterns:
-                    matches = re.findall(pattern, content, re.IGNORECASE)
-                    for match in matches:
-                        if 'dollargeneral' in match or match.startswith('/'):
-                            api_patterns.append(match)
-        
-        # Remove duplicates and clean up
-        unique_apis = list(set(api_patterns))
-        
-        print(f"Found {len(unique_apis)} potential API endpoints:")
-        for api in unique_apis[:10]:  # Show first 10
-            print(f"  -> {api}")
-        
-        return unique_apis
-        
-    except Exception as e:
-        print(f"Error analyzing page: {e}")
-        return []
-
-def test_sitemap_approach():
-    """Try to find products via sitemap"""
-    
-    print("=== Testing Sitemap Approach ===")
-    
-    sitemap_urls = [
-        'https://www.dollargeneral.com/sitemap.xml',
-        'https://www.dollargeneral.com/robots.txt'
-    ]
-    
-    for url in sitemap_urls:
-        try:
-            print(f"Testing: {url}")
-            response = requests.get(url, timeout=10)
-            print(f"  Status: {response.status_code}")
-            
-            if response.status_code == 200:
-                content = response.text
-                if 'pokemon' in content.lower():
-                    print("  ✓ Contains Pokemon references")
-                if '/p/' in content:
-                    print("  ✓ Contains product URLs (/p/)")
-                print(f"  Content length: {len(content)} characters")
-            print()
-        except Exception as e:
-            print(f"  Error: {e}")
-            print()
-
-if __name__ == "__main__":
-    print("Pokemon Discovery - Dynamic Content Testing")
-    print("=" * 60)
-    print()
-    
-    # Test various approaches to find products
-    test_api_endpoints()
-    print()
-    
-    apis = test_network_requests()
-    print()
-    
-    test_sitemap_approach()
-    print()
-    
-    print("=" * 60)
-    print("Summary:")
-    print("- Individual product extraction: ✅ WORKING")
-    print("- Product URLs can be processed if found")
-    print("- Main challenge: Finding product URLs from search page")
-    print("- Dynamic content requires browser automation or API discovery")
--- a/test_real_products.py
+++ b/test_real_products.py
@@ -1,165 +0,0 @@
-#!/usr/bin/env python3
-"""
-Test Pokemon Discovery with real Dollar General Pokemon products
-Demonstrates full working pipeline with known products
-"""
-
-import json
-import sys
-import os
-from datetime import datetime
-
-# Add current directory to path
-sys.path.insert(0, '.')
-
-from scraper import PokemonTCGScraper
-from pdf_generator import PokemonTCGCatalogGenerator
-
-def test_known_products():
-    """Test with known Pokemon TCG products from Dollar General"""
-    
-    # Known Pokemon TCG products (you can add more as you find them)
-    known_products = [
-        'https://www.dollargeneral.com/p/pok-mon-trading-card-game-card-pack-ct/728192558375',
-        # Add more product URLs here as they're discovered
-    ]
-    
-    print("Pokemon Discovery - Real Product Test")
-    print("=" * 50)
-    print(f"Testing with {len(known_products)} known products")
-    print()
-    
-    scraper = PokemonTCGScraper()
-    products_found = []
-    
-    for i, url in enumerate(known_products, 1):
-        print(f"Testing product {i}/{len(known_products)}")
-        print(f"URL: {url}")
-        
-        # Get product page
-        html = scraper.get_page_content(url)
-        
-        if html:
-            # Extract product information
-            product = scraper.extract_product_info(url, html)
-            
-            # Check if it's a Pokemon TCG product
-            if scraper.is_pokemon_tcg_product(product):
-                products_found.append(product)
-                print(f"✓ FOUND: {product.get('title', 'Unknown')}")
-                print(f"  SKU: {product.get('sku', 'N/A')}")
-                print(f"  Price: {product.get('price', 'N/A')}")
-                
-                # Try to get additional data we might have missed
-                if not product.get('price'):
-                    print("  (Attempting to find price...)")
-                    from bs4 import BeautifulSoup
-                    soup = BeautifulSoup(html, 'html.parser')
-                    
-                    # More price selectors
-                    price_selectors = ['[data-testid="price"]', '.price-display', '.current-price', '[class*="price"]']
-                    for selector in price_selectors:
-                        price_elem = soup.select_one(selector)
-                        if price_elem and not product.get('price'):
-                            price_text = price_elem.get_text().strip()
-                            if '$' in price_text:
-                                product['price'] = price_text
-                                print(f"  Found price: {price_text}")
-                                break
-                
-                # Try to get stock info
-                if not product.get('stock'):
-                    print("  (Attempting to find stock status...)")
-                    from bs4 import BeautifulSoup
-                    soup = BeautifulSoup(html, 'html.parser')
-                    
-                    # Look for stock indicators
-                    if 'in stock' in html.lower():
-                        product['stock'] = 'In Stock'
-                    elif 'out of stock' in html.lower():
-                        product['stock'] = 'Out of Stock'
-                    elif 'available' in html.lower():
-                        product['stock'] = 'Available'
-                    else:
-                        product['stock'] = 'Unknown'
-                    
-                    print(f"  Stock: {product.get('stock')}")
-            else:
-                print("✗ Not a Pokemon TCG product")
-        else:
-            print("✗ Failed to get product page")
-        
-        print()
-    
-    if products_found:
-        print(f"SUCCESS! Found {len(products_found)} Pokemon TCG products")
-        print()
-        
-        # Save to JSON file
-        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
-        json_file = f'pokemon_tcg_products_real_{timestamp}.json'
-        
-        with open(json_file, 'w') as f:
-            json.dump(products_found, f, indent=2)
-        
-        print(f"✓ Saved product data: {json_file}")
-        
-        # Generate PDF catalog
-        print("✓ Generating PDF catalog...")
-        
-        try:
-            generator = PokemonTCGCatalogGenerator(json_file)
-            pdf_file = generator.generate_pdf()
-            
-            if pdf_file:
-                print(f"✓ PDF catalog generated: {pdf_file}")
-                
-                # Show file sizes
-                import os
-                if os.path.exists(pdf_file):
-                    size = os.path.getsize(pdf_file) / 1024
-                    print(f"  PDF size: {size:.1f} KB")
-                
-                # Count barcodes generated
-                barcode_dir = generator.barcodes_dir
-                if barcode_dir.exists():
-                    barcodes = list(barcode_dir.glob('*.png'))
-                    print(f"  Barcodes generated: {len(barcodes)}")
-                
-                print()
-                print("🎉 COMPLETE SUCCESS!")
-                print("Pokemon Discovery successfully:")
-                print(f"  • Scraped {len(products_found)} real products from Dollar General")
-                print("  • Generated professional PDF catalog")
-                print("  • Created scannable UPC-A barcodes")
-                print("  • Used Unix-friendly timestamped files")
-                
-                return True
-        
-        except Exception as e:
-            print(f"Error generating PDF: {e}")
-            print("But product scraping was successful!")
-            return True
-    
-    else:
-        print("No Pokemon TCG products found.")
-        print()
-        print("This could be due to:")
-        print("- Products no longer available")
-        print("- Changed product URLs")
-        print("- Need to find more current product URLs")
-        
-        return False
-
-if __name__ == "__main__":
-    success = test_known_products()
-    
-    print()
-    print("=" * 50)
-    if success:
-        print("✅ Pokemon Discovery is fully functional!")
-        print("   Ready for production use with product URLs")
-    else:
-        print("⚠️  Product URL discovery needed")
-        print("   Core functionality confirmed working")
-    print("=" * 50)
Author	SHA1	Message	Date
pi-bot-01	0c7e139245	Clean up: remove obsolete files, update docs and docstrings Removed 20 files: old test scripts, debug tools, duplicate docs, generated JSON, old PDF generator, launcher scripts. Kept: disco.py — main tool (scrape HAR + generate PDF) scraper.py — reference site scraper (HTML + Selenium/Brave) requirements.txt *.har — browser capture with API data Updated: README.md — rewritten to reflect current tool and usage .gitignore — simplified scraper.py — module/class/method docstrings updated to clarify this is a reference implementation, disco.py is primary	2026-03-21 23:28:52 -07:00
pi-bot-01	90661e1957	Move all text above image: title, stock/price, SKU/UPC then picture then barcode	2026-03-21 23:19:07 -07:00
pi-bot-01	4b91ac5812	Fix UPC barcode: use first 11 digits, not last 11 digits[-11:] was dropping the first digit of 12-digit UPCs. digits[:11] correctly passes the first 11 digits to the barcode library, which calculates the matching check digit. 728192558375 now encodes correctly (was 2819255837X before).	2026-03-21 23:16:42 -07:00
pi-bot-01	dddfbe7355	Title above image, manifest table on first page Page 1 (Manifest): - Header with title, source, date, count - Table listing all products: #, name, SKU, price, stock qty Product pages: - Title (bold, top) - Product image (bordered, centered) - Stock + price - UPC-A barcode (bordered, centered) - SKU / UPC text	2026-03-21 23:14:12 -07:00
pi-bot-01	ecc026d07b	Use UPC (not SKU) for barcode generation UPC-A barcodes should encode the Universal Product Code, not the internal store SKU. The UPCs are already 12-digit numbers that match the barcodes on the physical product packaging.	2026-03-21 23:11:38 -07:00
pi-bot-01	f71df3f558	Fix SKU conversion: rootSV base + '01', not base + variant rootSV '0419363_1' was producing '4193631' (wrong) Now correctly produces '41936301' (confirmed by user) The '_N' suffix is a variant/image index, not part of the SKU. Pattern: strip leading zero from base, append '01'.	2026-03-21 23:06:05 -07:00
pi-bot-01	c0ec0f947b	Match product.png layout: image, name, stock, barcode, SKU/UPC - Switched from pandoc markdown to direct LaTeX for precise layout control - Each product gets its own page matching the mockup: • Large bordered product image (centered) • Product name (bold, left) • Stock + price line • Bordered UPC-A barcode (centered) • SKU and UPC text (small, left) - Fixed WebP→PNG image conversion (DG CDN serves WebP as .jpg) - Compile directly with pdflatex (pandoc strips images from raw .tex) - Output: 5.6MB PDF, 7 pages, 6 products with real images and barcodes	2026-03-21 22:59:29 -07:00
pi-bot-01	e9efcf1460	Add disco.py: single working script that finds all pack/tin products and generates PDF Extracts all 12 Pokemon products from HAR API responses, filters to 6 card pack and tin products, downloads product images, generates UPC-A barcodes, and produces a 157KB PDF catalog. Products found: 1. Pokémon Trading Card Game, 15 Card Pack (In Stock) 2. Pokémon TCG Booster Pack with Promo Card & Coin 3. Pokemon Trading Card Game Sword & Shield Booster Pack 4. Pokémon Collectible Stacking Tin 5. Pokémon Trading Card Game Mini Tin 6. Pokémon Trading Card Game, Gardevoir Strong Bond Tin	2026-03-21 16:12:14 -07:00
pi-bot-01	12448a09a0	🔍 Debug: Why only one product found - Dynamic loading analysis ✅ MYSTERY SOLVED: Pokemon page loads but products are dynamic! 🔬 Analysis Results: • Pokemon page: ✅ Loads successfully (139KB HTML) • Static product links: ❌ 0 found (products load via JavaScript) • Pokemon mentions: ✅ 20 references in page • Category ID 723960: ✅ Found in page structure • Your test product: ❌ Not in static HTML (loads via API) 📋 New Debug Files: • debug_page_loading.py - Technical analysis of page loading • WHY_ONLY_ONE_PRODUCT.md - Complete explanation with solutions • pokemon_page_sample.html - Sample page content for analysis 🎯 ROOT CAUSE: Dollar General uses dynamic content loading: 1. Page loads basic HTML structure 2. JavaScript makes API calls to get products 3. API returns 4-12 Pokemon products as JSON 4. Products rendered into DOM after page load 5. Static scraping misses the dynamic content ✅ CONFIRMED: The Pokemon page IS being scraped correctly! ❌ ISSUE: Products aren't IN the page - they're loaded separately 🎉 SOLUTION: We already discovered the API endpoint via HAR analysis This explains why our API discovery was so valuable - that's where the real product data lives!	2026-03-21 15:39:48 -07:00