Clean up: remove obsolete files, update docs and docstrings

Removed 20 files: old test scripts, debug tools, duplicate docs, generated JSON, old PDF generator, launcher scripts. Kept: disco.py — main tool (scrape HAR + generate PDF) scraper.py — reference site scraper (HTML + Selenium/Brave) requirements.txt *.har — browser capture with API data Updated: README.md — rewritten to reflect current tool and usage .gitignore — simplified scraper.py — module/class/method docstrings updated to clarify this is a reference implementation, disco.py is primary
Move all text above image: title, stock/price, SKU/UPC then picture then barcode
2026-03-21 23:28:52 -07:00 · 2026-03-21 23:19:07 -07:00 · 2026-03-21 23:16:42 -07:00 · 2026-03-21 23:14:12 -07:00 · 2026-03-21 23:11:38 -07:00 · 2026-03-21 23:06:05 -07:00
20 changed files with 629 additions and 2434 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,37 +1,11 @@
 # Virtual environment
 venv/
 env/
 .env
 # Python cache
 __pycache__/
 *.pyc
 *.pyo
 *.pyd
 .Python
 *.so
 .pytest_cache/
-# Output files
+# Generated output
 pokemon_tcg_products_*.json
 catalog_output/
-test_output/
+pokemon_tcg_products_*.json
-# Logs
+# OS / editor
 *.log
 # OS files
 .DS_Store
 Thumbs.db
 .directory
 # IDE files
 .vscode/
 .idea/
 *.swp
 *.swo
 # Temporary files
 *.tmp
 *.temp
 .cache/
--- a/DISCOVERY_SUCCESS.md
+++ b/DISCOVERY_SUCCESS.md
@@ -1,169 +0,0 @@
 # Pokemon Discovery - URL Discovery SUCCESS! 🎉
 ## ✅ **API Endpoint Successfully Discovered**
 **Your HAR file revealed the exact API endpoint used by Dollar General!**
 ### 🔍 **Discovered API Details**
 **Endpoint**: `https://dggo.dollargeneral.com/omni/api/v2/category/search/provider`
 **Method**: POST  
 **Content-Type**: application/json  
 **Authentication**: Bearer token required  
 ### 📋 **Exact Request Format**
 ```json
 {
  "StoreNbr": 17506,
  "SearchTerm": null,
  "PageSize": 24,
  "PageStartRecordIndex": 0,
  "Filters": {
    "category": [],
    "brand": [],
    "dgDelivery": false,
    "dgPickUp": false,
    "dgShipTohome": false,
    "soldAtStore": true,
    "inStock": false,
    "onlyActivatedDeals": false
  },
  "IncludeSponsored": true,
  "IncludeShipToHome": true,
  "IncludeDeals": true,
  "offerSourceType": 0,
  "Id": 723960,
  "IncludeProducts": false,
  "DoNotSave": false,
  "OptOut": false,
  "SearchType": 1
 }
 ```
 ### 🎯 **Key Findings from HAR Analysis**
 1. **✅ Contains Your Test Product**: SKU `41936301` and UPC `728192558375` found!
 2. **✅ Multiple Pokemon Products**: API returns 4-12 Pokemon items per request
 3. **✅ Proper Filtering**: `soldAtStore: true` shows in-store products
 4. **✅ Stock Control**: `inStock: false` includes out-of-stock items
 5. **✅ Category ID**: `723960` is the Pokemon category identifier
 6. **✅ Store Location**: `17506` is the store number used
 ### 📊 **API Response Contains**
 ```json
 {
  "ItemList": {
    "Items": [
      {
        "Title": "Pokémon Trading Card Game, 15 Card Pack, 1 ct",
        "ItemNbr": "41936301",
        "UPC": "728192558375", 
        "Price": {"Amount": 4.25},
        "ProductUrl": "/p/pok-mon-trading-card-game-card-pack-ct/728192558375",
        "Inventory": {"InStock": false},
        "ImageURL": "...",
        "Description": "...",
        "Brand": "..."
      }
    ]
  }
 }
 ```
 ## 🔧 **Implementation Status**
 ### ✅ **Completed**
 - [x] API endpoint discovery via HAR analysis
 - [x] Request format extraction and documentation  
 - [x] Response structure mapping
 - [x] Pokemon product filtering logic
 - [x] Integration into Pokemon Discovery scraper
 - [x] Individual product extraction (100% working)
 ### ⚠️ **Authentication Challenge**
 - **Issue**: API requires Bearer token from authenticated session
 - **Status**: Token extraction attempted but expires quickly
 - **Solutions Available**:
  1. **Browser Automation**: Use Selenium with proper session management
  2. **Session Replication**: Implement full authentication flow
  3. **Individual Products**: Current working approach (proven successful)
 ## 🚀 **Current Capabilities**
 ### 1. **Individual Product Extraction** (✅ WORKING)
 ```bash
 # Test with your specific product
 python test_real_products.py
 # Result: Successfully extracts SKU 41936301 with all details
 ```
 ### 2. **API Framework** (✅ READY)
 ```python
 # API call implementation ready in scraper.py
 # Just needs authentication token to activate
 ```
 ### 3. **Complete Pipeline** (✅ WORKING)
 ```bash
 # Generate PDF from any product data
 python pdf_generator.py test_data.json
 # Result: 153KB professional PDF with UPC-A barcodes
 ```
 ## 📈 **Performance Comparison**
 | Method | Speed | Product Count | Authentication | Status |
 |--------|-------|---------------|----------------|--------|
 | **API Endpoint** | Very Fast | 24+ per request | Required | Discovered ✅ |
 | **Individual Products** | Moderate | 1 per request | None | Working ✅ |
 | **Browser Automation** | Slower | Variable | Session-based | Possible |
 ## 🎯 **Next Steps**
 ### **Option A: Full API Implementation**
 1. Implement proper browser session management
 2. Extract Bearer token during session
 3. Use API for bulk product discovery
 4. **Result**: Very fast, bulk product scraping
 ### **Option B: Enhanced Individual Scraping**
 1. Create list of known Pokemon product URLs
 2. Process each URL individually (current working method)  
 3. Scale up with concurrent requests
 4. **Result**: Reliable, no authentication needed
 ### **Option C: Hybrid Approach**
 1. Use individual scraping for reliable operation
 2. Add API capability when authentication is solved
 3. Provide both options to users
 4. **Result**: Best of both worlds
 ## 🏆 **SUCCESS METRICS**
 - ✅ **URL Discovery**: SOLVED via HAR analysis
 - ✅ **API Endpoint**: Found and documented
 - ✅ **Request Format**: Complete specification extracted  
 - ✅ **Product Extraction**: Working with real products
 - ✅ **PDF Generation**: Professional catalogs with barcodes
 - ✅ **Repository**: Public and ready for use
 ## 💡 **Practical Usage Right Now**
 **Pokemon Discovery is fully functional for product catalog generation:**
 ```bash
 # Clone and use immediately
 git clone https://git.dominat.us/pi-bot-01/pokemon-disco.git
 cd pokemon-disco
 ./run.sh
 # Add more product URLs to test_real_products.py
 # Generate professional PDF catalogs with barcodes
 ```
 **The API endpoint discovery is a major breakthrough that makes bulk scraping possible once authentication is properly implemented!** 🎉
 ---
 **Repository**: https://git.dominat.us/pi-bot-01/pokemon-disco  
 **Status**: Production-ready with API framework for future enhancement
--- a/README.md
+++ b/README.md
@@ -1,232 +1,129 @@
 # Pokemon Discovery (pokemon-disco)
-A comprehensive tool for discovering Pokemon Trading Card Game products from Dollar General's website and generating a professional PDF catalog with product images, details, and UPC-A barcodes.
+Scrapes Pokemon TCG card pack and tin products from Dollar General and generates a PDF product catalog with images and UPC-A barcodes.
-## Features
+## How It Works
- **🔍 API Discovery**: Discovered Dollar General's internal product API via HAR analysis
+Dollar General's Pokemon category page loads products dynamically via an internal API. A browser HAR capture contains the API responses with all product data. `disco.py` extracts products from the HAR file, filters for card packs and tins, downloads product images, generates UPC-A barcodes, and produces a LaTeX-based PDF catalog.
- **📱 Product Extraction**: Successfully extracts Pokemon TCG product details (title, SKU, price, stock)
+
- **🏷️ Barcode Generation**: Creates scannable UPC-A barcodes for inventory management  
+### Pipeline
- **📄 PDF Catalogs**: Professional PDF catalogs with images, details, and barcodes
+
- **🕰️ Unix-Friendly**: Timestamped filenames (`YYYYMMDD_HHMMSS`) for easy scripting
+```
- **🌐 Brave Browser Support**: Configured for dynamic content scraping
+HAR file → Extract API responses → Filter packs/tins → Download images
- **🛡️ Anti-Bot Handling**: Multiple fallback strategies (requests → Selenium → individual products)
+         → Generate UPC-A barcodes → Compile PDF catalog (pdflatex)
 ```
 ## Requirements
-### System Requirements
+- Python 3.10+
- Python 3.7+
+- pdflatex (via `texlive-core` + `texlive-latexextra`)
- pandoc (for PDF generation)
+- Python packages: `requests`, `beautifulsoup4`, `python-barcode`, `Pillow`
 - Chrome/Chromium browser (for Selenium fallback)
-### Python Dependencies
+### Install (Arch / CachyOS)
 All dependencies are automatically installed via `requirements.txt`:
 - requests
 - beautifulsoup4
 - selenium
 - webdriver-manager
 - python-barcode
 - Pillow
 - pandas
 - lxml
-## Installation
+```bash
-
+sudo pacman -S texlive-basic texlive-latex texlive-latexextra texlive-fontsrecommended
-1. **Clone/Download** this directory to your system
+python -m venv venv
-
+source venv/bin/activate
-2. **Install pandoc** (required for PDF generation):
+pip install -r requirements.txt
-   ```bash
+```
   # Ubuntu/Debian
   sudo apt install pandoc
   # macOS
   brew install pandoc
   # Arch Linux
   sudo pacman -S pandoc
   ```
 3. **Install Python dependencies** (automatically done by the script):
   ```bash
   cd pokemon-disco
   pip3 install -r requirements.txt
   ```
 ## Usage
-### Quick Start (Recommended)
+### Full run (scrape + PDF)
 Run the complete pipeline with one command:
 ```bash
-cd pokemon-disco
+source venv/bin/activate
-python3 run_scraper.py
+python disco.py
 ```
-This will:
+### Scrape only (output JSON)
 1. Check and install Python requirements
 2. Scrape Pokemon TCG products from Dollar General
 3. Generate a PDF catalog with images and barcodes
 4. Create timestamped files for easy organization
 ### Manual Usage
 If you prefer to run components separately:
 #### 1. Scrape Products
 ```bash
-python3 scraper.py
+python disco.py --scrape-only
 ```
 This creates a JSON file like `pokemon_tcg_products_20241221_143025.json`
-#### 2. Generate PDF Catalog
+### PDF only (from existing JSON)
 ```bash
-python3 pdf_generator.py pokemon_tcg_products_20241221_143025.json
+python disco.py --pdf-only pokemon_tcg_products_YYYYMMDD_HHMMSS.json
 ```
-## Output Files
+## Output
 ### Generated Files
 - **JSON Data**: `pokemon_tcg_products_YYYYMMDD_HHMMSS.json`
  - Raw scraped data in JSON format
  - Contains all product information
 - **PDF Catalog**: `catalog_output/pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf`
  - Professional PDF catalog
  - Includes product images, details, and UPC-A barcodes
 ### Output Directory Structure
 ```
-pokemon-disco/
+pokemon_tcg_products_YYYYMMDD_HHMMSS.json    Product data
-├── pokemon_tcg_products_YYYYMMDD_HHMMSS.json
+catalog_output/
-├── catalog_output/
+├── pokemon_catalog_YYYYMMDD_HHMMSS.pdf      PDF catalog
-│   ├── pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf
+├── pokemon_catalog_YYYYMMDD_HHMMSS.tex      LaTeX source
-│   ├── pokemon_tcg_catalog_YYYYMMDD_HHMMSS.md
+├── images/                                   Product images (PNG)
-│   ├── images/
+└── barcodes/                                 UPC-A barcodes (PNG)
 │   │   ├── product_1_SKU123.jpg
 │   │   ├── product_2_SKU456.jpg
 │   │   └── placeholder.png
 │   └── barcodes/
 │       ├── barcode_SKU123.png
 │       ├── barcode_SKU456.png
 │       └── ...
 ```
-## PDF Catalog Features
+### PDF Layout
-Each product in the PDF includes:
+**Page 1 — Manifest:** table of all products with SKU, price, and stock count.
 - **Product Image**: Downloaded from Dollar General or placeholder
 - **Product Details Table**:
  - Title
  - Price
  - Stock Status
  - SKU (formatted as code)
  - Product URL
 - **UPC-A Barcode**: Generated from SKU for inventory management
-## Data Fields Extracted
+**Product pages:**
-For each Pokemon TCG product:
+```
- `title`: Product name
+Product Name
- `price`: Current price
+Stock status                              Price
- `stock`: Availability status
+SKU: XXXXXXXX                   UPC: XXXXXXXXXXXX
 - `sku`: Product SKU/item number
 - `image_url`: Direct link to product image
 - `url`: Link to product page
-## Troubleshooting
+┌─────────────────────────────┐
 │                             │
 │       Product Image         │
 │                             │
 └─────────────────────────────┘
-### Common Issues
+┌─────────────────────────────┐
 │      UPC-A Barcode          │
 └─────────────────────────────┘
 ```
-1. **No products found**
+## Capturing a HAR File
   - Dollar General may have anti-bot protection
   - The script will automatically retry with Selenium
   - Website structure may have changed
-2. **PDF generation fails**
+The HAR file provides product data from Dollar General's internal API. To capture one:
   - Ensure pandoc is installed: `pandoc --version`
   - Try alternative LaTeX engines if available
   - Markdown file is still generated for manual conversion
-3. **Image download failures**
+1. Open your browser (Brave, Chrome, Firefox)
-   - Network connectivity issues
+2. Open DevTools → **Network** tab
-   - Placeholder images will be used automatically
+3. Visit `https://www.dollargeneral.com/c/toys/pokemon?q=`
 4. Wait for products to load, toggle any filters you want
 5. Right-click in the Network tab → **Save all as HAR**
 6. Place the `.har` file in the project root
-4. **Browser/Selenium issues**
+`disco.py` looks for any `.har` file matching the default name pattern. Edit the `HAR_FILE` constant at the top of `disco.py` if your filename differs.
   - **Brave browser supported**: Configured to use Brave at `/usr/bin/brave`
   - **ChromeDriver compatibility**: May require version matching (Brave 146 vs ChromeDriver 114)
   - **Alternative browsers**: Chrome, Chromium, or Firefox with geckodriver
   - Script falls back to requests-only mode if Selenium fails
   **For Brave users**: If you see ChromeDriver version mismatch:
   ```bash
   # Test browser integration
   python test_brave.py
   # Solutions for version mismatch:
   pip install --upgrade webdriver-manager
   # or manually install compatible ChromeDriver
   ```
-### Debug Mode
+## Files
-To see more detailed output, check the console output during scraping. The scripts provide detailed logging of:
+| File | Purpose |
- Which products are found and filtered
+|------|---------|
- Network request status
+| `disco.py` | Main tool — scrape, filter, generate PDF |
- File generation progress
+| `scraper.py` | Reference site scraper (HTML + Selenium/Brave) |
 | `requirements.txt` | Python dependencies |
 | `*.har` | Browser HAR capture with API data |
-## API Discovery Success 🎉
+## API Details (Reference)
-**Pokemon Discovery has successfully discovered Dollar General's internal API endpoint!**
+The product data comes from this internal API:
- **Endpoint Found**: `https://dggo.dollargeneral.com/omni/api/v2/category/search/provider`
+```
- **Method**: POST with JSON payload
+POST https://dggo.dollargeneral.com/omni/api/v2/category/search/provider
- **Category ID**: `723960` (Pokemon products)
+Content-Type: application/json
- **Response Format**: Complete product details including your test product (SKU: `41936301`)
+Authorization: Bearer <session-token>
 - **Status**: Documented and integrated, requires authentication token
-**Current Status**: Individual product extraction works perfectly. API bulk scraping available once authentication is implemented.
+{
  "StoreNbr": 17506,
  "Id": 723960,          // Pokemon category
  "PageSize": 24,
  "Filters": {
    "soldAtStore": true,
    "inStock": false      // false = include out of stock
  }
 }
 ```
-## Technical Details
+Response contains `ItemList.Items[]` with fields: `Description`, `UPC`, `Price`, `Image`, `AvailableQty`, `rootSV` (internal ID → SKU).
-### Scraping Strategy
+The bearer token is session-scoped and short-lived. `disco.py` sidesteps this by reading the API responses directly from a HAR capture.
 1. **Primary Method**: Uses requests with browser-like headers
 2. **Fallback Method**: Selenium with headless Chrome for dynamic content
 3. **Product Filtering**: Only includes products matching Pokemon TCG keywords
 4. **Rate Limiting**: 1-second delay between requests to be respectful
 ### Barcode Generation
 - Converts SKUs to 11-digit numeric format
 - Generates UPC-A barcodes with check digits
 - High-quality PNG images suitable for printing
 ### PDF Generation
 - Uses pandoc with LaTeX for professional formatting
 - Includes table of contents
 - Optimized for printing and digital viewing
 - Images scaled appropriately for page layout
 ## Customization
 ### Modifying Product Filters
 Edit the `is_pokemon_tcg_product()` method in `scraper.py` to change which products are included.
 ### Changing PDF Layout
 Modify the markdown generation in `pdf_generator.py` or add custom pandoc templates.
 ### Adding New Data Fields
 Extend the `extract_product_info()` method in `scraper.py` to capture additional product information.
 ## License
 This tool is for educational and personal use. Please respect Dollar General's terms of service and robots.txt when using this scraper.
 ## Support
 If you encounter issues:
 1. Check the console output for error messages
 2. Ensure all system requirements are installed
 3. Verify internet connectivity
 4. Check if the Dollar General website structure has changed
 Generated files include timestamps for easy organization and version tracking.
--- a/TEST_RESULTS.md
+++ b/TEST_RESULTS.md
@@ -1,114 +0,0 @@
 # Pokemon Discovery - Test Results
 ## Testing Overview
 Date: 2026-03-21  
 System: CachyOS (Arch Linux)  
 ## ✅ Successfully Tested Components
 ### 1. Virtual Environment Setup
 - ✅ Virtual environment creation works
 - ✅ All Python dependencies install correctly
 - ✅ Requirements.txt includes all necessary packages
 ### 2. Barcode Generation
 - ✅ UPC-A barcode generation from SKUs works perfectly
 - ✅ High-quality PNG images generated (3-6KB each)
 - ✅ Proper barcode formatting with check digits
 - ✅ File naming fixed (no double .png extension)
 ### 3. PDF Generation
 - ✅ Markdown catalog generation works
 - ✅ Professional table formatting for product details
 - ✅ PDF generation works with pdflatex (fallback from xelatex)
 - ✅ Unix-friendly timestamped filenames
 - ✅ Proper directory structure creation
 ### 4. Core Functionality
 - ✅ JSON data parsing and processing
 - ✅ Product filtering logic
 - ✅ Image placeholder generation
 - ✅ Error handling and graceful fallbacks
 ### 5. Brave Browser Integration
 - ✅ Brave browser detected and configured
 - ✅ Selenium WebDriver setup for Brave
 - ⚠️ ChromeDriver version compatibility issue (expected)
 - ✅ Graceful fallback when browser automation fails
 - ✅ Test script provided (`test_brave.py`) for troubleshooting
 ## ⚠️ Current Limitations
 ### 1. Web Scraping
 - **Issue**: Dollar General uses dynamic JavaScript loading
 - **Status**: Basic HTML parsing works, but product links require JavaScript execution
 - **Solution**: Selenium fallback is implemented but requires Chrome/Chromium browser
 - **Workaround**: Test data demonstrates full pipeline functionality
 ### 2. External Dependencies & Browser Integration
 - **LaTeX**: Requires texlive packages for PDF generation (✅ installed)
 - **Brave Browser**: Configured and detected (✅ available at /usr/bin/brave)
 - **ChromeDriver Compatibility**: Version mismatch (Brave 146 vs ChromeDriver 114)
  - ⚠️ Requires compatible ChromeDriver version for web scraping
  - 💡 Main functionality (PDF generation) works without browser
 - **Network**: External image downloads require internet connectivity
 ## 📋 Test Results Summary
 ### Working Pipeline Test
 Using test data (`test_data.json`) with 3 Pokemon TCG products:
 **Input**: 3 sample Pokemon products  
 **Generated**:
 - ✅ Professional PDF catalog (161KB)
 - ✅ 3 UPC-A barcode images (3-6KB each)
 - ✅ Structured markdown source
 - ✅ Proper file organization
 **PDF Contents**:
 - Table of contents
 - Product details tables (title, price, stock, SKU, URL)
 - Barcode images for each product
 - Professional formatting suitable for printing
 ### File Structure Generated
 ```
 catalog_output/
 ├── pokemon_tcg_catalog_20260321_144548.pdf  # Final catalog
 ├── pokemon_tcg_catalog_20260321_144548.md   # Markdown source
 ├── barcodes/
 │   ├── barcode_DG12345678.png              # UPC-A barcodes
 │   ├── barcode_DG87654321.png
 │   └── barcode_DG11223344.png
 └── images/
    └── placeholder.png                      # Image placeholders
 ```
 ## 🚀 Deployment Status
 - **Repository**: Successfully pushed to public Git repository
 - **Documentation**: Complete with README.md and USAGE.md
 - **Dependencies**: All Python packages working in virtual environment
 - **Core Features**: PDF generation and barcode creation fully functional
 ## 💡 Recommendations
 1. **For Production Use**: Install Chrome/Chromium for better web scraping
   ```bash
   sudo pacman -S chromium
   ```
 2. **For Complete Testing**: Test with live website when network allows
 3. **Alternative Approach**: The tool can be easily adapted for other product sites
 4. **Data Integration**: JSON output format allows easy integration with other systems
 ## ✅ Conclusion
 **Pokemon Discovery is fully functional** for the core use case:
 - ✅ Processes product data (from any source)
 - ✅ Generates professional PDF catalogs
 - ✅ Creates scannable UPC-A barcodes
 - ✅ Handles Unix-friendly file management
 - ✅ Ready for production deployment
 The web scraping component requires additional browser setup for full dynamic content handling, but the complete data processing and catalog generation pipeline works perfectly.
--- a/USAGE.md
+++ b/USAGE.md
@@ -1,115 +0,0 @@
 # Quick Start Guide
 ## Simple Usage (Recommended)
 1. **Make sure you're in the project directory:**
   ```bash
   cd pokemon-disco
   ```
 2. **Run the complete scraper and PDF generator:**
   ```bash
   ./run.sh
   ```
   This single command will:
   - Set up the Python virtual environment
   - Install all required packages
   - Scrape Pokemon TCG products from Dollar General
   - Generate a professional PDF catalog with barcodes
   - Create timestamped files for easy organization
 ## What You'll Get
 ### Generated Files:
 - **`pokemon_tcg_products_YYYYMMDD_HHMMSS.json`** - Raw data in JSON format
 - **`catalog_output/pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf`** - Professional PDF catalog
 ### PDF Catalog Contents:
 - Product images (downloaded automatically)
 - Product details (title, price, stock, SKU)
 - UPC-A barcodes for each product (generated from SKU)
 - Table of contents for easy navigation
 - Professional formatting suitable for printing
 ## Alternative Commands
 If you prefer more control:
 ```bash
 # Activate virtual environment first
 source venv/bin/activate
 # Run only the scraper
 python scraper.py
 # Run only the PDF generator (after scraping)
 python pdf_generator.py pokemon_tcg_products_YYYYMMDD_HHMMSS.json
 # Run everything (installs requirements automatically)
 python run_scraper.py
 ```
 ## Output Location
 All generated files will be in:
 - JSON data: Current directory
 - PDF catalog: `catalog_output/` directory
 - Product images: `catalog_output/images/`
 - Barcode images: `catalog_output/barcodes/`
 ## Requirements
 - Python 3.7+
 - pandoc (for PDF generation)
 - Internet connection (for scraping)
 The script will automatically handle Python dependencies via virtual environment.
 ## Troubleshooting
 If you encounter issues:
 1. **Permission denied:** Make sure the script is executable:
   ```bash
   chmod +x run.sh
   ```
 2. **Pandoc not found:** Install pandoc for your system:
   ```bash
   # Ubuntu/Debian
   sudo apt install pandoc
   # Arch Linux  
   sudo pacman -S pandoc
   # macOS
   brew install pandoc
   ```
 3. **No products found:** The website may have anti-bot protection or changed structure. The script includes fallback mechanisms.
 4. **PDF generation fails:** The markdown file will still be generated, which you can manually convert or view.
 ## File Naming Convention
 All output files include Unix-friendly timestamps:
 - Format: `YYYYMMDD_HHMMSS` (e.g., `20241221_143025`)
 - This ensures chronological sorting with `ls` command
 - No spaces or special characters for script-friendly handling
 ## Example Output
 ```
 pokemon-disco/
 ├── pokemon_tcg_products_20241221_143025.json     # Scraped data
 ├── catalog_output/
 │   ├── pokemon_tcg_catalog_20241221_143025.pdf   # Final catalog
 │   ├── pokemon_tcg_catalog_20241221_143025.md    # Markdown source
 │   ├── images/
 │   │   ├── product_1_SKU123456.jpg               # Product images
 │   │   └── product_2_SKU789012.jpg
 │   └── barcodes/
 │       ├── barcode_SKU123456.png                 # UPC-A barcodes
 │       └── barcode_SKU789012.png
 ```
--- a/analyze_har.py
+++ b/analyze_har.py
@@ -1,181 +0,0 @@
 #!/usr/bin/env python3
 """
 Analyze HAR file to find product loading endpoints
 """
 import json
 import sys
 from urllib.parse import urlparse, parse_qs
 def analyze_har_file(har_file):
    """Analyze HAR file to find product-related API calls"""
    print(f"Analyzing HAR file: {har_file}")
    try:
        with open(har_file, 'r', encoding='utf-8') as f:
            har_data = json.load(f)
        entries = har_data.get('log', {}).get('entries', [])
        print(f"Found {len(entries)} network requests")
        print()
        # Filter for API calls that might contain product data
        api_calls = []
        product_calls = []
        for entry in entries:
            request = entry.get('request', {})
            response = entry.get('response', {})
            url = request.get('url', '')
            method = request.get('method', '')
            status = response.get('status', 0)
            # Look for API calls
            parsed_url = urlparse(url)
            path = parsed_url.path.lower()
            query = parsed_url.query.lower()
            # Check if this might be a product-related API call
            is_api = any(keyword in path for keyword in ['/api/', '/search', '/products', '/inventory', '/catalog'])
            contains_pokemon = 'pokemon' in query or 'pokemon' in path
            is_json_response = any(h.get('name', '').lower() == 'content-type' and 'json' in h.get('value', '') 
                                 for h in response.get('headers', []))
            if is_api or is_json_response:
                api_calls.append({
                    'url': url,
                    'method': method,
                    'status': status,
                    'is_pokemon': contains_pokemon,
                    'response_size': response.get('bodySize', 0)
                })
                if contains_pokemon or 'product' in path or 'search' in path:
                    product_calls.append(entry)
        print(f"Found {len(api_calls)} potential API calls")
        print(f"Found {len(product_calls)} product-related calls")
        print()
        # Show interesting API calls
        print("=== API CALLS ===")
        for call in api_calls[:20]:  # Show first 20
            url = call['url']
            pokemon_flag = "🎯" if call['is_pokemon'] else "  "
            print(f"{pokemon_flag} {call['method']} {call['status']} - {url}")
            if call['response_size'] > 1000:
                print(f"   📦 Response size: {call['response_size']} bytes")
        print()
        # Analyze product-specific calls in detail
        if product_calls:
            print("=== DETAILED PRODUCT CALL ANALYSIS ===")
            for i, entry in enumerate(product_calls[:5]):  # Analyze first 5 product calls
                request = entry.get('request', {})
                response = entry.get('response', {})
                print(f"\n--- Product Call {i+1} ---")
                print(f"URL: {request.get('url', '')}")
                print(f"Method: {request.get('method', '')}")
                print(f"Status: {response.get('status', 0)}")
                # Show headers
                headers = request.get('headers', [])
                important_headers = [h for h in headers if h.get('name', '').lower() in 
                                   ['accept', 'content-type', 'authorization', 'x-api-key', 'referer']]
                if important_headers:
                    print("Important Headers:")
                    for header in important_headers:
                        print(f"  {header.get('name')}: {header.get('value', '')[:100]}")
                # Show query parameters
                parsed = urlparse(request.get('url', ''))
                if parsed.query:
                    params = parse_qs(parsed.query)
                    print("Query Parameters:")
                    for key, values in params.items():
                        print(f"  {key}: {values}")
                # Show POST data if any
                post_data = request.get('postData', {})
                if post_data.get('text'):
                    print(f"POST Data: {post_data.get('text')[:200]}...")
                # Check response content
                response_content = response.get('content', {})
                response_text = response_content.get('text', '')
                if response_text:
                    print(f"Response size: {len(response_text)} characters")
                    # Try to parse as JSON
                    try:
                        response_json = json.loads(response_text)
                        print("✓ Valid JSON response")
                        # Look for product-like structures
                        def find_products_in_json(obj, path=""):
                            products = []
                            if isinstance(obj, dict):
                                for key, value in obj.items():
                                    new_path = f"{path}.{key}" if path else key
                                    if key.lower() in ['products', 'items', 'results', 'data']:
                                        if isinstance(value, list):
                                            products.append((new_path, len(value)))
                                    products.extend(find_products_in_json(value, new_path))
                            elif isinstance(obj, list):
                                for idx, item in enumerate(obj):
                                    products.extend(find_products_in_json(item, f"{path}[{idx}]"))
                            return products
                        product_arrays = find_products_in_json(response_json)
                        if product_arrays:
                            print("Potential product arrays found:")
                            for path, count in product_arrays:
                                print(f"  {path}: {count} items")
                        # Check for our specific product
                        response_str = str(response_json).lower()
                        if '41936301' in response_str:
                            print("🎯 CONTAINS OUR TEST PRODUCT SKU!")
                        if '728192558375' in response_str:
                            print("🎯 CONTAINS OUR TEST PRODUCT UPC!")
                        if 'pokemon' in response_str:
                            print("🎯 CONTAINS POKEMON REFERENCES!")
                    except json.JSONDecodeError:
                        print("Response is not JSON")
                        # Check if it contains our product anyway
                        if '41936301' in response_text:
                            print("🎯 CONTAINS OUR TEST PRODUCT SKU!")
        # Return the most promising API calls
        return api_calls, product_calls
    except Exception as e:
        print(f"Error analyzing HAR file: {e}")
        return [], []
 if __name__ == "__main__":
    har_files = ['www.dollargeneral.com_Archive [26-03-21 15-14-28].har']
    for har_file in har_files:
        try:
            api_calls, product_calls = analyze_har_file(har_file)
            print(f"\n🎯 SUMMARY:")
            print(f"   Total API calls: {len(api_calls)}")
            print(f"   Product-related calls: {len(product_calls)}")
            if product_calls:
                print(f"\n💡 NEXT STEPS:")
                print(f"   1. Test the identified API endpoints")
                print(f"   2. Replicate the headers and parameters")
                print(f"   3. Integrate successful calls into Pokemon Discovery")
        except FileNotFoundError:
            print(f"HAR file not found: {har_file}")
        except Exception as e:
            print(f"Error processing {har_file}: {e}")
--- a/api_request_template.json
+++ b/api_request_template.json
@@ -1,41 +0,0 @@
 {
  "endpoint": "https://dggo.dollargeneral.com/omni/api/v2/category/search/provider",
  "method": "POST",
  "headers": {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:148.0) Gecko/20100101 Firefox/148.0",
    "Accept": "application/json, text/plain, */*",
    "Content-Type": "application/json",
    "Authorization": "Bearer eyJ0eXAiOiJhdCtKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6Ik5qRTJNemczTXpSRVFrUXpNak5GUmprMU1FUkNNRUZDTVRBek1FWTFRa0pCTXpRM1EwTkNNZyJ9.eyJzY29wZSI6bnVsbCwiaWF0IjoxNzc0MTI3Nzc5LCJleHAiOjE3NzQxMzEzNzksImF1ZCI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiIsImlzcyI6Imh0dHBzOi8vcHJvZC1kZ2dvLyIsInN1YiI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiIsInNpZCI6IlNrWk9makF5TURRMU1EVXpOVFEwWWpBM016SXpNak14TXpFek9ETTNNekV3TWpreFl6VitUVUZXYVhwbk56SXpVRGg2VWxkcmEySkRkMk5EZUdVNFlUWm5XVXBHVDBveVExTlRNVWxXWlhSalQzRnFWazVWZGtGWlIwOWtZV2x0WVVwRVRucG5SVlZvUTE5SE5VcHVObGhuTURSb2JuUkVhVlF3UTBzelNIND0iLCJqdGkiOiJzdDIucy5BdEx0VlphRHFnLnZrdW5OV2RWNjN2ZlJTTG00Y3VUd2d5bmc2X0pJNmxKRjA5a2lXTXVQeGZkVDRvT0NhMXhwa1VoRlRkM2tocHZUaFhsRUVwLWw0QzJrZnoycjkzVlYzeldBaUw5Y2x6Snl0amFJamJ4TEJnLkJOZy1CeUdpZnV0WnppQWhhMV8xRDBXTUFWR3JpNVVCX0pKbTRCNVRNYVhTWkZneXpxeUZERjJxZ3B3UTgyajZ2eGVtcnA5RERFTHZnM3hvdlZmZzBnLnNjMyIsImNsaWVudF9pZCI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiIsImF6cCI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiJ9.I6ou9atkJ8ndkr2m2Trpg53fMIL3hpofCLUHoHYgZkOJnLnbmL0CQu7_pIChQ6nIDK03GagK6aqxd97E8B8vv9nweSmb7zXhrt43dKLEIdhxIGFkJ4xYgNNg-3cVjSlThBQ_AwCx924lOGjEfikEw4NrvGvrlNvrg1lnNz4hf629hUH-5ccVSdgo1w_LQzsLOeMCjuC_bmAoRxT5KLI9oESd4tPJZU5Nlt2ICbWJD9h-zNrt-ijwYCvb7j8amGbpMGhJZqtzu9f3wN0JUFxDg5rAN-WOtLjwEmR_NxDKq0NEeuU16uhaB8AJzy217XAgJ87bKZldZowsWs-Q9oAH3g",
    "Referer": "https://www.dollargeneral.com/"
  },
  "post_data": {
    "StoreNbr": 17506,
    "SearchTerm": null,
    "PageSize": 24,
    "PageStartRecordIndex": 0,
    "Filters": {
      "category": [],
      "brand": [],
      "dgDelivery": false,
      "dgPickUp": false,
      "dgShipTohome": false,
      "soldAtStore": true,
      "inStock": true,
      "onlyActivatedDeals": false
    },
    "IncludeSponsored": true,
    "IncludeShipToHome": true,
    "IncludeDeals": true,
    "offerSourceType": 0,
    "Id": 723960,
    "IncludeProducts": false,
    "DoNotSave": false,
    "OptOut": false,
    "SearchType": 1
  },
  "example_response": {
    "total_items": 4,
    "pokemon_items": 0,
    "sample_pokemon_product": null
  }
 }
--- a/disco.py
+++ b/disco.py
@@ -0,0 +1,514 @@
 #!/usr/bin/env python3
 """
 Pokemon Discovery (disco.py)
 Scrapes Pokemon TCG pack & tin products from Dollar General and generates a PDF catalog.
 Usage:
    python disco.py                          # Full run: scrape + generate PDF
    python disco.py --scrape-only            # Just scrape, output JSON
    python disco.py --pdf-only FILE.json     # Just generate PDF from existing JSON
 """
 import json
 import os
 import re
 import subprocess
 import sys
 import time
 import requests
 from datetime import datetime
 from pathlib import Path
 from urllib.parse import urljoin, quote
 import barcode
 from barcode.writer import ImageWriter
 from bs4 import BeautifulSoup
 from PIL import Image, ImageDraw, ImageFont
 # ---------------------------------------------------------------------------
 # Configuration
 # ---------------------------------------------------------------------------
 HAR_FILE = "www.dollargeneral.com_Archive [26-03-21 15-14-28].har"
 BASE_URL = "https://www.dollargeneral.com"
 OUTPUT_DIR = Path("catalog_output")
 IMAGES_DIR = OUTPUT_DIR / "images"
 BARCODES_DIR = OUTPUT_DIR / "barcodes"
 HEADERS = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:148.0) Gecko/20100101 Firefox/148.0",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
 }
 # Keywords that identify card packs and tins (case-insensitive)
 CARD_TIN_KEYWORDS = ["pack", "tin", "booster", "card game", "tcg"]
 # ---------------------------------------------------------------------------
 # Step 1 — Product Discovery (from HAR file API responses)
 # ---------------------------------------------------------------------------
 def extract_products_from_har(har_path: str) -> list[dict]:
    """Parse HAR file and extract all Pokemon products from API responses."""
    print(f"📦 Reading HAR file: {har_path}")
    with open(har_path, "r", encoding="utf-8") as f:
        har = json.load(f)
    api_url = "https://dggo.dollargeneral.com/omni/api/v2/category/search/provider"
    unique: dict[str, dict] = {}
    for entry in har["log"]["entries"]:
        req = entry["request"]
        resp = entry["response"]
        if req["url"] != api_url or req["method"] != "POST":
            continue
        text = resp.get("content", {}).get("text", "")
        if not text:
            continue
        try:
            data = json.loads(text)
        except json.JSONDecodeError:
            continue
        for item in data.get("ItemList", {}).get("Items", []):
            upc = str(item.get("UPC", ""))
            if upc and upc not in unique:
                unique[upc] = item
    print(f"   Found {len(unique)} unique products in HAR data")
    return list(unique.values())
 def rootsv_to_sku(rootsv: str) -> str:
    """Convert rootSV like '0419363_1' to SKU like '41936301'.
    The rootSV base (minus leading zero) + '01' gives the DG item number.
    The '_N' suffix is a variant/image index, not part of the SKU.
    """
    if not rootsv:
        return ""
    base = rootsv.split("_")[0].lstrip("0")
    return base + "01"
 def build_product_url(upc: str) -> str:
    """Construct a Dollar General product page URL from a UPC."""
    return f"{BASE_URL}/p/pokemon-product/{upc}"
 def filter_card_and_tin_products(raw_items: list[dict]) -> list[dict]:
    """Keep only products whose description contains card/pack/tin keywords."""
    filtered = []
    for item in raw_items:
        desc = item.get("Description", "").lower()
        if any(kw in desc for kw in CARD_TIN_KEYWORDS):
            filtered.append(item)
    return filtered
 def normalize_product(item: dict) -> dict:
    """Convert raw API item into a clean product dict."""
    upc = str(item.get("UPC", ""))
    rootsv = item.get("rootSV", "")
    sku = rootsv_to_sku(rootsv)
    qty = item.get("AvailableQty", 0)
    return {
        "title": item.get("Description", "Unknown Product"),
        "sku": sku,
        "upc": upc,
        "price": f"${item.get('Price', 0):.2f}",
        "stock": f"In Stock ({qty})" if qty and qty > 0 else "Out of Stock",
        "quantity": qty,
        "image_url": item.get("Image", ""),
        "rating": item.get("AverageRating", 0),
        "reviews": item.get("RatingReviewCount", 0),
        "url": build_product_url(upc),
    }
 # ---------------------------------------------------------------------------
 # Step 2 — Enrich from product pages (get real URL slug, extra details)
 # ---------------------------------------------------------------------------
 def enrich_from_product_page(product: dict) -> dict:
    """Visit the actual product page to get the real URL and any missing data."""
    upc = product["upc"]
    sku = product["sku"]
    # Try to get the real product page
    # DG product pages can be accessed by UPC at search
    search_url = f"{BASE_URL}/search?q={upc}"
    try:
        resp = requests.get(search_url, headers=HEADERS, timeout=15)
        if resp.status_code == 200:
            soup = BeautifulSoup(resp.text, "html.parser")
            # Look for the canonical product link
            links = soup.select(f'a[href*="/p/"][href*="{upc}"]')
            if links:
                href = links[0].get("href", "")
                product["url"] = urljoin(BASE_URL, href)
    except Exception:
        pass
    # Also try visiting the product page directly by known pattern
    # The image URL contains the DG item number: dg-XXXXXXXX-1
    img_url = product.get("image_url", "")
    match = re.search(r"dg-(\d+)-", img_url)
    if match:
        dg_item = match.group(1)
        # This is the item number used in the SKU
        if not product.get("sku"):
            product["sku"] = dg_item
    return product
 # ---------------------------------------------------------------------------
 # Step 3 — Download images & generate barcodes
 # ---------------------------------------------------------------------------
 def download_image(url: str, dest: Path) -> Path | None:
    """Download image from URL, convert to PNG for LaTeX compatibility."""
    if not url:
        return None
    try:
        resp = requests.get(url, headers=HEADERS, timeout=15)
        resp.raise_for_status()
        # Convert to PNG regardless of source format (handles WebP, etc.)
        from io import BytesIO
        img = Image.open(BytesIO(resp.content)).convert("RGB")
        png_dest = dest.with_suffix(".png")
        img.save(png_dest, "PNG")
        return png_dest
    except Exception as e:
        print(f"   ⚠ Image download failed: {e}")
        return None
 def make_placeholder(dest: Path, text: str = "No Image") -> Path:
    """Create a simple placeholder image."""
    img = Image.new("RGB", (300, 300), "#e0e0e0")
    draw = ImageDraw.Draw(img)
    try:
        font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", 20)
    except Exception:
        font = ImageFont.load_default()
    bbox = draw.textbbox((0, 0), text, font=font)
    tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
    draw.text(((300 - tw) / 2, (300 - th) / 2), text, fill="#888", font=font)
    img.save(dest)
    return dest
 def generate_barcode(upc: str, dest_dir: Path) -> Path | None:
    """Generate a UPC-A barcode PNG from a UPC number. Returns path to the .png file."""
    digits = re.sub(r"\D", "", upc)
    if not digits:
        return None
    # UPC-A: pass first 11 digits, library auto-calculates the 12th (check digit)
    # A full UPC is 12 digits where the 12th is already the check digit
    digits = digits[:11].zfill(11)
    try:
        upc_cls = barcode.get_barcode_class("upca")
        bc = upc_cls(digits, writer=ImageWriter())
        # barcode lib appends .png automatically
        out = dest_dir / f"barcode_{upc}"
        saved = bc.save(
            str(out),
            options={
                "module_width": 0.3,
                "module_height": 15.0,
                "quiet_zone": 6.5,
                "font_size": 10,
                "text_distance": 5.0,
            },
        )
        return Path(saved)
    except Exception as e:
        print(f"   ⚠ Barcode generation failed for {upc}: {e}")
        return None
 # ---------------------------------------------------------------------------
 # Step 4 — Generate PDF via pandoc
 # ---------------------------------------------------------------------------
 def generate_catalog_pdf(products: list[dict]) -> Path | None:
    """Build a LaTeX file and convert to PDF with pandoc.
    Layout per page (matching product.png mockup):
        ┌─────────────────────┐
        │                     │
        │    Product Image    │   ← large, centered, bordered
        │                     │
        └─────────────────────┘
        Name                      ← product title, bold
        Stk                       ← stock / price info
        ┌─────────────────────┐
        │    UPC-A Barcode    │   ← centered, bordered
        └─────────────────────┘
        SKU: XXXXXXX              ← small text
        UPC: XXXXXXXXXXXX         ← small text
    """
    timestamp_label = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    timestamp_file = datetime.now().strftime("%Y%m%d_%H%M%S")
    # Build LaTeX document directly for precise layout control
    latex_lines = [
        r"\documentclass[11pt,letterpaper]{article}",
        r"\usepackage[margin=0.75in]{geometry}",
        r"\usepackage{graphicx}",
        r"\usepackage{fancybox}",
        r"\usepackage{xcolor}",
        r"\usepackage{parskip}",
        r"\usepackage[utf8]{inputenc}",
        r"\usepackage[T1]{fontenc}",
        r"\usepackage{lmodern}",
        r"\usepackage{hyperref}",
        r"\pagestyle{empty}",
        r"\begin{document}",
        "",
        # Manifest page
        r"\begin{center}",
        r"{\Huge\bfseries Pokemon TCG Product Catalog}\\[0.4cm]",
        r"{\Large Dollar General}\\[0.2cm]",
        r"{\large Generated: " + timestamp_label + r"}\\[0.2cm]",
        r"{\large " + str(len(products)) + r" Cards \& Tins}",
        r"\end{center}",
        r"\vspace{0.8cm}",
        r"\begin{tabular}{r l l r r}",
        r"\hline",
        r"\textbf{\#} & \textbf{Product} & \textbf{SKU} & \textbf{Price} & \textbf{Stock} \\",
        r"\hline",
    ]
    for i, prod in enumerate(products, 1):
        safe = (
            prod["title"][:50]
            .replace("&", r"\&").replace("%", r"\%").replace("$", r"\$")
            .replace("#", r"\#").replace("_", r"\_").replace("é", r"\'e")
        )
        price = prod["price"].replace("$", r"\$")
        qty = prod.get("quantity", 0)
        stock_short = str(qty) if qty else "---"
        latex_lines.append(
            f"{i} & {safe} & \\texttt{{{prod['sku']}}} & {price} & {stock_short} \\\\"
        )
    latex_lines += [
        r"\hline",
        r"\end{tabular}",
        r"\newpage",
        "",
    ]
    for i, prod in enumerate(products, 1):
        title = prod["title"]
        sku = prod["sku"]
        upc = prod["upc"]
        price = prod["price"]
        stock = prod["stock"]
        # Download product image
        img_dest = IMAGES_DIR / f"product_{i}_{sku}.jpg"
        img_path = download_image(prod.get("image_url"), img_dest)
        if not img_path:
            img_path = make_placeholder(
                IMAGES_DIR / f"product_{i}_{sku}_placeholder.png", title[:30]
            )
        # Generate barcode from UPC (not SKU)
        bc_path = generate_barcode(upc, BARCODES_DIR)
        # Escape LaTeX special characters in text fields
        safe_title = (
            title.replace("&", r"\&")
            .replace("%", r"\%")
            .replace("$", r"\$")
            .replace("#", r"\#")
            .replace("_", r"\_")
            .replace("é", r"\'e")
        )
        safe_stock = stock.replace("&", r"\&")
        safe_price = price.replace("$", r"\$")
        # Absolute paths for LaTeX
        abs_img = str(img_path.resolve())
        abs_bc = str(bc_path.resolve()) if bc_path else None
        latex_lines += [
            # Name — bold, large
            r"{\Large\bfseries " + safe_title + r"}",
            "",
            r"\vspace{0.15cm}",
            "",
            # Stock and price
            r"{\large " + safe_stock + r" \hfill " + safe_price + r"}",
            "",
            r"\vspace{0.1cm}",
            "",
            # SKU and UPC
            r"{\small SKU: \texttt{" + sku + r"} \hfill UPC: \texttt{" + upc + r"}}",
            "",
            r"\vspace{0.3cm}",
            "",
            r"\begin{center}",
            # Product image — large, centered, with border
            r"\fbox{\includegraphics[width=0.7\textwidth,height=0.40\textheight,keepaspectratio]{"
            + abs_img
            + r"}}",
            r"\end{center}",
            "",
            r"\vfill",
            "",
        ]
        # Barcode — centered, bordered, pushed to bottom
        if abs_bc:
            latex_lines += [
                r"\begin{center}",
                r"\fbox{\includegraphics[width=0.55\textwidth]{"
                + abs_bc
                + r"}}",
                r"\end{center}",
                "",
            ]
        # Page break between products (not after last)
        if i < len(products):
            latex_lines.append(r"\newpage")
            latex_lines.append("")
        print(f"   ✅ [{i}/{len(products)}] {title}")
    latex_lines.append(r"\end{document}")
    # Write .tex file
    tex_file = OUTPUT_DIR / f"pokemon_catalog_{timestamp_file}.tex"
    tex_file.write_text("\n".join(latex_lines), encoding="utf-8")
    print(f"\n📝 LaTeX source: {tex_file}")
    # Compile to PDF with pdflatex directly (pandoc strips images from raw .tex)
    pdf_file = OUTPUT_DIR / f"pokemon_catalog_{timestamp_file}.pdf"
    for engine in ["pdflatex", "xelatex"]:
        try:
            result = subprocess.run(
                [engine, "-interaction=nonstopmode",
                 f"-output-directory={OUTPUT_DIR}", str(tex_file)],
                capture_output=True, text=True, timeout=120,
            )
            if pdf_file.exists() and pdf_file.stat().st_size > 1000:
                # Clean up LaTeX temp files
                for ext in [".aux", ".log", ".out"]:
                    tmp = pdf_file.with_suffix(ext)
                    if tmp.exists():
                        tmp.unlink()
                print(
                    f"📄 PDF generated: {pdf_file}  ({pdf_file.stat().st_size // 1024} KB)"
                )
                return pdf_file
        except FileNotFoundError:
            continue
        except Exception:
            continue
    print(f"⚠ PDF generation failed. LaTeX source available at: {tex_file}")
    return None
 # ---------------------------------------------------------------------------
 # Main
 # ---------------------------------------------------------------------------
 def main():
    args = sys.argv[1:]
    # Handle --pdf-only mode
    if "--pdf-only" in args:
        idx = args.index("--pdf-only")
        json_file = args[idx + 1] if idx + 1 < len(args) else None
        if not json_file or not Path(json_file).exists():
            print(f"Usage: {sys.argv[0]} --pdf-only <products.json>")
            sys.exit(1)
        products = json.loads(Path(json_file).read_text())
        for d in [OUTPUT_DIR, IMAGES_DIR, BARCODES_DIR]:
            d.mkdir(parents=True, exist_ok=True)
        print(f"\n🖨️  Generating PDF from {json_file} ({len(products)} products)...")
        generate_catalog_pdf(products)
        return
    scrape_only = "--scrape-only" in args
    # --- Banner ---
    timestamp_file = datetime.now().strftime("%Y%m%d_%H%M%S")
    print("=" * 60)
    print("  🔍  Pokemon Discovery (pokemon-disco)")
    print("  Dollar General — Pokemon TCG Cards & Tins")
    print(f"  {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print("=" * 60)
    # --- Step 1: Extract from HAR ---
    if not Path(HAR_FILE).exists():
        print(f"\n❌ HAR file not found: {HAR_FILE}")
        print("   Capture a HAR file from the Pokemon page in your browser")
        print("   and place it in the project directory.")
        sys.exit(1)
    raw_items = extract_products_from_har(HAR_FILE)
    # --- Step 2: Filter for Cards & Tins ---
    print(f"\n🎯 Filtering for card packs and tins...")
    card_tin_items = filter_card_and_tin_products(raw_items)
    print(f"   {len(card_tin_items)} of {len(raw_items)} products match (pack/tin/booster/tcg)")
    if not card_tin_items:
        print("❌ No card or tin products found.")
        sys.exit(1)
    # Show what was filtered out
    excluded = [i for i in raw_items if i not in card_tin_items]
    if excluded:
        print(f"\n   Excluded {len(excluded)} non-card/tin products:")
        for item in excluded:
            print(f"     ✗ {item.get('Description', '?')}")
    # --- Step 3: Normalize ---
    print(f"\n📋 Processing {len(card_tin_items)} products...")
    products = [normalize_product(item) for item in card_tin_items]
    # Print summary table
    print()
    print(f"  {'#':<3} {'Title':<55} {'SKU':<12} {'Price':<8} {'Stock'}")
    print(f"  {'—'*3} {'—'*55} {'—'*12} {'—'*8} {'—'*15}")
    for i, p in enumerate(products, 1):
        title = p['title'][:53]
        print(f"  {i:<3} {title:<55} {p['sku']:<12} {p['price']:<8} {p['stock']}")
    # --- Step 4: Save JSON ---
    json_file = f"pokemon_tcg_products_{timestamp_file}.json"
    Path(json_file).write_text(json.dumps(products, indent=2, ensure_ascii=False))
    print(f"\n💾 Product data: {json_file}")
    if scrape_only:
        print("\n✅ Scrape complete (--scrape-only). Run with --pdf-only to generate catalog.")
        return
    # --- Step 5: Generate PDF ---
    for d in [OUTPUT_DIR, IMAGES_DIR, BARCODES_DIR]:
        d.mkdir(parents=True, exist_ok=True)
    print(f"\n🖨️  Generating PDF catalog...")
    pdf_path = generate_catalog_pdf(products)
    # --- Done ---
    print("\n" + "=" * 60)
    if pdf_path:
        print(f"  ✅ COMPLETE!")
        print(f"  📄 PDF Catalog:  {pdf_path}")
        print(f"  💾 Product JSON: {json_file}")
        print(f"  🏷️  Barcodes:     {BARCODES_DIR}/")
        print(f"  🖼️  Images:       {IMAGES_DIR}/")
    else:
        print(f"  ⚠ PDF generation failed — markdown file available in {OUTPUT_DIR}/")
        print(f"  💾 Product JSON: {json_file}")
    print("=" * 60)
 if __name__ == "__main__":
    main()
--- a/extract_api_details.py
+++ b/extract_api_details.py
@@ -1,135 +0,0 @@
 #!/usr/bin/env python3
 """
 Extract exact API request details from HAR file
 """
 import json
 from urllib.parse import urlparse, parse_qs
 def extract_api_request_details():
    """Extract the exact API request format"""
    har_file = 'www.dollargeneral.com_Archive [26-03-21 15-14-28].har'
    with open(har_file, 'r', encoding='utf-8') as f:
        har_data = json.load(f)
    entries = har_data.get('log', {}).get('entries', [])
    # Find the API calls that contain our product
    api_endpoint = "https://dggo.dollargeneral.com/omni/api/v2/category/search/provider"
    successful_calls = []
    for entry in entries:
        request = entry.get('request', {})
        response = entry.get('response', {})
        if (request.get('url') == api_endpoint and 
            request.get('method') == 'POST' and 
            response.get('status') == 200):
            # Check if response contains our product
            response_text = response.get('content', {}).get('text', '')
            if '41936301' in response_text and 'pokemon' in response_text.lower():
                successful_calls.append(entry)
    print(f"Found {len(successful_calls)} successful API calls with Pokemon products")
    print()
    for i, entry in enumerate(successful_calls):
        request = entry.get('request', {})
        response = entry.get('response', {})
        print(f"=== API Call {i+1} ===")
        print(f"URL: {request.get('url')}")
        print(f"Method: {request.get('method')}")
        # Extract headers
        headers = {}
        for header in request.get('headers', []):
            name = header.get('name')
            value = header.get('value')
            if name.lower() in ['authorization', 'content-type', 'accept', 'referer', 'user-agent']:
                headers[name] = value
        print("Headers:")
        for name, value in headers.items():
            if name.lower() == 'authorization':
                print(f"  {name}: {value[:50]}... (Bearer token)")
            else:
                print(f"  {name}: {value}")
        # Extract POST data
        post_data = request.get('postData', {})
        if post_data.get('text'):
            try:
                post_json = json.loads(post_data.get('text'))
                print("POST Data:")
                print(json.dumps(post_json, indent=2))
            except:
                print(f"POST Data (raw): {post_data.get('text')}")
        # Analyze response
        response_text = response.get('content', {}).get('text', '')
        if response_text:
            try:
                response_json = json.loads(response_text)
                print(f"Response size: {len(response_text)} characters")
                # Extract product information
                items = response_json.get('ItemList', {}).get('Items', [])
                print(f"Products found: {len(items)}")
                # Show Pokemon products
                pokemon_products = []
                for item in items:
                    title = item.get('Title', '').lower()
                    if 'pokemon' in title or 'pokémon' in title:
                        pokemon_products.append({
                            'title': item.get('Title'),
                            'sku': item.get('ItemNbr'),
                            'upc': item.get('UPC'),
                            'price': item.get('Price', {}).get('Amount'),
                            'url': item.get('ProductUrl'),
                            'in_stock': item.get('Inventory', {}).get('InStock'),
                            'available_online': item.get('Inventory', {}).get('AvailableOnline')
                        })
                if pokemon_products:
                    print(f"\nPokemon products in this response: {len(pokemon_products)}")
                    for prod in pokemon_products:
                        print(f"  • {prod['title']}")
                        print(f"    SKU: {prod['sku']}, UPC: {prod['upc']}")
                        print(f"    Price: ${prod['price']}, In Stock: {prod['in_stock']}")
                        print(f"    URL: {prod['url']}")
                # Extract the store number and filters used
                if i == 0:  # Save the working request format
                    with open('api_request_template.json', 'w') as f:
                        json.dump({
                            'endpoint': api_endpoint,
                            'method': 'POST',
                            'headers': headers,
                            'post_data': post_json,
                            'example_response': {
                                'total_items': len(items),
                                'pokemon_items': len(pokemon_products),
                                'sample_pokemon_product': pokemon_products[0] if pokemon_products else None
                            }
                        }, f, indent=2)
                    print(f"\n✅ Saved working API template to: api_request_template.json")
            except Exception as e:
                print(f"Error parsing response: {e}")
        print("\n" + "="*60 + "\n")
    return successful_calls
 if __name__ == "__main__":
    successful_calls = extract_api_request_details()
    print("🎯 SUMMARY:")
    print(f"   Successfully extracted {len(successful_calls)} working API calls")
    print("   Next step: Implement this API call in Pokemon Discovery scraper")
--- a/implement_api_scraper.py
+++ b/implement_api_scraper.py
@@ -1,297 +0,0 @@
 #!/usr/bin/env python3
 """
 Implement API-based scraping for Pokemon Discovery
 """
 import json
 import requests
 import sys
 from datetime import datetime
 from urllib.parse import urljoin
 class DollarGeneralAPIScaper:
    def __init__(self):
        self.base_url = "https://www.dollargeneral.com"
        self.api_base = "https://dggo.dollargeneral.com"
        self.session = requests.Session()
        # Headers that mimic a real browser session
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:148.0) Gecko/20100101 Firefox/148.0',
            'Accept': 'application/json, text/plain, */*',
            'Accept-Language': 'en-US,en;q=0.9',
            'Accept-Encoding': 'gzip, deflate, br',
            'DNT': '1',
            'Connection': 'keep-alive',
            'Sec-Fetch-Dest': 'empty',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Site': 'cross-site',
        }
        self.session.headers.update(self.headers)
        self.auth_token = None
    def get_auth_token(self):
        """Try multiple methods to get authentication token"""
        print("🔑 Attempting to get authentication token...")
        # Method 1: Get token from main page
        try:
            print("  - Visiting main Pokemon page...")
            pokemon_url = f"{self.base_url}/c/toys/pokemon?q=&soldAtStore=true"
            response = self.session.get(pokemon_url, timeout=30)
            if response.status_code == 200:
                # Look for embedded tokens in the page
                import re
                # Look for bearer tokens in script tags
                token_patterns = [
                    r'Bearer\s+([A-Za-z0-9\-_\.]+)',
                    r'"access_token":\s*"([^"]+)"',
                    r'"token":\s*"([^"]+)"',
                    r'authorization:\s*["\'](Bearer\s+[^"\']+)["\']'
                ]
                for pattern in token_patterns:
                    matches = re.findall(pattern, response.text, re.IGNORECASE)
                    if matches:
                        token = matches[0]
                        if token.startswith('Bearer '):
                            token = token[7:]  # Remove 'Bearer ' prefix
                        print(f"  ✅ Found token via pattern: {token[:50]}...")
                        self.auth_token = token
                        return token
        except Exception as e:
            print(f"  ❌ Main page method failed: {e}")
        # Method 2: Try token endpoint
        try:
            print("  - Trying token endpoint...")
            token_url = f"{self.base_url}/bin/omni/userTokens"
            response = self.session.get(token_url, timeout=30)
            if response.status_code == 200:
                try:
                    data = response.json()
                    if 'access_token' in data:
                        token = data['access_token']
                        print(f"  ✅ Got token from endpoint: {token[:50]}...")
                        self.auth_token = token
                        return token
                except:
                    pass
        except Exception as e:
            print(f"  ❌ Token endpoint failed: {e}")
        # Method 3: Try CSRF token endpoint
        try:
            print("  - Trying CSRF token...")
            csrf_url = f"{self.base_url}/libs/granite/csrf/token.json"
            response = self.session.get(csrf_url, timeout=30)
            if response.status_code == 200:
                data = response.json()
                if 'token' in data:
                    # This might not be the right token, but let's try
                    print(f"  ⚠️  Got CSRF token (may not work for API): {str(data)[:100]}...")
        except Exception as e:
            print(f"  ❌ CSRF method failed: {e}")
        print("  ❌ Could not obtain authentication token")
        return None
    def search_products_api(self, store_nbr=17506, category_id=723960, include_out_of_stock=True):
        """Search for products using the API endpoint"""
        print(f"🔍 Searching products via API...")
        print(f"   Store: {store_nbr}, Category: {category_id}")
        if not self.auth_token:
            print("   ❌ No authentication token available")
            return []
        endpoint = f"{self.api_base}/omni/api/v2/category/search/provider"
        # Headers for API request
        api_headers = self.headers.copy()
        api_headers.update({
            'Content-Type': 'application/json',
            'Authorization': f'Bearer {self.auth_token}',
            'Referer': f'{self.base_url}/',
            'Origin': self.base_url,
        })
        # Request payload based on HAR analysis
        payload = {
            "StoreNbr": store_nbr,
            "SearchTerm": None,
            "PageSize": 48,  # Request more items
            "PageStartRecordIndex": 0,
            "Filters": {
                "category": [],
                "brand": [],
                "dgDelivery": False,
                "dgPickUp": False,
                "dgShipTohome": False,
                "soldAtStore": True,
                "inStock": not include_out_of_stock,  # False = include out of stock
                "onlyActivatedDeals": False
            },
            "IncludeSponsored": True,
            "IncludeShipToHome": True,
            "IncludeDeals": True,
            "offerSourceType": 0,
            "Id": category_id,
            "IncludeProducts": False,
            "DoNotSave": False,
            "OptOut": False,
            "SearchType": 1
        }
        try:
            print(f"   POST {endpoint}")
            response = self.session.post(endpoint, 
                                       headers=api_headers, 
                                       json=payload, 
                                       timeout=30)
            print(f"   Status: {response.status_code}")
            print(f"   Response size: {len(response.text)} characters")
            if response.status_code == 200:
                if len(response.text) == 0:
                    print("   ⚠️  Empty response (token may be expired)")
                    return []
                try:
                    data = response.json()
                    items = data.get('ItemList', {}).get('Items', [])
                    print(f"   ✅ Found {len(items)} total items")
                    return items
                except Exception as e:
                    print(f"   ❌ JSON parsing error: {e}")
                    print(f"   Response preview: {response.text[:200]}...")
                    return []
            elif response.status_code == 401:
                print("   ❌ Authentication failed - token expired or invalid")
                return []
            else:
                print(f"   ❌ API error: {response.status_code}")
                print(f"   Response: {response.text[:200]}...")
                return []
        except Exception as e:
            print(f"   ❌ Request failed: {e}")
            return []
    def filter_pokemon_products(self, items):
        """Filter for Pokemon TCG products"""
        pokemon_products = []
        for item in items:
            title = item.get('Title', '').lower()
            description = item.get('Description', '').lower()
            brand = item.get('Brand', '').lower()
            # Check if this is a Pokemon TCG product
            pokemon_keywords = ['pokemon', 'pokémon']
            tcg_keywords = ['trading card', 'tcg', 'cards', 'pack', 'tin', 'box', 'collection']
            has_pokemon = any(keyword in title or keyword in description for keyword in pokemon_keywords)
            has_tcg = any(keyword in title or keyword in description for keyword in tcg_keywords)
            if has_pokemon and has_tcg:
                product = {
                    'title': item.get('Title'),
                    'sku': item.get('ItemNbr'),
                    'upc': item.get('UPC'),
                    'price': f"${item.get('Price', {}).get('Amount', 0):.2f}",
                    'url': urljoin(self.base_url, item.get('ProductUrl', '')),
                    'stock': 'In Stock' if item.get('Inventory', {}).get('InStock') else 'Out of Stock',
                    'image_url': item.get('ImageURL'),
                    'description': item.get('Description', ''),
                    'brand': item.get('Brand', '')
                }
                pokemon_products.append(product)
                print(f"   🎯 Found: {product['title']}")
                print(f"      SKU: {product['sku']}, Price: {product['price']}")
                print(f"      Stock: {product['stock']}")
        return pokemon_products
    def scrape_pokemon_products(self):
        """Main scraping method"""
        print("Pokemon Discovery - API-based Scraping")
        print("="*60)
        # Get authentication token
        if not self.get_auth_token():
            print("❌ Authentication failed - cannot access API")
            print()
            print("💡 Alternative approaches:")
            print("   1. Use browser automation with proper session")
            print("   2. Extract products manually from individual pages")
            print("   3. Use the working individual product scraper")
            return []
        print()
        # Search for products
        all_items = self.search_products_api()
        if not all_items:
            print("❌ No items returned from API")
            return []
        print()
        # Filter for Pokemon products
        pokemon_products = self.filter_pokemon_products(all_items)
        print()
        print(f"🎉 SUCCESS! Found {len(pokemon_products)} Pokemon TCG products")
        if pokemon_products:
            # Save results
            timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
            filename = f'pokemon_tcg_api_scrape_{timestamp}.json'
            with open(filename, 'w') as f:
                json.dump(pokemon_products, f, indent=2)
            print(f"💾 Saved to: {filename}")
            # Show summary
            print()
            print("📋 Product Summary:")
            for i, product in enumerate(pokemon_products, 1):
                print(f"  {i}. {product['title']}")
                print(f"     SKU: {product['sku']} | Price: {product['price']} | {product['stock']}")
        return pokemon_products
 def main():
    scraper = DollarGeneralAPIScaper()
    products = scraper.scrape_pokemon_products()
    if products:
        print()
        print("🚀 Ready for PDF generation!")
        print("Run: python pdf_generator.py pokemon_tcg_api_scrape_[timestamp].json")
    else:
        print()
        print("📝 Note: Individual product scraping still works perfectly!")
        print("The issue is authentication for bulk API access.")
 if __name__ == "__main__":
    main()
--- a/pdf_generator.py
+++ b/pdf_generator.py
@@ -1,279 +0,0 @@
 #!/usr/bin/env python3
 """
 Pokemon Discovery - TCG Product Catalog PDF Generator
 Generates PDF catalog with product images, details, and UPC-A barcodes
 """
 import json
 import os
 import sys
 import requests
 import subprocess
 from datetime import datetime
 from pathlib import Path
 import barcode
 from barcode.writer import ImageWriter
 from PIL import Image, ImageDraw, ImageFont
 import tempfile
 import shutil
 class PokemonTCGCatalogGenerator:
    def __init__(self, json_file):
        self.json_file = json_file
        self.output_dir = Path("catalog_output")
        self.images_dir = self.output_dir / "images"
        self.barcodes_dir = self.output_dir / "barcodes"
        # Create output directories
        self.output_dir.mkdir(exist_ok=True)
        self.images_dir.mkdir(exist_ok=True)
        self.barcodes_dir.mkdir(exist_ok=True)
        # Load product data
        with open(json_file, 'r') as f:
            self.products = json.load(f)
    def download_image(self, url, filename):
        """Download product image"""
        if not url:
            return None
        try:
            response = requests.get(url, timeout=30)
            response.raise_for_status()
            filepath = self.images_dir / filename
            with open(filepath, 'wb') as f:
                f.write(response.content)
            return filepath
        except Exception as e:
            print(f"Failed to download image {url}: {e}")
            return None
    def generate_upc_barcode(self, sku):
        """Generate UPC-A barcode from SKU"""
        try:
            # Convert SKU to 12-digit UPC-A format
            # Remove non-digits and pad/truncate to 11 digits (12th is check digit)
            digits_only = ''.join(filter(str.isdigit, str(sku)))
            if len(digits_only) < 11:
                # Pad with zeros at the start
                upc_base = digits_only.zfill(11)
            else:
                # Take the last 11 digits
                upc_base = digits_only[-11:]
            # Generate UPC-A barcode
            upc_generator = barcode.get_barcode_class('upca')
            upc = upc_generator(upc_base, writer=ImageWriter())
            # Save barcode image
            barcode_filename = f"barcode_{sku.replace('/', '_').replace(' ', '_')}"
            barcode_path = self.barcodes_dir / barcode_filename
            # Save with specific options for better appearance
            upc.save(str(barcode_path).replace('.png', ''), options={
                'module_width': 0.2,
                'module_height': 15.0,
                'quiet_zone': 6.5,
                'font_size': 10,
                'text_distance': 5.0,
                'background': 'white',
                'foreground': 'black'
            })
            final_path = f"{barcode_path}.png"
            return final_path
        except Exception as e:
            print(f"Failed to generate barcode for SKU {sku}: {e}")
            return None
    def create_placeholder_image(self, width=300, height=200):
        """Create a placeholder image when product image is not available"""
        img = Image.new('RGB', (width, height), color='lightgray')
        draw = ImageDraw.Draw(img)
        try:
            # Try to use a system font
            font = ImageFont.truetype('/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf', 24)
        except:
            try:
                font = ImageFont.truetype('arial.ttf', 24)
            except:
                font = ImageFont.load_default()
        text = "No Image\nAvailable"
        # Get text bounding box for centering
        lines = text.split('\n')
        y_offset = height // 2 - (len(lines) * 30) // 2
        for line in lines:
            bbox = draw.textbbox((0, 0), line, font=font)
            text_width = bbox[2] - bbox[0]
            x_offset = (width - text_width) // 2
            draw.text((x_offset, y_offset), line, fill='darkgray', font=font)
            y_offset += 30
        placeholder_path = self.images_dir / "placeholder.png"
        img.save(placeholder_path)
        return placeholder_path
    def generate_markdown(self):
        """Generate markdown content for the catalog"""
        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        markdown = f"""---
 title: "Pokemon TCG Product Catalog"
 subtitle: "Dollar General - Generated {timestamp}"
 author: "Automated Scraper"
 date: "{timestamp}"
 geometry: margin=1in
 fontsize: 11pt
 documentclass: article
 ---
 # Pokemon TCG Product Catalog
 Generated on: {timestamp}  
 Source: Dollar General  
 Total Products: {len(self.products)}
 ---
 """
        for i, product in enumerate(self.products, 1):
            print(f"Processing product {i}/{len(self.products)}: {product.get('title', 'Unknown')}")
            # Download product image
            image_path = None
            if product.get('image_url'):
                filename = f"product_{i}_{product.get('sku', 'unknown').replace('/', '_').replace(' ', '_')}.jpg"
                image_path = self.download_image(product.get('image_url'), filename)
            if not image_path:
                # Use placeholder
                image_path = self.create_placeholder_image()
            # Generate barcode
            barcode_path = None
            if product.get('sku'):
                barcode_path = self.generate_upc_barcode(product.get('sku'))
            # Add product section to markdown
            markdown += f"## {i}. {product.get('title', 'Unknown Product')}\n\n"
            # Product image
            if image_path:
                rel_image_path = os.path.relpath(image_path, self.output_dir)
                markdown += f"![Product Image]({rel_image_path}){{width=300px}}\n\n"
            # Product details in a table
            markdown += "| Field | Value |\n"
            markdown += "|-------|-------|\n"
            markdown += f"| **Title** | {product.get('title', 'N/A')} |\n"
            markdown += f"| **Price** | {product.get('price', 'N/A')} |\n"
            markdown += f"| **Stock** | {product.get('stock', 'N/A')} |\n"
            markdown += f"| **SKU** | `{product.get('sku', 'N/A')}` |\n"
            markdown += f"| **URL** | {product.get('url', 'N/A')} |\n"
            markdown += "\n"
            # Barcode
            if barcode_path:
                rel_barcode_path = os.path.relpath(barcode_path, self.output_dir)
                markdown += f"**UPC-A Barcode:**\n\n"
                markdown += f"![UPC-A Barcode]({rel_barcode_path}){{width=200px}}\n\n"
            markdown += "---\n\n"
        return markdown
    def generate_pdf(self):
        """Generate PDF catalog using pandoc"""
        print("Generating markdown content...")
        markdown_content = self.generate_markdown()
        # Save markdown file
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        markdown_file = self.output_dir / f"pokemon_tcg_catalog_{timestamp}.md"
        with open(markdown_file, 'w', encoding='utf-8') as f:
            f.write(markdown_content)
        print(f"Markdown saved to: {markdown_file}")
        # Generate PDF using pandoc
        pdf_file = self.output_dir / f"pokemon_tcg_catalog_{timestamp}.pdf"
        print("Converting to PDF using pandoc...")
        try:
            subprocess.run([
                'pandoc',
                str(markdown_file),
                '-o', str(pdf_file),
                '--pdf-engine=xelatex',
                '-V', 'colorlinks=true',
                '-V', 'linkcolor=blue',
                '-V', 'filecolor=magenta',
                '-V', 'urlcolor=cyan',
                '--toc',
                '--toc-depth=2'
            ], check=True)
            print(f"PDF generated successfully: {pdf_file}")
            return pdf_file
        except subprocess.CalledProcessError as e:
            print(f"Pandoc conversion failed: {e}")
            print("Trying with pdflatex instead...")
            try:
                subprocess.run([
                    'pandoc',
                    str(markdown_file),
                    '-o', str(pdf_file),
                    '--pdf-engine=pdflatex',
                    '--toc'
                ], check=True)
                print(f"PDF generated successfully: {pdf_file}")
                return pdf_file
            except subprocess.CalledProcessError as e2:
                print(f"PDF generation failed with both engines: {e2}")
                print(f"Markdown file available at: {markdown_file}")
                return None
        except FileNotFoundError:
            print("Error: pandoc not found. Please install pandoc to generate PDF.")
            print(f"Markdown file available at: {markdown_file}")
            return None
 def main():
    if len(sys.argv) != 2:
        print("Usage: python3 pdf_generator.py <json_file>")
        print("Example: python3 pdf_generator.py pokemon_tcg_products_20241221_143025.json")
        sys.exit(1)
    json_file = sys.argv[1]
    if not os.path.exists(json_file):
        print(f"Error: JSON file '{json_file}' not found")
        sys.exit(1)
    generator = PokemonTCGCatalogGenerator(json_file)
    pdf_file = generator.generate_pdf()
    if pdf_file:
        print(f"\nCatalog generation completed!")
        print(f"PDF file: {pdf_file}")
        print(f"Output directory: {generator.output_dir}")
    else:
        print(f"\nPDF generation failed, but markdown file is available in: {generator.output_dir}")
 if __name__ == "__main__":
    main()
--- a/run.sh
+++ b/run.sh
@@ -1,31 +0,0 @@
 #!/bin/bash
 # Pokemon Discovery - Scraper & Catalog Generator Launcher
 # Automatically activates virtual environment and runs the scraper
 set -e
 cd "$(dirname "$0")"
 echo "Pokemon Discovery - Product Scraper & Catalog Generator"
 echo "================================================"
 # Check if virtual environment exists
 if [[ ! -d "venv" ]]; then
    echo "Creating virtual environment..."
    python3 -m venv venv
 fi
 # Activate virtual environment
 source venv/bin/activate
 # Check if requirements are installed
 if ! python -c "import requests, bs4, barcode, selenium" 2>/dev/null; then
    echo "Installing Python requirements..."
    pip install -r requirements.txt
 fi
 # Run the main script
 python run_scraper.py
 echo ""
 echo "Script completed. Check the output above for results."
--- a/run_scraper.py
+++ b/run_scraper.py
@@ -1,139 +0,0 @@
 #!/usr/bin/env python3
 """
 Pokemon Discovery - Scraper and Catalog Generator
 Main script that runs both scraping and PDF generation
 """
 import os
 import sys
 import subprocess
 from datetime import datetime
 from pathlib import Path
 def install_requirements():
    """Install Python requirements"""
    print("Installing Python requirements...")
    try:
        subprocess.run([sys.executable, '-m', 'pip', 'install', '-r', 'requirements.txt'], 
                      check=True)
        print("Requirements installed successfully!")
    except subprocess.CalledProcessError as e:
        print(f"Failed to install requirements: {e}")
        return False
    return True
 def run_scraper():
    """Run the scraper to collect product data"""
    print("=" * 60)
    print("STEP 1: SCRAPING POKEMON TCG PRODUCTS")
    print("=" * 60)
    try:
        result = subprocess.run([sys.executable, 'scraper.py'], 
                               capture_output=True, text=True)
        if result.returncode == 0:
            print("Scraping completed successfully!")
            print(result.stdout)
            # Find the generated JSON file
            json_files = list(Path('.').glob('pokemon_tcg_products_*.json'))
            if json_files:
                latest_file = max(json_files, key=os.path.getctime)
                return str(latest_file)
            else:
                print("No JSON file was generated")
                return None
        else:
            print("Scraping failed:")
            print(result.stderr)
            return None
    except Exception as e:
        print(f"Error running scraper: {e}")
        return None
 def run_pdf_generator(json_file):
    """Run the PDF generator with the scraped data"""
    print("=" * 60)
    print("STEP 2: GENERATING PDF CATALOG")
    print("=" * 60)
    try:
        result = subprocess.run([sys.executable, 'pdf_generator.py', json_file], 
                               capture_output=True, text=True)
        if result.returncode == 0:
            print("PDF generation completed successfully!")
            print(result.stdout)
            return True
        else:
            print("PDF generation failed:")
            print(result.stderr)
            return False
    except Exception as e:
        print(f"Error running PDF generator: {e}")
        return False
 def main():
    print("Pokemon Discovery - Product Scraper & Catalog Generator")
    print("=" * 60)
    print(f"Started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print()
    # Check if requirements are installed
    try:
        import requests, bs4, barcode, PIL
        print("✓ Required packages are available")
    except ImportError as e:
        print(f"✗ Missing required package: {e}")
        print("Installing requirements...")
        if not install_requirements():
            sys.exit(1)
    # Check if pandoc is available
    try:
        subprocess.run(['pandoc', '--version'], 
                      capture_output=True, check=True)
        print("✓ Pandoc is available for PDF generation")
    except (subprocess.CalledProcessError, FileNotFoundError):
        print("⚠ Pandoc not found. PDF generation may fail.")
        print("  Install pandoc with: sudo apt install pandoc (Ubuntu/Debian)")
        print("  or: brew install pandoc (macOS)")
        print("  or: pacman -S pandoc (Arch Linux)")
    print()
    # Run scraper
    json_file = run_scraper()
    if not json_file:
        print("Scraping failed. Exiting.")
        sys.exit(1)
    # Run PDF generator
    if run_pdf_generator(json_file):
        print("=" * 60)
        print("SUCCESS! Both scraping and PDF generation completed.")
        print("=" * 60)
        print(f"JSON data: {json_file}")
        print("PDF catalog: Check the catalog_output/ directory")
        print()
        print("Files generated:")
        # List generated files
        for file_pattern in ['pokemon_tcg_products_*.json', 'catalog_output/pokemon_tcg_catalog_*.pdf']:
            files = list(Path('.').glob(file_pattern))
            if files:
                latest = max(files, key=os.path.getctime)
                print(f"  - {latest}")
    else:
        print("=" * 60)
        print("PARTIAL SUCCESS: Scraping completed, but PDF generation failed.")
        print("=" * 60)
        print(f"JSON data: {json_file}")
        print("You can manually run the PDF generator with:")
        print(f"  python3 pdf_generator.py {json_file}")
 if __name__ == "__main__":
    main()
--- a/scraper.py
+++ b/scraper.py
@@ -1,7 +1,20 @@
 #!/usr/bin/env python3
 """
-Pokemon Discovery - TCG Product Scraper for Dollar General
+Pokemon Discovery — Site Scraper (Reference)
-Scrapes product information and saves to JSON for PDF generation
+
 HTML + Selenium/Brave scraper for Dollar General product pages.
 Kept as a reference implementation. The primary tool is disco.py,
 which reads product data from a HAR capture instead of scraping live.
 This scraper can:
  - Fetch individual product pages and extract title, SKU, price, stock
  - Attempt to find product links from the category page (limited by
    dynamic JS loading — products are injected via API after page load)
  - Fall back to Brave browser via Selenium for JS-rendered content
 Usage:
    python scraper.py                  # Attempt full category scrape
    # Or import and use PokemonTCGScraper class directly for individual pages
 """
 import json
@@ -28,6 +41,14 @@ except ImportError:
    print("Selenium not available, using requests only (install selenium for Brave browser support)")
 class PokemonTCGScraper:
    """HTML/Selenium scraper for Dollar General Pokemon product pages.
    Can extract product details (title, SKU, price, stock) from individual
    product page URLs. Category-level scraping is limited because Dollar
    General loads products dynamically via a JS API call after page load.
    See disco.py for the HAR-based approach that bypasses this limitation.
    """
    def __init__(self):
        self.base_url = "https://www.dollargeneral.com"
        self.search_url = "https://www.dollargeneral.com/c/toys/pokemon?q=&soldAtStore=true"
@@ -300,9 +321,10 @@ class PokemonTCGScraper:
        return has_pokemon and has_tcg
    def try_api_scraping(self):
-        """
+        """Stub for API-based scraping (requires auth token).
-        Try to scrape products using the discovered API endpoint
+
-        This method contains the exact API call found via HAR analysis
+        Documents the discovered API endpoint and request format.
        Not functional — use disco.py with a HAR file instead.
        """
        print("🔬 Attempting API-based scraping...")
        print("   Endpoint: https://dggo.dollargeneral.com/omni/api/v2/category/search/provider")
--- a/test_api_scraper.py
+++ b/test_api_scraper.py
@@ -1,246 +0,0 @@
 #!/usr/bin/env python3
 """
 Test the Dollar General API endpoint for Pokemon products
 """
 import json
 import requests
 import sys
 from datetime import datetime
 def get_auth_token():
    """Get authentication token from Dollar General"""
    try:
        # Try to get token from the token endpoint
        token_url = 'https://www.dollargeneral.com/bin/omni/userTokens'
        headers = {
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:148.0) Gecko/20100101 Firefox/148.0',
            'Accept': 'application/json, text/plain, */*',
            'Referer': 'https://www.dollargeneral.com/'
        }
        response = requests.get(token_url, headers=headers, timeout=30)
        if response.status_code == 200:
            data = response.json()
            # Look for access token in the response
            if 'access_token' in data:
                return data['access_token']
            elif 'token' in data:
                return data['token']
            else:
                print("Token response structure:", list(data.keys()))
                return None
        else:
            print(f"Failed to get token: {response.status_code}")
            return None
    except Exception as e:
        print(f"Error getting token: {e}")
        return None
 def test_api_with_existing_token():
    """Test with the token from HAR file"""
    # Token extracted from HAR file (may expire)
    har_token = "eyJ0eXAiOiJhdCtKV1QiLCJhbGciOiJSUzI1NiIsImtpZCI6Ik5qRTJNemczTXpSRVFrUXpNak5GUmprMU1FUkNNRUZDTVRBek1FWTFRa0pCTXpRM1EwTkNNZyJ9.eyJzY29wZSI6bnVsbCwiaWF0IjoxNzc0MTI3Nzc5LCJleHAiOjE3NzQxMzEzNzksImF1ZCI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiIsImlzcyI6Imh0dHBzOi8vcHJvZC1kZ2dvLyIsInN1YiI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiIsInNpZCI6IlNrWk9makF5TURRMU1EVXpOVFEwWWpBM016SXpNak14TXpFek9ETTNNekV3TWpreFl6VitUVUZXYVhwbk56SXpVRGg2VWxkcmEySkRkMk5EZUdVNFlUWm5XVXBHVDBveVExTlRNVWxXWlhSalQzRnFWazVWZGtGWlIwOWtZV2x0WVVwRVRucG5SVlZvUTE5SE5VcHVObGhuTURSb2JuUkVhVlF3UTBzelNIND0iLCJqdGkiOiJzdDIucy5BdEx0VlphRHFnLnZrdW5OV2RWNjN2ZlJTTG00Y3VUd2d5bmc2X0pJNmxKRjA5a2lXTXVQeGZkVDRvT0NhMXhwa1VoRlRkM2tocHZUaFhsRUVwLWw0QzJrZnoycjkzVlYzeldBaUw5Y2x6Snl0amFJamJ4TEJnLkJOZy1CeUdpZnV0WnppQWhhMV8xRDBXTUFWR3JpNVVCX0pKbTRCNVRNYVhTWkZneXpxeUZERjJxZ3B3UTgyajZ2eGVtcnA5RERFTHZnM3hvdlZmZzBnLnNjMyIsImNsaWVudF9pZCI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiIsImF6cCI6IldLOTlLc2VCYnUybmFoNC1ibFE3ZmsyUiJ9.I6ou9atkJ8ndkr2m2Trpg53fMIL3hpofCLUHoHYgZkOJnLnbmL0CQu7_pIChQ6nIDK03GagK6aqxd97E8B8vv9nweSmb7zXhrt43dKLEIdhxIGFkJ4xYgNNg-3cVjSlThBQ_AwCx924lOGjEfikEw4NrvGvrlNvrg1lnNz4hf629hUH-5ccVSdgo1w_LQzsLOeMCjuC_bmAoRxT5KLI9oESd4tPJZU5Nlt2ICbWJD9h-zNrt-ijwYCvb7j8amGbpMGhJZqtzu9f3wN0JUFxDg5rAN-WOtLjwEmR_NxDKq0NEeuU16uhaB8AJzy217XAgJ87bKZldZowsWs-Q9oAH3g"
    endpoint = "https://dggo.dollargeneral.com/omni/api/v2/category/search/provider"
    headers = {
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:148.0) Gecko/20100101 Firefox/148.0',
        'Accept': 'application/json, text/plain, */*',
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {har_token}',
        'Referer': 'https://www.dollargeneral.com/'
    }
    # Test different filter combinations
    test_requests = [
        {
            "name": "In Stock Pokemon Products",
            "payload": {
                "StoreNbr": 17506,
                "SearchTerm": None,
                "PageSize": 24,
                "PageStartRecordIndex": 0,
                "Filters": {
                    "category": [],
                    "brand": [],
                    "dgDelivery": False,
                    "dgPickUp": False,
                    "dgShipTohome": False,
                    "soldAtStore": True,
                    "inStock": True,
                    "onlyActivatedDeals": False
                },
                "IncludeSponsored": True,
                "IncludeShipToHome": True,
                "IncludeDeals": True,
                "offerSourceType": 0,
                "Id": 723960,  # Pokemon category ID
                "IncludeProducts": False,
                "DoNotSave": False,
                "OptOut": False,
                "SearchType": 1
            }
        },
        {
            "name": "All Pokemon Products (including out of stock)",
            "payload": {
                "StoreNbr": 17506,
                "SearchTerm": None,
                "PageSize": 24,
                "PageStartRecordIndex": 0,
                "Filters": {
                    "category": [],
                    "brand": [],
                    "dgDelivery": False,
                    "dgPickUp": False,
                    "dgShipTohome": False,
                    "soldAtStore": True,
                    "inStock": False,  # Include out of stock
                    "onlyActivatedDeals": False
                },
                "IncludeSponsored": True,
                "IncludeShipToHome": True,
                "IncludeDeals": True,
                "offerSourceType": 0,
                "Id": 723960,
                "IncludeProducts": False,
                "DoNotSave": False,
                "OptOut": False,
                "SearchType": 1
            }
        }
    ]
    all_pokemon_products = []
    for test in test_requests:
        print(f"=== Testing: {test['name']} ===")
        try:
            response = requests.post(endpoint, 
                                   headers=headers, 
                                   json=test['payload'], 
                                   timeout=30)
            print(f"Status Code: {response.status_code}")
            if response.status_code == 200:
                print(f"Response length: {len(response.text)} characters")
                print(f"Response preview: {response.text[:200]}...")
                try:
                    data = response.json()
                    items = data.get('ItemList', {}).get('Items', [])
                    print(f"Total products: {len(items)}")
                except Exception as json_error:
                    print(f"JSON parsing error: {json_error}")
                    print(f"Full response: {response.text}")
                    continue
                # Filter for Pokemon products
                pokemon_products = []
                for item in items:
                    title = item.get('Title', '').lower()
                    if any(keyword in title for keyword in ['pokemon', 'pokémon', 'trading card']):
                        product_info = {
                            'title': item.get('Title'),
                            'sku': item.get('ItemNbr'),
                            'upc': item.get('UPC'),
                            'price': item.get('Price', {}).get('Amount'),
                            'url': f"https://www.dollargeneral.com{item.get('ProductUrl', '')}",
                            'in_stock': item.get('Inventory', {}).get('InStock'),
                            'image_url': item.get('ImageURL'),
                            'description': item.get('Description', ''),
                            'brand': item.get('Brand', '')
                        }
                        pokemon_products.append(product_info)
                        all_pokemon_products.append(product_info)
                print(f"Pokemon products found: {len(pokemon_products)}")
                for i, prod in enumerate(pokemon_products, 1):
                    print(f"  {i}. {prod['title']}")
                    print(f"     SKU: {prod['sku']}, UPC: {prod['upc']}")
                    print(f"     Price: ${prod['price']}, In Stock: {prod['in_stock']}")
                    print(f"     URL: {prod['url']}")
                    # Check if this is our test product
                    if prod['sku'] == '41936301':
                        print(f"     🎯 THIS IS OUR TEST PRODUCT!")
                    print()
            elif response.status_code == 401:
                print("❌ Authentication failed - token may be expired")
                print("Response:", response.text)
                return None
            else:
                print(f"❌ API call failed: {response.status_code}")
                print("Response:", response.text[:500])
        except Exception as e:
            print(f"❌ Error: {e}")
        print("="*60)
        print()
    # Save results
    if all_pokemon_products:
        # Remove duplicates based on SKU
        unique_products = {prod['sku']: prod for prod in all_pokemon_products}.values()
        unique_products = list(unique_products)
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        filename = f'pokemon_tcg_api_results_{timestamp}.json'
        with open(filename, 'w') as f:
            json.dump(unique_products, f, indent=2)
        print(f"🎉 SUCCESS!")
        print(f"Found {len(unique_products)} unique Pokemon TCG products")
        print(f"Saved to: {filename}")
        return unique_products
    return None
 def main():
    print("Pokemon Discovery - API Endpoint Test")
    print("="*60)
    # First try to get a fresh token
    print("Attempting to get fresh authentication token...")
    fresh_token = get_auth_token()
    if fresh_token:
        print(f"✅ Got fresh token: {fresh_token[:50]}...")
    else:
        print("⚠️  Could not get fresh token, using HAR token")
    print()
    # Test API with existing token from HAR
    products = test_api_with_existing_token()
    if products:
        print()
        print("🚀 READY FOR INTEGRATION!")
        print("The API endpoint is working and can be integrated into Pokemon Discovery")
        print()
        # Check if our known product is in the results
        known_sku = '41936301'
        known_product = next((p for p in products if p['sku'] == known_sku), None)
        if known_product:
            print(f"✅ Confirmed: Our test product (SKU {known_sku}) was found via API!")
            print(f"   Title: {known_product['title']}")
            print(f"   URL: {known_product['url']}")
            print(f"   Stock: {known_product['in_stock']}")
    else:
        print("❌ API test failed - may need fresh authentication")
 if __name__ == "__main__":
    main()
--- a/test_barcode.py
+++ b/test_barcode.py
@@ -1,55 +0,0 @@
 #!/usr/bin/env python3
 """
 Test script to verify barcode generation functionality
 """
 import sys
 import os
 from pathlib import Path
 # Add current directory to path if running in venv
 sys.path.insert(0, '.')
 try:
    import barcode
    from barcode.writer import ImageWriter
    print("✓ Barcode generation libraries are available")
    # Test barcode generation
    test_sku = "123456789012"
    upc_generator = barcode.get_barcode_class('upca')
    test_barcode = upc_generator("12345678901", writer=ImageWriter())
    # Create test output directory
    test_dir = Path("test_output")
    test_dir.mkdir(exist_ok=True)
    # Generate test barcode
    barcode_path = test_dir / "test_barcode"
    test_barcode.save(str(barcode_path), options={
        'module_width': 0.2,
        'module_height': 15.0,
        'quiet_zone': 6.5,
        'font_size': 10,
        'text_distance': 5.0,
        'background': 'white',
        'foreground': 'black'
    })
    final_path = f"{barcode_path}.png"
    if os.path.exists(final_path):
        print(f"✓ Test barcode generated successfully: {final_path}")
        print(f"  File size: {os.path.getsize(final_path)} bytes")
    else:
        print(f"✗ Failed to generate test barcode")
        sys.exit(1)
 except ImportError as e:
    print(f"✗ Missing barcode library: {e}")
    sys.exit(1)
 except Exception as e:
    print(f"✗ Barcode generation failed: {e}")
    sys.exit(1)
 print("✓ All barcode generation tests passed!")
--- a/test_brave.py
+++ b/test_brave.py
@@ -1,67 +0,0 @@
 #!/usr/bin/env python3
 """
 Test Brave browser integration with Pokemon Discovery
 """
 import sys
 import os
 try:
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.chrome.service import Service
    from webdriver_manager.chrome import ChromeDriverManager
    print("✓ Selenium and webdriver-manager are available")
    # Check if Brave is available
    if not os.path.exists('/usr/bin/brave'):
        print("✗ Brave browser not found at /usr/bin/brave")
        sys.exit(1)
    print("✓ Brave browser found at /usr/bin/brave")
    # Get Brave version
    import subprocess
    try:
        result = subprocess.run(['/usr/bin/brave', '--version'], 
                              capture_output=True, text=True, timeout=5)
        brave_version = result.stdout.strip()
        print(f"✓ {brave_version}")
    except:
        print("⚠ Could not get Brave version")
    # Test ChromeDriver compatibility
    print("\nTesting ChromeDriver compatibility...")
    options = Options()
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.binary_location = '/usr/bin/brave'
    try:
        service = Service(ChromeDriverManager().install())
        driver = webdriver.Chrome(service=service, options=options)
        # Simple test page
        driver.get("data:text/html,<html><body><h1>Test</h1></body></html>")
        title = driver.title
        driver.quit()
        print("✓ Brave + ChromeDriver test successful!")
        print("✓ Pokemon Discovery is ready to use Brave for dynamic content")
    except Exception as e:
        print(f"✗ ChromeDriver compatibility issue: {e}")
        print("\n💡 Solutions:")
        print("1. Update ChromeDriver: pip install --upgrade webdriver-manager")
        print("2. Install matching ChromeDriver version manually")
        print("3. Use Firefox with geckodriver as alternative")
        print("\nNote: The main PDF generation functionality works without browser automation")
 except ImportError as e:
    print(f"✗ Missing dependency: {e}")
    print("Run: pip install selenium webdriver-manager")
    sys.exit(1)
 print("\n🎯 Test completed!")
--- a/test_data.json
+++ b/test_data.json
@@ -1,26 +0,0 @@
 [
  {
    "title": "Pokemon Trading Card Game Battle Academy",
    "price": "$19.95",
    "stock": "In Stock",
    "sku": "DG12345678",
    "image_url": "https://via.placeholder.com/300x200?text=Pokemon+Battle+Academy",
    "url": "https://www.dollargeneral.com/p/pokemon-battle-academy"
  },
  {
    "title": "Pokemon TCG Scarlet & Violet Booster Pack",
    "price": "$4.25",
    "stock": "In Stock", 
    "sku": "DG87654321",
    "image_url": "https://via.placeholder.com/300x200?text=Pokemon+Booster+Pack",
    "url": "https://www.dollargeneral.com/p/pokemon-scarlet-violet-booster"
  },
  {
    "title": "Pokemon Tin Collection Box",
    "price": "$12.95",
    "stock": "Low Stock",
    "sku": "DG11223344",
    "image_url": "https://via.placeholder.com/300x200?text=Pokemon+Tin+Box",
    "url": "https://www.dollargeneral.com/p/pokemon-tin-collection"
  }
 ]
--- a/test_dynamic_scraping.py
+++ b/test_dynamic_scraping.py
@@ -1,152 +0,0 @@
 #!/usr/bin/env python3
 """
 Test dynamic content loading for Pokemon Discovery
 """
 import requests
 import json
 from bs4 import BeautifulSoup
 import time
 def test_api_endpoints():
    """Try to find API endpoints that might return product data"""
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
        'Accept': 'application/json, text/plain, */*',
        'Accept-Language': 'en-US,en;q=0.9',
        'Referer': 'https://www.dollargeneral.com/c/toys/pokemon'
    }
    # Test potential API endpoints
    api_tests = [
        'https://www.dollargeneral.com/api/products/search?q=pokemon',
        'https://www.dollargeneral.com/api/v1/products?category=toys&query=pokemon',
        'https://www.dollargeneral.com/dg/search?q=pokemon&category=toys',
        'https://www.dollargeneral.com/api/search?term=pokemon+trading+card',
    ]
    print("=== Testing API Endpoints ===")
    for url in api_tests:
        try:
            print(f"Testing: {url}")
            response = requests.get(url, headers=headers, timeout=10)
            print(f"  Status: {response.status_code}")
            if response.status_code == 200:
                try:
                    data = response.json()
                    print(f"  JSON Response: {len(str(data))} characters")
                    if 'products' in str(data).lower():
                        print("  ✓ Contains 'products'")
                    if 'pokemon' in str(data).lower():
                        print("  ✓ Contains 'pokemon'")
                except:
                    print(f"  Text Response: {len(response.text)} characters")
            print()
        except Exception as e:
            print(f"  Error: {e}")
            print()
 def test_network_requests():
    """Analyze the search page to find AJAX calls"""
    url = 'https://www.dollargeneral.com/c/toys/pokemon?q=&soldAtStore=true'
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
    }
    print("=== Analyzing Search Page for API Calls ===")
    try:
        response = requests.get(url, headers=headers, timeout=30)
        soup = BeautifulSoup(response.text, 'html.parser')
        # Look for API endpoints in JavaScript
        scripts = soup.find_all('script')
        api_patterns = []
        for script in scripts:
            if script.string:
                content = script.string
                # Look for API endpoints
                import re
                patterns = [
                    r'(?:api|Api|API)["\'\s]*[:=]["\'\s]*([^"\']+)',
                    r'(?:endpoint|url|baseURL)["\'\s]*[:=]["\'\s]*([^"\']+)',
                    r'fetch\s*\(\s*["\']([^"\']+)["\']',
                    r'xhr\.open\s*\(\s*["\'][^"\']*["\'],\s*["\']([^"\']+)["\']',
                    r'/api/[^"\'\\s]+',
                    r'/search[^"\'\\s]*',
                ]
                for pattern in patterns:
                    matches = re.findall(pattern, content, re.IGNORECASE)
                    for match in matches:
                        if 'dollargeneral' in match or match.startswith('/'):
                            api_patterns.append(match)
        # Remove duplicates and clean up
        unique_apis = list(set(api_patterns))
        print(f"Found {len(unique_apis)} potential API endpoints:")
        for api in unique_apis[:10]:  # Show first 10
            print(f"  -> {api}")
        return unique_apis
    except Exception as e:
        print(f"Error analyzing page: {e}")
        return []
 def test_sitemap_approach():
    """Try to find products via sitemap"""
    print("=== Testing Sitemap Approach ===")
    sitemap_urls = [
        'https://www.dollargeneral.com/sitemap.xml',
        'https://www.dollargeneral.com/robots.txt'
    ]
    for url in sitemap_urls:
        try:
            print(f"Testing: {url}")
            response = requests.get(url, timeout=10)
            print(f"  Status: {response.status_code}")
            if response.status_code == 200:
                content = response.text
                if 'pokemon' in content.lower():
                    print("  ✓ Contains Pokemon references")
                if '/p/' in content:
                    print("  ✓ Contains product URLs (/p/)")
                print(f"  Content length: {len(content)} characters")
            print()
        except Exception as e:
            print(f"  Error: {e}")
            print()
 if __name__ == "__main__":
    print("Pokemon Discovery - Dynamic Content Testing")
    print("=" * 60)
    print()
    # Test various approaches to find products
    test_api_endpoints()
    print()
    apis = test_network_requests()
    print()
    test_sitemap_approach()
    print()
    print("=" * 60)
    print("Summary:")
    print("- Individual product extraction: ✅ WORKING")
    print("- Product URLs can be processed if found")
    print("- Main challenge: Finding product URLs from search page")
    print("- Dynamic content requires browser automation or API discovery")
--- a/test_real_products.py
+++ b/test_real_products.py
@@ -1,165 +0,0 @@
 #!/usr/bin/env python3
 """
 Test Pokemon Discovery with real Dollar General Pokemon products
 Demonstrates full working pipeline with known products
 """
 import json
 import sys
 import os
 from datetime import datetime
 # Add current directory to path
 sys.path.insert(0, '.')
 from scraper import PokemonTCGScraper
 from pdf_generator import PokemonTCGCatalogGenerator
 def test_known_products():
    """Test with known Pokemon TCG products from Dollar General"""
    # Known Pokemon TCG products (you can add more as you find them)
    known_products = [
        'https://www.dollargeneral.com/p/pok-mon-trading-card-game-card-pack-ct/728192558375',
        # Add more product URLs here as they're discovered
    ]
    print("Pokemon Discovery - Real Product Test")
    print("=" * 50)
    print(f"Testing with {len(known_products)} known products")
    print()
    scraper = PokemonTCGScraper()
    products_found = []
    for i, url in enumerate(known_products, 1):
        print(f"Testing product {i}/{len(known_products)}")
        print(f"URL: {url}")
        # Get product page
        html = scraper.get_page_content(url)
        if html:
            # Extract product information
            product = scraper.extract_product_info(url, html)
            # Check if it's a Pokemon TCG product
            if scraper.is_pokemon_tcg_product(product):
                products_found.append(product)
                print(f"✓ FOUND: {product.get('title', 'Unknown')}")
                print(f"  SKU: {product.get('sku', 'N/A')}")
                print(f"  Price: {product.get('price', 'N/A')}")
                # Try to get additional data we might have missed
                if not product.get('price'):
                    print("  (Attempting to find price...)")
                    from bs4 import BeautifulSoup
                    soup = BeautifulSoup(html, 'html.parser')
                    # More price selectors
                    price_selectors = ['[data-testid="price"]', '.price-display', '.current-price', '[class*="price"]']
                    for selector in price_selectors:
                        price_elem = soup.select_one(selector)
                        if price_elem and not product.get('price'):
                            price_text = price_elem.get_text().strip()
                            if '$' in price_text:
                                product['price'] = price_text
                                print(f"  Found price: {price_text}")
                                break
                # Try to get stock info
                if not product.get('stock'):
                    print("  (Attempting to find stock status...)")
                    from bs4 import BeautifulSoup
                    soup = BeautifulSoup(html, 'html.parser')
                    # Look for stock indicators
                    if 'in stock' in html.lower():
                        product['stock'] = 'In Stock'
                    elif 'out of stock' in html.lower():
                        product['stock'] = 'Out of Stock'
                    elif 'available' in html.lower():
                        product['stock'] = 'Available'
                    else:
                        product['stock'] = 'Unknown'
                    print(f"  Stock: {product.get('stock')}")
            else:
                print("✗ Not a Pokemon TCG product")
        else:
            print("✗ Failed to get product page")
        print()
    if products_found:
        print(f"SUCCESS! Found {len(products_found)} Pokemon TCG products")
        print()
        # Save to JSON file
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        json_file = f'pokemon_tcg_products_real_{timestamp}.json'
        with open(json_file, 'w') as f:
            json.dump(products_found, f, indent=2)
        print(f"✓ Saved product data: {json_file}")
        # Generate PDF catalog
        print("✓ Generating PDF catalog...")
        try:
            generator = PokemonTCGCatalogGenerator(json_file)
            pdf_file = generator.generate_pdf()
            if pdf_file:
                print(f"✓ PDF catalog generated: {pdf_file}")
                # Show file sizes
                import os
                if os.path.exists(pdf_file):
                    size = os.path.getsize(pdf_file) / 1024
                    print(f"  PDF size: {size:.1f} KB")
                # Count barcodes generated
                barcode_dir = generator.barcodes_dir
                if barcode_dir.exists():
                    barcodes = list(barcode_dir.glob('*.png'))
                    print(f"  Barcodes generated: {len(barcodes)}")
                print()
                print("🎉 COMPLETE SUCCESS!")
                print("Pokemon Discovery successfully:")
                print(f"  • Scraped {len(products_found)} real products from Dollar General")
                print("  • Generated professional PDF catalog")
                print("  • Created scannable UPC-A barcodes")
                print("  • Used Unix-friendly timestamped files")
                return True
        except Exception as e:
            print(f"Error generating PDF: {e}")
            print("But product scraping was successful!")
            return True
    else:
        print("No Pokemon TCG products found.")
        print()
        print("This could be due to:")
        print("- Products no longer available")
        print("- Changed product URLs")
        print("- Need to find more current product URLs")
        return False
 if __name__ == "__main__":
    success = test_known_products()
    print()
    print("=" * 50)
    if success:
        print("✅ Pokemon Discovery is fully functional!")
        print("   Ready for production use with product URLs")
    else:
        print("⚠️  Product URL discovery needed")
        print("   Core functionality confirmed working")
    print("=" * 50)
Author	SHA1	Message	Date
pi-bot-01	0c7e139245	Clean up: remove obsolete files, update docs and docstrings Removed 20 files: old test scripts, debug tools, duplicate docs, generated JSON, old PDF generator, launcher scripts. Kept: disco.py — main tool (scrape HAR + generate PDF) scraper.py — reference site scraper (HTML + Selenium/Brave) requirements.txt *.har — browser capture with API data Updated: README.md — rewritten to reflect current tool and usage .gitignore — simplified scraper.py — module/class/method docstrings updated to clarify this is a reference implementation, disco.py is primary	2026-03-21 23:28:52 -07:00
pi-bot-01	90661e1957	Move all text above image: title, stock/price, SKU/UPC then picture then barcode	2026-03-21 23:19:07 -07:00
pi-bot-01	4b91ac5812	Fix UPC barcode: use first 11 digits, not last 11 digits[-11:] was dropping the first digit of 12-digit UPCs. digits[:11] correctly passes the first 11 digits to the barcode library, which calculates the matching check digit. 728192558375 now encodes correctly (was 2819255837X before).	2026-03-21 23:16:42 -07:00
pi-bot-01	dddfbe7355	Title above image, manifest table on first page Page 1 (Manifest): - Header with title, source, date, count - Table listing all products: #, name, SKU, price, stock qty Product pages: - Title (bold, top) - Product image (bordered, centered) - Stock + price - UPC-A barcode (bordered, centered) - SKU / UPC text	2026-03-21 23:14:12 -07:00
pi-bot-01	ecc026d07b	Use UPC (not SKU) for barcode generation UPC-A barcodes should encode the Universal Product Code, not the internal store SKU. The UPCs are already 12-digit numbers that match the barcodes on the physical product packaging.	2026-03-21 23:11:38 -07:00
pi-bot-01	f71df3f558	Fix SKU conversion: rootSV base + '01', not base + variant rootSV '0419363_1' was producing '4193631' (wrong) Now correctly produces '41936301' (confirmed by user) The '_N' suffix is a variant/image index, not part of the SKU. Pattern: strip leading zero from base, append '01'.	2026-03-21 23:06:05 -07:00
pi-bot-01	c0ec0f947b	Match product.png layout: image, name, stock, barcode, SKU/UPC - Switched from pandoc markdown to direct LaTeX for precise layout control - Each product gets its own page matching the mockup: • Large bordered product image (centered) • Product name (bold, left) • Stock + price line • Bordered UPC-A barcode (centered) • SKU and UPC text (small, left) - Fixed WebP→PNG image conversion (DG CDN serves WebP as .jpg) - Compile directly with pdflatex (pandoc strips images from raw .tex) - Output: 5.6MB PDF, 7 pages, 6 products with real images and barcodes	2026-03-21 22:59:29 -07:00
pi-bot-01	e9efcf1460	Add disco.py: single working script that finds all pack/tin products and generates PDF Extracts all 12 Pokemon products from HAR API responses, filters to 6 card pack and tin products, downloads product images, generates UPC-A barcodes, and produces a 157KB PDF catalog. Products found: 1. Pokémon Trading Card Game, 15 Card Pack (In Stock) 2. Pokémon TCG Booster Pack with Promo Card & Coin 3. Pokemon Trading Card Game Sword & Shield Booster Pack 4. Pokémon Collectible Stacking Tin 5. Pokémon Trading Card Game Mini Tin 6. Pokémon Trading Card Game, Gardevoir Strong Bond Tin	2026-03-21 16:12:14 -07:00
pi-bot-01	12448a09a0	🔍 Debug: Why only one product found - Dynamic loading analysis ✅ MYSTERY SOLVED: Pokemon page loads but products are dynamic! 🔬 Analysis Results: • Pokemon page: ✅ Loads successfully (139KB HTML) • Static product links: ❌ 0 found (products load via JavaScript) • Pokemon mentions: ✅ 20 references in page • Category ID 723960: ✅ Found in page structure • Your test product: ❌ Not in static HTML (loads via API) 📋 New Debug Files: • debug_page_loading.py - Technical analysis of page loading • WHY_ONLY_ONE_PRODUCT.md - Complete explanation with solutions • pokemon_page_sample.html - Sample page content for analysis 🎯 ROOT CAUSE: Dollar General uses dynamic content loading: 1. Page loads basic HTML structure 2. JavaScript makes API calls to get products 3. API returns 4-12 Pokemon products as JSON 4. Products rendered into DOM after page load 5. Static scraping misses the dynamic content ✅ CONFIRMED: The Pokemon page IS being scraped correctly! ❌ ISSUE: Products aren't IN the page - they're loaded separately 🎉 SOLUTION: We already discovered the API endpoint via HAR analysis This explains why our API discovery was so valuable - that's where the real product data lives!	2026-03-21 15:39:48 -07:00