Initial commit: Pokemon Discovery - TCG product scraper and PDF catalog generator

- Comprehensive scraper for Dollar General Pokemon TCG products - Professional PDF catalog generator with UPC-A barcodes - Robust anti-bot handling with requests + Selenium fallback - Automatic image downloading and barcode generation - Unix-friendly timestamped filenames - Virtual environment support and dependency management - Complete documentation and usage guides
2026-03-21 14:41:17 -07:00
commit e6dd999aeb
9 changed files with 1200 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,37 @@
+# Virtual environment
+venv/
+env/
+.env
+
+# Python cache
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+.Python
+*.so
+.pytest_cache/
+
+# Output files
+*.json
+catalog_output/
+test_output/
+
+# Logs
+*.log
+
+# OS files
+.DS_Store
+Thumbs.db
+.directory
+
+# IDE files
+.vscode/
+.idea/
+*.swp
+*.swo
+
+# Temporary files
+*.tmp
+*.temp
+.cache/
--- a/README.md
+++ b/README.md
@@ -0,0 +1,208 @@
+# Pokemon Discovery (pokemon-disco)
+
+A comprehensive tool for discovering Pokemon Trading Card Game products from Dollar General's website and generating a professional PDF catalog with product images, details, and UPC-A barcodes.
+
+## Features
+
+- **Web Scraping**: Automatically scrapes Pokemon TCG products from Dollar General
+- **Robust Data Extraction**: Extracts product name, price, stock status, SKU, and images
+- **Anti-Bot Handling**: Uses both requests and Selenium for dynamic content
+- **Barcode Generation**: Creates UPC-A barcodes for each product SKU
+- **PDF Catalog**: Professional PDF with images, details, and barcodes
+- **Unix-Friendly Naming**: Timestamped filenames for easy sorting
+
+## Requirements
+
+### System Requirements
+- Python 3.7+
+- pandoc (for PDF generation)
+- Chrome/Chromium browser (for Selenium fallback)
+
+### Python Dependencies
+All dependencies are automatically installed via `requirements.txt`:
+- requests
+- beautifulsoup4
+- selenium
+- webdriver-manager
+- python-barcode
+- Pillow
+- pandas
+- lxml
+
+## Installation
+
+1. **Clone/Download** this directory to your system
+
+2. **Install pandoc** (required for PDF generation):
+   ```bash
+   # Ubuntu/Debian
+   sudo apt install pandoc
+   
+   # macOS
+   brew install pandoc
+   
+   # Arch Linux
+   sudo pacman -S pandoc
+   ```
+
+3. **Install Python dependencies** (automatically done by the script):
+   ```bash
+   cd pokemon-disco
+   pip3 install -r requirements.txt
+   ```
+
+## Usage
+
+### Quick Start (Recommended)
+
+Run the complete pipeline with one command:
+
+```bash
+cd pokemon-disco
+python3 run_scraper.py
+```
+
+This will:
+1. Check and install Python requirements
+2. Scrape Pokemon TCG products from Dollar General
+3. Generate a PDF catalog with images and barcodes
+4. Create timestamped files for easy organization
+
+### Manual Usage
+
+If you prefer to run components separately:
+
+#### 1. Scrape Products
+```bash
+python3 scraper.py
+```
+This creates a JSON file like `pokemon_tcg_products_20241221_143025.json`
+
+#### 2. Generate PDF Catalog
+```bash
+python3 pdf_generator.py pokemon_tcg_products_20241221_143025.json
+```
+
+## Output Files
+
+### Generated Files
+- **JSON Data**: `pokemon_tcg_products_YYYYMMDD_HHMMSS.json`
+  - Raw scraped data in JSON format
+  - Contains all product information
+
+- **PDF Catalog**: `catalog_output/pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf`
+  - Professional PDF catalog
+  - Includes product images, details, and UPC-A barcodes
+
+### Output Directory Structure
+```
+pokemon-disco/
+├── pokemon_tcg_products_YYYYMMDD_HHMMSS.json
+├── catalog_output/
+│   ├── pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf
+│   ├── pokemon_tcg_catalog_YYYYMMDD_HHMMSS.md
+│   ├── images/
+│   │   ├── product_1_SKU123.jpg
+│   │   ├── product_2_SKU456.jpg
+│   │   └── placeholder.png
+│   └── barcodes/
+│       ├── barcode_SKU123.png
+│       ├── barcode_SKU456.png
+│       └── ...
+```
+
+## PDF Catalog Features
+
+Each product in the PDF includes:
+- **Product Image**: Downloaded from Dollar General or placeholder
+- **Product Details Table**:
+  - Title
+  - Price
+  - Stock Status
+  - SKU (formatted as code)
+  - Product URL
+- **UPC-A Barcode**: Generated from SKU for inventory management
+
+## Data Fields Extracted
+
+For each Pokemon TCG product:
+- `title`: Product name
+- `price`: Current price
+- `stock`: Availability status
+- `sku`: Product SKU/item number
+- `image_url`: Direct link to product image
+- `url`: Link to product page
+
+## Troubleshooting
+
+### Common Issues
+
+1. **No products found**
+   - Dollar General may have anti-bot protection
+   - The script will automatically retry with Selenium
+   - Website structure may have changed
+
+2. **PDF generation fails**
+   - Ensure pandoc is installed: `pandoc --version`
+   - Try alternative LaTeX engines if available
+   - Markdown file is still generated for manual conversion
+
+3. **Image download failures**
+   - Network connectivity issues
+   - Placeholder images will be used automatically
+
+4. **Chrome/Selenium issues**
+   - Ensure Chrome or Chromium is installed
+   - webdriver-manager will automatically download ChromeDriver
+   - Script falls back to requests-only mode if Selenium fails
+
+### Debug Mode
+
+To see more detailed output, check the console output during scraping. The scripts provide detailed logging of:
+- Which products are found and filtered
+- Network request status
+- File generation progress
+
+## Technical Details
+
+### Scraping Strategy
+1. **Primary Method**: Uses requests with browser-like headers
+2. **Fallback Method**: Selenium with headless Chrome for dynamic content
+3. **Product Filtering**: Only includes products matching Pokemon TCG keywords
+4. **Rate Limiting**: 1-second delay between requests to be respectful
+
+### Barcode Generation
+- Converts SKUs to 11-digit numeric format
+- Generates UPC-A barcodes with check digits
+- High-quality PNG images suitable for printing
+
+### PDF Generation
+- Uses pandoc with LaTeX for professional formatting
+- Includes table of contents
+- Optimized for printing and digital viewing
+- Images scaled appropriately for page layout
+
+## Customization
+
+### Modifying Product Filters
+Edit the `is_pokemon_tcg_product()` method in `scraper.py` to change which products are included.
+
+### Changing PDF Layout
+Modify the markdown generation in `pdf_generator.py` or add custom pandoc templates.
+
+### Adding New Data Fields
+Extend the `extract_product_info()` method in `scraper.py` to capture additional product information.
+
+## License
+
+This tool is for educational and personal use. Please respect Dollar General's terms of service and robots.txt when using this scraper.
+
+## Support
+
+If you encounter issues:
+1. Check the console output for error messages
+2. Ensure all system requirements are installed
+3. Verify internet connectivity
+4. Check if the Dollar General website structure has changed
+
+Generated files include timestamps for easy organization and version tracking.
--- a/USAGE.md
+++ b/USAGE.md
@@ -0,0 +1,115 @@
+# Quick Start Guide
+
+## Simple Usage (Recommended)
+
+1. **Make sure you're in the project directory:**
+   ```bash
+   cd pokemon-disco
+   ```
+
+2. **Run the complete scraper and PDF generator:**
+   ```bash
+   ./run.sh
+   ```
+   
+   This single command will:
+   - Set up the Python virtual environment
+   - Install all required packages
+   - Scrape Pokemon TCG products from Dollar General
+   - Generate a professional PDF catalog with barcodes
+   - Create timestamped files for easy organization
+
+## What You'll Get
+
+### Generated Files:
+- **`pokemon_tcg_products_YYYYMMDD_HHMMSS.json`** - Raw data in JSON format
+- **`catalog_output/pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf`** - Professional PDF catalog
+
+### PDF Catalog Contents:
+- Product images (downloaded automatically)
+- Product details (title, price, stock, SKU)
+- UPC-A barcodes for each product (generated from SKU)
+- Table of contents for easy navigation
+- Professional formatting suitable for printing
+
+## Alternative Commands
+
+If you prefer more control:
+
+```bash
+# Activate virtual environment first
+source venv/bin/activate
+
+# Run only the scraper
+python scraper.py
+
+# Run only the PDF generator (after scraping)
+python pdf_generator.py pokemon_tcg_products_YYYYMMDD_HHMMSS.json
+
+# Run everything (installs requirements automatically)
+python run_scraper.py
+```
+
+## Output Location
+
+All generated files will be in:
+- JSON data: Current directory
+- PDF catalog: `catalog_output/` directory
+- Product images: `catalog_output/images/`
+- Barcode images: `catalog_output/barcodes/`
+
+## Requirements
+
+- Python 3.7+
+- pandoc (for PDF generation)
+- Internet connection (for scraping)
+
+The script will automatically handle Python dependencies via virtual environment.
+
+## Troubleshooting
+
+If you encounter issues:
+
+1. **Permission denied:** Make sure the script is executable:
+   ```bash
+   chmod +x run.sh
+   ```
+
+2. **Pandoc not found:** Install pandoc for your system:
+   ```bash
+   # Ubuntu/Debian
+   sudo apt install pandoc
+   
+   # Arch Linux  
+   sudo pacman -S pandoc
+   
+   # macOS
+   brew install pandoc
+   ```
+
+3. **No products found:** The website may have anti-bot protection or changed structure. The script includes fallback mechanisms.
+
+4. **PDF generation fails:** The markdown file will still be generated, which you can manually convert or view.
+
+## File Naming Convention
+
+All output files include Unix-friendly timestamps:
+- Format: `YYYYMMDD_HHMMSS` (e.g., `20241221_143025`)
+- This ensures chronological sorting with `ls` command
+- No spaces or special characters for script-friendly handling
+
+## Example Output
+
+```
+pokemon-disco/
+├── pokemon_tcg_products_20241221_143025.json     # Scraped data
+├── catalog_output/
+│   ├── pokemon_tcg_catalog_20241221_143025.pdf   # Final catalog
+│   ├── pokemon_tcg_catalog_20241221_143025.md    # Markdown source
+│   ├── images/
+│   │   ├── product_1_SKU123456.jpg               # Product images
+│   │   └── product_2_SKU789012.jpg
+│   └── barcodes/
+│       ├── barcode_SKU123456.png                 # UPC-A barcodes
+│       └── barcode_SKU789012.png
+```
--- a/pdf_generator.py
+++ b/pdf_generator.py
@@ -0,0 +1,278 @@
+#!/usr/bin/env python3
+"""
+Pokemon Discovery - TCG Product Catalog PDF Generator
+Generates PDF catalog with product images, details, and UPC-A barcodes
+"""
+
+import json
+import os
+import sys
+import requests
+import subprocess
+from datetime import datetime
+from pathlib import Path
+import barcode
+from barcode.writer import ImageWriter
+from PIL import Image, ImageDraw, ImageFont
+import tempfile
+import shutil
+
+class PokemonTCGCatalogGenerator:
+    def __init__(self, json_file):
+        self.json_file = json_file
+        self.output_dir = Path("catalog_output")
+        self.images_dir = self.output_dir / "images"
+        self.barcodes_dir = self.output_dir / "barcodes"
+        
+        # Create output directories
+        self.output_dir.mkdir(exist_ok=True)
+        self.images_dir.mkdir(exist_ok=True)
+        self.barcodes_dir.mkdir(exist_ok=True)
+        
+        # Load product data
+        with open(json_file, 'r') as f:
+            self.products = json.load(f)
+    
+    def download_image(self, url, filename):
+        """Download product image"""
+        if not url:
+            return None
+            
+        try:
+            response = requests.get(url, timeout=30)
+            response.raise_for_status()
+            
+            filepath = self.images_dir / filename
+            with open(filepath, 'wb') as f:
+                f.write(response.content)
+            
+            return filepath
+        except Exception as e:
+            print(f"Failed to download image {url}: {e}")
+            return None
+    
+    def generate_upc_barcode(self, sku):
+        """Generate UPC-A barcode from SKU"""
+        try:
+            # Convert SKU to 12-digit UPC-A format
+            # Remove non-digits and pad/truncate to 11 digits (12th is check digit)
+            digits_only = ''.join(filter(str.isdigit, str(sku)))
+            
+            if len(digits_only) < 11:
+                # Pad with zeros at the start
+                upc_base = digits_only.zfill(11)
+            else:
+                # Take the last 11 digits
+                upc_base = digits_only[-11:]
+            
+            # Generate UPC-A barcode
+            upc_generator = barcode.get_barcode_class('upca')
+            upc = upc_generator(upc_base, writer=ImageWriter())
+            
+            # Save barcode image
+            barcode_filename = f"barcode_{sku.replace('/', '_').replace(' ', '_')}.png"
+            barcode_path = self.barcodes_dir / barcode_filename
+            
+            # Save with specific options for better appearance
+            upc.save(str(barcode_path).replace('.png', ''), options={
+                'module_width': 0.2,
+                'module_height': 15.0,
+                'quiet_zone': 6.5,
+                'font_size': 10,
+                'text_distance': 5.0,
+                'background': 'white',
+                'foreground': 'black'
+            })
+            
+            return f"{barcode_path}.png"
+            
+        except Exception as e:
+            print(f"Failed to generate barcode for SKU {sku}: {e}")
+            return None
+    
+    def create_placeholder_image(self, width=300, height=200):
+        """Create a placeholder image when product image is not available"""
+        img = Image.new('RGB', (width, height), color='lightgray')
+        draw = ImageDraw.Draw(img)
+        
+        try:
+            # Try to use a system font
+            font = ImageFont.truetype('/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf', 24)
+        except:
+            try:
+                font = ImageFont.truetype('arial.ttf', 24)
+            except:
+                font = ImageFont.load_default()
+        
+        text = "No Image\nAvailable"
+        
+        # Get text bounding box for centering
+        lines = text.split('\n')
+        y_offset = height // 2 - (len(lines) * 30) // 2
+        
+        for line in lines:
+            bbox = draw.textbbox((0, 0), line, font=font)
+            text_width = bbox[2] - bbox[0]
+            x_offset = (width - text_width) // 2
+            draw.text((x_offset, y_offset), line, fill='darkgray', font=font)
+            y_offset += 30
+        
+        placeholder_path = self.images_dir / "placeholder.png"
+        img.save(placeholder_path)
+        return placeholder_path
+    
+    def generate_markdown(self):
+        """Generate markdown content for the catalog"""
+        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+        markdown = f"""---
+title: "Pokemon TCG Product Catalog"
+subtitle: "Dollar General - Generated {timestamp}"
+author: "Automated Scraper"
+date: "{timestamp}"
+geometry: margin=1in
+fontsize: 11pt
+documentclass: article
+---
+
+# Pokemon TCG Product Catalog
+
+Generated on: {timestamp}  
+Source: Dollar General  
+Total Products: {len(self.products)}
+
+---
+
+"""
+        
+        for i, product in enumerate(self.products, 1):
+            print(f"Processing product {i}/{len(self.products)}: {product.get('title', 'Unknown')}")
+            
+            # Download product image
+            image_path = None
+            if product.get('image_url'):
+                filename = f"product_{i}_{product.get('sku', 'unknown').replace('/', '_').replace(' ', '_')}.jpg"
+                image_path = self.download_image(product.get('image_url'), filename)
+            
+            if not image_path:
+                # Use placeholder
+                image_path = self.create_placeholder_image()
+            
+            # Generate barcode
+            barcode_path = None
+            if product.get('sku'):
+                barcode_path = self.generate_upc_barcode(product.get('sku'))
+            
+            # Add product section to markdown
+            markdown += f"## {i}. {product.get('title', 'Unknown Product')}\n\n"
+            
+            # Product image
+            if image_path:
+                rel_image_path = os.path.relpath(image_path, self.output_dir)
+                markdown += f"![Product Image]({rel_image_path}){{width=300px}}\n\n"
+            
+            # Product details in a table
+            markdown += "| Field | Value |\n"
+            markdown += "|-------|-------|\n"
+            markdown += f"| **Title** | {product.get('title', 'N/A')} |\n"
+            markdown += f"| **Price** | {product.get('price', 'N/A')} |\n"
+            markdown += f"| **Stock** | {product.get('stock', 'N/A')} |\n"
+            markdown += f"| **SKU** | `{product.get('sku', 'N/A')}` |\n"
+            markdown += f"| **URL** | {product.get('url', 'N/A')} |\n"
+            markdown += "\n"
+            
+            # Barcode
+            if barcode_path:
+                rel_barcode_path = os.path.relpath(barcode_path, self.output_dir)
+                markdown += f"**UPC-A Barcode:**\n\n"
+                markdown += f"![UPC-A Barcode]({rel_barcode_path}){{width=200px}}\n\n"
+            
+            markdown += "---\n\n"
+        
+        return markdown
+    
+    def generate_pdf(self):
+        """Generate PDF catalog using pandoc"""
+        print("Generating markdown content...")
+        markdown_content = self.generate_markdown()
+        
+        # Save markdown file
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        markdown_file = self.output_dir / f"pokemon_tcg_catalog_{timestamp}.md"
+        
+        with open(markdown_file, 'w', encoding='utf-8') as f:
+            f.write(markdown_content)
+        
+        print(f"Markdown saved to: {markdown_file}")
+        
+        # Generate PDF using pandoc
+        pdf_file = self.output_dir / f"pokemon_tcg_catalog_{timestamp}.pdf"
+        
+        print("Converting to PDF using pandoc...")
+        
+        try:
+            subprocess.run([
+                'pandoc',
+                str(markdown_file),
+                '-o', str(pdf_file),
+                '--pdf-engine=xelatex',
+                '-V', 'colorlinks=true',
+                '-V', 'linkcolor=blue',
+                '-V', 'filecolor=magenta',
+                '-V', 'urlcolor=cyan',
+                '--toc',
+                '--toc-depth=2'
+            ], check=True)
+            
+            print(f"PDF generated successfully: {pdf_file}")
+            return pdf_file
+            
+        except subprocess.CalledProcessError as e:
+            print(f"Pandoc conversion failed: {e}")
+            print("Trying with pdflatex instead...")
+            
+            try:
+                subprocess.run([
+                    'pandoc',
+                    str(markdown_file),
+                    '-o', str(pdf_file),
+                    '--pdf-engine=pdflatex',
+                    '--toc'
+                ], check=True)
+                
+                print(f"PDF generated successfully: {pdf_file}")
+                return pdf_file
+                
+            except subprocess.CalledProcessError as e2:
+                print(f"PDF generation failed with both engines: {e2}")
+                print(f"Markdown file available at: {markdown_file}")
+                return None
+        
+        except FileNotFoundError:
+            print("Error: pandoc not found. Please install pandoc to generate PDF.")
+            print(f"Markdown file available at: {markdown_file}")
+            return None
+
+def main():
+    if len(sys.argv) != 2:
+        print("Usage: python3 pdf_generator.py <json_file>")
+        print("Example: python3 pdf_generator.py pokemon_tcg_products_20241221_143025.json")
+        sys.exit(1)
+    
+    json_file = sys.argv[1]
+    
+    if not os.path.exists(json_file):
+        print(f"Error: JSON file '{json_file}' not found")
+        sys.exit(1)
+    
+    generator = PokemonTCGCatalogGenerator(json_file)
+    pdf_file = generator.generate_pdf()
+    
+    if pdf_file:
+        print(f"\nCatalog generation completed!")
+        print(f"PDF file: {pdf_file}")
+        print(f"Output directory: {generator.output_dir}")
+    else:
+        print(f"\nPDF generation failed, but markdown file is available in: {generator.output_dir}")
+
+if __name__ == "__main__":
+    main()
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1,8 @@
+requests
+beautifulsoup4
+selenium
+webdriver-manager
+python-barcode
+Pillow
+pandas
+lxml
--- a/run.sh
+++ b/run.sh
@@ -0,0 +1,31 @@
+#!/bin/bash
+# Pokemon Discovery - Scraper & Catalog Generator Launcher
+# Automatically activates virtual environment and runs the scraper
+
+set -e
+
+cd "$(dirname "$0")"
+
+echo "Pokemon Discovery - Product Scraper & Catalog Generator"
+echo "================================================"
+
+# Check if virtual environment exists
+if [[ ! -d "venv" ]]; then
+    echo "Creating virtual environment..."
+    python3 -m venv venv
+fi
+
+# Activate virtual environment
+source venv/bin/activate
+
+# Check if requirements are installed
+if ! python -c "import requests, bs4, barcode, selenium" 2>/dev/null; then
+    echo "Installing Python requirements..."
+    pip install -r requirements.txt
+fi
+
+# Run the main script
+python run_scraper.py
+
+echo ""
+echo "Script completed. Check the output above for results."
--- a/run_scraper.py
+++ b/run_scraper.py
@@ -0,0 +1,139 @@
+#!/usr/bin/env python3
+"""
+Pokemon Discovery - Scraper and Catalog Generator
+Main script that runs both scraping and PDF generation
+"""
+
+import os
+import sys
+import subprocess
+from datetime import datetime
+from pathlib import Path
+
+def install_requirements():
+    """Install Python requirements"""
+    print("Installing Python requirements...")
+    try:
+        subprocess.run([sys.executable, '-m', 'pip', 'install', '-r', 'requirements.txt'], 
+                      check=True)
+        print("Requirements installed successfully!")
+    except subprocess.CalledProcessError as e:
+        print(f"Failed to install requirements: {e}")
+        return False
+    return True
+
+def run_scraper():
+    """Run the scraper to collect product data"""
+    print("=" * 60)
+    print("STEP 1: SCRAPING POKEMON TCG PRODUCTS")
+    print("=" * 60)
+    
+    try:
+        result = subprocess.run([sys.executable, 'scraper.py'], 
+                               capture_output=True, text=True)
+        
+        if result.returncode == 0:
+            print("Scraping completed successfully!")
+            print(result.stdout)
+            
+            # Find the generated JSON file
+            json_files = list(Path('.').glob('pokemon_tcg_products_*.json'))
+            if json_files:
+                latest_file = max(json_files, key=os.path.getctime)
+                return str(latest_file)
+            else:
+                print("No JSON file was generated")
+                return None
+        else:
+            print("Scraping failed:")
+            print(result.stderr)
+            return None
+            
+    except Exception as e:
+        print(f"Error running scraper: {e}")
+        return None
+
+def run_pdf_generator(json_file):
+    """Run the PDF generator with the scraped data"""
+    print("=" * 60)
+    print("STEP 2: GENERATING PDF CATALOG")
+    print("=" * 60)
+    
+    try:
+        result = subprocess.run([sys.executable, 'pdf_generator.py', json_file], 
+                               capture_output=True, text=True)
+        
+        if result.returncode == 0:
+            print("PDF generation completed successfully!")
+            print(result.stdout)
+            return True
+        else:
+            print("PDF generation failed:")
+            print(result.stderr)
+            return False
+            
+    except Exception as e:
+        print(f"Error running PDF generator: {e}")
+        return False
+
+def main():
+    print("Pokemon Discovery - Product Scraper & Catalog Generator")
+    print("=" * 60)
+    print(f"Started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+    print()
+    
+    # Check if requirements are installed
+    try:
+        import requests, bs4, barcode, PIL
+        print("✓ Required packages are available")
+    except ImportError as e:
+        print(f"✗ Missing required package: {e}")
+        print("Installing requirements...")
+        if not install_requirements():
+            sys.exit(1)
+    
+    # Check if pandoc is available
+    try:
+        subprocess.run(['pandoc', '--version'], 
+                      capture_output=True, check=True)
+        print("✓ Pandoc is available for PDF generation")
+    except (subprocess.CalledProcessError, FileNotFoundError):
+        print("⚠ Pandoc not found. PDF generation may fail.")
+        print("  Install pandoc with: sudo apt install pandoc (Ubuntu/Debian)")
+        print("  or: brew install pandoc (macOS)")
+        print("  or: pacman -S pandoc (Arch Linux)")
+    
+    print()
+    
+    # Run scraper
+    json_file = run_scraper()
+    if not json_file:
+        print("Scraping failed. Exiting.")
+        sys.exit(1)
+    
+    # Run PDF generator
+    if run_pdf_generator(json_file):
+        print("=" * 60)
+        print("SUCCESS! Both scraping and PDF generation completed.")
+        print("=" * 60)
+        print(f"JSON data: {json_file}")
+        print("PDF catalog: Check the catalog_output/ directory")
+        print()
+        print("Files generated:")
+        
+        # List generated files
+        for file_pattern in ['pokemon_tcg_products_*.json', 'catalog_output/pokemon_tcg_catalog_*.pdf']:
+            files = list(Path('.').glob(file_pattern))
+            if files:
+                latest = max(files, key=os.path.getctime)
+                print(f"  - {latest}")
+    else:
+        print("=" * 60)
+        print("PARTIAL SUCCESS: Scraping completed, but PDF generation failed.")
+        print("=" * 60)
+        print(f"JSON data: {json_file}")
+        print("You can manually run the PDF generator with:")
+        print(f"  python3 pdf_generator.py {json_file}")
+
+if __name__ == "__main__":
+    main()
--- a/scraper.py
+++ b/scraper.py
@@ -0,0 +1,329 @@
+#!/usr/bin/env python3
+"""
+Pokemon Discovery - TCG Product Scraper for Dollar General
+Scrapes product information and saves to JSON for PDF generation
+"""
+
+import json
+import os
+import time
+import requests
+from datetime import datetime
+from urllib.parse import urljoin, urlparse
+import pandas as pd
+from bs4 import BeautifulSoup
+
+# Try selenium imports (fallback for dynamic content)
+try:
+    from selenium import webdriver
+    from selenium.webdriver.chrome.options import Options
+    from selenium.webdriver.common.by import By
+    from selenium.webdriver.support.ui import WebDriverWait
+    from selenium.webdriver.support import expected_conditions as EC
+    from selenium.common.exceptions import TimeoutException
+    from webdriver_manager.chrome import ChromeDriverManager
+    SELENIUM_AVAILABLE = True
+except ImportError:
+    SELENIUM_AVAILABLE = False
+    print("Selenium not available, using requests only")
+
+class PokemonTCGScraper:
+    def __init__(self):
+        self.base_url = "https://www.dollargeneral.com"
+        self.search_url = "https://www.dollargeneral.com/c/toys/pokemon?q=&soldAtStore=true"
+        self.session = requests.Session()
+        
+        # Headers to appear more like a real browser
+        self.headers = {
+            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
+            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
+            'Accept-Language': 'en-US,en;q=0.5',
+            'Accept-Encoding': 'gzip, deflate',
+            'DNT': '1',
+            'Connection': 'keep-alive',
+            'Upgrade-Insecure-Requests': '1',
+        }
+        self.session.headers.update(self.headers)
+        
+        self.products = []
+        
+    def get_page_with_requests(self, url):
+        """Try to get page content using requests"""
+        try:
+            response = self.session.get(url, timeout=30)
+            response.raise_for_status()
+            return response.text
+        except requests.RequestException as e:
+            print(f"Requests failed for {url}: {e}")
+            return None
+    
+    def get_page_with_selenium(self, url):
+        """Fallback to selenium for dynamic content"""
+        if not SELENIUM_AVAILABLE:
+            return None
+            
+        options = Options()
+        options.add_argument('--headless')
+        options.add_argument('--no-sandbox')
+        options.add_argument('--disable-dev-shm-usage')
+        options.add_argument('--disable-gpu')
+        options.add_argument(f'--user-agent={self.headers["User-Agent"]}')
+        
+        try:
+            driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
+            driver.get(url)
+            
+            # Wait for content to load
+            WebDriverWait(driver, 10).until(
+                EC.presence_of_element_located((By.TAG_NAME, "body"))
+            )
+            
+            # Additional wait for dynamic content
+            time.sleep(3)
+            
+            html = driver.page_source
+            driver.quit()
+            return html
+            
+        except Exception as e:
+            print(f"Selenium failed for {url}: {e}")
+            if 'driver' in locals():
+                driver.quit()
+            return None
+    
+    def get_page_content(self, url):
+        """Get page content, trying requests first, then selenium"""
+        print(f"Fetching: {url}")
+        
+        # Try requests first
+        content = self.get_page_with_requests(url)
+        if content and len(content) > 1000:  # Basic content check
+            return content
+            
+        # Fallback to selenium
+        print("Falling back to Selenium...")
+        return self.get_page_with_selenium(url)
+    
+    def extract_product_links(self, html):
+        """Extract product page links from search results"""
+        soup = BeautifulSoup(html, 'html.parser')
+        links = []
+        
+        # Common selectors for product links
+        selectors = [
+            'a[href*="/p/"]',
+            '.product-item a',
+            '.product-card a',
+            '.product-link',
+            '[data-testid*="product"] a'
+        ]
+        
+        for selector in selectors:
+            elements = soup.select(selector)
+            for element in elements:
+                href = element.get('href')
+                if href and '/p/' in href:
+                    full_url = urljoin(self.base_url, href)
+                    if full_url not in links:
+                        links.append(full_url)
+        
+        return links
+    
+    def extract_product_info(self, url, html):
+        """Extract product information from product page"""
+        soup = BeautifulSoup(html, 'html.parser')
+        product = {'url': url}
+        
+        # Extract title
+        title_selectors = [
+            'h1',
+            '.product-title',
+            '.product-name',
+            '[data-testid="product-title"]',
+            '.pdp-product-name'
+        ]
+        
+        for selector in title_selectors:
+            title_elem = soup.select_one(selector)
+            if title_elem:
+                product['title'] = title_elem.get_text().strip()
+                break
+        
+        # Extract price
+        price_selectors = [
+            '.price',
+            '.product-price',
+            '[data-testid="price"]',
+            '.price-current',
+            '.current-price'
+        ]
+        
+        for selector in price_selectors:
+            price_elem = soup.select_one(selector)
+            if price_elem:
+                price_text = price_elem.get_text().strip()
+                product['price'] = price_text
+                break
+        
+        # Extract SKU
+        sku_selectors = [
+            '[data-sku]',
+            '.sku',
+            '.product-sku',
+            '*[text()*="SKU"]',
+            'script[type="application/ld+json"]'
+        ]
+        
+        # Try data attributes first
+        for selector in sku_selectors[:-1]:
+            elem = soup.select_one(selector)
+            if elem:
+                sku = elem.get('data-sku') or elem.get_text().strip()
+                if sku and sku.lower() != 'sku':
+                    product['sku'] = sku
+                    break
+        
+        # Try JSON-LD structured data
+        if 'sku' not in product:
+            scripts = soup.find_all('script', type='application/ld+json')
+            for script in scripts:
+                try:
+                    data = json.loads(script.string)
+                    if isinstance(data, dict) and 'sku' in data:
+                        product['sku'] = data['sku']
+                        break
+                    elif isinstance(data, list):
+                        for item in data:
+                            if isinstance(item, dict) and 'sku' in item:
+                                product['sku'] = item['sku']
+                                break
+                except:
+                    continue
+        
+        # Extract stock information
+        stock_selectors = [
+            '.stock',
+            '.inventory',
+            '.availability',
+            '[data-testid="stock"]',
+            '.in-stock',
+            '.out-of-stock'
+        ]
+        
+        for selector in stock_selectors:
+            stock_elem = soup.select_one(selector)
+            if stock_elem:
+                stock_text = stock_elem.get_text().strip().lower()
+                if 'in stock' in stock_text:
+                    product['stock'] = 'In Stock'
+                elif 'out of stock' in stock_text:
+                    product['stock'] = 'Out of Stock'
+                else:
+                    product['stock'] = stock_text
+                break
+        
+        # Extract image URL
+        img_selectors = [
+            '.product-image img',
+            '.product-photo img',
+            '.pdp-image img',
+            '[data-testid="product-image"] img',
+            'img[alt*="Pokemon"]',
+            'img[alt*="TCG"]'
+        ]
+        
+        for selector in img_selectors:
+            img_elem = soup.select_one(selector)
+            if img_elem:
+                src = img_elem.get('src') or img_elem.get('data-src')
+                if src:
+                    product['image_url'] = urljoin(self.base_url, src)
+                    break
+        
+        return product
+    
+    def is_pokemon_tcg_product(self, product):
+        """Check if product is a Pokemon TCG card pack or tin"""
+        if not product.get('title'):
+            return False
+            
+        title = product['title'].lower()
+        pokemon_keywords = ['pokemon', 'tcg', 'trading card', 'cards']
+        tcg_keywords = ['pack', 'tin', 'box', 'booster', 'collection']
+        
+        has_pokemon = any(keyword in title for keyword in pokemon_keywords)
+        has_tcg = any(keyword in title for keyword in tcg_keywords)
+        
+        return has_pokemon and has_tcg
+    
+    def scrape_products(self):
+        """Main scraping method"""
+        print(f"Starting scrape of: {self.search_url}")
+        
+        # Get search results page
+        html = self.get_page_content(self.search_url)
+        if not html:
+            print("Failed to get search results page")
+            return []
+        
+        # Extract product links
+        product_links = self.extract_product_links(html)
+        print(f"Found {len(product_links)} potential product links")
+        
+        if not product_links:
+            print("No product links found. The page structure may have changed.")
+            print("First 1000 chars of page:")
+            print(html[:1000])
+            return []
+        
+        # Scrape each product page
+        for i, link in enumerate(product_links):
+            print(f"Scraping product {i+1}/{len(product_links)}: {link}")
+            
+            product_html = self.get_page_content(link)
+            if not product_html:
+                continue
+            
+            product = self.extract_product_info(link, product_html)
+            
+            # Filter for Pokemon TCG products
+            if self.is_pokemon_tcg_product(product):
+                print(f"Found Pokemon TCG product: {product.get('title', 'Unknown')}")
+                self.products.append(product)
+            else:
+                print(f"Skipping non-TCG product: {product.get('title', 'Unknown')}")
+            
+            # Be respectful to the server
+            time.sleep(1)
+        
+        return self.products
+    
+    def save_to_json(self, filename=None):
+        """Save scraped products to JSON file"""
+        if not filename:
+            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+            filename = f"pokemon_tcg_products_{timestamp}.json"
+        
+        with open(filename, 'w') as f:
+            json.dump(self.products, f, indent=2)
+        
+        print(f"Saved {len(self.products)} products to {filename}")
+        return filename
+
+def main():
+    scraper = PokemonTCGScraper()
+    products = scraper.scrape_products()
+    
+    if products:
+        filename = scraper.save_to_json()
+        print(f"\nScraping completed successfully!")
+        print(f"Found {len(products)} Pokemon TCG products")
+        print(f"Data saved to: {filename}")
+    else:
+        print("\nNo products found. This could be due to:")
+        print("1. No Pokemon TCG products in stock")
+        print("2. Website structure changes")
+        print("3. Anti-bot protection")
+
+if __name__ == "__main__":
+    main()
--- a/test_barcode.py
+++ b/test_barcode.py
@@ -0,0 +1,55 @@
+#!/usr/bin/env python3
+"""
+Test script to verify barcode generation functionality
+"""
+
+import sys
+import os
+from pathlib import Path
+
+# Add current directory to path if running in venv
+sys.path.insert(0, '.')
+
+try:
+    import barcode
+    from barcode.writer import ImageWriter
+    print("✓ Barcode generation libraries are available")
+    
+    # Test barcode generation
+    test_sku = "123456789012"
+    
+    upc_generator = barcode.get_barcode_class('upca')
+    test_barcode = upc_generator("12345678901", writer=ImageWriter())
+    
+    # Create test output directory
+    test_dir = Path("test_output")
+    test_dir.mkdir(exist_ok=True)
+    
+    # Generate test barcode
+    barcode_path = test_dir / "test_barcode"
+    test_barcode.save(str(barcode_path), options={
+        'module_width': 0.2,
+        'module_height': 15.0,
+        'quiet_zone': 6.5,
+        'font_size': 10,
+        'text_distance': 5.0,
+        'background': 'white',
+        'foreground': 'black'
+    })
+    
+    final_path = f"{barcode_path}.png"
+    if os.path.exists(final_path):
+        print(f"✓ Test barcode generated successfully: {final_path}")
+        print(f"  File size: {os.path.getsize(final_path)} bytes")
+    else:
+        print(f"✗ Failed to generate test barcode")
+        sys.exit(1)
+        
+except ImportError as e:
+    print(f"✗ Missing barcode library: {e}")
+    sys.exit(1)
+except Exception as e:
+    print(f"✗ Barcode generation failed: {e}")
+    sys.exit(1)
+
+print("✓ All barcode generation tests passed!")