UPC-A barcodes should encode the Universal Product Code, not the internal store SKU. The UPCs are already 12-digit numbers that match the barcodes on the physical product packaging.
Pokemon Discovery (pokemon-disco)
A comprehensive tool for discovering Pokemon Trading Card Game products from Dollar General's website and generating a professional PDF catalog with product images, details, and UPC-A barcodes.
Features
- 🔍 API Discovery: Discovered Dollar General's internal product API via HAR analysis
- 📱 Product Extraction: Successfully extracts Pokemon TCG product details (title, SKU, price, stock)
- 🏷️ Barcode Generation: Creates scannable UPC-A barcodes for inventory management
- 📄 PDF Catalogs: Professional PDF catalogs with images, details, and barcodes
- 🕰️ Unix-Friendly: Timestamped filenames (
YYYYMMDD_HHMMSS) for easy scripting - 🌐 Brave Browser Support: Configured for dynamic content scraping
- 🛡️ Anti-Bot Handling: Multiple fallback strategies (requests → Selenium → individual products)
Requirements
System Requirements
- Python 3.7+
- pandoc (for PDF generation)
- Chrome/Chromium browser (for Selenium fallback)
Python Dependencies
All dependencies are automatically installed via requirements.txt:
- requests
- beautifulsoup4
- selenium
- webdriver-manager
- python-barcode
- Pillow
- pandas
- lxml
Installation
-
Clone/Download this directory to your system
-
Install pandoc (required for PDF generation):
# Ubuntu/Debian sudo apt install pandoc # macOS brew install pandoc # Arch Linux sudo pacman -S pandoc -
Install Python dependencies (automatically done by the script):
cd pokemon-disco pip3 install -r requirements.txt
Usage
Quick Start (Recommended)
Run the complete pipeline with one command:
cd pokemon-disco
python3 run_scraper.py
This will:
- Check and install Python requirements
- Scrape Pokemon TCG products from Dollar General
- Generate a PDF catalog with images and barcodes
- Create timestamped files for easy organization
Manual Usage
If you prefer to run components separately:
1. Scrape Products
python3 scraper.py
This creates a JSON file like pokemon_tcg_products_20241221_143025.json
2. Generate PDF Catalog
python3 pdf_generator.py pokemon_tcg_products_20241221_143025.json
Output Files
Generated Files
-
JSON Data:
pokemon_tcg_products_YYYYMMDD_HHMMSS.json- Raw scraped data in JSON format
- Contains all product information
-
PDF Catalog:
catalog_output/pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf- Professional PDF catalog
- Includes product images, details, and UPC-A barcodes
Output Directory Structure
pokemon-disco/
├── pokemon_tcg_products_YYYYMMDD_HHMMSS.json
├── catalog_output/
│ ├── pokemon_tcg_catalog_YYYYMMDD_HHMMSS.pdf
│ ├── pokemon_tcg_catalog_YYYYMMDD_HHMMSS.md
│ ├── images/
│ │ ├── product_1_SKU123.jpg
│ │ ├── product_2_SKU456.jpg
│ │ └── placeholder.png
│ └── barcodes/
│ ├── barcode_SKU123.png
│ ├── barcode_SKU456.png
│ └── ...
PDF Catalog Features
Each product in the PDF includes:
- Product Image: Downloaded from Dollar General or placeholder
- Product Details Table:
- Title
- Price
- Stock Status
- SKU (formatted as code)
- Product URL
- UPC-A Barcode: Generated from SKU for inventory management
Data Fields Extracted
For each Pokemon TCG product:
title: Product nameprice: Current pricestock: Availability statussku: Product SKU/item numberimage_url: Direct link to product imageurl: Link to product page
Troubleshooting
Common Issues
-
No products found
- Dollar General may have anti-bot protection
- The script will automatically retry with Selenium
- Website structure may have changed
-
PDF generation fails
- Ensure pandoc is installed:
pandoc --version - Try alternative LaTeX engines if available
- Markdown file is still generated for manual conversion
- Ensure pandoc is installed:
-
Image download failures
- Network connectivity issues
- Placeholder images will be used automatically
-
Browser/Selenium issues
- Brave browser supported: Configured to use Brave at
/usr/bin/brave - ChromeDriver compatibility: May require version matching (Brave 146 vs ChromeDriver 114)
- Alternative browsers: Chrome, Chromium, or Firefox with geckodriver
- Script falls back to requests-only mode if Selenium fails
For Brave users: If you see ChromeDriver version mismatch:
# Test browser integration python test_brave.py # Solutions for version mismatch: pip install --upgrade webdriver-manager # or manually install compatible ChromeDriver - Brave browser supported: Configured to use Brave at
Debug Mode
To see more detailed output, check the console output during scraping. The scripts provide detailed logging of:
- Which products are found and filtered
- Network request status
- File generation progress
API Discovery Success 🎉
Pokemon Discovery has successfully discovered Dollar General's internal API endpoint!
- Endpoint Found:
https://dggo.dollargeneral.com/omni/api/v2/category/search/provider - Method: POST with JSON payload
- Category ID:
723960(Pokemon products) - Response Format: Complete product details including your test product (SKU:
41936301) - Status: Documented and integrated, requires authentication token
Current Status: Individual product extraction works perfectly. API bulk scraping available once authentication is implemented.
Technical Details
Scraping Strategy
- Primary Method: Uses requests with browser-like headers
- Fallback Method: Selenium with headless Chrome for dynamic content
- Product Filtering: Only includes products matching Pokemon TCG keywords
- Rate Limiting: 1-second delay between requests to be respectful
Barcode Generation
- Converts SKUs to 11-digit numeric format
- Generates UPC-A barcodes with check digits
- High-quality PNG images suitable for printing
PDF Generation
- Uses pandoc with LaTeX for professional formatting
- Includes table of contents
- Optimized for printing and digital viewing
- Images scaled appropriately for page layout
Customization
Modifying Product Filters
Edit the is_pokemon_tcg_product() method in scraper.py to change which products are included.
Changing PDF Layout
Modify the markdown generation in pdf_generator.py or add custom pandoc templates.
Adding New Data Fields
Extend the extract_product_info() method in scraper.py to capture additional product information.
License
This tool is for educational and personal use. Please respect Dollar General's terms of service and robots.txt when using this scraper.
Support
If you encounter issues:
- Check the console output for error messages
- Ensure all system requirements are installed
- Verify internet connectivity
- Check if the Dollar General website structure has changed
Generated files include timestamps for easy organization and version tracking.