# Pokemon Discovery (pokemon-disco) Scrapes Pokemon TCG card pack and tin products from Dollar General and generates a PDF product catalog with images and UPC-A barcodes. ## How It Works Dollar General's Pokemon category page loads products dynamically via an internal API. A browser HAR capture contains the API responses with all product data. `disco.py` extracts products from the HAR file, filters for card packs and tins, downloads product images, generates UPC-A barcodes, and produces a LaTeX-based PDF catalog. ### Pipeline ``` HAR file → Extract API responses → Filter packs/tins → Download images → Generate UPC-A barcodes → Compile PDF catalog (pdflatex) ``` ## Requirements - Python 3.10+ - pdflatex (via `texlive-core` + `texlive-latexextra`) - Python packages: `requests`, `beautifulsoup4`, `python-barcode`, `Pillow` ### Install (Arch / CachyOS) ```bash sudo pacman -S texlive-basic texlive-latex texlive-latexextra texlive-fontsrecommended python -m venv venv source venv/bin/activate pip install -r requirements.txt ``` ## Usage ### Full run (scrape + PDF) ```bash source venv/bin/activate python disco.py ``` ### Scrape only (output JSON) ```bash python disco.py --scrape-only ``` ### PDF only (from existing JSON) ```bash python disco.py --pdf-only pokemon_tcg_products_YYYYMMDD_HHMMSS.json ``` ## Output ``` pokemon_tcg_products_YYYYMMDD_HHMMSS.json Product data catalog_output/ ├── pokemon_catalog_YYYYMMDD_HHMMSS.pdf PDF catalog ├── pokemon_catalog_YYYYMMDD_HHMMSS.tex LaTeX source ├── images/ Product images (PNG) └── barcodes/ UPC-A barcodes (PNG) ``` ### PDF Layout **Page 1 — Manifest:** table of all products with SKU, price, and stock count. **Product pages:** ``` Product Name Stock status Price SKU: XXXXXXXX UPC: XXXXXXXXXXXX ┌─────────────────────────────┐ │ │ │ Product Image │ │ │ └─────────────────────────────┘ ┌─────────────────────────────┐ │ UPC-A Barcode │ └─────────────────────────────┘ ``` ## Capturing a HAR File The HAR file provides product data from Dollar General's internal API. To capture one: 1. Open your browser (Brave, Chrome, Firefox) 2. Open DevTools → **Network** tab 3. Visit `https://www.dollargeneral.com/c/toys/pokemon?q=` 4. Wait for products to load, toggle any filters you want 5. Right-click in the Network tab → **Save all as HAR** 6. Place the `.har` file in the project root `disco.py` looks for any `.har` file matching the default name pattern. Edit the `HAR_FILE` constant at the top of `disco.py` if your filename differs. ## Files | File | Purpose | |------|---------| | `disco.py` | Main tool — scrape, filter, generate PDF | | `scraper.py` | Reference site scraper (HTML + Selenium/Brave) | | `requirements.txt` | Python dependencies | | `*.har` | Browser HAR capture with API data | ## API Details (Reference) The product data comes from this internal API: ``` POST https://dggo.dollargeneral.com/omni/api/v2/category/search/provider Content-Type: application/json Authorization: Bearer { "StoreNbr": 17506, "Id": 723960, // Pokemon category "PageSize": 24, "Filters": { "soldAtStore": true, "inStock": false // false = include out of stock } } ``` Response contains `ItemList.Items[]` with fields: `Description`, `UPC`, `Price`, `Image`, `AvailableQty`, `rootSV` (internal ID → SKU). The bearer token is session-scoped and short-lived. `disco.py` sidesteps this by reading the API responses directly from a HAR capture.