Removed 20 files: old test scripts, debug tools, duplicate docs,
generated JSON, old PDF generator, launcher scripts.
Kept:
disco.py — main tool (scrape HAR + generate PDF)
scraper.py — reference site scraper (HTML + Selenium/Brave)
requirements.txt
*.har — browser capture with API data
Updated:
README.md — rewritten to reflect current tool and usage
.gitignore — simplified
scraper.py — module/class/method docstrings updated to clarify
this is a reference implementation, disco.py is primary
3.9 KiB
Pokemon Discovery (pokemon-disco)
Scrapes Pokemon TCG card pack and tin products from Dollar General and generates a PDF product catalog with images and UPC-A barcodes.
How It Works
Dollar General's Pokemon category page loads products dynamically via an internal API. A browser HAR capture contains the API responses with all product data. disco.py extracts products from the HAR file, filters for card packs and tins, downloads product images, generates UPC-A barcodes, and produces a LaTeX-based PDF catalog.
Pipeline
HAR file → Extract API responses → Filter packs/tins → Download images
→ Generate UPC-A barcodes → Compile PDF catalog (pdflatex)
Requirements
- Python 3.10+
- pdflatex (via
texlive-core+texlive-latexextra) - Python packages:
requests,beautifulsoup4,python-barcode,Pillow
Install (Arch / CachyOS)
sudo pacman -S texlive-basic texlive-latex texlive-latexextra texlive-fontsrecommended
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Usage
Full run (scrape + PDF)
source venv/bin/activate
python disco.py
Scrape only (output JSON)
python disco.py --scrape-only
PDF only (from existing JSON)
python disco.py --pdf-only pokemon_tcg_products_YYYYMMDD_HHMMSS.json
Output
pokemon_tcg_products_YYYYMMDD_HHMMSS.json Product data
catalog_output/
├── pokemon_catalog_YYYYMMDD_HHMMSS.pdf PDF catalog
├── pokemon_catalog_YYYYMMDD_HHMMSS.tex LaTeX source
├── images/ Product images (PNG)
└── barcodes/ UPC-A barcodes (PNG)
PDF Layout
Page 1 — Manifest: table of all products with SKU, price, and stock count.
Product pages:
Product Name
Stock status Price
SKU: XXXXXXXX UPC: XXXXXXXXXXXX
┌─────────────────────────────┐
│ │
│ Product Image │
│ │
└─────────────────────────────┘
┌─────────────────────────────┐
│ UPC-A Barcode │
└─────────────────────────────┘
Capturing a HAR File
The HAR file provides product data from Dollar General's internal API. To capture one:
- Open your browser (Brave, Chrome, Firefox)
- Open DevTools → Network tab
- Visit
https://www.dollargeneral.com/c/toys/pokemon?q= - Wait for products to load, toggle any filters you want
- Right-click in the Network tab → Save all as HAR
- Place the
.harfile in the project root
disco.py looks for any .har file matching the default name pattern. Edit the HAR_FILE constant at the top of disco.py if your filename differs.
Files
| File | Purpose |
|---|---|
disco.py |
Main tool — scrape, filter, generate PDF |
scraper.py |
Reference site scraper (HTML + Selenium/Brave) |
requirements.txt |
Python dependencies |
*.har |
Browser HAR capture with API data |
API Details (Reference)
The product data comes from this internal API:
POST https://dggo.dollargeneral.com/omni/api/v2/category/search/provider
Content-Type: application/json
Authorization: Bearer <session-token>
{
"StoreNbr": 17506,
"Id": 723960, // Pokemon category
"PageSize": 24,
"Filters": {
"soldAtStore": true,
"inStock": false // false = include out of stock
}
}
Response contains ItemList.Items[] with fields: Description, UPC, Price, Image, AvailableQty, rootSV (internal ID → SKU).
The bearer token is session-scoped and short-lived. disco.py sidesteps this by reading the API responses directly from a HAR capture.