- OpenAI-compatible Flask proxy (POST /audio/speech, GET /models) - faster-qwen3-tts HIP graph acceleration: GPU LLM at 1.78x RTF - CPU speech tokenizer decoder: bypasses MIOpen ConvDirectNaiveConvFwd, eliminates 4-40s per-request decode overhead - attn_implementation=sdpa for transformer attention - AOTRITON env var toggle (off=short sentences, on=long-form/novel chapters) - HIP_GRAPHS env var toggle (default on) - Startup warmup with HIP graph capture (~5s) - CORS support for browser extension requests - RTF: 0.9-1.5x on AMD RX 7900 XTX (gfx1100, ROCm 6.3) Performance vs baseline (CPU-only, ~3 min/sentence): 12c: 3.2s | 44c: 2.7s | 115c: 6.6s
50 lines
499 B
Plaintext
50 lines
499 B
Plaintext
# Python
|
|
__pycache__/
|
|
*.py[cod]
|
|
*.pyo
|
|
*.pyd
|
|
.Python
|
|
*.egg-info/
|
|
dist/
|
|
build/
|
|
*.egg
|
|
.eggs/
|
|
|
|
# Virtual envs
|
|
venv/
|
|
.venv/
|
|
env/
|
|
*.venv
|
|
|
|
# Model weights / audio output
|
|
*.wav
|
|
*.mp3
|
|
*.bin
|
|
*.safetensors
|
|
*.pt
|
|
*.pth
|
|
|
|
# HuggingFace cache
|
|
.cache/
|
|
|
|
# Test artifacts
|
|
test_output.*
|
|
test_simple.py
|
|
|
|
# OS
|
|
.DS_Store
|
|
Thumbs.db
|
|
|
|
# IDE
|
|
.vscode/
|
|
.idea/
|
|
*.swp
|
|
*.swo
|
|
|
|
# Submodule source trees (large, checked out separately)
|
|
Qwen3-TTS/
|
|
read-aloud/
|
|
|
|
# Systemd units are user-specific, generated by setup script
|
|
${HOME_DIR}/
|