feat: Qwen3-TTS proxy with HIP graph + CPU decoder optimisations
- OpenAI-compatible Flask proxy (POST /audio/speech, GET /models) - faster-qwen3-tts HIP graph acceleration: GPU LLM at 1.78x RTF - CPU speech tokenizer decoder: bypasses MIOpen ConvDirectNaiveConvFwd, eliminates 4-40s per-request decode overhead - attn_implementation=sdpa for transformer attention - AOTRITON env var toggle (off=short sentences, on=long-form/novel chapters) - HIP_GRAPHS env var toggle (default on) - Startup warmup with HIP graph capture (~5s) - CORS support for browser extension requests - RTF: 0.9-1.5x on AMD RX 7900 XTX (gfx1100, ROCm 6.3) Performance vs baseline (CPU-only, ~3 min/sentence): 12c: 3.2s | 44c: 2.7s | 115c: 6.6s
This commit is contained in:
49
.gitignore
vendored
Normal file
49
.gitignore
vendored
Normal file
@@ -0,0 +1,49 @@
|
||||
# Python
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*.pyo
|
||||
*.pyd
|
||||
.Python
|
||||
*.egg-info/
|
||||
dist/
|
||||
build/
|
||||
*.egg
|
||||
.eggs/
|
||||
|
||||
# Virtual envs
|
||||
venv/
|
||||
.venv/
|
||||
env/
|
||||
*.venv
|
||||
|
||||
# Model weights / audio output
|
||||
*.wav
|
||||
*.mp3
|
||||
*.bin
|
||||
*.safetensors
|
||||
*.pt
|
||||
*.pth
|
||||
|
||||
# HuggingFace cache
|
||||
.cache/
|
||||
|
||||
# Test artifacts
|
||||
test_output.*
|
||||
test_simple.py
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
|
||||
# Submodule source trees (large, checked out separately)
|
||||
Qwen3-TTS/
|
||||
read-aloud/
|
||||
|
||||
# Systemd units are user-specific, generated by setup script
|
||||
${HOME_DIR}/
|
||||
Reference in New Issue
Block a user