qwen3-tts-ra

pi-bot-01/qwen3-tts-ra

Fork 0

Commit Graph

Author	SHA1	Message	Date
pi-bot-01	fef6a1b74c	feat: add PCM streaming + Kokoro voice name support - POST /audio/speech with response_format=pcm now streams raw 16-bit PCM (24kHz mono) via Flask generator — compatible with customtts extension streaming mode - resolve_voice() handles: * Standard OpenAI names (alloy, echo, ...) * Kokoro blend syntax: 'af_bella+bf_emma+af_nicole' (picks first) * Kokoro prefix heuristic: af_/bf_/am_/bm_ → Ryan, zf_/zm_ → Vivian * Explicit Kokoro aliases for common voices (bella, emma, sky, etc.) * Graceful fallback to alloy for unknown voices - app.run(threaded=True) to support concurrent streaming connections	2026-03-25 21:39:56 -07:00
pi-bot-01	d3ca5ab0b2	feat: Qwen3-TTS proxy with HIP graph + CPU decoder optimisations - OpenAI-compatible Flask proxy (POST /audio/speech, GET /models) - faster-qwen3-tts HIP graph acceleration: GPU LLM at 1.78x RTF - CPU speech tokenizer decoder: bypasses MIOpen ConvDirectNaiveConvFwd, eliminates 4-40s per-request decode overhead - attn_implementation=sdpa for transformer attention - AOTRITON env var toggle (off=short sentences, on=long-form/novel chapters) - HIP_GRAPHS env var toggle (default on) - Startup warmup with HIP graph capture (~5s) - CORS support for browser extension requests - RTF: 0.9-1.5x on AMD RX 7900 XTX (gfx1100, ROCm 6.3) Performance vs baseline (CPU-only, ~3 min/sentence): 12c: 3.2s \| 44c: 2.7s \| 115c: 6.6s	2026-03-25 21:18:42 -07:00

Author

SHA1

Message

Date

pi-bot-01

fef6a1b74c

feat: add PCM streaming + Kokoro voice name support

- POST /audio/speech with response_format=pcm now streams raw 16-bit
  PCM (24kHz mono) via Flask generator — compatible with customtts
  extension streaming mode
- resolve_voice() handles:
    * Standard OpenAI names (alloy, echo, ...)
    * Kokoro blend syntax: 'af_bella+bf_emma+af_nicole' (picks first)
    * Kokoro prefix heuristic: af_/bf_/am_/bm_ → Ryan, zf_/zm_ → Vivian
    * Explicit Kokoro aliases for common voices (bella, emma, sky, etc.)
    * Graceful fallback to alloy for unknown voices
- app.run(threaded=True) to support concurrent streaming connections

2026-03-25 21:39:56 -07:00

pi-bot-01

d3ca5ab0b2

feat: Qwen3-TTS proxy with HIP graph + CPU decoder optimisations

- OpenAI-compatible Flask proxy (POST /audio/speech, GET /models)
- faster-qwen3-tts HIP graph acceleration: GPU LLM at 1.78x RTF
- CPU speech tokenizer decoder: bypasses MIOpen ConvDirectNaiveConvFwd,
  eliminates 4-40s per-request decode overhead
- attn_implementation=sdpa for transformer attention
- AOTRITON env var toggle (off=short sentences, on=long-form/novel chapters)
- HIP_GRAPHS env var toggle (default on)
- Startup warmup with HIP graph capture (~5s)
- CORS support for browser extension requests
- RTF: 0.9-1.5x on AMD RX 7900 XTX (gfx1100, ROCm 6.3)

Performance vs baseline (CPU-only, ~3 min/sentence):
  12c: 3.2s | 44c: 2.7s | 115c: 6.6s

2026-03-25 21:18:42 -07:00

2 Commits