e91f92fbb6
fix: PCM streaming — missing Response import + wrong tuple unpacking
...
- Add Response to flask imports (caused NameError on every PCM request)
- Unpack (audio, sr, timing) tuple correctly from generate_custom_voice_streaming
(was iterating the tuple itself, passing a 3-element object to np.clip)
- Move elapsed/chunk logging inside the generator so it fires after stream ends
- PCM streaming now working: 12c test → 2.3s audio in 1.8s, 3 chunks
2026-03-25 21:47:59 -07:00
fef6a1b74c
feat: add PCM streaming + Kokoro voice name support
...
- POST /audio/speech with response_format=pcm now streams raw 16-bit
PCM (24kHz mono) via Flask generator — compatible with customtts
extension streaming mode
- resolve_voice() handles:
* Standard OpenAI names (alloy, echo, ...)
* Kokoro blend syntax: 'af_bella+bf_emma+af_nicole' (picks first)
* Kokoro prefix heuristic: af_/bf_/am_/bm_ → Ryan, zf_/zm_ → Vivian
* Explicit Kokoro aliases for common voices (bella, emma, sky, etc.)
* Graceful fallback to alloy for unknown voices
- app.run(threaded=True) to support concurrent streaming connections
2026-03-25 21:39:56 -07:00
d3ca5ab0b2
feat: Qwen3-TTS proxy with HIP graph + CPU decoder optimisations
...
- OpenAI-compatible Flask proxy (POST /audio/speech, GET /models)
- faster-qwen3-tts HIP graph acceleration: GPU LLM at 1.78x RTF
- CPU speech tokenizer decoder: bypasses MIOpen ConvDirectNaiveConvFwd,
eliminates 4-40s per-request decode overhead
- attn_implementation=sdpa for transformer attention
- AOTRITON env var toggle (off=short sentences, on=long-form/novel chapters)
- HIP_GRAPHS env var toggle (default on)
- Startup warmup with HIP graph capture (~5s)
- CORS support for browser extension requests
- RTF: 0.9-1.5x on AMD RX 7900 XTX (gfx1100, ROCm 6.3)
Performance vs baseline (CPU-only, ~3 min/sentence):
12c: 3.2s | 44c: 2.7s | 115c: 6.6s
2026-03-25 21:18:42 -07:00