Speech AI Forge

Report Abuse

Basic Information

Speech-AI-Forge is a project centered on text-to-speech model development that provides both an API server and a Gradio web user interface for running and managing speech generation and recognition workflows. The repository packages integrations for many open TTS and ASR models and tools, plus scripts to download required model assets. It is intended to be deployed locally, in containers, on Colab, or on HuggingFace Spaces. The codebase exposes a lightweight HTTP API server script for higher-throughput programmatic use and a web UI for interactive configuration and testing. The project focuses on practical TTS features such as multi-model inference, voice building and testing, SSML script editing, long-text handling and post-processing, while also offering ASR, force alignment and audio enhancement capabilities. Model download utilities, Docker compose files, environment templates and API documentation are included to help users install and run the system.

Links

Categorization

App Details

Features
The repository documents and implements a broad set of speech features and deployment tooling. Multi-model TTS support includes ChatTTS, FishSpeech, CosyVoice, FireRedTTS, F5-TTS, Index-TTS, Spark-TTS and GPT-SoVITS. Voice management features include speaker switching, custom voice upload, a voice builder, test voice, random seed sampling and voice blending plus a voice hub for shared voice packages. Advanced synthesis features include style controls, refiner support for long text, configurable text splitters, batch size for long-text batching, pace/pitch/volume adjusters, loudness equalization and an enhancer model for audio quality. SSML tooling supports podcast and multi-role scripts, subtitle-to-SSML conversion and a script editor. ASR support uses Whisper and SenseVoice with force alignment. Operational features include a Gradio web UI, a standalone API server with documented endpoints, Docker compose setups and model downloader scripts.
Use Cases
Speech-AI-Forge helps users and teams deploy and evaluate high-quality TTS and ASR pipelines without building integrations from scratch. It provides ready-to-run UI workflows for experimenting with voices, styles and SSML scripts, plus programmatic API endpoints for integrating TTS into applications or high-throughput services. The included model downloader scripts and Docker compose templates reduce setup friction and make deployments reproducible across Colab, local machines and container platforms. Voice management and builder tools let creators construct and test custom voices and blend seeds for creative results. Long-text handling, refiner and post-processing tools support podcast and narration use cases, while ASR and force alignment enable transcription and alignment tasks. Documentation, example commands and an API docs endpoint assist operationalizing TTS/ASR in research or production contexts.

Please fill the required fields*