Speech AI Forge

Design

Speech-AI-Forge is a project centered on text-to-speech model development that provides both an API server and a Gradio web user interface for running and managing speech generation and recognition workflows. The repository packages integrations for many open TTS and ASR models and tools, plus scripts to download required model assets. It is intended to be deployed locally, in containers, on Colab, or on HuggingFace Spaces. The codebase exposes a lightweight HTTP API server script for higher-throughput programmatic use and a web UI for interactive configuration and testing. The project focuses on practical TTS features such as multi-model inference, voice building and testing, SSML script editing, long-text handling and post-processing, while also offering ASR, force alignment and audio enhancement capabilities. Model download utilities, Docker compose files, environment templates and API documentation are included to help users install and run the system.

Stars

1328

App URL

https://github.com/lenML/Speech-AI-Forge

Github Repository

https://github.com/lenML/Speech-AI-Forge/blob/main/README.md

Features

The repository documents and implements a broad set of speech features and deployment tooling. Multi-model TTS support includes ChatTTS, FishSpeech, CosyVoice, FireRedTTS, F5-TTS, Index-TTS, Spark-TTS and GPT-SoVITS. Voice management features include speaker switching, custom voice upload, a voice builder, test voice, random seed sampling and voice blending plus a voice hub for shared voice packages. Advanced synthesis features include style controls, refiner support for long text, configurable text splitters, batch size for long-text batching, pace/pitch/volume adjusters, loudness equalization and an enhancer model for audio quality. SSML tooling supports podcast and multi-role scripts, subtitle-to-SSML conversion and a script editor. ASR support uses Whisper and SenseVoice with force alignment. Operational features include a Gradio web UI, a standalone API server with documented endpoints, Docker compose setups and model downloader scripts.

Use Cases

Speech-AI-Forge helps users and teams deploy and evaluate high-quality TTS and ASR pipelines without building integrations from scratch. It provides ready-to-run UI workflows for experimenting with voices, styles and SSML scripts, plus programmatic API endpoints for integrating TTS into applications or high-throughput services. The included model downloader scripts and Docker compose templates reduce setup friction and make deployments reproducible across Colab, local machines and container platforms. Voice management and builder tools let creators construct and test custom voices and blend seeds for creative results. Long-text handling, refiner and post-processing tools support podcast and narration use cases, while ASR and force alignment enable transcription and alignment tasks. Documentation, example commands and an API docs endpoint assist operationalizing TTS/ASR in research or production contexts.

Speech AI Forge

Basic Information

Links

Categorization

App Details

Categories

Similar Listings

yutu

cyber-doctor

LLM-Powered-RAG-System

xpert

Curie

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags

More Filters

Speech AI Forge

Categories

Similar Listings

yutu

cyber-doctor

LLM-Powered-RAG-System

xpert

Curie

Featured Listings

Terry Bison Ranch

The Singapore Flyer

Tags