CleanS2S
Basic Information
CleanS2S is a prototype Speech-to-Speech (S2S) agent designed to demonstrate a high-quality, streaming, interactive Chinese voice interface implemented in a compact single-file pipeline. The repository aims to provide researchers and developers a readable reference implementation of an end-to-end S2S pipeline that combines Automatic Speech Recognition (ASR), a Large Language Model (LLM) handler, and Text-to-Speech (TTS) into a real-time conversational agent. It emphasizes a Linguistic User Interface (LUI) style experience with features for proactive action initiation and subjective action judgement. The project includes demo conversations, backend server scripts for running the streaming pipeline, optional retrieval-augmented generation (RAG) and web search extensions, and a frontend client to try interactions in a browser. The design targets quick exploration, validation of ideas, and easy customization of models and components.