WavCraft

Report Abuse

Basic Information

WavCraft is an LLM-driven system for audio content creation and editing that connects large language models with audio expert models and DSP functions. The repository provides tools to perform text-guided audio editing of existing clips, generate new audio from textual prompts, and produce audio-aware scriptwriting where the model writes scenes and generates corresponding sounds. It includes command-line entry points and an interactive chat mode for iterative editing and supports watermarking so outputs can be identified as produced or modified by WavCraft. The project supplies environment setup and service launch scripts to run deep learning components locally and accepts OpenAI and Hugging Face credentials for model access. The codebase is provided for research purposes and includes a mandatory watermarking disclaimer.

Links

App Details

Features
WavCraft implements several core capabilities: text-guided audio editing that modifies input WAV files via a one-line CLI or an interactive chat flow, text-guided audio generation from prompts, and an audio scriptwriting mode that produces scene descriptions and sound designs. Utility scripts include scripts/setup_envs.sh for environment setup and scripts/start_services.sh to launch local model services. Example entry points shown in the README are WavCraft.py and WavCraft-chat.py for batch and interactive use, plus check_watermark.py to verify whether audio was generated or altered by WavCraft. The project added watermarking functionality and supports openLLMs such as the Mistral family for generation. The architecture is described as an LLM orchestrator that glues together expert audio models and DSP tools.
Use Cases
WavCraft helps researchers and creators prototype LLM-driven audio workflows by enabling natural-language control of audio editing and generation. Users can quickly apply high-level textual instructions to an existing recording to add or modify sounds, run interactive sessions to refine edits, or request the system to draft audio-oriented scripts and generate corresponding soundtracks. Built-in watermarking and a checker script aid provenance and detection of synthesized content. The provided setup and service scripts make it easier to run required deep learning components locally and to experiment with OpenAI or Hugging Face models. The README emphasizes research-only usage and advises not to disable watermarking, offering a practical sandbox for experimenting with audio LLM orchestration.

Please fill the required fields*