Report Abuse

Basic Information

MITSUHA is a hobbyist project that creates a local, multilingual virtual assistant or 'waifu' you can speak to via microphone and that speaks back via TTS. The system is designed to run on a PC or in VR/AR setups and supports a Gatebox-style hologram, VTube Studio avatar integration, and optional smart home control through Tuya. It stitches together voice input, speech-to-text, contextual memory lookup, local LLM inference, and TTS to produce spoken, context-aware responses. The README emphasizes this is a work in progress with major changes underway and warns users not to attempt installation right now because some model components may error. The project targets enthusiasts who want an interactive, multimodal local assistant with avatar and home automation features.

Links

Categorization

App Details

Features
The README documents an end-to-end pipeline combining SpeechRecognition and Whisper for microphone input and transcription, hyperDB and sentence-transformers for vectorized short-term and long-term memory retrieval, llama.cpp for local language model inference, and VITS-based TTS to generate spoken replies. It supports multilingual voices in English, Japanese, Korean, and Chinese. Integrations include VTube Studio for lip-synced avatars, VB-Audio virtual cable setup instructions for routing audio, and optional Tuya Cloud IoT for Alexa-like smart home control. The project lists prerequisites and an automatic installation flow, plus planned roadmap items such as compiling into a single executable, mobile support, and improved lip-sync methods.
Use Cases
MITSUHA provides a hands-free conversational assistant for desktop and VR use cases by combining speech recognition, local LLM responses, memory for context continuity, and natural-sounding TTS. For VTubers and hobbyists it offers avatar lip-sync via VTube Studio and an example workflow for routing audio with virtual cables. For home automation users it can be configured to control Tuya-compatible devices like an Alexa-like assistant. Because it uses local inference components such as llama.cpp and local TTS, it supports offline or privacy-conscious setups. The README also documents setup steps and hardware/software prerequisites to help users reproduce the environment when the project stabilizes.

Please fill the required fields*