A Clean Break with Alice: Building a Fully Local and Private Voice Assistant
How to replace Yandex Alice with a fully offline voice assistant built on Home Assistant, Whisper, Piper, and ESP32 hardware — three implementation options ranging from 1,400 to 20,000+ rubles, with complete architecture diagrams.
I decided to replace Yandex Alice after learning about new Russian legislation regarding internet service obligations. My smart home already ran fully autonomously on a local network — 17 Wi-Fi modules and 42 Zigbee devices managed by a Raspberry Pi 4 running Home Assistant OS. The voice assistant was the one remaining cloud dependency. That needed to change.
My Smart Home Configuration
Central controller: Raspberry Pi 4B (2GB RAM), Home Assistant OS, installed 2022. All automation logic runs locally with zero cloud dependencies.
Protocols:
- Wi-Fi — ESPHome (17 modules)
- Zigbee — 42 devices via Sonoff 3.0 Plus dongle and Zigbee2MQTT
Controlled devices: multi-zone lighting per room, climate control (air conditioning, heating), smart appliances (washing machine, dishwasher, kettle), motorized curtains, security cameras, intercom automation, and Kodi media integration.
Voice Assistant Architecture: The Required Components
A local voice assistant requires six integrated elements working together:
- Microphone and Speaker Hardware — ESP32-S3-BOX (full-featured, ~6,000 RUB) or M5Stack ATOM Echo (compact, ~1,400 RUB)
- Wake Word Engine — OpenWakeWord: lightweight local keyword activation
- Speech-to-Text (STT) — Whisper from OpenAI: modern transcription standard, excellent Russian language support, multiple model sizes (tiny through medium)
- Intent Recognition — Home Assistant's built-in Assist mechanism
- Text-to-Speech (TTS) — Piper: fast voice synthesis with Russian language support
- Wyoming Protocol — the networking layer that connects all components
Three Implementation Options
Option 1: Simple and Budget-Friendly
Cost: ~1,400 rubles (M5Stack ATOM Echo only)
All speech processing runs on the existing Raspberry Pi 4. Single microphone unit, quick to set up.
- Advantages: minimal investment, single device, no additional hardware, quick setup
- Disadvantages: noticeable response latency, additional load on the main controller, poor scalability to multiple rooms
Option 2: Proper Architecture (Author's Recommendation)
Cost: ~21,400 rubles and up
- Mini PC with Intel N100/N95 processor: ~14,000 rubles
- ESP32-S3-BOX for living room: ~6,000 rubles
- M5Stack ATOM Echo units for other rooms: ~1,400 rubles each
The mini PC runs Whisper (STT), Piper (TTS), and OpenWakeWord in Docker containers. Home Assistant on the Raspberry Pi stays focused on automation only — it does not handle speech processing.
- Advantages: fast response times, doesn't burden the main controller, scales easily to additional rooms
- Disadvantages: higher initial cost, requires Linux and Docker knowledge
Option 3: Consolidated Powerful Server
Replace the Raspberry Pi 4 entirely with a single powerful machine (16–32 GB RAM). All services — Home Assistant, Whisper, Piper, OpenWakeWord — run on one host. Optional NVIDIA GPU acceleration available.
- Advantages: maximum flexibility, single point of maintenance
- Disadvantages: significant cost increase, increased complexity
Data Flow (Option 2)
[User speaks]
↓
[ESP32-S3-BOX / ATOM Echo] — microphone + wake word detection
↓ (Wi-Fi)
[Mini PC: Whisper STT Server] — speech converted to text
↓
[Home Assistant on Raspberry Pi 4] — intent determined by Assist
↓ — commands executed
↓ (optional voice response)
[Mini PC: Piper TTS] — text synthesized to speech
↓ (Wi-Fi)
[ESP32-S3-BOX / ATOM Echo] — speaker plays response
↓
[User receives answer]
Why Whisper Instead of Alternatives?
Speech-to-Phrase: Very lightweight but inflexible — only matches against a predefined dictionary of phrases. Cannot handle natural freeform speech.
Rhasspy: A powerful, veteran solution, but development has slowed compared to the Home Assistant ecosystem's pace.
Whisper: Modern standard that understands natural speech freely, works well with Russian, has multiple model sizes for performance tuning, optimized variants like distil-whisper available, and has an active development community.
Conclusion
This transition is not just technical experimentation — it is a conscious step toward a truly private and independent smart home. Each option offers different trade-offs between cost, complexity, and capability. You can start with Option 1 for under 1,400 rubles and upgrade later, or go straight to the proper architecture if you have the hardware available. The system works entirely offline, with no cloud dependency and no data leaving your local network.