Smart Home

A Clean Break with Alice: Building a Fully Local and Private Voice Assistant

How to replace Yandex Alice with a fully offline voice assistant built on Home Assistant, Whisper, Piper, and ESP32 hardware — three implementation options ranging from 1,400 to 20,000+ rubles, with complete architecture diagrams.

I decided to replace Yandex Alice after learning about new Russian legislation regarding internet service obligations. My smart home already ran fully autonomously on a local network — 17 Wi-Fi modules and 42 Zigbee devices managed by a Raspberry Pi 4 running Home Assistant OS. The voice assistant was the one remaining cloud dependency. That needed to change.

My Smart Home Configuration

Central controller: Raspberry Pi 4B (2GB RAM), Home Assistant OS, installed 2022. All automation logic runs locally with zero cloud dependencies.

Protocols:

Wi-Fi — ESPHome (17 modules)
Zigbee — 42 devices via Sonoff 3.0 Plus dongle and Zigbee2MQTT

Controlled devices: multi-zone lighting per room, climate control (air conditioning, heating), smart appliances (washing machine, dishwasher, kettle), motorized curtains, security cameras, intercom automation, and Kodi media integration.

Voice Assistant Architecture: The Required Components

A local voice assistant requires six integrated elements working together:

Microphone and Speaker Hardware — ESP32-S3-BOX (full-featured, ~6,000 RUB) or M5Stack ATOM Echo (compact, ~1,400 RUB)
Wake Word Engine — OpenWakeWord: lightweight local keyword activation
Speech-to-Text (STT) — Whisper from OpenAI: modern transcription standard, excellent Russian language support, multiple model sizes (tiny through medium)
Intent Recognition — Home Assistant's built-in Assist mechanism
Text-to-Speech (TTS) — Piper: fast voice synthesis with Russian language support
Wyoming Protocol — the networking layer that connects all components

Three Implementation Options

Option 1: Simple and Budget-Friendly

Cost: ~1,400 rubles (M5Stack ATOM Echo only)

All speech processing runs on the existing Raspberry Pi 4. Single microphone unit, quick to set up.

Advantages: minimal investment, single device, no additional hardware, quick setup
Disadvantages: noticeable response latency, additional load on the main controller, poor scalability to multiple rooms

Option 2: Proper Architecture (Author's Recommendation)

Cost: ~21,400 rubles and up

Mini PC with Intel N100/N95 processor: ~14,000 rubles
ESP32-S3-BOX for living room: ~6,000 rubles
M5Stack ATOM Echo units for other rooms: ~1,400 rubles each

The mini PC runs Whisper (STT), Piper (TTS), and OpenWakeWord in Docker containers. Home Assistant on the Raspberry Pi stays focused on automation only — it does not handle speech processing.

Advantages: fast response times, doesn't burden the main controller, scales easily to additional rooms
Disadvantages: higher initial cost, requires Linux and Docker knowledge

Option 3: Consolidated Powerful Server

Replace the Raspberry Pi 4 entirely with a single powerful machine (16–32 GB RAM). All services — Home Assistant, Whisper, Piper, OpenWakeWord — run on one host. Optional NVIDIA GPU acceleration available.

Advantages: maximum flexibility, single point of maintenance
Disadvantages: significant cost increase, increased complexity

Data Flow (Option 2)

[User speaks]
    ↓
[ESP32-S3-BOX / ATOM Echo] — microphone + wake word detection
    ↓ (Wi-Fi)
[Mini PC: Whisper STT Server] — speech converted to text
    ↓
[Home Assistant on Raspberry Pi 4] — intent determined by Assist
    ↓ — commands executed
    ↓ (optional voice response)
[Mini PC: Piper TTS] — text synthesized to speech
    ↓ (Wi-Fi)
[ESP32-S3-BOX / ATOM Echo] — speaker plays response
    ↓
[User receives answer]

Why Whisper Instead of Alternatives?

Speech-to-Phrase: Very lightweight but inflexible — only matches against a predefined dictionary of phrases. Cannot handle natural freeform speech.

Rhasspy: A powerful, veteran solution, but development has slowed compared to the Home Assistant ecosystem's pace.

Whisper: Modern standard that understands natural speech freely, works well with Russian, has multiple model sizes for performance tuning, optimized variants like distil-whisper available, and has an active development community.

Conclusion

This transition is not just technical experimentation — it is a conscious step toward a truly private and independent smart home. Each option offers different trade-offs between cost, complexity, and capability. You can start with Option 1 for under 1,400 rubles and upgrade later, or go straight to the proper architecture if you have the hardware available. The system works entirely offline, with no cloud dependency and no data leaving your local network.

A Clean Break with Alice: Building a Fully Local and Private Voice Assistant

My Smart Home Configuration

Voice Assistant Architecture: The Required Components

Three Implementation Options

Option 1: Simple and Budget-Friendly

Option 2: Proper Architecture (Author's Recommendation)

Option 3: Consolidated Powerful Server

Data Flow (Option 2)

Why Whisper Instead of Alternatives?

Conclusion

Further reading

Why Airships Never Took Off. Part 12: Italian Semi-Rigid Airships

Why Airships Never Took Off. Part 11: Aircraft Carriers in the Sky

Why Airships Never Took Off. Part 10: The Most Famous and Successful Zeppelin

Why Airships Never Took Off. Part 9: Ashes of War and New Opportunities