How-To Series · Episode 19 / 59 · Module 4: Eyes, Ears, Voice

Hermes · Voice Mode

Hold a key, talk to Hermes, hear it talk back. CLI, Telegram, Discord, and Discord VC.

After this videoYou can now have a real spoken conversation with Hermes.

Voice mode lets you talk to Hermes on the CLI (push-to-talk with Ctrl+B, VAD silence detection), on Telegram and Discord (auto voice replies alongside text), and in Discord voice channels (the bot joins, listens, speaks back). Three STT picks: faster-whisper (local, no key, default), Groq Whisper (free tier, fast), OpenAI Whisper (paid). Four TTS picks: Tool Gateway (Portal, no keys), ElevenLabs (premium), Edge TTS (free), NeuTTS (local). One caveat: voice mode does not work on Android via Termux — faster-whisper has no ctranslate2 wheel for Android.

About these resources. Every command in this video comes from the Voice Mode user-guide doc; the practical guide is referenced for setup walkthrough patterns.

Sources · What this video distills

2 docs pages · every command below traces to one of them
Primary · CLI push-to-talk, auto-reply, Discord VC, STT/TTS providers, system deps
Voice Mode
Read ↗
Practical setup walkthrough
Use Voice Mode with Hermes · Guide
Read ↗

Commands shown · Copy and paste

each shows the source doc it came from
Install voice extrasfrom source ↗
pip install "hermes-agent[voice]"
System deps (Mac)from source ↗
brew install portaudio ffmpeg opus
System deps (Linux)from source ↗
sudo apt install portaudio19-dev ffmpeg libopus0
Premium TTSfrom source ↗
pip install "hermes-agent[tts-premium]"
Enable inside a chatfrom source ↗
/voice on

Going deeper · Related Hermes docs

further reading · not sources of facts shown above

Next in the series · Episodes that build on this

E20
Browser Automation
E24
Hermes on Telegram
E25
Hermes on Discord