hermes-TTS
pendingby Matthias
Generate lightweight audio from a markdown note and prepend timestamped metadata with an embedded audio link.
hermes-TTS
Convert any Obsidian Markdown note into lightweight speech audio, then prepend a timestamped metadata callout with an embedded audio link.
What changed
This plugin now uses an Aloud-style API link-up pattern:
- One Model Provider selector in settings.
- Provider-specific fields shown only for the selected provider.
- Voice selection is done via dropdowns for all major providers.
- New Voice prompt section for optional speaking-style instructions.
- Output is always normalized to MP3.
- Character limit is no longer user-configurable (notes are processed without a fixed UI cap).
- File name prefix and speech speed settings were removed to simplify configuration.
Supported providers
- OpenAI
- Google Gemini
- Google Cloud Text-to-Speech
- Azure Speech
- ElevenLabs
- AWS Polly
- OpenAI-compatible endpoints (custom base URL)
Policy disclosures
- Network access is required. The plugin sends note text to the selected external TTS provider.
- External accounts and API keys are required for provider usage (OpenAI, Google, Azure, ElevenLabs, AWS, or compatible API).
- The plugin does not include telemetry or ads.
Mobile compatibility
- Hermes TTS is configured to load on mobile (
isDesktopOnly: false). - The bundle is built for browser-compatible runtimes to support Obsidian mobile.
- The plugin avoids regex lookbehind and Node-only
Bufferusage in runtime paths for broader mobile compatibility. - Provider behavior may still vary by service/API/network conditions on mobile devices.
Voice dropdown behavior
- OpenAI/Gemini: curated built-in voice dropdowns.
- Google Cloud/Azure/ElevenLabs/AWS Polly: dropdowns with refresh buttons to fetch latest provider voices.
- OpenAI-compatible: OpenAI-style voice dropdown.
- Audio from all providers is normalized and saved as MP3.
Voice prompt behavior
- The Voice prompt setting is global and optional.
- OpenAI: sent as
instructionsonly when usinggpt-4o-mini-ttsmodels (per API behavior). - Gemini: prepended as style notes before the transcript in the prompt.
- Other providers currently ignore this field.
Gemini reliability fallback
- Gemini uses the official
@google/genaiSDK flow (matching Aloud plugin setup). - On Gemini
400"tried to generate text" errors, the plugin retries in segmented transcript mode with rolling previous-context continuity. - If Gemini fails with transient errors and Google Cloud TTS is configured, generation automatically falls back to Google Cloud.
- Metadata uses the provider that actually generated the audio.
Commands
Generate Hermes-TTS audio (current note)
Provider documentation
The same docs are also available from buttons in the plugin settings tab.
Metadata block format
The plugin prepends a callout block near the top of the note (after frontmatter if present). Metadata lines can be toggled in settings. The title is a clean timestamp. For example:
> [!tts]+ 2026-02-17 15:42:10.321
> generated_at: 2026-02-17T14:42:10.321Z
> source_note: [[02 Projects/My Note]]
> provider: openai
> provider_name: OpenAI
> model: gpt-4o-mini-tts
> voice: shimmer
> format: mp3
> mime_type: audio/mpeg
> source_characters_sent: 2412
> provider_docs: https://platform.openai.com/docs/guides/text-to-speech
> voice_docs: https://platform.openai.com/docs/guides/text-to-speech#voice-options
> audio_file: ![[Attachments/TTS Audio/my-note-20260217-154210.mp3]]
Build
npm ci
npm run build
Release assets expected by Obsidian:
manifest.jsonmain.jsstyles.css
For plugin developers
Search results and similarity scores are powered by semantic analysis of your plugin's README. If your plugin isn't appearing for searches you'd expect, try updating your README to clearly describe your plugin's purpose, features, and use cases.