hermes-TTS

pending

by Matthias

Generate lightweight audio from a markdown note and prepend timestamped metadata with an embedded audio link.

Updated 1mo agoMITDiscovered via Obsidian Unofficial Plugins
View on GitHub

hermes-TTS

Convert any Obsidian Markdown note into lightweight speech audio, then prepend a timestamped metadata callout with an embedded audio link.

What changed

This plugin now uses an Aloud-style API link-up pattern:

  • One Model Provider selector in settings.
  • Provider-specific fields shown only for the selected provider.
  • Voice selection is done via dropdowns for all major providers.
  • New Voice prompt section for optional speaking-style instructions.
  • Output is always normalized to MP3.
  • Character limit is no longer user-configurable (notes are processed without a fixed UI cap).
  • File name prefix and speech speed settings were removed to simplify configuration.

Supported providers

  • OpenAI
  • Google Gemini
  • Google Cloud Text-to-Speech
  • Azure Speech
  • ElevenLabs
  • AWS Polly
  • OpenAI-compatible endpoints (custom base URL)

Policy disclosures

  • Network access is required. The plugin sends note text to the selected external TTS provider.
  • External accounts and API keys are required for provider usage (OpenAI, Google, Azure, ElevenLabs, AWS, or compatible API).
  • The plugin does not include telemetry or ads.

Mobile compatibility

  • Hermes TTS is configured to load on mobile (isDesktopOnly: false).
  • The bundle is built for browser-compatible runtimes to support Obsidian mobile.
  • The plugin avoids regex lookbehind and Node-only Buffer usage in runtime paths for broader mobile compatibility.
  • Provider behavior may still vary by service/API/network conditions on mobile devices.

Voice dropdown behavior

  • OpenAI/Gemini: curated built-in voice dropdowns.
  • Google Cloud/Azure/ElevenLabs/AWS Polly: dropdowns with refresh buttons to fetch latest provider voices.
  • OpenAI-compatible: OpenAI-style voice dropdown.
  • Audio from all providers is normalized and saved as MP3.

Voice prompt behavior

  • The Voice prompt setting is global and optional.
  • OpenAI: sent as instructions only when using gpt-4o-mini-tts models (per API behavior).
  • Gemini: prepended as style notes before the transcript in the prompt.
  • Other providers currently ignore this field.

Gemini reliability fallback

  • Gemini uses the official @google/genai SDK flow (matching Aloud plugin setup).
  • On Gemini 400 "tried to generate text" errors, the plugin retries in segmented transcript mode with rolling previous-context continuity.
  • If Gemini fails with transient errors and Google Cloud TTS is configured, generation automatically falls back to Google Cloud.
  • Metadata uses the provider that actually generated the audio.

Commands

  • Generate Hermes-TTS audio (current note)

Provider documentation

ProviderAPI docsVoice docs
OpenAIhttps://platform.openai.com/docs/guides/text-to-speechhttps://platform.openai.com/docs/guides/text-to-speech#voice-options
Google Geminihttps://ai.google.dev/gemini-api/docs/speech-generationhttps://ai.google.dev/gemini-api/docs/speech-generation#voices
Google Cloud TTShttps://cloud.google.com/text-to-speech/docs/reference/resthttps://cloud.google.com/text-to-speech/docs/list-voices-and-types
Azure Speechhttps://learn.microsoft.com/azure/ai-services/speech-service/rest-text-to-speechhttps://learn.microsoft.com/azure/ai-services/speech-service/language-support?tabs=tts
ElevenLabshttps://elevenlabs.io/docs/api-reference/text-to-speech/converthttps://elevenlabs.io/docs/voices
AWS Pollyhttps://docs.aws.amazon.com/polly/latest/dg/API_SynthesizeSpeech.htmlhttps://docs.aws.amazon.com/polly/latest/dg/voicelist.html
OpenAI-compatiblehttps://platform.openai.com/docs/api-reference/audio/createSpeechhttps://platform.openai.com/docs/guides/text-to-speech#voice-options

The same docs are also available from buttons in the plugin settings tab.

Metadata block format

The plugin prepends a callout block near the top of the note (after frontmatter if present). Metadata lines can be toggled in settings. The title is a clean timestamp. For example:

> [!tts]+ 2026-02-17 15:42:10.321
> generated_at: 2026-02-17T14:42:10.321Z
> source_note: [[02 Projects/My Note]]
> provider: openai
> provider_name: OpenAI
> model: gpt-4o-mini-tts
> voice: shimmer
> format: mp3
> mime_type: audio/mpeg
> source_characters_sent: 2412
> provider_docs: https://platform.openai.com/docs/guides/text-to-speech
> voice_docs: https://platform.openai.com/docs/guides/text-to-speech#voice-options
> audio_file: ![[Attachments/TTS Audio/my-note-20260217-154210.mp3]]

Build

npm ci
npm run build

Release assets expected by Obsidian:

  • manifest.json
  • main.js
  • styles.css

For plugin developers

Search results and similarity scores are powered by semantic analysis of your plugin's README. If your plugin isn't appearing for searches you'd expect, try updating your README to clearly describe your plugin's purpose, features, and use cases.