Transcription Audio
approvedby cha-yh
Transcribe audio files into Markdown notes.
Transcription Audio(Beta) Plugin for Obsidian
Turn your audio into structured Markdown notes inside Obsidian. This plugin detects an audio file linked in your current note, sends it to Gemini for transcription and summarization, and inserts the result back into your note. A right-hand progress panel shows what’s happening step by step.
Features
- Smart audio detection from links or embeds in the active note
- Google Gemini transcription and summarization
- Progress panel (sidebar) with live status:
- Detected audio filename and size
- Audio preparation status
- API request start/completion times
- Gemini usage logs (prompt/output/total tokens)
- Cancel button to stop upload/API request in progress
- Success/error result
- Writes the final output to the file and cursor position where you started the command
Requirements
- A Google AI API key for Gemini. You can obtain one at https://aistudio.google.com/api-keys
Getting started
- Open Obsidian Settings
- Navigate to "Community plugins" and click "Browse"
- Search for "Transcription Audio" and click Install
- Enable the plugin in Community plugins
- Set up your API key in plugin settings (SecretStorage recommended)
Configuration
Open Settings → Transcription Audio:
- API Key (SecretStorage, recommended): Select the secret name from Obsidian SecretStorage
- API Key (deprecated, not recommended): Legacy plain-text API key field kept for backward compatibility fallback
- On older Obsidian versions, SecretStorage is disabled and you will see an update-required message (Obsidian 1.11.4+)
- Transcription mode:
- Basic mode (default): prompt only
- Template mode: dedicated prompt + output template (both prefilled with defaults)
- Model: Select a Gemini-compatible model (
gemini-2.5-flash,gemini-2.5-pro,gemini-3-flash-preview,gemini-3.1-pro-preview) gemini-3-pro-previewis deprecated by Google and shuts down on March 9, 2026. Existing settings are automatically migrated togemini-3.1-pro-preview.- Prompt: Customize the instruction for the selected mode
- Output template: Available in template mode to enforce a consistent final markdown structure
Usage
- In a note, linked file before your cursor, for example:
- Wiki link:
![[example_audio.wav]]
- Wiki link:
- Place the cursor after the link.
- Run the command: "Transcribe audio".
- A progress panel will automatically open in the right sidebar, showing real-time status updates including file upload progress, API request status, and transcription progress.
- When complete, the transcription and notes are inserted at your starting cursor position.
Privacy & Data
Audio content is sent to Google’s Gemini API for processing. The plugin does not store your audio or transcripts outside your vault. Keep your API key secure and review your organization’s data policies before use.
Changelog
Version 0.5.0
- Transcription mode enhancements
- Added Template mode so prompt and output template can be configured separately
- Gemini 3 Pro Preview migration
- Added automatic migration from
gemini-3-pro-previewtogemini-3.1-pro-preview - Updated related settings and documentation for current model options
- Added automatic migration from
Version 0.4.1
- Gemini 3 Pro Preview migration
- Replaced
gemini-3-pro-previewwithgemini-3.1-pro-previewin model selection - Automatically migrates previously saved
gemini-3-pro-previewsetting togemini-3.1-pro-preview
- Replaced
Version 0.4.0
- SecretStorage API key support
- Added Obsidian SecretStorage-based API key selection (recommended)
- Kept legacy plain-text API key as fallback for backward compatibility
- Cancelable transcription flow
- Added cancel control in the progress panel
- Improved cancellation handling for upload/request steps
- Progress panel navigation improvements
- File and Target entries are clickable links
- Target navigation moves to the exact line/character position
- Progress log improvements
- Added localized timestamp to the initial
Log startline
- Added localized timestamp to the initial
- Gemini usage visibility
- Added token usage logs (prompt/output/total and related token fields) in progress detail
Version 0.3.0
- Add gemini-3-flash-preview(default) model to settings
- Enhanced Progress Tracking: Improved transcription process with detailed progress tracking and UI updates
- Enhanced progress panel with more detailed status information
- Better visual feedback during transcription process
- Improved error handling and status reporting
- Updated Default Settings: Updated default settings with new model and refined prompt structure
- Optimized default model selection
- Improved prompt structure for better transcription quality
License
This project is licensed under the MIT License.
For plugin developers
Search results and similarity scores are powered by semantic analysis of your plugin's README. If your plugin isn't appearing for searches you'd expect, try updating your README to clearly describe your plugin's purpose, features, and use cases.