Audio Transcription

pending

by Panagiotis Tzamtzis

Transcribe audio files (m4a, mp3) and extract actionable insights using AI. Supports all languages with local or cloud processing.

★ 1 starsUpdated 2mo agoMITDiscovered via Obsidian Unofficial Plugins

View on GitHub

Audio Transcription & Analysis Plugin for Obsidian

Transform your audio recordings into structured, actionable notes automatically. This plugin transcribes audio files (m4a, mp3) and extracts key points, action items, and follow-ups using AI - all within your Obsidian vault.

Features

Automatic Transcription: Convert meeting recordings, lectures, and interviews into text
Multilingual Support: Full support for Greek and English (with automatic language detection)
AI-Powered Analysis: Extract summaries, key points, action items, and follow-up questions
Local or Cloud Processing: Choose between privacy-focused local processing or faster cloud APIs
Speaker Identification: Distinguish between different speakers in conversations
Long Audio Support: Handle recordings up to 2+ hours
Seamless Integration: Results saved as markdown files in your vault
Customizable: Add your own analysis instructions for personalized results

What It Looks Like: End-User Journey

Step 1: Installing the Plugin

After installing from Obsidian Community Plugins:

Open Obsidian Settings (gear icon)
Navigate to "Community Plugins"
Search for "Audio Transcription"
Click "Install" then "Enable"

You'll see a new microphone icon in your left ribbon bar.

Step 2: First-Time Setup

When you first use the plugin, you'll see a welcome screen:

============================================
  Welcome to Audio Transcription Plugin!
============================================

  Before you can transcribe audio files,
  you need to download a transcription
  model.

  Recommended: Medium Model (1.5 GB)
  - Best balance of speed and accuracy
  - Good Greek language support
  - Can process 1-hour audio in ~20 mins

  Other options:
  - Small (466 MB) - Faster, less accurate
  - Large (2.9 GB) - Best quality, slower

  [Download Medium Model]  [Choose Other]

  Or use cloud processing (no download):
  [Configure Cloud API]
============================================

If you click "Download Medium Model", you'll see:

============================================
  Downloading Whisper Medium Model...
============================================

  Progress: [##########....] 742 MB / 1.5 GB

  Estimated time remaining: 3 minutes

  This is a one-time download. The model
  will be saved to your plugin folder.

            [Cancel Download]
============================================

Step 3: Configuring Settings (Optional)

Open Settings > Audio Transcription to see:

============================================
        TRANSCRIPTION SETTINGS
============================================

Processing Mode
(*) Local (Whisper.cpp) - Private, no internet needed
( ) Cloud (OpenAI Whisper) - Faster, requires API key
( ) Cloud (OpenRouter) - Use custom models

Model Size (for local processing)
[  tiny  |  base  | small | *medium | large  ]

Default Language
(*) Auto-detect - Let the AI figure it out
( ) English only
( ) Greek only
( ) Multilingual (both)

[X] Enable Speaker Diarization (identify speakers)

--------------------------------------------
           ANALYSIS SETTINGS
--------------------------------------------

Analysis Provider
(*) Local (Ollama) - Requires Ollama installed
( ) Cloud (OpenRouter) - Requires API key

Custom Analysis Instructions (optional)
+---------------------------------------+
| Focus on technical decisions and      |
| deadlines. Tag people using @name     |
| format. Identify risks and blockers.  |
+---------------------------------------+

--------------------------------------------
      API KEYS (for cloud processing)
--------------------------------------------

OpenAI API Key (for Whisper API)
[sk-************************************************]

OpenRouter API Key (for analysis)
[sk-or-********************************************]

OpenRouter Model Name
[meta-llama/llama-3.2-3b-instruct              ]

--------------------------------------------
           MODEL MANAGEMENT
--------------------------------------------

Local Models Path: ./models/

Installed Models:
( ) tiny.bin    - Not downloaded
( ) base.bin    - Not downloaded
( ) small.bin   - Not downloaded
(*) medium.bin  - ✓ Installed (1.5 GB)
( ) large.bin   - Not downloaded

[Download Selected Model]  [Delete Model]

--------------------------------------------
           OUTPUT SETTINGS
--------------------------------------------

Output Folder
[/Transcriptions                           ] [Browse]

[X] Include timestamps in transcription
[X] Auto-create tags from analysis
[X] Skip files that are already analyzed

            [Save Settings]
============================================

Step 4: Transcribing Your First Audio File

There are two ways to start transcription:

Method 1: Right-click on an audio file

In your file explorer, find an audio file (meeting-2025-01-15.m4a)
Right-click on it
Select "Transcribe audio file" from the context menu

File: meeting-2025-01-15.m4a
-----------------------------
Rename
Delete
Copy path
Move file
> Transcribe audio file  <-- Click here!
Properties

Method 2: Use the ribbon icon

Click the microphone icon in the left sidebar
A file picker appears
Select your audio file

Step 5: Watching the Progress

You'll see a notification in the bottom-right corner:

======================================
  Transcribing: meeting-2025-01-15
======================================

  Step 1/3: Transcribing audio...
  Progress: [########..] 73%

  Estimated time: 4 minutes remaining

       [Cancel Transcription]
======================================

After transcription completes:

======================================
  Transcribing: meeting-2025-01-15
======================================

  Step 2/3: Analyzing content...
  Extracting key points and actions

  [#########.] 85%
======================================

Then:

======================================
  Transcribing: meeting-2025-01-15
======================================

  Step 3/3: Creating markdown file...

  [###########] 100%
======================================

Finally:

======================================
      ✓ Transcription Complete
======================================

  File created: meeting-2025-01-15.md

  Duration: 1:32:45
  Processing time: 18 minutes

       [Open File]  [Dismiss]
======================================

Step 6: Viewing the Results

Clicking "Open File" opens your new markdown note:

---
audio_file: "meeting-2025-01-15.m4a"
duration: "1:32:45"
transcribed_date: 2025-01-15T14:32:00
language: "en"
speakers: 3
tags: [meeting, transcription, project-alpha, budget-review]
---

# Meeting Transcription: Q1 Budget Review

**Audio File:** meeting-2025-01-15.m4a
**Date:** January 15, 2025
**Duration:** 1 hour 32 minutes
**Participants:** 3 speakers identified

---

## Summary

This meeting covered the Q1 budget review for Project Alpha. The team discussed resource allocation, timeline adjustments due to staffing changes, and identified three critical blockers that need immediate attention. A follow-up meeting was scheduled for next week to finalize the revised timeline.

---

## Key Points

- **Budget approved** for additional contractor support ($45K)
- **Timeline extended** by 2 weeks due to Sarah's onboarding delay
- **Marketing campaign** launch postponed to March 1st
- **New feature request** from client - needs feasibility assessment
- **Risk identified**: Current API rate limits may impact performance testing
- **Decision made**: Switch to microservices architecture for Phase 2

---

## Action Items

- [ ] @john Review and approve contractor agreements by Friday (Jan 19)
- [ ] @sarah Set up development environment and complete onboarding checklist
- [ ] @mike Research API rate limit solutions and present options (due: Jan 22)
- [ ] @team Update project timeline in Jira with new milestones
- [ ] @john Schedule client call to discuss new feature request
- [ ] @sarah Create technical specification for microservices migration

---

## Follow-up Questions

- What is the exact scope of the new client feature request?
- Do we have budget flexibility if API solution requires paid tier upgrade?
- Has legal reviewed the contractor agreements?
- When will the new designer start?

---

## Full Transcription

**Speaker 1 (John)** [00:00:15]
Good morning everyone. Thanks for joining today's Q1 budget review. I know we're all busy, so let's try to keep this focused. Sarah, welcome to the team - this is your first planning meeting with us.

**Speaker 2 (Sarah)** [00:00:28]
Thanks John! Happy to be here. Looking forward to diving in.

**Speaker 1 (John)** [00:00:32]
Great. So let me start with the budget overview. We've been tracking expenses closely, and I'm happy to report we're actually 8% under budget for Q4, which gives us some flexibility going forward.

**Speaker 3 (Mike)** [00:00:48]
That's great news. Does that mean we can move forward with the contractor support we discussed?

**Speaker 1 (John)** [00:00:53]
Yes, exactly. I'm proposing we allocate $45,000 for two contractors to help with the frontend work. This should accelerate our timeline significantly.

**Speaker 2 (Sarah)** [00:01:08]
Just to clarify - would these contractors be working on the React components or the new design system?

**Speaker 1 (John)** [00:01:15]
Both, actually. We need someone who can implement the designs and also help establish the component library patterns.

[Transcription continues for full 1:32:45...]

---

**Speaker 3 (Mike)** [01:31:52]
Alright, I think we've covered everything. I'll send out the meeting notes later today.

**Speaker 1 (John)** [01:32:02]
Perfect. Thanks everyone. Let's sync up again next Tuesday.

**Speaker 2 (Sarah)** [01:32:08]
Sounds good. Thanks all!

[End of transcription]

Step 7: What Happens If You Try Again?

If you right-click on the same audio file and select "Transcribe audio file" again:

======================================
     Analysis Already Exists
======================================

  This audio file has already been
  transcribed and analyzed.

  File: meeting-2025-01-15.md
  Created: 2025-01-15 at 14:32

  [Open Existing File]  [OK]
======================================

This prevents accidental duplicate processing and wasted time.

Step 8: Error Handling Example

If something goes wrong during transcription:

======================================
       Transcription Failed
======================================

  ⚠ The transcription process failed

  Error: Could not process audio file

  Possible causes:
  - Audio file may be corrupted
  - Unsupported audio codec
  - Insufficient disk space

  The plugin automatically retried
  once but encountered the same error.

  [View Detailed Logs]  [Close]
======================================

Installation

Requirements

Obsidian v1.4.0 or higher
For Local Processing:
- 4GB+ RAM (8GB recommended for large models)
- 2-3GB free disk space for models
- Windows 10/11 (64-bit)
For Cloud Processing:
- Internet connection
- API key from OpenAI or OpenRouter

Install from Community Plugins (Recommended)

Open Obsidian Settings
Go to "Community Plugins" and disable Safe Mode
Click "Browse" to open the community plugins browser
Search for "Audio Transcription"
Click "Install"
Once installed, enable the plugin
Follow the first-time setup wizard to download models

Manual Installation (Advanced)

Download the latest release from GitHub
Extract the files to <vault>/.obsidian/plugins/obsidian-transcription-plugin/
Reload Obsidian
Enable the plugin in Settings > Community Plugins

Setup Guide

Option 1: Local Processing (Recommended for Privacy)

Advantages:

Complete privacy - audio never leaves your device
No ongoing costs
Works offline
Full control over processing

Setup Steps:

Open plugin settings
Select "Local (Whisper.cpp)" as processing mode
Choose your model size:
- Small (466 MB): Fast, good for English-only, basic quality
- Medium (1.5 GB): Recommended - balanced speed/quality, good Greek support
- Large (2.9 GB): Best quality, excellent multilingual, slower
Click "Download Selected Model"
Wait for download to complete (one-time only)

Note: First transcription may take a few minutes as the system initializes. Subsequent transcriptions will be faster.

Option 2: Cloud Processing with OpenAI Whisper

Advantages:

Faster processing
No large downloads required
Works on any device
Excellent accuracy

Cost: ~~$0.006 per minute (~~$0.72 for 2-hour recording)

Setup Steps:

Get an OpenAI API key from https://platform.openai.com/api-keys
Open plugin settings
Select "Cloud (OpenAI Whisper)" as processing mode
Paste your API key in the "OpenAI API Key" field
Save settings

Option 3: Cloud Processing with OpenRouter

Advantages:

Access to multiple AI models
Often cheaper than OpenAI
Flexible model selection

Setup Steps:

Get an OpenRouter API key from https://openrouter.ai/keys
Open plugin settings
Select "Cloud (OpenRouter)" as processing mode
Paste your API key in the "OpenRouter API Key" field
Enter your preferred model name (e.g., openai/whisper-large-v3)
Save settings

Configuring Analysis (AI Insights)

The plugin can analyze your transcriptions to extract key information.

Option 1: Local Analysis with Ollama (Free)

Install Ollama from https://ollama.ai
Run ollama pull llama3.2:3b in your terminal
In plugin settings, select "Local (Ollama)" as analysis provider
The plugin will automatically connect to Ollama

Option 2: Cloud Analysis with OpenRouter

Get an OpenRouter API key (same as above)
In plugin settings, select "Cloud (OpenRouter)" as analysis provider
Paste your API key
Choose a model (recommended: meta-llama/llama-3.2-3b-instruct)

Adding Custom Analysis Instructions

Want the AI to focus on specific things? Add custom instructions:

Examples:

For project meetings:

- Tag all participants with @ symbol
- Identify technical decisions and mark with [DECISION]
- Flag any mentioned deadlines with [DEADLINE]
- Highlight budget discussions

For lecture notes:

- Extract key concepts and definitions
- Create a glossary of technical terms
- Identify examples and case studies
- Note any assigned homework or readings

For interviews:

- Identify main themes discussed
- Extract interesting quotes verbatim
- Note emotional reactions or emphasis
- Highlight follow-up topics

Usage Examples

Example 1: Team Meeting Notes

Scenario: You recorded a 45-minute team standup meeting with 4 participants.

Steps:

Save recording as team-standup-2025-01-15.m4a in your vault
Right-click → "Transcribe audio file"
Wait ~8-12 minutes (medium model, local processing)
Open the generated team-standup-2025-01-15.md file

Result: You get a complete transcription with:

Each person's comments identified
Action items automatically extracted as checkboxes
Key decisions highlighted
Tagged with relevant project names

Example 2: Client Call (Confidential)

Scenario: 1-hour client discussion with sensitive information. Privacy is critical.

Steps:

Ensure you're using local processing (no cloud APIs)
Record and save as client-call-acme-corp.m4a
Add custom instruction: "Identify all commitments made to the client"
Transcribe

Result: Complete transcription that never left your computer, with client commitments clearly marked.

Example 3: Greek Language Lecture

Scenario: 90-minute university lecture in Greek

Steps:

Use large model for best Greek support
Set language to "Greek only" or "Auto-detect"
Add custom instruction: "Extract key concepts and create a glossary of technical terms"
Transcribe

Result: Full Greek transcription with technical terms identified and defined.

Example 4: Bilingual Meeting (English + Greek)

Scenario: Meeting where participants switch between English and Greek

Steps:

Use medium or large model
Set language to "Multilingual (both)" or "Auto-detect"
Transcribe

Result: Accurate transcription with both languages correctly identified and transcribed.

How It Works (Behind the Scenes)

The Transcription Process

Pre-check: Plugin checks if this audio file was already transcribed
Model Check: Verifies the selected model is downloaded
Audio Loading: Reads the audio file from your vault
Chunking: For long files, splits audio into manageable segments (30-min chunks)
Transcription: Processes each chunk with Whisper
Speaker Diarization: If enabled, identifies different speakers
Language Detection: Automatically detects language(s) in the audio
Assembly: Combines all chunks into complete transcript

The Analysis Process

Transcript Review: AI reads the complete transcription
Context Understanding: Identifies the type of content (meeting, lecture, etc.)
Custom Instructions: Applies any user-defined analysis rules
Extraction: Pulls out:
- Summary (2-3 sentences)
- Key points (bullet list)
- Action items (as checkboxes with assignees)
- Follow-up questions
- Relevant tags
Formatting: Creates structured markdown output

Privacy & Data Flow

Local Processing:

Your Audio File → Your Computer → Whisper Model → Transcription
                                        ↓
                               Ollama (Local) → Analysis
                                        ↓
                            Your Vault (.md file)

Nothing leaves your computer. Complete privacy.

Cloud Processing:

Your Audio File → OpenAI/OpenRouter API → Transcription
                                                ↓
                        OpenRouter API → Analysis
                                                ↓
                            Your Vault (.md file)

Audio and transcript sent to external servers. Review your API provider's privacy policy.

Frequently Asked Questions (FAQ)

General Questions

Q: How accurate is the transcription?

A: Using the medium or large model, transcription accuracy is typically 90-95% for clear audio in English or Greek. Accuracy depends on:

Audio quality (clear recordings work best)
Background noise (quiet environments ideal)
Speaker clarity (distinct voices help)
Language complexity (technical jargon may need review)

Q: Can it handle multiple speakers?

A: Yes! When speaker diarization is enabled, the plugin identifies different speakers and labels them as "Speaker 1", "Speaker 2", etc. It cannot currently identify speakers by name automatically.

Q: What audio formats are supported?

A: Currently .m4a and .mp3 files. Support for .wav, .ogg, and .flac may be added in future versions.

Q: How long does transcription take?

A: Processing time varies:

Small model: ~0.1-0.2x realtime (10-min audio = 1-2 min processing)
Medium model: ~0.3-0.5x realtime (1-hour audio = 18-30 min processing)
Large model: ~0.5-1x realtime (1-hour audio = 30-60 min processing)
Cloud APIs: Much faster, usually 0.05-0.1x realtime

Q: Will it work offline?

A: Yes, if you use local processing. Once models are downloaded, you can transcribe without internet.

Technical Questions

Q: Where are models stored?

A: Models are stored in <vault>/.obsidian/plugins/obsidian-transcription-plugin/models/

Q: Can I use my own Whisper model?

A: Currently, the plugin uses official Whisper.cpp models from HuggingFace. Custom model support may be added later.

Q: What if I don't have Ollama installed?

A: You can still use cloud analysis via OpenRouter, or skip the analysis step and just get the transcription.

Q: How much disk space do I need?

A: Model sizes:

Tiny: 75 MB
Base: 142 MB
Small: 466 MB
Medium: 1.5 GB
Large: 2.9 GB

Plus temporary space for audio processing (usually 2-3x the audio file size).

Q: Does it work on mobile (iOS/Android)?

A: Not yet. Currently Windows desktop only. Mobile support may come in future updates.

Troubleshooting Questions

Q: Transcription failed with "model not found" error

A: Go to Settings → Audio Transcription → Model Management and download your selected model.

Q: The transcription is very inaccurate

A: Try these solutions:

Upgrade to a larger model (medium or large)
Check audio quality - clear recordings work best
Set the correct language instead of auto-detect
Ensure audio file isn't corrupted

Q: Plugin says "analysis already available" but I don't see a file

A: The markdown file might be in your configured output folder. Check Settings → Audio Transcription → Output Settings to see the folder path.

Q: Processing is very slow

Local processing is CPU-intensive. Close other applications.
Try a smaller model (small instead of medium)
Consider using cloud processing for faster results
Check if your antivirus is scanning the process

Q: Speaker diarization isn't working

A: Speaker diarization requires cloud processing with Assembly AI (coming in Phase 3) or local pyannote installation (advanced). Currently limited functionality.

Usage Questions

Q: Can I edit the transcription after it's created?

A: Absolutely! It's a markdown file in your vault. Edit it like any other note.

Q: Can I re-transcribe if I'm not happy with the results?

A: Yes. Delete the generated markdown file first, then transcribe again. The plugin skips files that already have analysis.

Q: Can I transcribe video files?

A: Not directly. Extract the audio first using a tool like VLC or FFmpeg, then transcribe the audio file.

Q: How do I share transcriptions with others?

A: They're standard markdown files. Export to PDF, copy the text, or share the .md file directly.

Privacy & Cost Questions

Q: Is my audio data private?

A: With local processing: Yes, completely private. Audio never leaves your device. With cloud processing: Audio is sent to API provider (OpenAI, OpenRouter). Check their privacy policies.

Q: How much do cloud APIs cost?

A: Approximate costs:

OpenAI Whisper: $0.006/minute ($7.20 per 20 hours)
OpenRouter: Varies by model, often cheaper
Local processing: Free (after model download)

Q: Do I need a paid Obsidian account?

A: No. This plugin works with free Obsidian.

Roadmap

Current Features (v1.0)

✓ Local Whisper.cpp transcription
✓ Cloud transcription (OpenAI, OpenRouter)
✓ Greek and English language support
✓ AI-powered analysis and extraction
✓ Customizable analysis instructions
✓ Automatic model management
✓ Duplicate detection
✓ Error retry logic
✓ Custom prompt templates

Planned Features (Future Versions)

v1.1 - Enhanced Analysis

Multiple analysis profiles (meeting, lecture, interview)
Improved speaker identification
Export to other formats (PDF, DOCX)

v1.2 - Speaker Diarization

Full speaker identification
Speaker labeling and naming
Improved multi-speaker accuracy

v2.0 - Advanced Features

Real-time transcription during recording
Video file support (auto-extract audio)
Batch processing (multiple files at once)
Mobile app support (iOS/Android)
Integration with other plugins (Calendar, Tasks)

Community Requests

Your feedback shapes the roadmap! Submit feature requests on GitHub.

Support & Community

Getting Help

Documentation: You're reading it!
GitHub Issues: Report bugs or request features at github.com/tzamtzis/obsidian-transcription-plugin
Obsidian Forum: Discuss the plugin with other users

Contributing

This is an open-source project! Contributions welcome:

Report bugs
Suggest features
Submit pull requests
Improve documentation
Share your use cases

License

MIT License - Free to use, modify, and distribute.

Credits

Built with:

Whisper.cpp - Fast C++ implementation of OpenAI Whisper
Obsidian API - Plugin framework
OpenAI Whisper - Original transcription model

Special Thanks:

OpenAI for creating Whisper
Georgi Gerganov for whisper.cpp
Obsidian team for the amazing plugin API
Beta testers and early adopters

Changelog

v1.0.0 (2026-01-27)

Initial release
Local and cloud transcription
Greek and English support
AI-powered analysis
Automatic model management
Windows desktop support

Made with ♥ for the Obsidian community

Transform your audio into knowledge. Start transcribing today!

For plugin developers

Search results and similarity scores are powered by semantic analysis of your plugin's README. If your plugin isn't appearing for searches you'd expect, try updating your README to clearly describe your plugin's purpose, features, and use cases.