Smart Transcriber

pending

by Kazen

Record audio and get instant transcripts using OpenAI Whisper API with intelligent voice detection and real-time sidebar display.

Updated 6mo agoMITDiscovered via Obsidian Unofficial Plugins
View on GitHub

Smart Transcriber - Obsidian Voice Transcription Plugin

Advanced real-time voice transcription plugin for Obsidian using OpenAI Whisper API with intelligent voice detection and smart timing features. Record audio and get accurate transcripts with advanced voice activity detection.

Features

  • šŸŽ¤ Smart Voice Detection: Advanced voice activity detection with human voice recognition
  • 🧠 Intelligent Timing: Smart segmentation based on actual voice activity, not fixed time intervals
  • šŸ”‡ Noise Suppression: Built-in signal processing to filter out background noise and computer audio
  • šŸ¤– AI Transcription: Uses OpenAI Whisper API for highly accurate speech-to-text conversion
  • šŸ“ Real-time Display: Live transcription updates in the sidebar as you speak
  • āœļø Editable Results: Click to edit and refine transcription results
  • šŸŒ Multi-language Support: Auto-detection or manual selection from 12+ languages
  • šŸŽÆ Adaptive Segmentation: Uploads segments only when voice pauses are detected
  • āš™ļø Advanced Settings: Fine-tune voice detection, pause thresholds, and segment durations
  • šŸ“Š Audio Level Monitoring: Real-time audio level visualization with voice activity indicators

Installation

Official Installation (Recommended)

  1. Open Obsidian Settings
  2. Go to Community plugins
  3. Browse community plugins
  4. Search for "Smart Transcriber"
  5. Install and enable the plugin

Manual Installation

  1. Download the latest release from the releases page
  2. Extract to your vault's .obsidian/plugins/smart-transcriber/ folder
  3. Enable the plugin in Obsidian settings

Development Installation

  1. Clone this repository to your Obsidian plugins folder:

    cd /path/to/your/vault/.obsidian/plugins
    git clone https://github.com/Kazzx9921/obsidian-smart-transcriber
    cd smart-transcriber
    
  2. Install dependencies:

    npm install
    
  3. Build the plugin:

    npm run build
    
  4. Enable the plugin in Obsidian settings

Setup

  1. Get OpenAI API Key:

    • Visit OpenAI API Keys
    • Create a new API key
    • Make sure you have credits available
  2. Configure Plugin:

    • Open Smart Transcriber settings in Obsidian Settings > Community plugins > Smart Transcriber
    • Enter your OpenAI API key and click "Verify" to test it
    • Adjust recording settings (segment duration, pause threshold, etc.)
    • Configure language settings and voice detection parameters

Usage

Basic Recording

  1. Open Sidebar: Click the microphone icon in the ribbon or use the command palette
  2. Start Recording: Click the record button to begin voice transcription
  3. Grant Permissions: Allow microphone access when prompted
  4. Smart Detection: The plugin automatically detects when you're speaking vs. background noise
  5. Live Transcripts: Watch as your speech is transcribed in real-time with intelligent segmentation
  6. Stop Recording: Click the button again to stop

Advanced Features

Smart Voice Detection

  • Human Voice Recognition: Distinguishes between human speech and computer audio
  • Confidence Scoring: Only processes audio with high confidence voice detection
  • Background Noise Filtering: Automatically filters out ambient noise and system sounds
  • Adaptive Thresholds: Learns and adapts to your voice patterns and environment

Intelligent Timing & Segmentation

  • Activity-Based Timing: Timer only counts when voice is actively detected
  • Smart Pause Detection: Automatically uploads segments when voice pauses are detected
  • Minimum Duration Control: Ensures segments have sufficient content before processing
  • Configurable Thresholds: Adjust pause detection and minimum segment durations

Settings Configuration

  • Segment Duration: Target duration for voice segments (3-30 seconds of active speech)
  • Pause Detection Threshold: How long to wait after voice stops before uploading (500ms-3000ms)
  • Minimum Segment Duration: Minimum active speech time before creating a segment (1-10 seconds)
  • Language: Choose from 12+ languages or use auto-detection
  • Translation: Enable to translate non-English audio to English
  • Audio Level Monitoring: Real-time visual feedback of voice detection confidence

Editing Transcripts

  • Click on any transcript segment to edit the text
  • Use the copy button to copy individual segments
  • Export all transcripts to a text file

Keyboard Shortcuts

  • Ctrl/Cmd + Shift + M: Toggle recording
  • Ctrl/Cmd + Shift + T: Open/close transcript sidebar

Technical Details

Architecture

  • Frontend: Svelte 4 + TypeScript
  • Audio Processing: Web Audio API with MediaRecorder
  • Build System: esbuild with Svelte plugin
  • API Integration: OpenAI Whisper API

File Structure

smart-transcriber/
ā”œā”€ā”€ src/
│   ā”œā”€ā”€ components/          # Svelte UI components
│   │   ā”œā”€ā”€ SimpleVoiceRecorder.svelte    # Main recording interface
│   │   ā”œā”€ā”€ VoiceTranscriberApp.svelte    # App container
│   │   ā”œā”€ā”€ TranscriptDisplay.svelte      # Transcript display
│   │   ā”œā”€ā”€ SimpleSettingsPanel.svelte    # Settings UI
│   │   └── TestComponent.svelte          # Testing components
│   ā”œā”€ā”€ services/            # Core business logic
│   │   ā”œā”€ā”€ VoiceActivityDetector.ts      # Smart voice detection
│   │   ā”œā”€ā”€ WhisperAPI.ts                 # OpenAI API integration
│   │   ā”œā”€ā”€ AudioRecorder.ts              # Audio capture
│   │   ā”œā”€ā”€ SegmentManager.ts             # Segment management
│   │   └── AudioSourceManager.ts         # Audio source handling
│   ā”œā”€ā”€ utils/               # Utility functions
│   │   ā”œā”€ā”€ SignalProcessor.ts            # Audio signal processing
│   │   └── AudioLevelConverter.ts        # Audio level calculations
│   ā”œā”€ā”€ settings/            # Plugin configuration
│   │   ā”œā”€ā”€ PluginSettings.ts             # Settings interface
│   │   └── SettingTab.ts                 # Settings UI tab
│   └── views/               # Obsidian integration
│       └── SidebarView.ts               # Sidebar view implementation
ā”œā”€ā”€ main.ts                  # Plugin entry point
ā”œā”€ā”€ main.js                  # Compiled plugin bundle
ā”œā”€ā”€ manifest.json           # Plugin metadata
ā”œā”€ā”€ versions.json           # Version compatibility
ā”œā”€ā”€ LICENSE                 # MIT License
ā”œā”€ā”€ README.md              # Documentation
└── esbuild.config.mjs      # Build configuration

Performance Optimizations

  • Smart Segmentation: Voice activity-based chunking eliminates empty segments
  • Intelligent Queuing: Sequential API requests with smart retry logic
  • Real-time Processing: 50ms detection intervals for responsive voice activity detection
  • Memory Management: Automatic cleanup of processed audio segments
  • Background Noise Reduction: Built-in signal processing reduces API costs
  • Adaptive Confidence Scoring: Dynamic thresholds based on environment conditions

Troubleshooting

Common Issues

"OpenAI API key is required"

  • Enter a valid API key in Smart Transcriber settings
  • Use the "Verify" button to test your API key
  • Ensure your OpenAI account has available credits

"Failed to initialize: Permission denied"

  • Grant microphone permission when prompted by your browser
  • Check system privacy settings (macOS: System Settings > Privacy & Security > Microphone)
  • Restart Obsidian if permissions were recently changed

"Voice not detected" or "No transcription"

  • Check if your microphone is working in other applications
  • Adjust the audio level - speak louder or move closer to microphone
  • Try adjusting the "Pause Detection Threshold" in settings
  • Ensure background noise isn't too high

"Transcription failed" or API errors

  • Verify your internet connection is stable
  • Check that your OpenAI API key has sufficient credits
  • Try reducing the segment duration if uploads are timing out
  • Check the browser console for detailed error messages

Performance Tips

  • Microphone Quality: Use a good quality microphone positioned 6-12 inches from your mouth
  • Speaking Style: Speak clearly at a moderate pace with natural pauses
  • Environment: Record in a quiet environment to improve voice detection accuracy
  • Settings Optimization:
    • Start with default settings (8s segments, 1s pause threshold)
    • Adjust pause threshold based on your speaking rhythm (faster speakers: 500ms, slower: 2000ms)
    • Set minimum segment duration to 3-5s to avoid processing very short utterances
  • API Efficiency: The smart voice detection significantly reduces API costs by only processing segments with actual speech

Privacy & Security

  • Audio Processing: Audio is only sent to OpenAI's Whisper API for transcription
  • No Local Storage: Audio segments are processed in memory and not stored on disk
  • Smart Filtering: Voice detection prevents accidental recording of system audio or background noise
  • Secure API: API key is stored locally in Obsidian settings, never transmitted to third parties
  • Local Transcripts: All transcription results are stored locally in your Obsidian vault
  • No Tracking: The plugin does not collect analytics or usage data

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

License

MIT License - see LICENSE for details

Support

Author

Created by Kazen - Website


Made with ā¤ļø for the Obsidian community. Smart Transcriber brings advanced voice recognition technology to your note-taking workflow.

For plugin developers

Search results and similarity scores are powered by semantic analysis of your plugin's README. If your plugin isn't appearing for searches you'd expect, try updating your README to clearly describe your plugin's purpose, features, and use cases.