Smart Transcriber
pendingby Kazen
Record audio and get instant transcripts using OpenAI Whisper API with intelligent voice detection and real-time sidebar display.
Smart Transcriber - Obsidian Voice Transcription Plugin
Advanced real-time voice transcription plugin for Obsidian using OpenAI Whisper API with intelligent voice detection and smart timing features. Record audio and get accurate transcripts with advanced voice activity detection.
Features
- š¤ Smart Voice Detection: Advanced voice activity detection with human voice recognition
- š§ Intelligent Timing: Smart segmentation based on actual voice activity, not fixed time intervals
- š Noise Suppression: Built-in signal processing to filter out background noise and computer audio
- š¤ AI Transcription: Uses OpenAI Whisper API for highly accurate speech-to-text conversion
- š Real-time Display: Live transcription updates in the sidebar as you speak
- āļø Editable Results: Click to edit and refine transcription results
- š Multi-language Support: Auto-detection or manual selection from 12+ languages
- šÆ Adaptive Segmentation: Uploads segments only when voice pauses are detected
- āļø Advanced Settings: Fine-tune voice detection, pause thresholds, and segment durations
- š Audio Level Monitoring: Real-time audio level visualization with voice activity indicators
Installation
Official Installation (Recommended)
- Open Obsidian Settings
- Go to Community plugins
- Browse community plugins
- Search for "Smart Transcriber"
- Install and enable the plugin
Manual Installation
- Download the latest release from the releases page
- Extract to your vault's
.obsidian/plugins/smart-transcriber/folder - Enable the plugin in Obsidian settings
Development Installation
-
Clone this repository to your Obsidian plugins folder:
cd /path/to/your/vault/.obsidian/plugins git clone https://github.com/Kazzx9921/obsidian-smart-transcriber cd smart-transcriber -
Install dependencies:
npm install -
Build the plugin:
npm run build -
Enable the plugin in Obsidian settings
Setup
-
Get OpenAI API Key:
- Visit OpenAI API Keys
- Create a new API key
- Make sure you have credits available
-
Configure Plugin:
- Open Smart Transcriber settings in Obsidian Settings > Community plugins > Smart Transcriber
- Enter your OpenAI API key and click "Verify" to test it
- Adjust recording settings (segment duration, pause threshold, etc.)
- Configure language settings and voice detection parameters
Usage
Basic Recording
- Open Sidebar: Click the microphone icon in the ribbon or use the command palette
- Start Recording: Click the record button to begin voice transcription
- Grant Permissions: Allow microphone access when prompted
- Smart Detection: The plugin automatically detects when you're speaking vs. background noise
- Live Transcripts: Watch as your speech is transcribed in real-time with intelligent segmentation
- Stop Recording: Click the button again to stop
Advanced Features
Smart Voice Detection
- Human Voice Recognition: Distinguishes between human speech and computer audio
- Confidence Scoring: Only processes audio with high confidence voice detection
- Background Noise Filtering: Automatically filters out ambient noise and system sounds
- Adaptive Thresholds: Learns and adapts to your voice patterns and environment
Intelligent Timing & Segmentation
- Activity-Based Timing: Timer only counts when voice is actively detected
- Smart Pause Detection: Automatically uploads segments when voice pauses are detected
- Minimum Duration Control: Ensures segments have sufficient content before processing
- Configurable Thresholds: Adjust pause detection and minimum segment durations
Settings Configuration
- Segment Duration: Target duration for voice segments (3-30 seconds of active speech)
- Pause Detection Threshold: How long to wait after voice stops before uploading (500ms-3000ms)
- Minimum Segment Duration: Minimum active speech time before creating a segment (1-10 seconds)
- Language: Choose from 12+ languages or use auto-detection
- Translation: Enable to translate non-English audio to English
- Audio Level Monitoring: Real-time visual feedback of voice detection confidence
Editing Transcripts
- Click on any transcript segment to edit the text
- Use the copy button to copy individual segments
- Export all transcripts to a text file
Keyboard Shortcuts
Ctrl/Cmd + Shift + M: Toggle recordingCtrl/Cmd + Shift + T: Open/close transcript sidebar
Technical Details
Architecture
- Frontend: Svelte 4 + TypeScript
- Audio Processing: Web Audio API with MediaRecorder
- Build System: esbuild with Svelte plugin
- API Integration: OpenAI Whisper API
File Structure
smart-transcriber/
āāā src/
ā āāā components/ # Svelte UI components
ā ā āāā SimpleVoiceRecorder.svelte # Main recording interface
ā ā āāā VoiceTranscriberApp.svelte # App container
ā ā āāā TranscriptDisplay.svelte # Transcript display
ā ā āāā SimpleSettingsPanel.svelte # Settings UI
ā ā āāā TestComponent.svelte # Testing components
ā āāā services/ # Core business logic
ā ā āāā VoiceActivityDetector.ts # Smart voice detection
ā ā āāā WhisperAPI.ts # OpenAI API integration
ā ā āāā AudioRecorder.ts # Audio capture
ā ā āāā SegmentManager.ts # Segment management
ā ā āāā AudioSourceManager.ts # Audio source handling
ā āāā utils/ # Utility functions
ā ā āāā SignalProcessor.ts # Audio signal processing
ā ā āāā AudioLevelConverter.ts # Audio level calculations
ā āāā settings/ # Plugin configuration
ā ā āāā PluginSettings.ts # Settings interface
ā ā āāā SettingTab.ts # Settings UI tab
ā āāā views/ # Obsidian integration
ā āāā SidebarView.ts # Sidebar view implementation
āāā main.ts # Plugin entry point
āāā main.js # Compiled plugin bundle
āāā manifest.json # Plugin metadata
āāā versions.json # Version compatibility
āāā LICENSE # MIT License
āāā README.md # Documentation
āāā esbuild.config.mjs # Build configuration
Performance Optimizations
- Smart Segmentation: Voice activity-based chunking eliminates empty segments
- Intelligent Queuing: Sequential API requests with smart retry logic
- Real-time Processing: 50ms detection intervals for responsive voice activity detection
- Memory Management: Automatic cleanup of processed audio segments
- Background Noise Reduction: Built-in signal processing reduces API costs
- Adaptive Confidence Scoring: Dynamic thresholds based on environment conditions
Troubleshooting
Common Issues
"OpenAI API key is required"
- Enter a valid API key in Smart Transcriber settings
- Use the "Verify" button to test your API key
- Ensure your OpenAI account has available credits
"Failed to initialize: Permission denied"
- Grant microphone permission when prompted by your browser
- Check system privacy settings (macOS: System Settings > Privacy & Security > Microphone)
- Restart Obsidian if permissions were recently changed
"Voice not detected" or "No transcription"
- Check if your microphone is working in other applications
- Adjust the audio level - speak louder or move closer to microphone
- Try adjusting the "Pause Detection Threshold" in settings
- Ensure background noise isn't too high
"Transcription failed" or API errors
- Verify your internet connection is stable
- Check that your OpenAI API key has sufficient credits
- Try reducing the segment duration if uploads are timing out
- Check the browser console for detailed error messages
Performance Tips
- Microphone Quality: Use a good quality microphone positioned 6-12 inches from your mouth
- Speaking Style: Speak clearly at a moderate pace with natural pauses
- Environment: Record in a quiet environment to improve voice detection accuracy
- Settings Optimization:
- Start with default settings (8s segments, 1s pause threshold)
- Adjust pause threshold based on your speaking rhythm (faster speakers: 500ms, slower: 2000ms)
- Set minimum segment duration to 3-5s to avoid processing very short utterances
- API Efficiency: The smart voice detection significantly reduces API costs by only processing segments with actual speech
Privacy & Security
- Audio Processing: Audio is only sent to OpenAI's Whisper API for transcription
- No Local Storage: Audio segments are processed in memory and not stored on disk
- Smart Filtering: Voice detection prevents accidental recording of system audio or background noise
- Secure API: API key is stored locally in Obsidian settings, never transmitted to third parties
- Local Transcripts: All transcription results are stored locally in your Obsidian vault
- No Tracking: The plugin does not collect analytics or usage data
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
License
MIT License - see LICENSE for details
Support
- š Report bugs in GitHub Issues
- š” Request features in GitHub Discussions
- ā Support development: Buy Me a Coffee
- š Check the documentation for detailed guides and troubleshooting
Author
Created by Kazen - Website
Made with ā¤ļø for the Obsidian community. Smart Transcriber brings advanced voice recognition technology to your note-taking workflow.
For plugin developers
Search results and similarity scores are powered by semantic analysis of your plugin's README. If your plugin isn't appearing for searches you'd expect, try updating your README to clearly describe your plugin's purpose, features, and use cases.