Speech to Text
pendingby Taesun Lee
Convert audio recordings to text using multiple AI providers (OpenAI Whisper, Deepgram)
Speech to Text for Obsidian
Convert audio recordings to text directly in Obsidian using multiple AI providers with advanced features like speaker diarization.
English
Features
ðïž Multi-Provider Audio Transcription
- OpenAI Whisper: High accuracy, stable performance
- Deepgram Nova 3: Latest model with 98% accuracy, 70% cost reduction
- Speaker Diarization: Automatic speaker separation with "Speaker 1:", "Speaker 2:" format
- Auto Selection: Automatically chooses the best provider for each file
- Supported Formats: M4A, MP3, WAV, MP4, WebM, OGG, FLAC
ð Multi-language Support
- Auto Detection: Automatic language recognition
- 40+ Languages: Korean, English, Japanese, Chinese, Spanish, French, German, etc.
- Provider Optimization: Each provider optimized for different languages
ð Smart Text Insertion & Speaker Recognition
- Cursor Position: Insert at current cursor location
- Note Positions: Beginning or end of note
- Auto Note Creation: Creates new note if no active editor
- Speaker Diarization: Automatic speaker identification and labeling
- Multi-Speaker Support: Clear separation for meetings, interviews, conversations
â¡ Performance & Architecture
- Nova-3 Model: 98% accuracy with $0.0043/min (70% cost reduction)
- Clean Architecture: Domain-driven design with clear separation of concerns
- Intelligent Provider Selection: Best provider based on file size and format
- Real-time Progress: Status bar progress indicator with cancellation support
- Async Processing: Non-blocking background processing
- Memory Management: Built-in memory monitoring and optimization
- Performance Benchmarking: Integrated performance monitoring tools
- Error Boundaries: Comprehensive error handling and recovery
- Dependency Injection: IoC container for better testability
- Event-Driven Architecture: Decoupled components with EventManager
- Batch Processing: Efficient batch request handling
- Caching Layer: Smart caching for improved performance
- Settings Migration: Automatic settings upgrade and validation
- Fallback Mechanism: Automatic provider switching on failure
Installation
Manual Installation
From Releases
- Download the latest release from Releases
- Extract files to your vault's
.obsidian/plugins/obsidian-speech-to-text/folder - Restart Obsidian
- Enable "Speech to Text" in Community Plugins settings
Development Build
# Clone repository
git clone https://github.com/asyouplz/SpeechNote.git
cd SpeechNote
# Install dependencies
npm install
# Build
npm run build
# Copy to plugin folder
cp main.js manifest.json styles.css /path/to/your/vault/.obsidian/plugins/obsidian-speech-to-text/
Setup
API Key Configuration
1. Choose Provider
- Open Obsidian Settings â "Speech to Text"
- Select "Transcription Provider":
- OpenAI Whisper: High quality, stable
- Deepgram: Fast speed, large file support
- Auto: Automatic selection (recommended)
2. Get OpenAI API Key (for Whisper)
- Visit OpenAI Platform
- Sign in or create account
- Click "Create new secret key"
- Copy the key (â ïž shown only once)
3. Get Deepgram API Key (for Deepgram)
- Visit Deepgram Console
- Sign up or sign in
- Go to "API Keys" menu
- Click "Create a New API Key"
- Copy the API key
4. Configure Plugin
- Open Obsidian Settings (Cmd/Ctrl + ,)
- Select "Speech to Text" from left menu
- Enter your API key(s):
- "OpenAI API Key" (for Whisper)
- "Deepgram API Key" (for Deepgram)
- Save settings
Usage
Basic Usage
Method 1: Command Palette
- Open Command Palette:
Cmd/Ctrl + P - Search: "Transcribe audio file"
- Select File: Choose audio file from list
- Wait: Monitor progress in status bar
- Complete: Text automatically inserted into note
Method 2: Context Menu âš
- File Explorer: Find audio file
- Right Click: Right-click on audio file
- Select: "Transcribe audio file"
- Auto Process: Transcription starts and inserts result
Method 3: Hotkeys
- Settings: Settings â Hotkeys â search "Transcribe audio file"
- Set Hotkey: Assign preferred key combination
- Execute: Use hotkey for quick access
ð Using Speaker Diarization
Enable Speaker Diarization
- Open Settings: Settings â Speech to Text â Deepgram Settings
- Enable Diarization: Toggle "Speaker Diarization" to ON
- Select Nova-3: Choose "Nova-3" model (default for new installations)
- Save Settings: Apply configuration
Example Results
ðïž Multi-speaker meeting audio:
ð Transcription output:
Speaker 1: Good morning everyone, let's start the meeting.
Speaker 2: Thank you. I'd like to discuss the project timeline.
Speaker 1: That sounds good. What are your thoughts?
Speaker 3: I think we should extend the deadline by one week.
Best Practices for Speaker Diarization
- Clear Audio: Use high-quality recordings for better accuracy
- Speaker Separation: Ensure speakers don't talk simultaneously
- Minimum Duration: Each speaker segment should be at least 2-3 seconds
- Audio Format: Use M4A, MP3, or WAV for optimal results
Supported Audio Formats
| Format | Extension | Whisper | Deepgram | Max Size | Diarization | Description |
|---|---|---|---|---|---|---|
| M4A | .m4a | â | â | 25MB/2GB | â | Apple default recording format |
| MP3 | .mp3 | â | â | 25MB/2GB | â | Universal audio format |
| WAV | .wav | â | â | 25MB/2GB | â | Lossless, large file size |
| MP4 | .mp4 | â | â | 25MB/2GB | â | Audio from video files |
| WebM | .webm | â | â | -/2GB | â | Web streaming format |
| OGG | .ogg | â | â | -/2GB | â | Open source audio format |
| FLAC | .flac | â | â | -/2GB | â | Lossless compression |
Settings
Main Settings
- Provider Selection:
- Auto: Intelligent selection based on file
- OpenAI Whisper: High quality, stable performance
- Deepgram: Fast speed, speaker diarization support
- Language: Auto-detect or specific language selection
- Insert Position:
- Cursor position
- Beginning of note
- End of note
- Auto-insert: Automatic text insertion after transcription
Deepgram Settings
- Model Selection:
- Nova-3 (recommended): 98% accuracy, speaker diarization
- Nova-2: Previous generation, high accuracy
- Nova/Enhanced/Base: Legacy models
- Features:
- Speaker Diarization: Automatic speaker separation
- Smart Format: Intelligent text formatting
- Punctuation: Automatic punctuation
- Utterances: Segment by natural speech patterns
- Paragraphs: Automatic paragraph detection
Advanced Settings
- Performance:
- Batch Processing: Process multiple files efficiently
- Memory Limits: Configure memory usage thresholds
- Cache Duration: Result caching timeouts
- Max Parallel Requests: Control concurrent API calls
- Circuit Breaker: Automatic failure protection
- Network:
- Request Timeout: API request timeout settings
- Retry Policy: Automatic retry configuration
- Fallback Provider: Backup provider on failure
- Health Checks: Monitor provider availability
- Cost Management:
- Monthly Budget: Set spending limits
- Cost Limits: Per-request cost controls
- Budget Alerts: Get notified at threshold
- Auto Cost Optimization: Intelligent provider selection based on cost
- Quality Control:
- Quality Threshold: Minimum acceptable accuracy
- Confidence Level: Minimum transcription confidence
- Strict Language Mode: Enforce language consistency
- Post-Processing: Additional text refinement
- Text Formatting:
- Plain Text: Standard text output
- Markdown: Markdown-formatted output
- Quote Block: Insert as blockquote
- Bullet List: Format as list items
- Heading: Insert as headings
- Code Block: Format as code
- Callout: Use Obsidian callouts
- A/B Testing:
- Provider Comparison: Test multiple providers
- Traffic Split: Percentage-based routing
- Metric Tracking: Compare accuracy, speed, cost
- Duration Control: Set test duration
- Large File Handling:
- Auto Chunking: Split files automatically
- Chunk Size: Configure chunk size (MB)
- Overlap: Set chunk overlap for continuity
- Development:
- Debug Mode: Detailed logging
- Performance Monitoring: Track performance metrics
- Error Reporting: Enhanced error details
- Metrics Retention: Configure data retention period
Troubleshooting
Common Issues
"Invalid API Key" Error
Solutions:
- Verify API key format (OpenAI: starts with
sk-) - Check API key status on provider dashboard
- Ensure sufficient credits/active subscription
- Remove any extra spaces from key
"File too large" Error
Solutions:
- Check file size limits (Whisper: 25MB, Deepgram: 2GB)
- Use Deepgram for larger files
- Compress audio files if needed
Speaker Diarization Not Working
Solutions:
- Ensure Nova-3 model is selected (required for diarization)
- Check "Speaker Diarization" is enabled in Deepgram settings
- Verify audio quality (clear speakers, minimal overlap)
- Use supported audio formats (M4A, MP3, WAV recommended)
- Check minimum speaker duration (2-3 seconds per segment)
No Audio Files Found
Solutions:
- Verify supported formats: .m4a, .mp3, .wav, .mp4, etc.
- Ensure files are in vault folder
- Restart Obsidian
- Wait for file indexing (large vaults)
Network Errors
Solutions:
- Check internet connection
- Verify VPN/proxy settings
- Check provider API status
Commands
| Command | Description | Status |
|---|---|---|
| Transcribe audio file | Select and transcribe audio file | â Available |
| Cancel transcription | Cancel ongoing transcription | â Available |
Development
Prerequisites
- Node.js 16.0.0+
- npm 7.0.0+
- Obsidian 0.15.0+
- TypeScript 4.7.4+
Development Setup
# Clone repository
git clone https://github.com/asyouplz/SpeechNote.git
cd SpeechNote
# Install dependencies
npm install
# Development mode (watch for changes)
npm run dev
# Production build
npm run build
# Code quality checks
npm run lint # Lint check
npm run lint:fix # Auto-fix lint issues
npm run format # Format code
npm run format:check # Check formatting
npm run typecheck # Type checking
# Testing
npm test # Run all tests
npm run test:unit # Unit tests only
npm run test:integration # Integration tests
npm run test:e2e # End-to-end tests
npm run test:coverage # Generate coverage report
npm run test:watch # Watch mode for TDD
# Clean build
npm run clean # Clean build artifacts
npm run clean:all # Clean everything including node_modules
# Full validation
npm run validate # Lint + Type check + Tests
npm run ci # Full CI pipeline
Project Structure
SpeechNote/
âââ src/
â âââ main.ts # Plugin entry point
â âââ application/ # Application services
â â âââ EditorService.ts # Editor management
â â âââ EventManager.ts # Event handling
â â âââ StateManager.ts # State management
â â âââ TextInsertionHandler.ts # Text insertion logic
â âââ architecture/ # Architecture components
â â âââ DependencyContainer.ts # Dependency injection
â â âââ ErrorBoundary.ts # Error handling boundaries
â â âââ PluginLifecycleManager.ts # Plugin lifecycle
â âââ core/ # Core business logic
â â âââ LazyLoader.ts # Lazy loading utilities
â â âââ transcription/ # Transcription services
â â âââ AudioProcessor.ts # Audio processing
â â âââ TextFormatter.ts # Text formatting
â â âââ TranscriptionService.ts # Main transcription service
â âââ domain/ # Domain models
â â âââ events/ # Domain events
â â âââ models/ # Domain entities
â âââ infrastructure/ # External integrations
â â âââ api/ # API clients
â â â âââ providers/ # Provider implementations
â â â â âââ deepgram/ # Deepgram integration
â â â â âââ whisper/ # OpenAI Whisper integration
â â â â âââ factory/ # Provider factory
â â â âââ adapters/ # Interface adapters
â â â âââ BatchRequestManager.ts # Batch request handling
â â â âââ FileUploadManager.ts # File upload management
â â â âââ SettingsAPI.ts # Settings API
â â â âââ SettingsMigrator.ts # Settings migration
â â â âââ SettingsValidator.ts # Settings validation
â â â âââ TranscriberFactory.ts # Transcriber factory
â â âââ audio/ # Audio utilities
â â âââ cache/ # Caching layer
â â âââ logging/ # Logging infrastructure
â â âââ security/ # Security utilities
â â âââ storage/ # Storage management
â âââ patterns/ # Design patterns
â âââ testing/ # Testing utilities
â âââ types/ # Type definitions
â â âââ DeepgramTypes.ts # Deepgram type definitions
â â âââ events.ts # Event types
â â âââ guards.ts # Type guards
â â âââ resources.ts # Resource types
â â âââ strategy.ts # Strategy pattern types
â âââ ui/ # User interface
â â âââ commands/ # Command implementations
â â âââ formatting/ # Format options UI
â â âââ modals/ # Modal dialogs
â â âââ settings/ # Settings tab UI
â â âââ statusbar/ # Status bar components
â âââ utils/ # Utilities
â âââ error/ # Error handling utilities
â âââ memory/ # Memory management
â âââ performance/ # Performance monitoring
âââ __tests__/ # Test files
âââ esbuild.config.mjs # Build configuration
âââ jest.config.js # Test configuration
âââ manifest.json # Plugin metadata
âââ package.json # Project configuration
âââ README.md # Documentation
Architecture Highlights
Clean Architecture Layers
- Application Layer: Orchestrates use cases and coordinates domain logic
- Core Layer: Business logic and transcription services
- Domain Layer: Business entities and domain events
- Infrastructure Layer: External services and API integrations
- UI Layer: User interface components and settings management
Design Patterns Used
- Factory Pattern: TranscriberFactory for provider instantiation
- Adapter Pattern: API adapters for provider integration
- Observer Pattern: Event-driven architecture with EventManager
- Strategy Pattern: Multiple transcription providers
- Repository Pattern: Storage management abstraction
- Dependency Injection: IoC container for loose coupling
- Error Boundary Pattern: Comprehensive error handling
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Contribution Process
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'feat: add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Create Pull Request
Code Style Guidelines
- Follow TypeScript best practices
- Use ESLint and Prettier configurations
- Write unit tests for new features
- Update documentation for API changes
License
This project is licensed under the MIT License - see the LICENSE file for details.
Credits
Acknowledgments
- Obsidian Team: Obsidian Plugin API
- OpenAI: Whisper API
- Deepgram: Speech-to-Text API
- Community: Obsidian community feedback and contributions
Built With
- TypeScript
- ESBuild
- Jest
- ESLint & Prettier
Support
Need Help?
- ð Bug Reports: GitHub Issues
- ð¡ Feature Requests: GitHub Issues
- ð¬ Discussions: GitHub Discussions
Show Your Support
If this project helped you:
- â Star on GitHub
- ðŠ Share on social media
- â Buy me a coffee
Recent Updates
ð v3.0.4 (Latest) - Settings Stability & Deepgram Cleanup
- ð ïž Settings Stability: Improves plugin settings reliability
- ð§¹ Deepgram Refactor: Cleanup and refactor for provider integration
- ð§© General Fixes: Minor bug fixes and improvements
v3.0.1 - Enterprise Architecture & Performance
- ðïž Clean Architecture: Domain-driven design with clear separation of concerns
- ð¯ Modular Structure: Organized into application, core, domain, infrastructure layers
- â¡ Performance Monitoring: Built-in performance benchmarking and memory management
- ð¡ïž Error Boundaries: Comprehensive error handling with ErrorBoundary pattern
- ðŠ Dependency Injection: DependencyContainer for better testability
- 𧪠Test Coverage: Unit, integration, and E2E test suites
- ð Settings Migration: Automatic settings migration and validation
- ð Batch Processing: Efficient batch request management
v3.0.0 - Nova-3 & Speaker Diarization
- âš Nova-3 Model: Default model upgrade with 98% accuracy
- ð Speaker Diarization: Complete implementation with "Speaker 1:", "Speaker 2:" format
- ð° Cost Optimization: 70% cost reduction with Deepgram Nova-3
- ð§ Code Quality: Improved type safety and code organization
v1.0.0 - Initial Release
- ð Multi-Provider Support: OpenAI Whisper & Deepgram integration
- ð Multi-language Support: 40+ languages with auto-detection
- ð Smart Text Insertion: Flexible insertion options
- ð¯ Context Menu Integration: Right-click transcription
íêµìŽ
죌ì êž°ë¥
ðïž ë€ì€ ê³µêžì ì€ëì€ ë³í
- OpenAI Whisper: ëì ì íë, ìì ì ìž ì±ë¥
- Deepgram Nova 3: ìµì 몚ëž, 98% ì íë, 70% ë¹ì© ì ê°
- íì ë¶ëЬ: "Speaker 1:", "Speaker 2:" íììŒë¡ ìë íì 구ë¶
- ìë ì í: ê° íìŒì ê°ì¥ ì í©í ê³µêžì ìë ì í
- ì§ì íì: M4A, MP3, WAV, MP4, WebM, OGG, FLAC
ð ë€êµìŽ ì§ì
- ìë ê°ì§: ìë ìžìŽ ìžì
- 40ê° ìŽì ìžìŽ: íêµìŽ, ììŽ, ìŒë³žìŽ, ì€êµìŽ, ì€íìžìŽ, íëì€ìŽ, ë ìŒìŽ ë±
- ê³µêžì ìµì í: ê° ê³µêžìë³ ìžìŽ ìµì í
ð ì€ë§íž í ì€íž ìœì ë° íì ìžì
- 컀ì ìì¹: íì¬ ì»€ì ìì¹ì ìœì
- ë žíž ìì¹: ë žížì ìì ëë ëì ìœì
- ìë ë žíž ìì±: íì± ížì§êž°ê° ììŒë©Ž ì ë žíž ìì±
- íì ë¶ëЬ: ìë íì ìë³ ë° ë ìŽëžë§
- ë€ì€ íì ì§ì: íì, ìží°ë·°, ëí륌 ëª ííê² êµ¬ë¶
â¡ ì±ë¥ ë° ìí€í ì²
- Nova-3 몚ëž: 98% ì íë, ë¶ë¹ $0.0043 (70% ë¹ì© ì ê°)
- íŽëа ìí€í ì²: ëë©ìž 죌ë ì€ê³ ë° êŽì¬ì¬ ë¶ëЬ
- ì§ë¥í ê³µêžì ì í: íìŒ í¬êž°ì íìì ë°ë¥ž ìµì ê³µêžì ì í
- ì€ìê° ì§í ìí©: ì·šì ì§ììŽ í¬íšë ìí íìì€ ì§í íìêž°
- ë¹ëêž° ì²ëЬ: ë Œëžë¡í¹ 백귞ëŒìŽë ì²ëЬ
- ë©ëªšëЬ êŽëЬ: ëŽì¥ ë©ëªšëЬ 몚ëí°ë§ ë° ìµì í
- ì±ë¥ ë²€ì¹ë§í¹: íµí© ì±ë¥ 몚ëí°ë§ ë구
- ìë¬ ë°ìŽë늬: í¬êŽì ìž ì€ë¥ ì²ëЬ ë° ë³µêµ¬
- ìì¡Žì± ì£Œì : ë ëì í ì€íž ê°ë¥ì±ì ìí IoC 컚í ìŽë
- ìŽë²€íž êž°ë° ìí€í ì²: EventManager륌 íµí ë¶ëЬë 컎í¬ëíž
- ë°°ì¹ ì²ëЬ: íšìšì ìž ë°°ì¹ ìì² ì²ëЬ
- ìºì± ë ìŽìŽ: ì±ë¥ í¥ìì ìí ì€ë§íž ìºì±
- ì€ì ë§ìŽê·žë ìŽì : ìë ì€ì ì ê·žë ìŽë ë° ê²ìŠ
- íŽë°± ë©ì»€ëìŠ: ì€íš ì ìë ê³µêžì ì í
ì€ì¹
ìë ì€ì¹
늎늬ìŠìì ì€ì¹
- Releasesìì ìµì ëŠŽëŠ¬ìŠ ë€ìŽë¡ë
- vaultì
.obsidian/plugins/obsidian-speech-to-text/íŽëì íìŒ ìì¶ íŽì - Obsidian ì¬ìì
- 컀뮀ëí° íë¬ê·žìž ì€ì ìì "Speech to Text" íì±í
ê°ë° ë¹ë
# ì ì¥ì ë³µì
git clone https://github.com/asyouplz/SpeechNote.git
cd SpeechNote
# ìì¡Žì± ì€ì¹
npm install
# ë¹ë
npm run build
# íë¬ê·žìž íŽëì ë³µì¬
cp main.js manifest.json styles.css /path/to/your/vault/.obsidian/plugins/obsidian-speech-to-text/
ìŽêž° ì€ì
API í€ ì€ì
1. ê³µêžì ì í
- Obsidian ì€ì â "Speech to Text" ìŽêž°
- "Transcription Provider" ì í:
- OpenAI Whisper: ê³ íì§, ìì ì
- Deepgram: ë¹ ë¥ž ìë, ëì©ë íìŒ ì§ì
- Auto: ìë ì í (ê¶ì¥)
2. OpenAI API í€ ë°êž° (Whisperì©)
- OpenAI Platform 방묞
- ë¡ê·žìž ëë ê³ì ìì±
- "Create new secret key" íŽëŠ
- í€ ë³µì¬ (â ïž í ë²ë§ íìëš)
3. Deepgram API í€ ë°êž° (Deepgramì©)
- Deepgram Console 방묞
- íìê°ì ëë ë¡ê·žìž
- "API Keys" ë©ëŽë¡ ìŽë
- "Create a New API Key" íŽëŠ
- API í€ ë³µì¬
4. íë¬ê·žìž ì€ì
- Obsidian ì€ì ìŽêž° (Cmd/Ctrl + ,)
- ìŒìªœ ë©ëŽìì "Speech to Text" ì í
- API í€ ì
ë ¥:
- "OpenAI API Key" (Whisperì©)
- "Deepgram API Key" (Deepgramì©)
- ì€ì ì ì¥
ì¬ì©ë²
Ʞ볞 ì¬ì©ë²
ë°©ë² 1: ëª ë ¹ íë íž
- ëª
ë ¹ íë íž ìŽêž°:
Cmd/Ctrl + P - ê²ì: "Transcribe audio file"
- íìŒ ì í: 목ë¡ìì ì€ëì€ íìŒ ì í
- ëêž°: ìí íìì€ìì ì§í ìí© ëªšëí°ë§
- ìë£: í ì€ížê° ìëìŒë¡ ë žížì ìœì ëš
ë°©ë² 2: 컚í ì€íž ë©ëŽ âš
- íìŒ íìêž°: ì€ëì€ íìŒ ì°Ÿêž°
- ì€ë¥žìªœ íŽëŠ: ì€ëì€ íìŒìì ì€ë¥žìªœ íŽëŠ
- ì í: "Transcribe audio file"
- ìë ì²ëЬ: ë³í ìì ë° ê²°ê³Œ ìœì
ë°©ë² 3: ëšì¶í€
- ì€ì : ì€ì â ëšì¶í€ â "Transcribe audio file" ê²ì
- ëšì¶í€ ì€ì : ì ížíë í€ ì¡°í© í ë¹
- ì€í: ëšì¶í€ë¥Œ ì¬ì©íì¬ ë¹ ë¥ž ì¡ìžì€
ð íì ë¶ëЬ ì¬ì©íêž°
íì ë¶ëЬ íì±í
- ì€ì ìŽêž°: ì€ì â Speech to Text â Deepgram ì€ì
- ë¶ëЬ íì±í: "Speaker Diarization"ì ONìŒë¡ ì í
- Nova-3 ì í: "Nova-3" ëªšëž ì í (ì ì€ì¹ ì Ʞ볞ê°)
- ì€ì ì ì¥: êµ¬ì± ì ì©
결곌 ìì
ðïž ë€ì€ íì íì ì€ëì€:
ð ë³í 결곌:
Speaker 1: ìë
íìžì ì¬ë¬ë¶, íì륌 ììíê² ìµëë€.
Speaker 2: ê°ì¬í©ëë€. íë¡ì íž ìŒì ì ëíŽ ë
Œìíê³ ì¶ìµëë€.
Speaker 1: ì¢ìµëë€. ìŽë€ ìê°ìŽì ê°ì?
Speaker 3: ë§ê°ìŒì ìŒì£ŒìŒ ì°ì¥íŽìŒ í ê² ê°ìµëë€.
íì ë¶ëЬ ëªšë² ì¬ë¡
- 깚ëí ì€ëì€: ë ëì ì íë륌 ìíŽ ê³ íì§ ë ¹ì ì¬ì©
- íì ë¶ëЬ: íìë€ìŽ ëìì ë§íì§ ìëë¡ ë³Žì¥
- ìµì êžžìŽ: ê° íì ìžê·žëšŒížë ìµì 2-3ìŽì¬ìŒ íš
- ì€ëì€ íì: ìµì ì 결곌륌 ìíŽ M4A, MP3 ëë WAV ì¬ì©
ì§ì ì€ëì€ íì
| íì | íì¥ì | Whisper | Deepgram | ìµë í¬êž° | íìë¶ëЬ | ì€ëª |
|---|---|---|---|---|---|---|
| M4A | .m4a | â | â | 25MB/2GB | â | Apple Ʞ볞 ë ¹ì íì |
| MP3 | .mp3 | â | â | 25MB/2GB | â | ë²ì© ì€ëì€ íì |
| WAV | .wav | â | â | 25MB/2GB | â | 묎ìì€, í° íìŒ í¬êž° |
| MP4 | .mp4 | â | â | 25MB/2GB | â | ë¹ëì€ íìŒì ì€ëì€ |
| WebM | .webm | â | â | -/2GB | â | ì¹ ì€ížëŠ¬ë° íì |
| OGG | .ogg | â | â | -/2GB | â | ì€í ìì€ ì€ëì€ íì |
| FLAC | .flac | â | â | -/2GB | â | 묎ìì€ ìì¶ |
ì€ì
Ʞ볞 ì€ì
- ê³µêžì ì í:
- Auto (ìë): íìŒì ë°ëŒ ì§ë¥ì ì í
- OpenAI Whisper: ê³ íì§, ìì ì ìž ì±ë¥
- Deepgram: ë¹ ë¥ž ìë, íì ë¶ëЬ ì§ì
- ìžìŽ: ìë ê°ì§ ëë í¹ì ìžìŽ ì í
- ìœì
ìì¹:
- 컀ì ìì¹
- ë žíž ìì
- ë žíž ë
- ìë ìœì : ë³í í ìë í ì€íž ìœì
Deepgram ì€ì
- ëªšëž ì í:
- Nova-3 (ê¶ì¥): 98% ì íë, íì ë¶ëЬ
- Nova-2: ìŽì ìžë, ëì ì íë
- Nova/Enhanced/Base: ë ê±°ì 몚ëž
- êž°ë¥:
- íì ë¶ëЬ: ìë íì 구ë¶
- ì€ë§íž í¬ë§·: ì§ë¥í í ì€íž í¬ë§·í
- ë¬žì¥ ë¶íž: ìë 구ëì
- ë°í: ìì°ì€ë¬ìŽ ë§ íšíŽìŒë¡ ë¶í
- 묞ëš: ìë ë¬žëš ê°ì§
ê³ êž ì€ì
- ì±ë¥:
- ë°°ì¹ ì²ëЬ: ì¬ë¬ íìŒì íšìšì ìŒë¡ ì²ëЬ
- ë©ëªšëЬ ì í: ë©ëªšëЬ ì¬ì© ìê³ê° 구ì±
- ìºì ì§ì ìê°: 결곌 ìºì± íììì
- ìµë ë³ë ¬ ìì²: ëì API ížì¶ ì ìŽ
- ìí· ëžë ìŽì»€: ìë ì¥ì 볎íž
- ë€ížìí¬:
- ìì² íììì: API ìì² íììì ì€ì
- ì¬ìë ì ì± : ìë ì¬ìë 구ì±
- íŽë°± ê³µêžì: ì€íš ì ë°±ì ê³µêžì
- ìí 첎í¬: ê³µêžì ê°ì©ì± 몚ëí°ë§
- ë¹ì© êŽëЬ:
- ìê° ìì°: ì§ì¶ íë ì€ì
- ë¹ì© ì í: ìì²ë¹ ë¹ì© ì ìŽ
- ìì° ì늌: ìê³ê° ëë¬ ì ì늌
- ìë ë¹ì© ìµì í: ë¹ì© êž°ë° ì§ë¥í ê³µêžì ì í
- íì§ ì ìŽ:
- íì§ ìê³ê°: ìµì íì© ì íë
- ì 뢰ë ìì€: ìµì ë³í ì 뢰ë
- ì격í ìžìŽ ëªšë: ìžìŽ ìŒêŽì± ê°ì
- íì²ëЬ: ì¶ê° í ì€íž ì ì
- í
ì€íž í¬ë§·í
:
- ìŒë° í ì€íž: íì€ í ì€íž ì¶ë ¥
- ë§í¬ë€ìŽ: ë§í¬ë€ìŽ íì ì¶ë ¥
- ìžì©êµ¬ ëžë¡: ìžì©êµ¬ë¡ ìœì
- êžëšžëЬ êž°íž: ëª©ë¡ í목ìŒë¡ í¬ë§·
- ì 목: ì 목ìŒë¡ ìœì
- ìœë ëžë¡: ìœëë¡ í¬ë§·
- ìœìì: Obsidian ìœìì ì¬ì©
- A/B í
ì€í
:
- ê³µêžì ë¹êµ: ì¬ë¬ ê³µêžì í ì€íž
- ížëíœ ë¶í : ë°±ë¶ìš êž°ë° ëŒì°í
- ë©ížëŠ ì¶ì : ì íë, ìë, ë¹ì© ë¹êµ
- êž°ê° ì ìŽ: í ì€íž êž°ê° ì€ì
- ëì©ë íìŒ ì²ëЬ:
- ìë ì²í¹: íìŒ ìë ë¶í
- ì²í¬ í¬êž°: ì²í¬ í¬êž° êµ¬ì± (MB)
- ì€ë²ë©: ì°ìì±ì ìí ì²í¬ ì€ë²ë© ì€ì
- ê°ë°:
- ëë²ê·ž 몚ë: ììž ë¡ê¹
- ì±ë¥ 몚ëí°ë§: ì±ë¥ ë©ížëŠ ì¶ì
- ì€ë¥ ë³Žê³ : í¥ìë ì€ë¥ ìžë¶ ì 볎
- ë©ížëŠ ë³Žì¡Ž: ë°ìŽí° 볎졎 êž°ê° êµ¬ì±
묞ì íŽê²°
ìŒë°ì ìž ë¬žì
"Invalid API Key" ì€ë¥
íŽê²° ë°©ë²:
- API í€ íì íìž (OpenAI:
sk-ë¡ ìì) - ê³µêžì ëì볎ëìì API í€ ìí íìž
- ì¶©ë¶í í¬ë ë§/íì± êµ¬ë íìž
- í€ìì 공백 ì ê±°
"File too large" ì€ë¥
íŽê²° ë°©ë²:
- íìŒ í¬êž° ì í íìž (Whisper: 25MB, Deepgram: 2GB)
- í° íìŒì Deepgram ì¬ì©
- íìì ì€ëì€ íìŒ ìì¶
íì ë¶ëŠ¬ê° ìëíì§ ìì
íŽê²° ë°©ë²:
- Nova-3 몚ëžìŽ ì íëìëì§ íìž (íì ë¶ëЬì íì)
- Deepgram ì€ì ìì "Speaker Diarization"ìŽ íì±íëìëì§ íìž
- ì€ëì€ íì§ íìž (ëª íí íì, ìµì ì€ì²©)
- ì§ìëë ì€ëì€ íì ì¬ì© (M4A, MP3, WAV ê¶ì¥)
- ìµì íì êžžìŽ íìž (ìžê·žëšŒížë¹ 2-3ìŽ)
ì€ëì€ íìŒì ì°Ÿì ì ìì
íŽê²° ë°©ë²:
- ì§ì íì íìž: .m4a, .mp3, .wav, .mp4 ë±
- íìŒìŽ vault íŽëì ìëì§ íìž
- Obsidian ì¬ìì
- íìŒ ìžë±ì± ëêž° (í° vaultì 겜ì°)
ë€ížìí¬ ì€ë¥
íŽê²° ë°©ë²:
- ìží°ë· ì°ê²° íìž
- VPN/íë¡ì ì€ì íìž
- ê³µêžì API ìí íìž
ëª ë ¹ìŽ
| ëª ë ¹ìŽ | ì€ëª | ìí |
|---|---|---|
| Transcribe audio file | ì€ëì€ íìŒ ì í ë° ë³í | â ì¬ì© ê°ë¥ |
| Cancel transcription | ì§í ì€ìž ë³í ì·šì | â ì¬ì© ê°ë¥ |
ê°ë°
íì ì구 ì¬í
- Node.js 16.0.0+
- npm 7.0.0+
- Obsidian 0.15.0+
- TypeScript 4.7.4+
ê°ë° í겜 ì€ì
# ì ì¥ì ë³µì
git clone https://github.com/asyouplz/SpeechNote.git
cd SpeechNote
# ìì¡Žì± ì€ì¹
npm install
# ê°ë° 몚ë (ë³ê²œ ê°ì§)
npm run dev
# íë¡ëì
ë¹ë
npm run build
# ìœë íì§ ê²ì¬
npm run lint # Lint ê²ì¬
npm run lint:fix # Lint 묞ì ìë ìì
npm run format # ìœë í¬ë§·í
npm run format:check # í¬ë§·í
íìž
npm run typecheck # íì
첎í¹
# í
ì€íž
npm test # 몚ë í
ì€íž ì€í
npm run test:unit # ëšì í
ì€ížë§
npm run test:integration # íµí© í
ì€íž
npm run test:e2e # End-to-end í
ì€íž
npm run test:coverage # 컀ë²ëŠ¬ì§ ë³Žê³ ì ìì±
npm run test:watch # TDD륌 ìí ê°ì 몚ë
# ë¹ë ì 늬
npm run clean # ë¹ë ìí°í©íž ì 늬
npm run clean:all # node_modules í¬íš 몚ë ì 늬
# ì 첎 ê²ìŠ
npm run validate # Lint + íì
ì²Ží¬ + í
ì€íž
npm run ci # ì 첎 CI íìŽíëŒìž
íë¡ì íž êµ¬ì¡°
SpeechNote/
âââ src/
â âââ main.ts # íë¬ê·žìž ì§ì
ì
â âââ application/ # ì í늬ìŒìŽì
ìë¹ì€
â â âââ EditorService.ts # ìëí° êŽëЬ
â â âââ EventManager.ts # ìŽë²€íž ì²ëЬ
â â âââ StateManager.ts # ìí êŽëЬ
â â âââ TextInsertionHandler.ts # í
ì€íž ìœì
ë¡ì§
â âââ architecture/ # ìí€í
ì² ì»Ží¬ëíž
â â âââ DependencyContainer.ts # ìì¡Žì± ì£Œì
â â âââ ErrorBoundary.ts # ì€ë¥ ì²ëЬ 겜ê³
â â âââ PluginLifecycleManager.ts # íë¬ê·žìž ìëª
죌Ʞ
â âââ core/ # íµì¬ ë¹ìŠëì€ ë¡ì§
â â âââ LazyLoader.ts # ì§ì° ë¡ë© ì ížëЬí°
â â âââ transcription/ # ë³í ìë¹ì€
â â âââ AudioProcessor.ts # ì€ëì€ ì²ëЬ
â â âââ TextFormatter.ts # í
ì€íž í¬ë§·í
â â âââ TranscriptionService.ts # ë©ìž ë³í ìë¹ì€
â âââ domain/ # ëë©ìž 몚ëž
â â âââ events/ # ëë©ìž ìŽë²€íž
â â âââ models/ # ëë©ìž ìí°í°
â âââ infrastructure/ # ìžë¶ íµí©
â â âââ api/ # API íŽëŒìŽìžíž
â â â âââ providers/ # ê³µêžì 구í
â â â â âââ deepgram/ # Deepgram íµí©
â â â â âââ whisper/ # OpenAI Whisper íµí©
â â â â âââ factory/ # ê³µêžì í©í 늬
â â â âââ adapters/ # ìží°íìŽì€ ìŽëí°
â â â âââ BatchRequestManager.ts # ë°°ì¹ ìì² ì²ëЬ
â â â âââ FileUploadManager.ts # íìŒ ì
ë¡ë êŽëЬ
â â â âââ SettingsAPI.ts # ì€ì API
â â â âââ SettingsMigrator.ts # ì€ì ë§ìŽê·žë ìŽì
â â â âââ SettingsValidator.ts # ì€ì ê²ìŠ
â â â âââ TranscriberFactory.ts # ë³íêž° í©í 늬
â â âââ audio/ # ì€ëì€ ì ížëЬí°
â â âââ cache/ # ìºì± ë ìŽìŽ
â â âââ logging/ # ë¡ê¹
ìžíëŒ
â â âââ security/ # 볎ì ì ížëЬí°
â â âââ storage/ # ì€í ëŠ¬ì§ êŽëЬ
â âââ patterns/ # ëììž íšíŽ
â âââ testing/ # í
ì€íž ì ížëЬí°
â âââ types/ # íì
ì ì
â â âââ DeepgramTypes.ts # Deepgram íì
ì ì
â â âââ events.ts # ìŽë²€íž íì
â â âââ guards.ts # íì
ê°ë
â â âââ resources.ts # 늬ìì€ íì
â â âââ strategy.ts # ì ëµ íšíŽ íì
â âââ ui/ # ì¬ì©ì ìží°íìŽì€
â â âââ commands/ # ëª
ë ¹ 구í
â â âââ formatting/ # í¬ë§· ìµì
UI
â â âââ modals/ # ëªšë¬ ëíìì
â â âââ settings/ # ì€ì í UI
â â âââ statusbar/ # ìí íìì€ ì»Ží¬ëíž
â âââ utils/ # ì ížëЬí°
â âââ error/ # ì€ë¥ ì²ëЬ ì ížëЬí°
â âââ memory/ # ë©ëªšëЬ êŽëЬ
â âââ performance/ # ì±ë¥ 몚ëí°ë§
âââ __tests__/ # í
ì€íž íìŒ
âââ esbuild.config.mjs # ë¹ë 구ì±
âââ jest.config.js # í
ì€íž 구ì±
âââ manifest.json # íë¬ê·žìž ë©íë°ìŽí°
âââ package.json # íë¡ì íž êµ¬ì±
âââ README.md # 묞ì
ìí€í ì² íìŽëŒìŽíž
íŽëа ìí€í ì² ë ìŽìŽ
- ì í늬ìŒìŽì ë ìŽìŽ: ì ì€ìŒìŽì€ ì¡°ì ë° ëë©ìž ë¡ì§ ìœëë€ìŽì
- ìœìŽ ë ìŽìŽ: ë¹ìŠëì€ ë¡ì§ ë° ë³í ìë¹ì€
- ëë©ìž ë ìŽìŽ: ë¹ìŠëì€ ìí°í° ë° ëë©ìž ìŽë²€íž
- ìžíëŒ ë ìŽìŽ: ìžë¶ ìë¹ì€ ë° API íµí©
- UI ë ìŽìŽ: ì¬ì©ì ìží°íìŽì€ 컎í¬ëíž ë° ì€ì êŽëЬ
ì¬ì©ë ëììž íšíŽ
- í©í 늬 íšíŽ: ê³µêžì ìžì€íŽì€í륌 ìí TranscriberFactory
- ìŽëí° íšíŽ: ê³µêžì íµí©ì ìí API ìŽëí°
- ìµì ë² íšíŽ: EventManager륌 íµí ìŽë²€íž êž°ë° ìí€í ì²
- ì ëµ íšíŽ: ë€ì€ ë³í ê³µêžì
- 늬í¬ì§í 늬 íšíŽ: ì€í ëŠ¬ì§ êŽëЬ ì¶ìí
- ìì¡Žì± ì£Œì : ëìší ê²°í©ì ìí IoC 컚í ìŽë
- ìë¬ ë°ìŽë늬 íšíŽ: í¬êŽì ìž ì€ë¥ ì²ëЬ
êž°ì¬
êž°ì¬ë¥Œ íìí©ëë€! ê°ìŽëëŒìžì CONTRIBUTING.md륌 ì°žì¡°íìžì.
êž°ì¬ ê³Œì
- Fork ì ì¥ì
- ìì± êž°ë¥ ëžëì¹:
git checkout -b feature/amazing-feature - ì»€ë° ë³ê²œì¬í:
git commit -m 'feat: add amazing feature' - ížì ëžëì¹ë¡:
git push origin feature/amazing-feature - ìì± í 늬íì€íž
ìœë ì€íìŒ ê°ìŽëëŒìž
- TypeScript ëªšë² ì¬ë¡ ì€ì
- ESLint ë° Prettier êµ¬ì± ì¬ì©
- ì êž°ë¥ì ëí ëšì í ì€íž ìì±
- API ë³ê²œ ì¬íì ëí 묞ì ì ë°ìŽíž
ëŒìŽì ì€
ìŽ íë¡ì ížë MIT ëŒìŽì ì€ì ë°ëŒ ëŒìŽì ì€ê° ë¶ì¬ë©ëë€ - ììží ëŽì©ì LICENSE íìŒì ì°žì¡°íìžì.
í¬ë ë§
ê°ì¬ì ë§
- Obsidian Team: Obsidian íë¬ê·žìž API
- OpenAI: Whisper API
- Deepgram: ìì±-í ì€íž API
- Community: Obsidian 컀뮀ëí° íŒëë°± ë° êž°ì¬
ì¬ì© êž°ì
- TypeScript
- ESBuild
- Jest
- ESLint & Prettier
ì§ì
ëììŽ íìíì ê°ì?
- ð ë²ê·ž ì ê³ : GitHub Issues
- ð¡ êž°ë¥ ìì²: GitHub Issues
- ð¬ í ë¡ : GitHub Discussions
ì§ì 볎ì¬ì£Œêž°
ìŽ íë¡ì ížê° ëììŽ ëìë€ë©Ž:
- â GitHubì ë³ ì£Œêž°
- ðŠ ìì 믞ëìŽì ê³µì
- â ì»€íŒ ì¬ì£Œêž°
- ð Development: See below for contribution and release guidelines.
Development
ð ïž Release Process (Automated)
This project uses semantic-release for fully automated versioning and releases.
- Merge PR to
main: Ensure all commits follow the Conventional Commits specification. - CI/CD Pipeline: GitHub Actions triggered on push to
mainwill:- Analyze commits to determine the next version (feat -> minor, fix -> patch, BREAKING CHANGE -> major).
- Update
manifest.json,package.json, andversions.json. - Generate GitHub Release notes.
- Create a git tag and GitHub Release with built assets.
ð Conventional Commits
We use commitlint and husky to enforce commit message formats.
Format: <type>(<scope>): <description>
- feat: A new feature (causes a minor version bump)
- fix: A bug fix (causes a patch version bump)
- perf: A performance improvement (causes a patch version bump)
- chore: Maintenance/Internal changes (no release)
- docs: Documentation changes (no release)
- refactor: Code change that neither fixes a bug nor adds a feature
Example: feat(audio): add support for deepgram v3
ð Emergency Manual Release
If the automated system fails, use the emergency script:
./scripts/release-emergency.sh [patch|minor|major|VERSION]
ð Rollback Procedure
If a bad release is pushed:
- Revert: Revert the release commit on
main. - Delete Tag: Delete the problematic git tag locally and remotely:
git tag -d v3.x.x git push origin :refs/tags/v3.x.x - Delete Release: Manually delete the release on GitHub.
- Fix & Release: Push a fix commit using
fix:to trigger a new patch release.
ìµê·Œ ì ë°ìŽíž
ð v3.0.4 (ìµì ) - ì€ì ìì í ë° Deepgram ì 늬
- ð ïž ì€ì ìì í: íë¬ê·žìž ì€ì ì ë¢°ì± ê°ì
- ð§¹ Deepgram 늬í©í°ë§: ê³µêžì íµí© 구조 ì 늬
- ð§© êž°í ìì : ììí ë²ê·ž ìì ë° ê°ì
v3.0.1 - ìí°íëŒìŽìŠ ìí€í ì² ë° ì±ë¥
- ðïž íŽëа ìí€í ì²: êŽì¬ì¬ ë¶ëŠ¬ë¥Œ ê°ì¶ ëë©ìž 죌ë ì€ê³
- ð¯ 몚ëì 구조: application, core, domain, infrastructure ë ìŽìŽë¡ 구ì±
- â¡ ì±ë¥ 몚ëí°ë§: ëŽì¥ ì±ë¥ ë²€ì¹ë§í¹ ë° ë©ëªšëЬ êŽëЬ
- ð¡ïž ìë¬ ë°ìŽë늬: ErrorBoundary íšíŽì íì©í í¬êŽì ìž ì€ë¥ ì²ëЬ
- ðŠ ìì¡Žì± ì£Œì : ë ëì í ì€íž ê°ë¥ì±ì ìí DependencyContainer
- 𧪠í ì€íž 컀ë²ëЬì§: ëšì, íµí©, E2E í ì€íž ì€ìíž
- ð ì€ì ë§ìŽê·žë ìŽì : ìë ì€ì ë§ìŽê·žë ìŽì ë° ê²ìŠ
- ð ë°°ì¹ ì²ëЬ: íšìšì ìž ë°°ì¹ ìì² êŽëЬ
v3.0.0 - Nova-3 ë° íì ë¶ëЬ
- âš Nova-3 몚ëž: 98% ì íëë¡ êž°ë³ž ëªšëž ì ê·žë ìŽë
- ð íì ë¶ëЬ: "Speaker 1:", "Speaker 2:" íìì ìì í 구í
- ð° ë¹ì© ìµì í: Deepgram Nova-3ë¡ 70% ë¹ì© ì ê°
- ð§ ìœë íì§: í¥ìë íì ìì ì± ë° ìœë 구ì±
v1.0.0 - ìŽêž° 늎늬ìŠ
- ð ë€ì€ ê³µêžì ì§ì: OpenAI Whisper ë° Deepgram íµí©
- ð ë€êµìŽ ì§ì: ìë ê°ì§ êž°ë¥ìŽ ìë 40ê° ìŽì ìžìŽ
- ð ì€ë§íž í ì€íž ìœì : ì ì°í ìœì ìµì
- ð¯ 컚í ì€íž ë©ëŽ íµí©: ì€ë¥žìªœ íŽëŠ ë³í
Made with â€ïž for the Obsidian community
For plugin developers
Search results and similarity scores are powered by semantic analysis of your plugin's README. If your plugin isn't appearing for searches you'd expect, try updating your README to clearly describe your plugin's purpose, features, and use cases.