Realtime Transcription
pendingby garetneda-gif
Real-time speech-to-text powered by SenseVoice-Small. Supports Chinese, English, Japanese, Korean, and Cantonese with auto-translation, AI summarization, and text polishing.
实时语音转写 · Realtime Transcription for Obsidian
基于 SenseVoice-Small + Silero VAD + sherpa-onnx 的本地实时语音转写 Obsidian 插件
A local, real-time speech-to-text Obsidian plugin powered by SenseVoice-Small + Silero VAD + sherpa-onnx
中文文档
功能特性
| 功能 | 说明 |
|---|---|
| 本地实时转写 | 完全本地运行,无需联网,支持边说边显示 |
| 多语言识别 | 中文 / 英文 / 日文 / 韩文 / 粤语 |
| 识别语言范围 | 可限定为纯中文、纯英文或中英混杂模式 |
| 实时预览模式 | 稳态档(更准)/ 极速档(更快)两档切换 |
| 自动翻译 | 检测到非中文内容时,自动调用 OpenAI 兼容 API 翻译成中文 |
| AI 文本润色 | 手动触发,将口语化转写润色为规范书面语 |
| AI 自动摘要 | 按字数阈值自动生成摘要(默认每 500 字触发一次) |
| 二次摘要(综合总结) | 累积多个摘要后自动生成一份综合总结 |
| 导出为笔记 | 一键导出为 Obsidian Markdown 笔记,支持时间戳/AI/手动三种命名方式 |
| 历史记录持久化 | 关闭 Obsidian 后转写记录不丢失 |
| 跨平台支持 | macOS / Windows / Linux 全平台兼容 |
架构概览
Obsidian 插件 (TypeScript)
├── src/
│ ├── main.ts # 插件主入口,协调所有服务
│ ├── settings.ts # 设置面板 UI
│ ├── types.ts # 类型定义
│ ├── constants.ts # 常量
│ ├── services/
│ │ ├── BackendManager.ts # Python 后端进程管理(启动/停止)
│ │ ├── WebSocketClient.ts # 与后端的 WebSocket 通信
│ │ ├── AudioCapture.ts # 麦克风音频采集(Web Audio API)
│ │ ├── TranslationService.ts # 调用 LLM API 翻译
│ │ ├── SummaryService.ts # 调用 LLM API 摘要 + AI 命名
│ │ └── FormalizeService.ts # 调用 LLM API 文本润色
│ ├── views/
│ │ ├── TranscriptionView.ts # 右侧边栏主视图
│ │ └── TitleInputModal.ts # 手动命名导出弹窗
│ └── utils/
│ ├── pluginPaths.ts # 插件目录解析
│ └── entrySerializer.ts # 历史记录序列化
│
└── backend/ # Python 后端
├── server.py # WebSocket 服务端(sherpa-onnx 推理)
├── download_model.py # 模型自动下载脚本
└── requirements.txt # Python 依赖
数据流:
麦克风 → AudioCapture → WebSocket → server.py → sherpa-onnx
↓
partial/final 文本 → 聚合 → 翻译/摘要/润色 → 视图
第一步:安装 Obsidian 插件
方式 A:从 Release 直接安装(推荐普通用户)
-
前往 Releases 下载最新版本的 zip 文件
-
解压后,将文件夹重命名为
realtime-transcription -
将该文件夹整体复制到你 Vault 的插件目录:
- macOS / Linux:
<你的Vault>/.obsidian/plugins/realtime-transcription/ - Windows:
<你的Vault>\.obsidian\plugins\realtime-transcription\
不知道 Vault 在哪?打开 Obsidian → 左下角「管理库」→ 查看库的本地路径。
- macOS / Linux:
-
打开 Obsidian → 设置 → 第三方插件 → 关闭安全模式 → 找到「实时语音转写」并启用
方式 B:从源码构建(推荐开发者)
git clone https://github.com/garetneda-gif/obsidian-realtime-transcription.git
cd obsidian-realtime-transcription
npm install
npm run build
# 构建产物:根目录的 main.js
# 将 manifest.json、main.js、styles.css、backend/ 复制到 Vault 插件目录
若你使用 remotely-save,可能在同步结束时被旧版
main.js覆盖。可在同步完成后执行:npm run post-sync-refresh -- --vault "/你的/Vault/路径" --vault-name "你的Vault名称"该命令会再次复制插件文件,并通过 Obsidian CLI 执行
plugin:reload强制重载。
第二步:安装 Python
如果你已有 Python 3.10 ~ 3.12,可跳过此步。
检查是否已安装 Python:
# macOS / Linux
python3 --version
# Windows(命令提示符或 PowerShell)
python --version
输出类似 Python 3.11.x 则已安装,可继续。否则按下方系统安装:
| 系统 | 安装方式 |
|---|---|
| macOS | 推荐:brew install python@3.12(需先安装 Homebrew)或:从 python.org 下载安装包 |
| Windows | 从 python.org 下载安装包,安装时务必勾选「Add Python to PATH」 |
| Linux | sudo apt install python3.12 python3.12-pip(Ubuntu/Debian) |
推荐版本:3.10 / 3.11 / 3.12。3.13 和 3.14 兼容性尚未充分测试,不建议使用。
第三步:安装 Python 依赖
在插件目录的 backend/ 文件夹中提供了一键安装脚本,运行一次即可,无需手动输入任何 pip 命令。
macOS / Linux
双击 backend/setup.command(macOS 可直接双击运行),或在终端中运行:
cd <你的Vault>/.obsidian/plugins/realtime-transcription/backend
bash setup.command
Windows
双击 backend\setup.bat,或在命令提示符中运行:
cd <你的Vault>\.obsidian\plugins\realtime-transcription\backend
setup.bat
脚本会自动完成:创建虚拟环境 → 安装所有依赖 → 验证安装 → 输出第五步需要填写的 Python 路径。
遇到报错? 确认已安装 Python 3.10~3.12,且终端/PowerShell 有网络访问权限。
第四步:准备模型文件
模型文件需要存放在一个你自己创建的目录中(插件不会自动创建目录)。
先创建模型目录:
# macOS / Linux
mkdir -p ~/obsidian-models
# Windows(命令提示符)
mkdir C:\Users\你的用户名\obsidian-models
然后下载模型(二选一):
方法一:插件内一键下载(推荐)
- 打开 Obsidian → 设置 → 实时语音转写 → 模型设置
- 在「模型目录」字段填入刚创建的目录路径:
- macOS/Linux:
/Users/你的用户名/obsidian-models - Windows:
C:\Users\你的用户名\obsidian-models
- macOS/Linux:
- 点击 下载模型 按钮(约 240 MB,需要网络,耐心等待)
- 弹出「模型下载完成!」通知后即可
方法二:手动下载
将以下三个文件分别下载到同一目录(每个都是独立文件,无需解压):
| 文件 | 下载链接(点击直接下载) | 大小 |
|---|---|---|
model.int8.onnx | HuggingFace 下载 · 国内镜像 | ~229 MB |
tokens.txt | HuggingFace 下载 · 国内镜像 | <1 MB |
silero_vad.onnx | GitHub 下载 | ~1.8 MB |
提示:建议保持「使用 Int8 量化模型」为开启状态(默认已开启),可将模型体积从 895 MB 压缩至 229 MB,精度基本无损。
确认三个文件都在目录中:
ls ~/obsidian-models
# 应该看到:model.int8.onnx tokens.txt silero_vad.onnx
第五步:插件配置
打开 Obsidian → 设置 → 实时语音转写,按以下顺序配置:
后端设置
-
Python 路径:填写 Python 的路径- macOS / Linux:填
python3(大多数情况下直接可用) - Windows:填
python(插件会自动设置此默认值)
如果默认值不工作,需要获取 Python 完整路径:
- macOS/Linux:在终端运行
which python3 - Windows:在命令提示符运行
where python,复制第一行结果
各平台路径示例:
系统 Python 路径示例 macOS(系统 Python) python3或/usr/local/bin/python3macOS(虚拟环境,推荐) /Users/你的用户名/.../backend/venv/bin/pythonWindows C:\Users\yourname\AppData\Local\Programs\Python\Python312\python.exeLinux python3或/usr/bin/python3 - macOS / Linux:填
-
后端端口:默认 18888,一般无需修改 -
点击 检测环境 按钮验证配置:
- 成功:弹出通知「环境检测通过:Python + sherpa-onnx 可用」→ 可继续
- 失败:见下方环境检测失败排查
模型设置
-
模型目录:填入第四步中创建的目录完整路径 -
识别语言范围:中英混杂(默认)/纯中文/纯英文说中文时识别出日语或韩语?将此项改为「纯中文」。
翻译 / 润色 / 摘要设置(均为可选)
这三项功能需要调用 AI 接口,支持任意 OpenAI 兼容 API(如 DeepSeek、通义千问、本地 Ollama 等)。
| 字段 | 填写说明 |
|---|---|
| API 端点 | 完整 URL,例如 https://api.deepseek.com/v1/chat/completions |
| API Key | 对应服务的密钥,以 sk- 开头 |
| 模型名称 | 例如 deepseek-chat、qwen-turbo、gpt-4o-mini |
如果暂时不需要翻译/摘要功能,直接跳过这三项,保持关闭状态即可。
高级设置(可选,默认值已够用)
| 参数 | 说明 | 推荐值 |
|---|---|---|
| 实时模式预设 | 稳态档更准,极速档更快 | 稳态档 |
| 实时预览 | 边说边显示识别中的文字 | 开启 |
| VAD 静音阈值 | 越大分句越少 | 1.0 s |
| 聚合输出窗口 | 越大段落越长(延迟也越大) | 4 s |
| 单段最大字数 | 超过此长度自动换段 | 320 字 |
使用方法
- 点击左侧 Ribbon 栏的麦克风图标,打开转写面板
- 点击面板中的开始录制按钮
- 对着麦克风说话,右侧面板实时显示转写文字
- 说完后点击停止录制
- 可选:点击任意条目上的润色按钮,用 AI 整理为书面语
- 点击导出笔记,将转写内容保存为 Obsidian 笔记文件
常见问题排查
环境检测失败排查
| 提示内容 | 可能原因 | 解决方案 |
|---|---|---|
| 「环境检测失败,请执行 pip install...」 | sherpa-onnx 依赖未安装 | 按第三步说明安装依赖后重试 |
| 「环境检测失败」但依赖已安装 | 使用了虚拟环境,但 Python 路径仍指向系统 Python | 将「Python 路径」改为虚拟环境路径,例如 /path/to/backend/venv/bin/python |
| 检测无反应,按钮灰色 | Python 路径字段为空 | macOS/Linux 填 python3;Windows 填 python |
| 「No such file or directory」 | Python 路径不存在 | macOS/Linux 运行 which python3;Windows 运行 where python 获取正确路径 |
| Windows 上找不到 python | Python 未加入系统 PATH | 重新安装 Python,安装时勾选「Add Python to PATH」 |
后端启动失败:错误信息对照表
| 错误提示 | 原因 | 解决方案 |
|---|---|---|
模型文件缺失: model.int8.onnx | 模型未下完或目录填错 | 检查模型目录路径,重新点击「下载模型」 |
模型文件缺失: tokens.txt | 同上 | 同上 |
模型文件缺失: silero_vad.onnx | 同上 | 同上 |
后端启动超时(30秒) | 模型首次加载慢,或 Python 环境有问题 | 关闭其他占用内存的程序后重试;确认依赖已安装 |
[Errno 2] No such file or directory | Python 路径填错 | 重新检查 Python 路径配置 |
查看详细错误日志:
- macOS:
Cmd + Option + I→ Console 标签- Windows:
Ctrl + Shift + I→ Console 标签将红色报错信息复制后可在 Issues 提问。
翻译返回 404 错误
检查 API URL 是否多写了 /v1:
# 错误
https://api.example.com/v1v1/chat/completions
# 正确
https://api.example.com/v1/chat/completions
频繁出现 429 限流
- 换用速率更高的模型或提升 API 套餐额度
- 关闭自动翻译,改为手动触发
- 调大「聚合输出窗口」(减少 API 调用频率)
识别结果分句太碎
在高级设置中调大:
VAD 静音阈值(建议从 1.0 调到 1.5~2.0)聚合输出窗口(建议从 4 调到 6~8)
Windows 后端启动报 NotImplementedError
如果在 v1.0.2 或更早版本遇到 NotImplementedError: add_signal_handler 错误,请升级至 v1.0.3+。此问题已在新版本中修复。
macOS 首次运行弹出安全警告
macOS 可能拦截未经公证的 Python 脚本,出现「无法验证开发者」提示:
- 打开「系统设置」→「隐私与安全性」
- 找到相关提示,点击「仍要打开」或「允许」
- 返回 Obsidian,重新点击开始录制
安全提示
data.json(含 API Key)不要提交到 Git 或分享给他人- 换新设备时建议手动在插件设置中重新填写 API Key
Contributing
Pull requests and issues are welcome! Please:
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Commit your changes following Conventional Commits
- Open a Pull Request
English Documentation
Features
| Feature | Description |
|---|---|
| Local real-time transcription | Fully local inference, no internet required, streaming text display |
| Multi-language recognition | Chinese / English / Japanese / Korean / Cantonese |
| Recognition mode | Limit to Chinese-only, English-only, or mixed mode |
| Real-time profile | Stable mode (more accurate) / Fast mode (lower latency) |
| Auto translation | Automatically translate non-Chinese speech to Chinese via OpenAI-compatible API |
| AI text formalization | On-demand polishing of colloquial transcriptions into formal written text |
| AI auto-summary | Generate summaries after a configurable character threshold (default: 500 chars) |
| Meta-summary | Automatically generate a comprehensive summary after accumulating multiple summaries |
| Export to note | One-click export to Obsidian Markdown note; timestamp / AI-generated / manual title |
| Persistent history | Transcription history survives Obsidian restarts |
| Cross-platform | Fully compatible with macOS / Windows / Linux |
Architecture Overview
Obsidian Plugin (TypeScript)
├── src/
│ ├── main.ts # Plugin entry point, orchestrates all services
│ ├── settings.ts # Settings UI tab
│ ├── types.ts # TypeScript type definitions
│ ├── constants.ts # Shared constants
│ ├── services/
│ │ ├── BackendManager.ts # Python backend process lifecycle (start/stop)
│ │ ├── WebSocketClient.ts # WebSocket communication with backend
│ │ ├── AudioCapture.ts # Microphone capture (Web Audio API)
│ │ ├── TranslationService.ts # LLM API calls for translation
│ │ ├── SummaryService.ts # LLM API calls for summarization & AI naming
│ │ └── FormalizeService.ts # LLM API calls for text formalization
│ ├── views/
│ │ ├── TranscriptionView.ts # Main sidebar panel view
│ │ └── TitleInputModal.ts # Manual note-naming modal
│ └── utils/
│ ├── pluginPaths.ts # Plugin directory resolution
│ └── entrySerializer.ts # History serialization/deserialization
│
└── backend/ # Python backend
├── server.py # WebSocket server (sherpa-onnx inference)
├── download_model.py # Automatic model downloader
└── requirements.txt # Python dependencies
Data Flow:
Microphone → AudioCapture → WebSocket → server.py → sherpa-onnx
↓
partial/final text → aggregation → translate/summarize/formalize → view
Step 1: Install the Obsidian Plugin
Option A: From Release (recommended for regular users)
-
Go to Releases and download the latest zip file
-
Extract it and rename the folder to
realtime-transcription -
Copy the folder to your Vault's plugins directory:
- macOS / Linux:
<your-vault>/.obsidian/plugins/realtime-transcription/ - Windows:
<your-vault>\.obsidian\plugins\realtime-transcription\
Not sure where your Vault is? Open Obsidian → Click the vault icon (bottom left) → View the local path.
- macOS / Linux:
-
Open Obsidian → Settings → Community Plugins → Disable Safe Mode → Enable Realtime Transcription
Option B: Build from Source (for developers)
git clone https://github.com/garetneda-gif/obsidian-realtime-transcription.git
cd obsidian-realtime-transcription
npm install
npm run build
# Copy manifest.json, main.js, styles.css, backend/ to your Vault plugin directory
If you use remotely-save, sync may overwrite
main.jswith an older version at the end of sync. Run this right after sync:npm run post-sync-refresh -- --vault "/path/to/your/vault" --vault-name "Your Vault Name"This command recopies plugin files and then forces plugin reload via Obsidian CLI (
plugin:reload).
Step 2: Install Python
Skip this step if you already have Python 3.10–3.12.
Check if Python is installed:
# macOS / Linux
python3 --version
# Windows (Command Prompt or PowerShell)
python --version
If you see Python 3.11.x (or similar), you're good. Otherwise, install Python for your OS:
| OS | How to install |
|---|---|
| macOS | Recommended: brew install python@3.12 (requires Homebrew)Or: Download installer from python.org |
| Windows | Download installer from python.org. Check "Add Python to PATH" during install. |
| Linux | sudo apt install python3.12 python3.12-pip (Ubuntu/Debian) |
Recommended versions: 3.10 / 3.11 / 3.12. Versions 3.13 and 3.14 have not been fully tested.
Step 3: Install Python Dependencies
A one-shot setup script is included in the backend/ folder. Run it once — no manual pip commands needed.
macOS / Linux
Double-click backend/setup.command (macOS supports direct double-click), or run in terminal:
cd <your-vault>/.obsidian/plugins/realtime-transcription/backend
bash setup.command
Windows
Double-click backend\setup.bat, or run in Command Prompt:
cd <your-vault>\.obsidian\plugins\realtime-transcription\backend
setup.bat
The script automatically: creates a virtual environment → installs all dependencies → verifies the install → prints the Python path you need for Step 5.
Errors? Make sure Python 3.10–3.12 is installed and your terminal has internet access.
Step 4: Prepare Model Files
You need to create a folder to store model files. The plugin will not create it automatically.
Create the model directory:
# macOS / Linux
mkdir -p ~/obsidian-models
# Windows (Command Prompt)
mkdir C:\Users\YourUsername\obsidian-models
Then download the models (choose one method):
Method 1: In-plugin download (recommended)
- Open Obsidian → Settings → Realtime Transcription → Model Settings
- Enter the directory path you just created in the Model Directory field:
- macOS/Linux:
/Users/YourUsername/obsidian-models - Windows:
C:\Users\YourUsername\obsidian-models
- macOS/Linux:
- Click Download Model (~240 MB, requires internet, please wait patiently)
- A notification "Model download complete!" means success
Method 2: Manual download
Download each file individually into the same directory (all are standalone files, no extraction needed):
| File | Download Link | Size |
|---|---|---|
model.int8.onnx | HuggingFace · Mirror (China) | ~229 MB |
tokens.txt | HuggingFace · Mirror (China) | <1 MB |
silero_vad.onnx | GitHub | ~1.8 MB |
Tip: Keep
Use Int8 quantized modelenabled (default). It reduces model size from 895 MB to 229 MB with negligible quality loss.
Verify all three files are present:
ls ~/obsidian-models
# Should show: model.int8.onnx tokens.txt silero_vad.onnx
Step 5: Configure the Plugin
Open Obsidian → Settings → Realtime Transcription and configure in order:
Backend Settings
-
Python Path: Enter your Python path- macOS / Linux: Enter
python3(works in most cases) - Windows: Enter
python(the plugin auto-detects this default)
If the default doesn't work, find the full Python path:
- macOS/Linux: Run
which python3in terminal - Windows: Run
where pythonin Command Prompt, use the first result
Path examples by OS:
OS Python Path Example macOS (system Python) python3or/usr/local/bin/python3macOS (virtual env, recommended) /Users/yourname/.../backend/venv/bin/pythonWindows C:\Users\yourname\AppData\Local\Programs\Python\Python312\python.exeLinux python3or/usr/bin/python3 - macOS / Linux: Enter
-
Backend Port: Default 18888, usually no need to change -
Click Check Environment to verify your setup:
- Success: Notification "Environment check passed: Python + sherpa-onnx available" → proceed
- Failed: See Environment Check Failures below
Model Settings
-
Model Directory: Enter the full path to the directory from Step 4 -
Recognition Mode:Chinese+English(default) /Chinese only/English onlyGetting Japanese/Korean when speaking Chinese? Switch to
Chinese only.
Translation / Formalization / Summary Settings (all optional)
These features require an AI API. Any OpenAI-compatible API works (OpenAI, DeepSeek, Qwen, local Ollama, etc.).
| Field | What to enter |
|---|---|
| API Endpoint | Full URL, e.g. https://api.deepseek.com/v1/chat/completions |
| API Key | Your service API key (usually starts with sk-) |
| Model Name | e.g. deepseek-chat, qwen-turbo, gpt-4o-mini |
If you don't need translation/summary now, just skip these sections and leave them disabled.
Advanced Settings (optional, defaults work well)
| Setting | Description | Default |
|---|---|---|
| Realtime Profile | Stable (more accurate) / Fast (lower latency) | Stable |
| Realtime Preview | Show partial results while speaking | On |
| VAD Silence Threshold | Larger = fewer sentence splits | 1.0 s |
| Aggregation Window | Larger = longer paragraphs (more delay) | 4 s |
| Max Chars per Segment | Auto-split when segment exceeds this length | 320 chars |
Usage
- Click the microphone icon in the left Ribbon to open the transcription panel
- Click Start Recording
- Speak — text appears in real time on the right panel
- Click Stop Recording when done
- Optionally click the Formalize button on any entry to polish the text
- Click Export Note to save as an Obsidian Markdown file
Troubleshooting
Environment Check Failures
| Message | Likely Cause | Fix |
|---|---|---|
| "Environment check failed, please run pip install..." | sherpa-onnx not installed | Follow Step 3 to install dependencies, then retry |
| "Environment check failed" but packages are installed | Used a venv but Python Path still points to system Python | Set Python Path to the venv path, e.g. /path/to/backend/venv/bin/python |
| No response, button stays gray | Python Path field is empty | Enter python3 (macOS/Linux) or python (Windows) |
| "No such file or directory" | Python path is wrong | Run which python3 (macOS/Linux) or where python (Windows) to get the correct path |
| Python not found (Windows) | Python not added to PATH | Reinstall Python and check "Add Python to PATH" |
Backend Startup Failures
| Error Message | Cause | Fix |
|---|---|---|
Model file missing: model.int8.onnx | Download incomplete or wrong directory | Check model directory path; re-run the Download Model step |
Model file missing: tokens.txt | Same as above | Same as above |
Model file missing: silero_vad.onnx | Same as above | Same as above |
Backend startup timed out (30s) | Slow first load or bad Python env | Close other memory-heavy apps; verify all dependencies are installed |
[Errno 2] No such file or directory | Python path is wrong | Double-check the Python Path setting |
View detailed error logs:
- macOS:
Cmd + Option + I→ Console tab- Windows:
Ctrl + Shift + I→ Console tabCopy any red error messages and open a GitHub Issue for help.
Translation Returns 404
Check for a duplicated /v1 in your API URL:
# Wrong
https://api.example.com/v1v1/chat/completions
# Correct
https://api.example.com/v1/chat/completions
Frequent 429 Rate-Limit Errors
- Switch to a model with higher rate limits or upgrade your API plan
- Disable auto-translation and translate manually instead
- Increase the Aggregation Window to reduce API call frequency
Transcription Segments Are Too Choppy
Increase these two values in Advanced Settings:
VAD Silence Threshold(try 1.5–2.0 s)Aggregation Window(try 6–8 s)
Windows Backend Throws NotImplementedError
If you encounter NotImplementedError: add_signal_handler on v1.0.2 or earlier, upgrade to v1.0.3+. This has been fixed.
macOS Security Warning on First Run
macOS may block unnotarized Python scripts with an "unidentified developer" alert:
- Open System Settings → Privacy & Security
- Scroll down to find the blocked item and click Open Anyway
- Return to Obsidian and start recording again
Security Notes
- Do not commit
data.json(which contains API keys) to Git or share it with others - On a new machine, re-enter API keys manually in plugin settings
Contributing
Pull requests and issues are welcome! Please:
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Commit your changes following Conventional Commits
- Open a Pull Request
License
MIT License — see LICENSE for details.
For plugin developers
Search results and similarity scores are powered by semantic analysis of your plugin's README. If your plugin isn't appearing for searches you'd expect, try updating your README to clearly describe your plugin's purpose, features, and use cases.