Realtime Transcription

pending

by garetneda-gif

Real-time speech-to-text powered by SenseVoice-Small. Supports Chinese, English, Japanese, Korean, and Cantonese with auto-translation, AI summarization, and text polishing.

★ 3 starsUpdated 4d agoMITDiscovered via Obsidian Unofficial Plugins

View on GitHub

实时语音转写 · Realtime Transcription for Obsidian

中文 | English

基于 SenseVoice-Small + Silero VAD + sherpa-onnx 的本地实时语音转写 Obsidian 插件
A local, real-time speech-to-text Obsidian plugin powered by SenseVoice-Small + Silero VAD + sherpa-onnx

中文文档

功能特性

功能	说明
本地实时转写	完全本地运行，无需联网，支持边说边显示
多语言识别	中文 / 英文 / 日文 / 韩文 / 粤语
识别语言范围	可限定为纯中文、纯英文或中英混杂模式
实时预览模式	稳态档（更准）/ 极速档（更快）两档切换
自动翻译	检测到非中文内容时，自动调用 OpenAI 兼容 API 翻译成中文
AI 文本润色	手动触发，将口语化转写润色为规范书面语
AI 自动摘要	按字数阈值自动生成摘要（默认每 500 字触发一次）
二次摘要（综合总结）	累积多个摘要后自动生成一份综合总结
导出为笔记	一键导出为 Obsidian Markdown 笔记，支持时间戳/AI/手动三种命名方式
历史记录持久化	关闭 Obsidian 后转写记录不丢失
跨平台支持	macOS / Windows / Linux 全平台兼容

架构概览

Obsidian 插件 (TypeScript)
├── src/
│   ├── main.ts               # 插件主入口，协调所有服务
│   ├── settings.ts           # 设置面板 UI
│   ├── types.ts              # 类型定义
│   ├── constants.ts          # 常量
│   ├── services/
│   │   ├── BackendManager.ts     # Python 后端进程管理（启动/停止）
│   │   ├── WebSocketClient.ts    # 与后端的 WebSocket 通信
│   │   ├── AudioCapture.ts       # 麦克风音频采集（Web Audio API）
│   │   ├── TranslationService.ts # 调用 LLM API 翻译
│   │   ├── SummaryService.ts     # 调用 LLM API 摘要 + AI 命名
│   │   └── FormalizeService.ts   # 调用 LLM API 文本润色
│   ├── views/
│   │   ├── TranscriptionView.ts  # 右侧边栏主视图
│   │   └── TitleInputModal.ts    # 手动命名导出弹窗
│   └── utils/
│       ├── pluginPaths.ts        # 插件目录解析
│       └── entrySerializer.ts    # 历史记录序列化
│
└── backend/                  # Python 后端
    ├── server.py             # WebSocket 服务端（sherpa-onnx 推理）
    ├── download_model.py     # 模型自动下载脚本
    └── requirements.txt      # Python 依赖

数据流：
麦克风 → AudioCapture → WebSocket → server.py → sherpa-onnx
                                               ↓
                              partial/final 文本 → 聚合 → 翻译/摘要/润色 → 视图

第一步：安装 Obsidian 插件

方式 A：从 Release 直接安装（推荐普通用户）

前往 Releases 下载最新版本的 zip 文件
解压后，将文件夹重命名为 realtime-transcription
将该文件夹整体复制到你 Vault 的插件目录：
- macOS / Linux：<你的Vault>/.obsidian/plugins/realtime-transcription/
- Windows：<你的Vault>\.obsidian\plugins\realtime-transcription\
不知道 Vault 在哪？打开 Obsidian → 左下角「管理库」→ 查看库的本地路径。
打开 Obsidian → 设置 → 第三方插件 → 关闭安全模式 → 找到「实时语音转写」并启用

方式 B：从源码构建（推荐开发者）

git clone https://github.com/garetneda-gif/obsidian-realtime-transcription.git
cd obsidian-realtime-transcription
npm install
npm run build
# 构建产物：根目录的 main.js
# 将 manifest.json、main.js、styles.css、backend/ 复制到 Vault 插件目录

若你使用 remotely-save，可能在同步结束时被旧版 main.js 覆盖。可在同步完成后执行：
npm run post-sync-refresh -- --vault "/你的/Vault/路径" --vault-name "你的Vault名称"
该命令会再次复制插件文件，并通过 Obsidian CLI 执行 plugin:reload 强制重载。

第二步：安装 Python

如果你已有 Python 3.10 ~ 3.12，可跳过此步。

检查是否已安装 Python：

# macOS / Linux
python3 --version

# Windows（命令提示符或 PowerShell）
python --version

输出类似 Python 3.11.x 则已安装，可继续。否则按下方系统安装：

系统	安装方式
macOS	推荐：`brew install python@3.12`（需先安装 Homebrew）或：从 python.org 下载安装包
Windows	从 python.org 下载安装包，安装时务必勾选「Add Python to PATH」
Linux	`sudo apt install python3.12 python3.12-pip`（Ubuntu/Debian）

推荐版本：3.10 / 3.11 / 3.12。3.13 和 3.14 兼容性尚未充分测试，不建议使用。

第三步：安装 Python 依赖

在插件目录的 backend/ 文件夹中提供了一键安装脚本，运行一次即可，无需手动输入任何 pip 命令。

macOS / Linux

双击 backend/setup.command（macOS 可直接双击运行），或在终端中运行：

cd <你的Vault>/.obsidian/plugins/realtime-transcription/backend
bash setup.command

Windows

双击 backend\setup.bat，或在命令提示符中运行：

cd <你的Vault>\.obsidian\plugins\realtime-transcription\backend
setup.bat

脚本会自动完成：创建虚拟环境 → 安装所有依赖 → 验证安装 → 输出第五步需要填写的 Python 路径。

遇到报错？ 确认已安装 Python 3.10~3.12，且终端/PowerShell 有网络访问权限。

第四步：准备模型文件

模型文件需要存放在一个你自己创建的目录中（插件不会自动创建目录）。

先创建模型目录：

# macOS / Linux
mkdir -p ~/obsidian-models

# Windows（命令提示符）
mkdir C:\Users\你的用户名\obsidian-models

然后下载模型（二选一）：

方法一：插件内一键下载（推荐）

打开 Obsidian → 设置 → 实时语音转写 → 模型设置
在「模型目录」字段填入刚创建的目录路径：
- macOS/Linux：/Users/你的用户名/obsidian-models
- Windows：C:\Users\你的用户名\obsidian-models
点击 下载模型 按钮（约 240 MB，需要网络，耐心等待）
弹出「模型下载完成！」通知后即可

方法二：手动下载

将以下三个文件分别下载到同一目录（每个都是独立文件，无需解压）：

文件	下载链接（点击直接下载）	大小
`model.int8.onnx`	HuggingFace 下载 · 国内镜像	~229 MB
`tokens.txt`	HuggingFace 下载 · 国内镜像	<1 MB
`silero_vad.onnx`	GitHub 下载	~1.8 MB

提示：建议保持「使用 Int8 量化模型」为开启状态（默认已开启），可将模型体积从 895 MB 压缩至 229 MB，精度基本无损。

确认三个文件都在目录中：

ls ~/obsidian-models
# 应该看到：model.int8.onnx   tokens.txt   silero_vad.onnx

第五步：插件配置

打开 Obsidian → 设置 → 实时语音转写，按以下顺序配置：

后端设置

Python 路径：填写 Python 的路径

macOS / Linux：填 python3（大多数情况下直接可用）
Windows：填 python（插件会自动设置此默认值）

如果默认值不工作，需要获取 Python 完整路径：

macOS/Linux：在终端运行 which python3

Windows：在命令提示符运行 where python，复制第一行结果

各平台路径示例：

系统	Python 路径示例
macOS（系统 Python）	`python3` 或 `/usr/local/bin/python3`
macOS（虚拟环境，推荐）	`/Users/你的用户名/.../backend/venv/bin/python`
Windows	`C:\Users\yourname\AppData\Local\Programs\Python\Python312\python.exe`
Linux	`python3` 或 `/usr/bin/python3`

后端端口：默认 18888，一般无需修改
点击 检测环境 按钮验证配置：
- 成功：弹出通知「环境检测通过：Python + sherpa-onnx 可用」→ 可继续
- 失败：见下方环境检测失败排查

模型设置

模型目录：填入第四步中创建的目录完整路径
识别语言范围：中英混杂（默认）/ 纯中文 / 纯英文

说中文时识别出日语或韩语？将此项改为「纯中文」。

翻译 / 润色 / 摘要设置（均为可选）

这三项功能需要调用 AI 接口，支持任意 OpenAI 兼容 API（如 DeepSeek、通义千问、本地 Ollama 等）。

字段	填写说明
API 端点	完整 URL，例如 `https://api.deepseek.com/v1/chat/completions`
API Key	对应服务的密钥，以 `sk-` 开头
模型名称	例如 `deepseek-chat`、`qwen-turbo`、`gpt-4o-mini`

如果暂时不需要翻译/摘要功能，直接跳过这三项，保持关闭状态即可。

高级设置（可选，默认值已够用）

参数	说明	推荐值
实时模式预设	稳态档更准，极速档更快	稳态档
实时预览	边说边显示识别中的文字	开启
VAD 静音阈值	越大分句越少	1.0 s
聚合输出窗口	越大段落越长（延迟也越大）	4 s
单段最大字数	超过此长度自动换段	320 字

使用方法

点击左侧 Ribbon 栏的麦克风图标，打开转写面板
点击面板中的开始录制按钮
对着麦克风说话，右侧面板实时显示转写文字
说完后点击停止录制
可选：点击任意条目上的润色按钮，用 AI 整理为书面语
点击导出笔记，将转写内容保存为 Obsidian 笔记文件

常见问题排查

环境检测失败排查

提示内容	可能原因	解决方案
「环境检测失败，请执行 pip install...」	sherpa-onnx 依赖未安装	按第三步说明安装依赖后重试
「环境检测失败」但依赖已安装	使用了虚拟环境，但 Python 路径仍指向系统 Python	将「Python 路径」改为虚拟环境路径，例如 `/path/to/backend/venv/bin/python`
检测无反应，按钮灰色	Python 路径字段为空	macOS/Linux 填 `python3`；Windows 填 `python`
「No such file or directory」	Python 路径不存在	macOS/Linux 运行 `which python3`；Windows 运行 `where python` 获取正确路径
Windows 上找不到 python	Python 未加入系统 PATH	重新安装 Python，安装时勾选「Add Python to PATH」

后端启动失败：错误信息对照表

错误提示	原因	解决方案
`模型文件缺失: model.int8.onnx`	模型未下完或目录填错	检查模型目录路径，重新点击「下载模型」
`模型文件缺失: tokens.txt`	同上	同上
`模型文件缺失: silero_vad.onnx`	同上	同上
`后端启动超时（30秒）`	模型首次加载慢，或 Python 环境有问题	关闭其他占用内存的程序后重试；确认依赖已安装
`[Errno 2] No such file or directory`	Python 路径填错	重新检查 Python 路径配置

查看详细错误日志：

macOS：Cmd + Option + I → Console 标签

Windows：Ctrl + Shift + I → Console 标签

将红色报错信息复制后可在 Issues 提问。

翻译返回 404 错误

检查 API URL 是否多写了 /v1：

# 错误
https://api.example.com/v1v1/chat/completions

# 正确
https://api.example.com/v1/chat/completions

频繁出现 429 限流

换用速率更高的模型或提升 API 套餐额度
关闭自动翻译，改为手动触发
调大「聚合输出窗口」（减少 API 调用频率）

识别结果分句太碎

在高级设置中调大：

VAD 静音阈值（建议从 1.0 调到 1.5~2.0）
聚合输出窗口（建议从 4 调到 6~8）

Windows 后端启动报 NotImplementedError

如果在 v1.0.2 或更早版本遇到 NotImplementedError: add_signal_handler 错误，请升级至 v1.0.3+。此问题已在新版本中修复。

macOS 首次运行弹出安全警告

macOS 可能拦截未经公证的 Python 脚本，出现「无法验证开发者」提示：

打开「系统设置」→「隐私与安全性」
找到相关提示，点击「仍要打开」或「允许」
返回 Obsidian，重新点击开始录制

安全提示

data.json（含 API Key）不要提交到 Git 或分享给他人
换新设备时建议手动在插件设置中重新填写 API Key

Contributing

Pull requests and issues are welcome! Please:

Fork the repository
Create a feature branch: git checkout -b feature/my-feature
Commit your changes following Conventional Commits
Open a Pull Request

English Documentation

Features

Feature	Description
Local real-time transcription	Fully local inference, no internet required, streaming text display
Multi-language recognition	Chinese / English / Japanese / Korean / Cantonese
Recognition mode	Limit to Chinese-only, English-only, or mixed mode
Real-time profile	Stable mode (more accurate) / Fast mode (lower latency)
Auto translation	Automatically translate non-Chinese speech to Chinese via OpenAI-compatible API
AI text formalization	On-demand polishing of colloquial transcriptions into formal written text
AI auto-summary	Generate summaries after a configurable character threshold (default: 500 chars)
Meta-summary	Automatically generate a comprehensive summary after accumulating multiple summaries
Export to note	One-click export to Obsidian Markdown note; timestamp / AI-generated / manual title
Persistent history	Transcription history survives Obsidian restarts
Cross-platform	Fully compatible with macOS / Windows / Linux

Architecture Overview

Obsidian Plugin (TypeScript)
├── src/
│   ├── main.ts               # Plugin entry point, orchestrates all services
│   ├── settings.ts           # Settings UI tab
│   ├── types.ts              # TypeScript type definitions
│   ├── constants.ts          # Shared constants
│   ├── services/
│   │   ├── BackendManager.ts     # Python backend process lifecycle (start/stop)
│   │   ├── WebSocketClient.ts    # WebSocket communication with backend
│   │   ├── AudioCapture.ts       # Microphone capture (Web Audio API)
│   │   ├── TranslationService.ts # LLM API calls for translation
│   │   ├── SummaryService.ts     # LLM API calls for summarization & AI naming
│   │   └── FormalizeService.ts   # LLM API calls for text formalization
│   ├── views/
│   │   ├── TranscriptionView.ts  # Main sidebar panel view
│   │   └── TitleInputModal.ts    # Manual note-naming modal
│   └── utils/
│       ├── pluginPaths.ts        # Plugin directory resolution
│       └── entrySerializer.ts    # History serialization/deserialization
│
└── backend/                  # Python backend
    ├── server.py             # WebSocket server (sherpa-onnx inference)
    ├── download_model.py     # Automatic model downloader
    └── requirements.txt      # Python dependencies

Data Flow:
Microphone → AudioCapture → WebSocket → server.py → sherpa-onnx
                                                   ↓
                                partial/final text → aggregation → translate/summarize/formalize → view

Step 1: Install the Obsidian Plugin

Option A: From Release (recommended for regular users)

Go to Releases and download the latest zip file
Extract it and rename the folder to realtime-transcription
Copy the folder to your Vault's plugins directory:
- macOS / Linux: <your-vault>/.obsidian/plugins/realtime-transcription/
- Windows: <your-vault>\.obsidian\plugins\realtime-transcription\
Not sure where your Vault is? Open Obsidian → Click the vault icon (bottom left) → View the local path.
Open Obsidian → Settings → Community Plugins → Disable Safe Mode → Enable Realtime Transcription

Option B: Build from Source (for developers)

git clone https://github.com/garetneda-gif/obsidian-realtime-transcription.git
cd obsidian-realtime-transcription
npm install
npm run build
# Copy manifest.json, main.js, styles.css, backend/ to your Vault plugin directory

If you use remotely-save, sync may overwrite main.js with an older version at the end of sync. Run this right after sync:
npm run post-sync-refresh -- --vault "/path/to/your/vault" --vault-name "Your Vault Name"
This command recopies plugin files and then forces plugin reload via Obsidian CLI (plugin:reload).

Step 2: Install Python

Skip this step if you already have Python 3.10–3.12.

Check if Python is installed:

# macOS / Linux
python3 --version

# Windows (Command Prompt or PowerShell)
python --version

If you see Python 3.11.x (or similar), you're good. Otherwise, install Python for your OS:

OS	How to install
macOS	Recommended: `brew install python@3.12` (requires Homebrew) Or: Download installer from python.org
Windows	Download installer from python.org. Check "Add Python to PATH" during install.
Linux	`sudo apt install python3.12 python3.12-pip` (Ubuntu/Debian)

Recommended versions: 3.10 / 3.11 / 3.12. Versions 3.13 and 3.14 have not been fully tested.

Step 3: Install Python Dependencies

A one-shot setup script is included in the backend/ folder. Run it once — no manual pip commands needed.

macOS / Linux

Double-click backend/setup.command (macOS supports direct double-click), or run in terminal:

cd <your-vault>/.obsidian/plugins/realtime-transcription/backend
bash setup.command

Windows

Double-click backend\setup.bat, or run in Command Prompt:

cd <your-vault>\.obsidian\plugins\realtime-transcription\backend
setup.bat

The script automatically: creates a virtual environment → installs all dependencies → verifies the install → prints the Python path you need for Step 5.

Errors? Make sure Python 3.10–3.12 is installed and your terminal has internet access.

Step 4: Prepare Model Files

You need to create a folder to store model files. The plugin will not create it automatically.

Create the model directory:

# macOS / Linux
mkdir -p ~/obsidian-models

# Windows (Command Prompt)
mkdir C:\Users\YourUsername\obsidian-models

Then download the models (choose one method):

Method 1: In-plugin download (recommended)

Open Obsidian → Settings → Realtime Transcription → Model Settings
Enter the directory path you just created in the Model Directory field:
- macOS/Linux: /Users/YourUsername/obsidian-models
- Windows: C:\Users\YourUsername\obsidian-models
Click Download Model (~240 MB, requires internet, please wait patiently)
A notification "Model download complete!" means success

Method 2: Manual download

Download each file individually into the same directory (all are standalone files, no extraction needed):

File	Download Link	Size
`model.int8.onnx`	HuggingFace · Mirror (China)	~229 MB
`tokens.txt`	HuggingFace · Mirror (China)	<1 MB
`silero_vad.onnx`	GitHub	~1.8 MB

Tip: Keep Use Int8 quantized model enabled (default). It reduces model size from 895 MB to 229 MB with negligible quality loss.

Verify all three files are present:

ls ~/obsidian-models
# Should show: model.int8.onnx   tokens.txt   silero_vad.onnx

Step 5: Configure the Plugin

Open Obsidian → Settings → Realtime Transcription and configure in order:

Backend Settings

Python Path: Enter your Python path

macOS / Linux: Enter python3 (works in most cases)
Windows: Enter python (the plugin auto-detects this default)

If the default doesn't work, find the full Python path:

macOS/Linux: Run which python3 in terminal

Windows: Run where python in Command Prompt, use the first result

Path examples by OS:

OS	Python Path Example
macOS (system Python)	`python3` or `/usr/local/bin/python3`
macOS (virtual env, recommended)	`/Users/yourname/.../backend/venv/bin/python`
Windows	`C:\Users\yourname\AppData\Local\Programs\Python\Python312\python.exe`
Linux	`python3` or `/usr/bin/python3`

Backend Port: Default 18888, usually no need to change
Click Check Environment to verify your setup:
- Success: Notification "Environment check passed: Python + sherpa-onnx available" → proceed
- Failed: See Environment Check Failures below

Model Settings

Model Directory: Enter the full path to the directory from Step 4
Recognition Mode: Chinese+English (default) / Chinese only / English only

Getting Japanese/Korean when speaking Chinese? Switch to Chinese only.

Translation / Formalization / Summary Settings (all optional)

These features require an AI API. Any OpenAI-compatible API works (OpenAI, DeepSeek, Qwen, local Ollama, etc.).

Field	What to enter
API Endpoint	Full URL, e.g. `https://api.deepseek.com/v1/chat/completions`
API Key	Your service API key (usually starts with `sk-`)
Model Name	e.g. `deepseek-chat`, `qwen-turbo`, `gpt-4o-mini`

If you don't need translation/summary now, just skip these sections and leave them disabled.

Advanced Settings (optional, defaults work well)

Setting	Description	Default
Realtime Profile	Stable (more accurate) / Fast (lower latency)	Stable
Realtime Preview	Show partial results while speaking	On
VAD Silence Threshold	Larger = fewer sentence splits	1.0 s
Aggregation Window	Larger = longer paragraphs (more delay)	4 s
Max Chars per Segment	Auto-split when segment exceeds this length	320 chars

Usage

Click the microphone icon in the left Ribbon to open the transcription panel
Click Start Recording
Speak — text appears in real time on the right panel
Click Stop Recording when done
Optionally click the Formalize button on any entry to polish the text
Click Export Note to save as an Obsidian Markdown file

Troubleshooting

Environment Check Failures

Message	Likely Cause	Fix
"Environment check failed, please run pip install..."	sherpa-onnx not installed	Follow Step 3 to install dependencies, then retry
"Environment check failed" but packages are installed	Used a venv but Python Path still points to system Python	Set Python Path to the venv path, e.g. `/path/to/backend/venv/bin/python`
No response, button stays gray	Python Path field is empty	Enter `python3` (macOS/Linux) or `python` (Windows)
"No such file or directory"	Python path is wrong	Run `which python3` (macOS/Linux) or `where python` (Windows) to get the correct path
Python not found (Windows)	Python not added to PATH	Reinstall Python and check "Add Python to PATH"

Backend Startup Failures

Error Message	Cause	Fix
`Model file missing: model.int8.onnx`	Download incomplete or wrong directory	Check model directory path; re-run the Download Model step
`Model file missing: tokens.txt`	Same as above	Same as above
`Model file missing: silero_vad.onnx`	Same as above	Same as above
`Backend startup timed out (30s)`	Slow first load or bad Python env	Close other memory-heavy apps; verify all dependencies are installed
`[Errno 2] No such file or directory`	Python path is wrong	Double-check the Python Path setting

View detailed error logs:

macOS: Cmd + Option + I → Console tab

Windows: Ctrl + Shift + I → Console tab

Copy any red error messages and open a GitHub Issue for help.

Translation Returns 404

Check for a duplicated /v1 in your API URL:

# Wrong
https://api.example.com/v1v1/chat/completions

# Correct
https://api.example.com/v1/chat/completions

Frequent 429 Rate-Limit Errors

Switch to a model with higher rate limits or upgrade your API plan
Disable auto-translation and translate manually instead
Increase the Aggregation Window to reduce API call frequency

Transcription Segments Are Too Choppy

Increase these two values in Advanced Settings:

VAD Silence Threshold (try 1.5–2.0 s)
Aggregation Window (try 6–8 s)

Windows Backend Throws NotImplementedError

If you encounter NotImplementedError: add_signal_handler on v1.0.2 or earlier, upgrade to v1.0.3+. This has been fixed.

macOS Security Warning on First Run

macOS may block unnotarized Python scripts with an "unidentified developer" alert:

Open System Settings → Privacy & Security
Scroll down to find the blocked item and click Open Anyway
Return to Obsidian and start recording again

Security Notes

Do not commit data.json (which contains API keys) to Git or share it with others
On a new machine, re-enter API keys manually in plugin settings

Contributing

Pull requests and issues are welcome! Please:

Fork the repository
Create a feature branch: git checkout -b feature/my-feature
Commit your changes following Conventional Commits
Open a Pull Request

License

MIT License — see LICENSE for details.

For plugin developers

Search results and similarity scores are powered by semantic analysis of your plugin's README. If your plugin isn't appearing for searches you'd expect, try updating your README to clearly describe your plugin's purpose, features, and use cases.