Advanced
Advanced > Advanced (current)
Table of Contents
- Command Line Args
- Args
- GUI Args
- Rate Limiting Configuration Guide
- RPM (Requests Per Minute) Rate Limiting
- Concurrent Connection Limiting
- Best Practices
- Partial translation
- Specify source and target languages
- Translate wih exceptions
- Custom prompt
- Custom configuration
- Skip clean
- Translation cache
- Deployment as a public services
- Authentication and welcome page
- Glossary Support
Command Line Args
Execute the translation command in the command line to generate the translated document example-mono.pdf
and the bilingual document example-dual.pdf
in the current working directory. Use Google as the default translation service. More support translation services can find HERE.
In the following table, we list all advanced options for reference:
Args
Option | Function | Example |
---|---|---|
files | Local PDF file path | pdf2zh ~/local.pdf |
links | Online files | pdf2zh http://arxiv.org/paper.pdf |
--output | Output directory for files | pdf2zh example.pdf --output /outputpath |
--<Services> | Use specific service for translation | pdf2zh example.pdf --openai pdf2zh example.pdf --deepseek |
--help , -h | Show help message and exit | pdf2zh -h |
--config-file | Path to the configuration file | pdf2zh --config-file /path/to/config/config.toml |
--report-interval | Progress report interval in seconds | pdf2zh example.pdf --report-interval 5 |
--debug | Use debug logging level | pdf2zh example.pdf --debug |
--gui | Interact with GUI | pdf2zh --gui |
--warmup | Only download and verify required assets then exit | pdf2zh example.pdf --warmup |
--generate-offline-assets | Generate offline assets package in the specified directory | pdf2zh example.pdf --generate-offline-assets /path |
--restore-offline-assets | Restore offline assets package from the specified directory | pdf2zh example.pdf --restore-offline-assets /path |
--version | Show version then exit | pdf2zh --version |
--pages | Partial document translation | pdf2zh example.pdf --pages 1,2,1-,-3,3-5 |
--lang-in | The code of source language | pdf2zh example.pdf --lang-in en |
--lang-out | The code of target language | pdf2zh example.pdf --lang-out zh-CN |
--min-text-length | Minimum text length to translate | pdf2zh example.pdf --min-text-length 5 |
--rpc-doclayout | RPC service host address for document layout analysis | |
--qps | QPS limit for translation service | pdf2zh example.pdf --qps 200 |
--ignore-cache | Ignore translation cache | pdf2zh example.pdf --ignore-cache |
--custom-system-prompt | Custom system prompt for translation. Used for /no_think in Qwen 3 | pdf2zh example.pdf --custom-system-prompt "/no_think You are a professional, authentic machine translation engine" |
--pool-max-worker | Maximum number of workers for translation pool. If not set, will use qps as the number of workers | pdf2zh example.pdf --pool-max-worker 100 |
--no-auto-extract-glossary | Disable auto extract glossary | pdf2zh example.pdf --no-auto-extract-glossary |
--primary-font-family | Override primary font family for translated text. Choices: 'serif' for serif fonts, 'sans-serif' for sans-serif fonts, 'script' for script/italic fonts. If not specified, uses automatic font selection based on original text properties. | pdf2zh example.pdf --primary-font-family serif |
--no-dual | Do not output bilingual PDF files | pdf2zh example.pdf --no-dual |
--no-mono | Do not output monolingual PDF files | pdf2zh example.pdf --no-mono |
--formular-font-pattern | Font pattern to identify formula text | pdf2zh example.pdf --formular-font-pattern "(MS.*)" |
--formular-char-pattern | Character pattern to identify formula text | pdf2zh example.pdf --formular-char-pattern "(MS.*)" |
--split-short-line | Force split short line into different paragraphs | pdf2zh example.pdf --split-short-line |
--short-line-split-factor | Split threshold factor for short lines | |
--skip-clean | Skip PDF cleaning step | pdf2zh example.pdf --skip-clean |
--dual-translate-first | 在双 PDF 模式下优先放置翻译页 | pdf2zh example.pdf --dual-translate-first |
--disable-rich-text-translate | Disable rich text translation | pdf2zh example.pdf --disable-rich-text-translate |
--enhance-compatibility | Enable all compatibility enhancement options | pdf2zh example.pdf --enhance-compatibility |
--use-alternating-pages-dual | Use alternating pages mode for dual PDF | pdf2zh example.pdf --use-alternating-pages-dual |
--watermark-output-mode | Watermark output mode for PDF files | pdf2zh example.pdf --watermark-output-mode "NoWaterMark" |
--max-pages-per-part | Maximum pages per part for split translation | pdf2zh example.pdf --max-pages-per-part 1 |
--translate-table-text | Translate table text (experimental) | pdf2zh example.pdf --translate-table-text |
--skip-scanned-detection | Skip scanned detection | pdf2zh example.pdf --skip-scanned-detection |
--ocr-workaround | Force translated text to be black and add white background | pdf2zh example.pdf --ocr-workaround |
--auto-enable-ocr-workaround | Enable automatic OCR workaround. If a document is detected as heavily scanned, this will attempt to enable OCR processing and skip further scan detection. See documentation for details. (default: False) | pdf2zh example.pdf --auto-enable-ocr-workaround True |
--only-include-translated-page | Only include translated pages in the output PDF. Effective only when --pages is used. | pdf2zh example.pdf --pages 1-5 --only-include-translated-page |
--glossaries | Custom glossary for translation. | pdf2zh example.pdf --glossaries "glossary1.csv,glossary2.csv,glossary3.csv" |
--save-auto-extracted-glossary | save automatically extracted glossary. | pdf2zh example.pdf --save-auto-extracted-glossary |
--no-merge-alternating-line-numbers | Disable merging of alternating line numbers and text paragraphs in documents with line numbers | pdf2zh example.pdf --no-merge-alternating-line-numbers |
--no-remove-non-formula-lines | Disable removal of non-formula lines within paragraph areas | pdf2zh example.pdf --no-remove-non-formula-lines |
--non-formula-line-iou-threshold | Set IoU threshold for identifying non-formula lines (0.0-1.0) | pdf2zh example.pdf --non-formula-line-iou-threshold 0.85 |
--figure-table-protection-threshold | Set protection threshold for figures and tables (0.0-1.0). Lines within figures/tables will not be processed | pdf2zh example.pdf --figure-table-protection-threshold 0.95 |
--skip-formula-offset-calculation | Skip formula offset calculation during processing | pdf2zh example.pdf --skip-formula-offset-calculation |
GUI Args
Option | Function | Example |
---|---|---|
--share | Enable sharing mode | pdf2zh --gui --share |
--auth-file | Path to the authentication file | pdf2zh --gui --auth-file /path |
--welcome-page | Path to the welcome html file | pdf2zh --gui --welcome-page /path |
--enabled-services | Enabled translation services | pdf2zh --gui --enabled-services "Bing,OpenAI" |
--disable-gui-sensitive-input | Disable GUI sensitive input | pdf2zh --gui --disable-gui-sensitive-input |
--disable-config-auto-save | Disable automatic configuration saving | pdf2zh --gui --disable-config-auto-save |
--server-port | WebUI Port | pdf2zh --gui --server-port 7860 |
Rate Limiting Configuration Guide
When using translation services, proper rate limiting configuration is crucial to avoid API errors and optimize performance. This guide explains how to configure --qps
and --pool-max-worker
parameters based on different upstream service limitations.
Tip
It is recommended that the pool_size does not exceed 1000. If the pool_size calculated by the following method exceeds 1000, please use 1000.
RPM (Requests Per Minute) Rate Limiting
When the upstream service has RPM limitations, use the following calculation:
Calculation Formula: - qps = floor(rpm / 60)
- pool_size = qps * 10
Note
The factor of 10 is an empirical coefficient that generally works well for most scenarios.
Example: If your translation service has a limit of 600 RPM: - qps = floor(600 / 60) = 10
- pool_size = 10 * 10 = 100
Concurrent Connection Limiting
When the upstream service has concurrent connection limitations (like GLM official service), use this approach:
Calculation Formula: - pool_size = max(floor(0.9 * official_concurrent_limit), official_concurrent_limit - 20)
- qps = pool_size
Example: If your translation service allows 50 concurrent connections: - pool_size = max(floor(0.9 * 50), 50 - 20) = max(45, 30) = 45
- qps = 45
Best Practices
Tip
- Always start with conservative values and gradually increase if needed
- Monitor your service's response times and error rates
- Different services may require different optimization strategies
- Consider your specific use case and document size when setting these parameters
Partial translation
Use the --pages
parameter to translate a portion of a document.
- If the page numbers are consecutive, you can write it like this:
Tip
25-
includes all pages after page 25. If your document has 100 pages, this is equivalent to 25-100
.
Similarly, -25
includes all pages before page 25, which is equivalent to 1-25
.
- If the pages are not consecutive, you can use a comma
,
to separate them.
For example, if you want to translate the first and third pages, you can use the following command:
- If the pages include both consecutive and non-consecutive ranges, you can also connect them with a comma, like this:
This command will translate the first page, the third page, pages 10-20, and all pages from 25 to the end.
Specify source and target languages
See Google Languages Codes, DeepL Languages Codes
Translate wih exceptions
Use regex to specify formula fonts and characters that need to be preserved:
pdf2zh_next example.pdf --formular-font-pattern "(CM[^RT].*|MS.*|.*Ital)" --formular-char-pattern "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"
Preserve Latex
, Mono
, Code
, Italic
, Symbol
and Math
fonts by default:
pdf2zh_next example.pdf --formular-font-pattern "(CM[^R]|MS.M|XY|MT|BL|RM|EU|LA|RS|LINE|LCIRCLE|TeX-|rsfs|txsy|wasy|stmary|.*Mono|.*Code|.*Ital|.*Sym|.*Math)"
Custom prompt
Custom system prompt for translation. It is mainly used to add the '/no_think' instruction of Qwen 3 in the prompt.
pdf2zh_next example.pdf --custom-system-prompt "/no_think You are a professional and reliable machine translation engine responsible for translating the input text into zh_CN.When translating, strictly follow the instructions below to ensure translation quality and preserve all formatting, tags, and placeholders:"
Custom configuration
There are multiple ways to modify and import the configuration file.
Note
Configuration File Hierarchy
When modifying the same parameter using different methods, the software will apply changes according to the priority order below.
Higher-ranked modifications will override lower-ranked ones.
cli/gui > env > user config file > default config file
- Modifying Configuration via Command Line Arguments
For most cases, you can directly pass your desired settings through command line arguments. Please refer to Command Line Args for more information.
For example, if you want to enable a GUI window, you can use the following command:
- Modifying Configuration via Environment Variables
You can replace the --
in command line arguments with PDF2ZH_
, connect parameters using =
, and replace -
with _
as environment variables.
For example, if you want to enable a GUI window, you can use the following command:
- User-Specified Configuration File
You can specify a configuration file using the command line argument below:
If you are unsure about the config file format, please refer to the default configuration file described below.
- Default Configuration File
The default configuration file is located at ~/.config/pdf2zh
. Please do not modify the configuration files in the default
directory. It is strongly recommended to refer to this configuration file's content and use User-Specified Configuration File to implement your own configuration file.
Tip
- By default, pdf2zh 2.0 automatically saves the current configuration to
~/.config/pdf2zh/config.v3.toml
each time you click the translate button in the GUI. This configuration file will be loaded by default on the next startup. - The configuration files in the
default
directory are automatically generated by the program. You can copy them for modification, but please do not modify them directly. - Configuration files may include version numbers such as "v2", "v3", etc. These are configuration file version numbers, not the version number of pdf2zh itself.
Skip clean
When this parameter is set to True, the PDF cleaning step will be skipped, which can improve compatibility and avoid some font processing issues.
Usage:
Or using environment variables:
Tip
When --enhance-compatibility
is enabled, Skip clean is automatically enabled.
Translation cache
PDFMathTranslate caches translated texts to increase speed and avoid unnecessary API calls for same contents. You can use --ignore-cache
option to ignore translation cache and force retranslation.
Deployment as a public services
When deploying a pdf2zh GUI on public services, you should modify the configuration file as described below.
Warning
This project has not been professionally audited for security, and may contain security vulnerabilities. Please evaluate the risks and take necessary security measures before deploying on public networks.
Tip
- When deploying publicly, both disable_gui_sensitive_input and disable_config_auto_save should be enabled.
- Separate different available services with English commas , .
A usable configuration is as follows:
[basic]
gui = true
[gui_settings]
enabled_services = "Bing,OpenAI"
disable_gui_sensitive_input = true
disable_config_auto_save = true
Authentication and welcome page
When using Authentication and welcome page to specify which user to use Web UI and custom the login page:
example auth.txt Each line contains two elements, username, and password, separated by a comma.
example welcome.html
<!DOCTYPE html>
<html>
<head>
<title>Simple HTML</title>
</head>
<body>
<h1>Hello, World!</h1>
<p>Welcome to my simple HTML page.</p>
</body>
</html>
Note
welcome page will work if only authentication file is not blank. If authentication file is blank, there will be no authentication. :)
A usable configuration is as follows:
[basic]
gui = true
[gui_settings]
auth_file = "/path/to/auth/file"
welcome_page = "/path/to/welcome/html/file"
Glossary Support
PDFMathTranslate supports the glossary table. The glossary tables file should be csv
file. There are three columns in file. Here is a demo glossary file:
source | target | tgt_lng |
---|---|---|
AutoML | 自动 ML | zh-CN |
a,a | a | zh-CN |
" | " | zh-CN |
For CLI user: You can use multiple files for glossary. And different files should be split by ,
.
For WebUI user:
You can upload your own glossary file now. After you uploaded the file, you can check them by click their name and the content shows below.