espeak添加或改善一个语言

背景

因为朗读的原声是英文的，所以想把espeak朗读的效果改成中文味道，就想研究下espeak组织发音的方式，大概看了下英文介绍，发现是以拼音方式组织音素，并不是为每个汉字都准备wav信息，分析espeak文件时，文件都不大，不可能包含这么多汉字的wav信息。用拼音的话，espeak在组织元音辅音这类方式上还是很到位的，还提供了espeakedit工具来编辑。具体研究大家看下面的翻译。

6. ADDING OR IMPROVING A LANGUAGE 6.添加或改善语言

Most of the work doesn't need any programming knowledge. Just an understanding of the language, an awareness of its features, patience and attention to detail. Wikipedia is a good source of basic phonetic information, eg http://en.wikipedia.org/wiki/Vowel.
In many cases it should be fairly easy to add a rough implementation of a new language, hopefully enough to be intelligible. After that it's a gradual process of improvement. 大多数工作不需要任何编程知识。只需了解该语言，知晓特性，能耐心和对细节的关注。 Wikipedia是基本语音信息的良好来源，例如http://en.wikipedia.org/wiki/Vowel。在许多情况下，添加一种新语言的粗略实现应该很容易，希望做到尽可能好后。之后，这是一个逐步的改进过程。

6.1 Language Code 6.1语言代码
Generally, the language's international ISO 639-1 code is used to identify the language. It is used in the filenames which contain the language's data. In the examples below the code "fr" is used as an example. Replace this with the code of your language. 通常，使用该语言的国际ISO 639-1代码来标识该语言。它用于包含语言数据的文件名中。在下面的示例中，以代码``fr''为例。用您的语言代码代替。
If the language does not have a 2-letter ISO_639-1 code, then use the 3-letter ISO_639-3 code. Language codes may differ from country codes. 如果该语言没有2个字母的ISO_639-1代码，请使用3个字母的ISO_639-3代码。语言代码可能与国家代码不同。
It is possible to have different variants of a language for different dialects. For example the sound of some phonemes are changed, or some of the pronunciation rules differ. 对于不同的方言，可能有不同的语言变体。例如，某些音素的声音已更改，或者某些发音规则有所不同。

6.2 Language Files 6.2语言文件
The following files are needed for your language. 您的语言需要以下文件。

espeak-data/voices/fr. The voice file. This gives the language name and may set some options.
phsource/ph_french. The phoneme definition file. This contains phoneme definitions for the vowels and consonants which the language uses. Usually it will contain mostly vowels. Most consonants will be inherited from the common phoneme definitions in the master phoneme file, phsource/phonemes. The master phoneme file needs to be edited to call your new ph_french file.
dictsource/fr_rules. This contains the spelling-to-phoneme translation rules.
dictsource/fr_list. This contains pronunciations for numbers, letter and symbol names, and words with exceptional pronunciations. It also gives attributes such as "unstressed" and "pause" to some common words.
espeak-data / voices / fr。语音文件。这给出了语言名称，并可能设置一些选项。
phsource / ph_french。音素定义文件。它包含该语言使用的元音和辅音的音素定义。通常，它将主要包含元音。大多数辅音将从主音素文件phsource/phonemes中的通用音素定义继承而来。需要编辑主音素文件以调用新的ph_french文件。
dictsource / fr_rules。这包含拼写到音素的翻译规则。
dictsource / fr_list。它包含数字、字母和符号名称的发音，以及具有特殊发音的单词。它还为一些常用单词赋予了诸如“不重音”和“暂停”之类的属性。

The fr_rules and fr_list files are compiled to produce the file espeak-data/fr_dict, which eSpeak uses when it is speaking. fr_rules和fr_list文件被编译以生成文件epeak-data / fr_dict，eSpeak在讲话时会使用该文件。

6.3 Voice File 6.3语音文件
Each language needs a voice file in espeak-data/voices or espeak-data/voices/test. The filename of the default voice for a language should be the same as the language code (eg. "fr" for French). 每种语言都需要使用espeak-data / voices或espeak-data / voices / test中的语音文件。语言默认语音的文件名应与语言代码相同（例如，法语为“ fr”）。
Details of the contents of voice files are given in voices.html.
The simplest voice file would contain just 2 lines to give the language name and language code, eg: 语音文件内容的详细信息在voices.html中提供。
name french 最简单的语音文件将仅包含两行以提供语言名称和语言代码，例如：
language fr 法语名字
This language code specifies which phoneme table and dictionary to use (i.e. phonemetable fr and espeak-data/fr_dict) to be used. If needed, these can be overridden by phonemes and dictionary attributes in the voice file. For example you may want to start the implementation of a new language by using the phoneme table of an existing language. 此语言代码指定要使用的音素表和字典（即，可语音识别的fr和 espeak-data/fr_dict）。如果需要，这些可以被语音文件中的音素和字典属性覆盖。例如，您可能想通过使用现有语言的音素表来开始新语言的实现。

6.4 Phoneme Definition File 6.4音素定义文件
You must first decide on the set of phonemes (vowel and consonant sounds) for the language. These should be defined in a phoneme definition file ph_xxxx, where "ph_xxxx" is the name of your language. A reference to this file is then included at the end of the master phoneme file, phsource/phonemes, eg: 您必须首先确定该语言的音素集（元音和辅音）。这些应在音素定义文件ph_xxxx中定义，其中“ ph_xxxx”是您的语言名称。然后在主音素文件phsource / phonemes的末尾包含对该文件的引用，例如：

phonemetable   fr  base             
include   ph_french

This example defines a phoneme table "fr" which inherits the contents of phoneme table "base". Its contents are found in the file ph_french.
The base phoneme table contains definitions of a basic set of consonants, and also some "control" phonemes such as stress marks and pauses. These are defined in phsource/phonemes. The phoneme table for a language will inherit these, or alternatively it may inherit the phoneme table of another language which in turn inherits the base phoneme table. 此示例定义了一个音素表“ fr”，该表继承了音素表“ base”的内容。其内容可在文件ph_french中找到。基本音素表包含一组基本辅音的定义，以及一些“控制”音素，例如重音和停顿。这些在phsource /音素中定义。一种语言的音素表将继承这些音素表，或者可以继承另一种语言的音素表，而后者又继承基本的音素表。
The phonemes file for the language defines those additional phonemes which are not inherited (generally the vowels and diphthongs, plus any additional consonants that are needed), or phonemes whose definitions differ from the inherited version (eg. the redefinition of a consonant). 该语言的音素文件定义了那些未被继承的其他音素（通常是元音和双音，加上所需的任何其他辅音），或者其定义与继承版本不同的音素（例如重定义辅音）
Details of phonemes files are given in phontab.html.音素文件的详细信息在phontab.html中提供。
The Compile phoneme data function of the espeakedit program compiles the phonemes files of all languages to produce the files espeak-data/phontab, phonindex, and phondata which are used by eSpeak. 。
For many languages, the consonant phonemes which are already available in eSpeak, together with the available vowel files which can be used to define vowel phonemes, will be sufficient. At least for an initial implementation.

espeakedit程序的 “编译音素数据”功能会编译所有语言的音素文件，以生成eSpeak使用的文件espeak-data/phontab，phonindex和phondata。

对于许多语言而言，eSpeak中已经可用的辅音音素以及可用于定义元音音素的可用元音文件就足够了。至少对于最初的实现。

6.5 Dictionary Files 6.5字典文件
Once the language's phonemes have been defined, then pronunciation dictionary data can be produced in order to translate the language's source text into phonemes. This consists of two source files: fr_rules (the spelling to phoneme rules) and fr_list (an exceptions list, and attributes of certain words). The corresponding compiled data file is espeak-data/fr_dict which is produced from fr_rules and fr_list sources by the command: 一旦定义了语言的音素，便可以生成发音词典数据，以便将语言的源文本转换为音素。它由两个源文件组成：fr_rules（拼写为音素规则）和fr_list（例外列表以及某些单词的属性）。相应的已编译数据文件为espeak-data / fr_dict，该文件是通过以下命令从fr_rules和fr_list来源生成的：
espeak --compile=fr
Or by using the espeakedit program. 或使用espeakedit程序。
Details of the contents of the dictionary files are given in dictionary.html.字典文件内容的详细信息在dictionary.html中给出。
The fr_list file contains:

Pronunciations which exceptions to the rules in fr_rules, (eg. foreign names).
Pronunciation of letter names, symbol names, and punctuation names.
Pronunciation of numbers.
Attributes for words. For example, common function words which should not be stressed, or conjunctions which should be preceded by a pause.
该fr_list文件包含：
- fr_rules中规则例外的发音（例如外来名称）。
- 字母名称，符号名称和标点符号名称的发音。
- 数字的发音。
- 单词的属性。例如，不应该强调的常用功能词，或者在连词之前加一个暂停。

6.6 Program Code 6.6程式码
The behaviour of the eSpeak program is controlled by various options such as: eSpeak程序的行为由各种选项控制，例如：

Default rules for which syllable of a word has the main stress.
Relative lengths and amplitude of vowels in stressed and unstressed syllables.
Which intonation tunes to use.
Rules for speaking numbers.

The function SetTranslator() at the start of the source code file tr_languages.cpp recognizes the language code and sets the appropriate options. For a new language, you would add its language code and the required options in SetTranslator(). However, this may not be necessary during testing because most of the options can also be set in the voice file in espeak-data/voices (see Voice files).

单词的音节具有主要重音的默认规则。
重读和重读音节中元音的相对长度和幅度。
使用哪种音调。
说话号码规则。

源代码文件tr_languages.cpp开头的函数SetTranslator（）识别语言代码并设置适当的选项。对于新语言，您可以在SetTranslator（）中添加其语言代码和必需的选项。但是，这在测试期间可能不是必需的，因为大多数选项也可以在espeak-data / voices的语音文件中设置

6.7 Improving a Language 6.7改善语言
Listen carefully to the eSpeak voice. Try to identify what sounds wrong and what needs to be improved. 仔细听eSpeak语音。尝试找出听起来错误的地方以及需要改进的地方。

Make the spelling-to-phoneme translation rules more accurate, including the position of stressed syllables within words. Some languages are easier than others. I expect most are easier than English.
Improve the sounds of the phonemes. It may be that a phoneme should sound different depending on adjacent sounds, or whether it's at the start or the end of a word, between vowels, in a stressed or unstressed syllable, etc. This may consist of making small adjustments to vowel and diphthong quality or length, or adjusting the strength of consonants. Phoneme definitions can include conditional statements which can be used to change the sound of a phoneme depending on its environment. Bigger changes may be recording new or replacement consonant sounds, or may even need program code to implement new types of sounds.
Some common words should be added to the dictionary (the fr_list file for the language) with an "unstressed" attribute $u or $u+ (eg. in English, words such as "the", "is", "had", "my", "she", "of", "in", "some"), or should be preceded by a short pause (such as "and", "but", "which"), or have other attributes, in order to make the speech flow better.
Improve the rhythm of the speech by adjusting the relative lengths of vowels in different contexts, eg. stressed/unstressed syllable, or depending on the following phonemes. This is important for making the speech sound good for the language.
Make new intonation "tunes" for statements or questions (see Intonation).
使拼写到音素的翻译规则更加准确，包括单词中重读音节的位置。有些语言比其他语言容易。我希望大多数语言都比英语容易。
改善音素的声音。音素的发音可能会有所不同，具体取决于相邻的声音，或者它是在单词的开头还是结尾，在元音之间，在重音节还是非重音节中，等等。这可能包括对元音和双音小调质量或长度，或调整辅音的强度。音素定义可以包括条件语句，这些条件语句可用于根据音素的环境更改音素。较大的更改可能是在录制新的或替代的辅音，或者甚至可能需要程序代码来实现新型的声音。
某些常用单词应添加到词典（该语言的fr_list文件）中，并带有“ unstressed”属性$u或$u+（例如，英语中诸如“ the”，“ is”，“ had”，“ my”，“ she”，“ of”，“ in”，“ some”），或应在前面稍作停顿（例如“ and”，“ but”，“ that”）或其他属性，为了使语音流更好。
通过调整不同语境中元音的相对长度来提高语音节奏，例如。重读/重读音节，或取决于以下音素。这对于使语音听起来对语言有益非常重要。
为陈述或问题制作新的语调“曲调“

If you are interested in working on a language, please contact me so that I can set up the initial data and discuss the features of the language. 如果您对使用某种语言感兴趣，请与我联系，以便我设置初始数据并讨论该语言的功能。
For most of the eSpeak voices, I do not speak or understand the language, and I do not know how it should sound. I can only make improvements as a result of feedback from speakers of that language. If you want to help to improve a language, listen carefully and try to identify individual errors, either in the spelling-to-phoneme translation, the position of stressed syllables within words, or the sound of phonemes, or problems with rhythm and vowel lengths. 对于大多数eSpeak语音，我不会说或不理解该语言，也不知道该如何发音。我只能根据使用该语言的人的反馈进行改进。如果要帮助提高语言水平，请仔细听并尝试找出单个错误，包括拼写到音素的翻译，单词中重读音节的位置，音素的声音或韵律和元音长度的问题。

友情链接