CMU声学模型

CMUSphinx Downloads Tutorial FAQ Contact Us About

Adapting the default acoustic model

Creating an adaptation corpus 创建适应语料库

Adapting the acoustic model调整

Generating acoustic feature files生成声学特征文件
Converting the sendump and mdef files转换sendump和mdef文件
Accumulating observation counts积累观察次数
Creating a transformation with MLLR用MLLR创建转换
Updating the acoustic model files with MAP用MAP更新声学模型
Recreating the adapted sendump file重新创建适配好的sendump文件

Caution!
This tutorial uses the 5 pre-alpha release.
It is not going to work for older versions.
This page describes how to do some simple acoustic model adaptation to improve speech recognition in your configuration. Please note that the adaptation doesn’t necessary adapt for a particular speaker. It just improves the fit between the adaptation data and the model. For example you can adapt to your own voice to make dictation good, but you also can adapt to your particular recording environment, your audio transmission channel, your accent or accent of your users. You can use a model trained with clean broadcast data and telephone data to produce a telephone acoustic model by doing adaptation. Cross-language adaptation also make sense, you can for example adapt an English model to sounds of another language by creating a phoneset map and creating another language dictionary with an English phoneset. 本页介绍如何进行一些简单的声学模型调整，以改善配置中的语音识别。请注意，适应不一定适合特定的发言者。它只是改善了适应数据和模型之间的拟合。例如，您可以适应自己的声音，使听写变得更好，但您也可以适应您的特定录音环境，音频传输频道，您的用户口音或重音。您可以使用经过干净的广播数据和电话数据训练的模型，通过调整来产生电话声学模型。跨语言调整也很有意义，例如，您可以通过创建语音集映射并使用英语语音集创建另一种语言字典，使英语模型适应另一种语言的声音。
The adaptation process takes transcribed data and improves the model you already have. It’s more robust than training and could lead to good results even if your adaptation data is small. For example, it’s enough to have 5 minutes of speech to significantly improve the dictation accuracy by adapting to the particular speaker. 适应过程采用转录数据并改进您已有的模型。它比训练更强大，即使您的适应数据很小，也可以产生良好的结果。例如，通过适应特定的发言者，可以有5分钟的语音来显着提高听写准确度。

The methods of adaptation are a bit different between PocketSphinx and Sphinx4 due to the different types of acoustic models used. For more technical information on that read the article about Acoustic Model Types.
由于使用了不同类型的声学模型，PocketSphinx和Sphinx4之间的适应方法略有不同。有关该技术的更多技术信息，请阅读有关声学模型类型的文章。

Creating an adaptation corpus创建适应语料库

The first thing you need to do is to create a corpus of adaptation data. The corpus will consist of

a list of sentences
a dictionary describing the pronunciation of all the words in that list of sentences
a recording of you speaking each of those sentences

您需要做的第一件事是创建适应数据语料库。语料库将包括

一系列句子
描述该句子列表中所有单词的发音的字典
每个句子的口述录音

Required files

The actual set of sentences you use is somewhat arbitrary, but ideally it should have good coverage of the most frequently used words or phonemes in the set of sentences or the type of text you want to recognize. For example, if you want to recognize isolated commands, you need tor record them. If you want to recognize dictation, you need to record full sentences. For simple voice adaptation we have had good results simply by using sentences from the CMU ARCTIC text-to-speech databases. To that effect, here are the first 20 sentences from ARCTIC, a .fileids file, and a transcription file:
您使用的实际句子集有些随意，但理想情况下，它应该能够很好地覆盖句子集中最常用的单词或音素或您想要识别的文本类型。例如，如果要识别分开的命令，则需要记录它们。如果你想识别听写，你需要记录完整的句子。对于简单的语音自适应，我们只需使用CMU ARCTIC文本到语音数据库中的句子就可以获得很好的结果。为此，以下是来自ARCTIC的前20个句子，一个.fileids文件和一个转录文件：

arctic20.fileids

arctic20.transcription

The sections below will refer to these files, so, if you want to follow along we recommend downloading these files now. You should also make sure that you have downloaded and compiled sphinxbase and sphinxtrain. 以下部分将引用这些文件，因此，如果您想要跟进，我们建议您立即下载这些文件。您还应该确保已经下载并编译了sphinxbase和sphinxtrain。
Fileids文件示例内容为：arctic_0001
Transcription文件示例内容为：<s> author of the danger trail philip steels etc </s> (arctic_0001)

Recording your adaptation data记录您的适应数据

In case you are adapting to a single speaker you can record the adaptation data yourself. This is unfortunately a bit more complicated than it ought to be.
Basically, you need to record a single audio file for each sentence in the adaptation corpus, naming the files according to the names listed in arctic20.transcription and arctic20.fileids. 如果您正在适应单个扬声器，您可以自己记录适配数据。遗憾的是，它应该比它应该更复杂。
基本上，你需要记录在适应语料库每个句子单一的音频文件，根据列出的名称命名文件
In addition, you need to make sure that you record at a sampling rate of 16 kHz (or 8 kHz if you adapt a telephone model) in mono with a single channel. 此外，您需要确保以单声道MONO16 kHz的采样率（如果您用电话型号，则为8 kHz）进行录制。
The simplest way would be to start a sound recorder like Audacity or Wavesurfer and read all sentences in one big audio file. Then you can cut the audio files on sentences in a text editor and make sure every sentence is saved in the corresponding file. The file structure should look like this:
最简单的方法是启动Audacity或Wavesurfer等录音机，并读取一个大音频文件中的所有句子。然后，您可以在文本编辑器中剪切句子上的音频文件，并确保每个句子都保存在相应的文件中。文件结构应如下所示：
arctic_0001.wav
arctic_0002.wav
.....
arctic_0019.wav
arctic20.fileids
arctic20.transcription
You should verify that these recordings sound okay. To do this, you can play them back with: 您应该验证这些录音听起来不错。要做到这一点，你可以用以下方式播放它们：
for i in *.wav; do play $i; done
If you already have a recording of the speaker, you can split it on sentences and create the .fileids and the .transcription files.
If you are adapting to a channel, accent or some other generic property of the audio, then you need to collect a little bit more recordings manually. For example, in a call center you can record and transcribe hundred calls and use them to improve the recognizer accuracy by means of adaptation.
如果你已经有扬声器的记录，你可以在句子拆分，并创建.fileids和.transcription文件。
如果您正在调整音频的频道，重音或其他一般属性，则需要手动收集更多的录音。例如，在呼叫中心，您可以录制和转录一百个呼叫，并通过改编使用它们来提高识别器的准确性。

Adapting the acoustic model适配声学模型

First we will copy the default acoustic model from PocketSphinx into the current directory in order to work on it. Assuming that you installed PocketSphinx under /usr/local, the acoustic model directory is /usr/local/share/pocketsphinx/model/en-us/en-us. Copy this directory to your working directory:默认模型，拷贝到新目录下
cp -a /usr/local/share/pocketsphinx/model/en-us/en-us .
Let’s also copy the dictionary and the langauge model for testing:拷贝字典和语言模型
cp -a /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict .
cp -a /usr/local/share/pocketsphinx/model/en-us/en-us.lm.bin .

Generating acoustic feature files生成声学特征文件

In order to run the adaptation tools, you must generate a set of acoustic model feature files from these WAV audio recordings. This can be done with the sphinx_fe tool from SphinxBase. It is imperative that you make sure you are using the same acoustic parameters to extract these features as were used to train the standard acoustic model. Since PocketSphinx 0.4, these are stored in a file called feat.params in the acoustic model directory. You can simply add it to the command line for sphinx_fe, like this: 要运行自适应工具，必须从这些WAV录音中生成一组声学模型特征文件。这可以使用sphinx_feSphinxBase 的工具完成。您必须确保使用相同的声学参数来提取这些特征，就像用于训练标准声学模型一样。从PocketSphinx 0.4开始，这些存储在声学模型目录中feat.params文件。您只需将其添加到命令行sphinx_fe，如下所示：
sphinx_fe -argfile en-us/feat.params \
-samprate 16000 -c arctic20.fileids \
-di . -do . -ei wav -eo mfc -mswav yes
You should now have the following files in your working directory: 您现在应该在工作目录中包含以下文件：
en-us
arctic_0001.mfc
arctic_0001.wav
arctic_0002.mfc
arctic_0002.wav
arctic_0003.mfc
arctic_0003.wav
.....
arctic_0020.wav
arctic20.fileids
arctic20.transcription
cmudict-en-us.dict
en-us.lm.bin

Converting the sendump and mdef files

Some models like en-us are distributed in compressed version. Extra files that are required for adaptation are excluded to save space. For the en-us model from pocketsphinx you can download the full version suitable for adaptation:某些模型比如英语以压缩版本发布，需要的更多文件在解压目录下没有。Pocketsphinx的英语模型可从全版本套件里下载。
cmusphinx-en-us-ptm-5.2.tar.gz
Make sure you are using the full model with the mixture_weights file present.确保使用具备mixture_weights文件的全模型。
If the mdef file inside the model is converted to binary, you will also need to convert the mdef file from the acoustic model to the plain text format used by the SphinxTrain tools. To do this, use the pocketsphinx_mdef_convert program:如果mdef转换成二进制形式，你也需要转换mdef文件到纯文本格式，使用
pocketsphinx_mdef_convert -text en-us/mdef en-us/mdef.txt
In the downloads the mdef is already in the text form.下载后就是了。

Accumulating observation counts积累观察次数

The next step in the adaptation is to collect statistics from the adaptation data.从适应数据里收集统计
This is done using the bw program from SphinxTrain. You should be able to find the bw tool in a sphinxtrain installation in the folder /usr/local/libexec/sphinxtrain (or under another prefix on Linux) or in bin\Release (in the sphinxtrain directory on Windows). Copy it to the working directory along with the map_adapt and mk_s2sendump programs.bw命令带map_adapt和mk_s2sendump一起用。
Now, to collect the statistics, run:
./bw \
-hmmdir en-us \
-moddeffn en-us/mdef.txt \
-ts2cbfn .ptm. \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-cmn current \
-agc none \
-dictfn cmudict-en-us.dict \
-ctlfn arctic20.fileids \
-lsnfn arctic20.transcription \
-accumdir .
Make sure the arguments in the bw command match the parameters in the feat.params file inside the acoustic model folder. 确保bw命令的参数和声学模型的feat.params参数一致Please note that not all the parameters from feat.param are supported by bw. bw for example doesn’t suppport upperf or other feature extraction parameters. You only need to use parameters which are accepted, other parameters from feat.params should be skipped.从feat.param里找到的参数不一定会被bw命令支持。Bw不支持upperf或其它。只能用支持的，不支持的忽略。
For example, for a continuous model you don’t need to include the svspec option. Instead, you need to use just -ts2cbfn .cont. For semi-continuous models use -ts2cbfn .semi. If the model has a feature_transform file like the en-us continuous model, you need to add the -lda feature_transform argument to bw, otherwise it will not work properly. 例如，对于连续模型，您不需要包含该svspec 选项。相反，您只需使用-ts2cbfn .cont.半连续模型即可-ts2cbfn .semi。如果模型有feature_transform像en-us连续模型这样的文件，则需要添加-lda feature_transform 参数给bw，否则它将无法正常工作。
If you are missing the noisedict file, you also need an extra step. Copy the fillerdict file into the directory that you choose in the hmmdir parameter and renaming it to noisedict. 如果您错过了该noisedict文件，还需要一个额外的步骤。将fillerdict文件复制到您在hmmdir 参数中选择的目录并将其重命名为noisedict。

Creating a transformation with MLLR使用MLLR创建转换

MLLR transforms are supported by pocketsphinx and sphinx4. MLLR is a cheap adaptation method that is suitable when the amount of data is limited. It’s a good idea to use MLLR for online adaptation. MLLR works best for a continuous model. Its effect for semi-continuous models is very limited since semi-continuous models mostly rely on mixture weights. If you want the best accuracy you can combine MLLR adaptation with MAP adaptation below. On the other hand, because MAP requires a lot of adaptation data it is not really practical to use it for continuous models. For continuous models MLLR is more reasonable. pocketphinx和sphinx4支持MLLR变换。MLLR是一种廉价的自适应方法，适用于数据量有限的情况。使用MLLR进行在线调整是个好主意。MLLR最适合连续模型。由于半连续模型主要依赖于混合物权重，因此它对半连续模型的影响非常有限。如果您想获得最佳精度，可以将MLLR适应与MAP适配结合起来。另一方面，由于MAP需要大量的自适应数据，因此将其用于连续模型并不实际。对于连续型号，MLLR更合理。
Next, we will generate an MLLR transformation which we will pass to the decoder to adapt the acoustic model at run-time. This is done with the mllr_solve program: 接下来，我们将生成一个MLLR变换，我们将传递给解码器以在运行时调整声学模型。这是通过该mllr_solve 程序完成的：
./mllr_solve \
-meanfn en-us/means \
-varfn en-us/variances \
-outmllrfn mllr_matrix -accumdir .
This command will create an adaptation data file called mllr_matrix. Now, if you wish to decode with the adapted model, simply add -mllr mllr_matrix (or whatever the path to the mllr_matrix file you created is) to your pocketsphinx command line. 此命令将创建一个名为mllr_matrix的自适应数据文件。现在，如果您希望使用改编的模型进行解码，只需将-mllr mllr_matrix （或您创建的mllr_matrix文件的路径）添加到pocketsphinx命令行即可。

Updating the acoustic model files with MAP使用MAP更新声学模型文件

MAP is a different adaptation method. In this case, unlike for MLLR, we don’t create a generic transform but update each parameter in the model. We will now copy the acoustic model directory and overwrite the newly created directory with the adapted model files: MAP是一种不同的适应方法。在这种情况下，与MLLR不同，我们不会创建通用转换，而是更新模型中的每个参数。我们现在将复制声学模型目录并使用改编的模型文件覆盖新创建的目录：
cp -a en-us en-us-adapt
To apply the adaptation, use the map_adapt program: 要应用改编，请使用以下map_adapt程序：
./map_adapt \
-moddeffn en-us/mdef.txt \
-ts2cbfn .ptm. \
-meanfn en-us/means \
-varfn en-us/variances \
-mixwfn en-us/mixture_weights \
-tmatfn en-us/transition_matrices \
-accumdir . \
-mapmeanfn en-us-adapt/means \
-mapvarfn en-us-adapt/variances \
-mapmixwfn en-us-adapt/mixture_weights \
-maptmatfn en-us-adapt/transition_matrices

Recreating the adapted sendump file重新创建适应的sendump文件

If you want to save space for the model you can use a sendump file which is supported by PocketSphinx. For Sphinx4 you don’t need that. To recreate the sendump file from the updated mixture_weights file run: 如果要为模型节省空间，可以使用PocketSphinx支持的sendump文件。对于Sphinx4，您不需要它。要从更新的mixture_weights文件运行中重新创建sendump 文件：
./mk_s2sendump \
-pocketsphinx yes \
-moddeffn en-us-adapt/mdef.txt \
-mixwfn en-us-adapt/mixture_weights \
-sendumpfn en-us-adapt/sendump
Congratulations! You now have an adapted acoustic model.
The en-us-adapt/mixture_weights and en-us-adapt/mdef.txt files are not used by the decoder, so, if you like, you can delete them to save some space. 在en-us-adapt/mixture_weights和en-us-adapt/mdef.txt文件不使用解码器，所以，如果你喜欢，你可以将它们删除，以节省一些空间。

Other acoustic models

For Sphinx4, the adaptation is the same as for PocketSphinx, except that Sphinx4 can not read the binary compressed mdef and sendump files, you need to leave the mdef and the mixture weights file. 对于Sphinx4，适应是和PocketSphinx一样，除了Sphinx4无法读取二进制压缩mdef和sendump文件，你需要把mdef和mixture weights文件放到一边不用管。

Testing the adaptation测试适应性

After you have done the adaptation, it’s critical to test the adaptation quality. To do that you need to setup the database similar to the one used for adaptation. To test the adaptation you need to configure the decoding with the required paramters, in particular, you need to have a language model <your.lm>. For more details see the tutorial on Building a Language Model. The detailed process of testing the model is covered in another part of the tutorial. 完成调整后，测试适应质量至关重要。为此，您需要设置类似于用于适应的数据库。要测试适应性，您需要使用所需的参数配置解码，特别是您需要具有语言模型 <your.lm>。有关更多详细信息，请参阅构建语言模型教程。本教程的另一部分介绍了测试模型的详细过程。
You can try to run the decoder on the original acoustic model and on the new acoustic model to estimate the improvement. 您可以尝试在原始声学模型和新声学模型上运行解码器以估计改进。

Using the model

After adaptation, the acoustic model is located in the folder en-us-adapt. You need only that folder. The model should have the following files: 适应后，声学模型位于文件夹中en-us-adapt。您只需要该文件夹。该模型应具有以下文件：
mdef
feat.params
mixture_weights
means
noisedict
transition_matrices
variances
depending on the type of the model you trained. 取决于您训练的模型类型。
To use the model in PocketSphinx, simply put the model files to the resources of your application. Then point to it with the -hmm option: 要在PocketSphinx中使用该模型，只需将模型文件放入应用程序的资源即可。然后用-hmm选项指向它：
pocketsphinx_continuous -hmm `<your_new_model_folder>` -lm `<your_lm>` -dict `<your_dict>` -infile test.wav
or with the -hmm engine configuration option through the cmd_ln_init function. Alternatively, you can replace the old model files with the new ones. 或通过cmd_ln_init 函数指定具备-hmm引擎配置选项使用。或者，您可以使用新模型替换旧模型文件。
To use the trained model in Sphinx4, you need to update the model location in the code. 要在Sphinx4中使用经过训练的模型，您需要更新代码中的模型位置。

Troubleshooting

If the adaptation didn’t improve your results, first test the accuracy and make sure it’s good. 如果适应性没有改善您的结果，首先测试准确性并确保它是好的。
I have no idea where to start looking for the problem…

Test whether the accuracy on the adaptation set improved

Accuracy improved on adaptation set ⇢ check if your adaptation set matches with your test set

Accuracy didn’t improve on adaptation set ⇢ you made a mistake during the adaptation

我不知道从哪里开始寻找问题......

测试适应集的准确性是否提高

适应集的准确度得到改善⇢检查您的适应集是否与您的测试集匹配

适应性的准确性没有改善 - 你在适应期间犯了一个错误

......或者通过适应我可能会有多大改进

…or how much improvement I might expect through adaptation
From few sentences you should get about 10% relative WER improvement.
I’m lost about…
…whether it needs more/better training data, whether I’m not doing the adaptation correctly, whether my language model is the problem here, or whether there is something intrinsically wrong with my configuration.
Most likely you just ignored some error messages that were printed to you. You obviosly need to provide more information and give access to your experiment files in order to get more definite advise.
从几句话来看，你应该获得大约10％的相对WER改善。
我迷失了......
...是否需要更多/更好的训练数据，是否我没有正确进行调整，我的语言模型是否存在问题，或者我的配置是否存在本质上的错误。
您很可能只是忽略了打印给您的一些错误消息。您显然需要提供更多信息并访问您的实验文件，以获得更明确的建议。

What’s next
We hope the adapted model gives you acceptable results. If not, try to improve your adaptation process by:

Adding more adaptation data

Adapting your language mode / using a better language model

我们希望改编的模型能为您提供可接受的结果如果没有，请尝试通过以下方式改进适应过程：

添加更多适应数据

适应您的语言模式/使用更好的语言模型