精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
CMUSphinxDownloads Tutorial FAQ Contact Us About 
Caution!
                This tutorial uses the 5 pre-alpha  release.
                  It is not going to work for older versions.
                This page describes how to do some  simple acoustic model adaptation to improve speech recognition in your configuration.  Please note that the adaptation doesn’t necessary adapt for a particular  speaker. It just improves the fit between the adaptation data and the model.  For example you can adapt to your own voice to make dictation good, but you  also can adapt to your particular recording environment, your audio  transmission channel, your accent or accent of your users. You can use a model  trained with clean broadcast data and telephone data to produce a telephone  acoustic model by doing adaptation. Cross-language adaptation also make sense,  you can for example adapt an English model to sounds of another language by  creating a phoneset map and creating another language dictionary with an  English phoneset. 本页介绍如何进行一些简单的声学模型调整,以改善配置中的语音识别。请注意,适应不一定适合特定的发言者。它只是改善了适应数据和模型之间的拟合。例如,您可以适应自己的声音,使听写变得更好,但您也可以适应您的特定录音环境,音频传输频道,您的用户口音或重音。您可以使用经过干净的广播数据和电话数据训练的模型,通过调整来产生电话声学模型。跨语言调整也很有意义,例如,您可以通过创建语音集映射并使用英语语音集创建另一种语言字典,使英语模型适应另一种语言的声音。 
              The adaptation process takes transcribed data  and improves the model you already have. It’s more robust than training and  could lead to good results even if your adaptation data is small. For example,  it’s enough to have 5 minutes of speech to significantly improve the dictation  accuracy by adapting to the particular speaker. 适应过程采用转录数据并改进您已有的模型。它比训练更强大,即使您的适应数据很小,也可以产生良好的结果。例如,通过适应特定的发言者,可以有5分钟的语音来显着提高听写准确度。 
The methods of adaptation are a bit  different between PocketSphinx and Sphinx4 due to the different types of  acoustic models used. For more technical information on that read the article  about Acoustic Model  Types.
                由于使用了不同类型的声学模型,PocketSphinx和Sphinx4之间的适应方法略有不同。有关该技术的更多技术信息,请阅读有关声学模型类型的文章 。 
The first thing you need to do is to create a corpus of adaptation data. The corpus will consist of
您需要做的第一件事是创建适应数据语料库。语料库将包括
The sections below will refer to these  files, so, if you want to follow along we recommend downloading these files  now. You should also make sure that you have downloaded and compiled sphinxbase  and sphinxtrain. 以下部分将引用这些文件,因此,如果您想要跟进,我们建议您立即下载这些文件。您还应该确保已经下载并编译了sphinxbase和sphinxtrain。 
                Fileids文件示例内容为:arctic_0001
                Transcription文件示例内容为:<s> author of the danger trail philip steels etc </s>  (arctic_0001)
In case you are adapting to a single  speaker you can record the adaptation data yourself. This is unfortunately a  bit more complicated than it ought to be.
                Basically, you need to record a single audio file for each sentence in the adaptation  corpus, naming the files according to the names listed in arctic20.transcription and arctic20.fileids. 如果您正在适应单个扬声器,您可以自己记录适配数据。遗憾的是,它应该比它应该更复杂。
                基本上,你需要记录在适应语料库每个句子单一的音频文件,根据列出的名称命名文件 
                In addition, you need to make sure that  you record at a sampling rate of 16 kHz (or 8 kHz if you adapt a  telephone model) in mono with a single channel. 此外,您需要确保以单声道MONO16 kHz的采样率(如果您用电话型号,则为8 kHz)进行录制。 
                The simplest way would be to start a  sound recorder like Audacity or Wavesurfer and read all sentences in one big  audio file. Then you can cut the audio files on sentences in a text editor and  make sure every sentence is saved in the corresponding file. The file structure  should look like this:
                最简单的方法是启动Audacity或Wavesurfer等录音机,并读取一个大音频文件中的所有句子。然后,您可以在文本编辑器中剪切句子上的音频文件,并确保每个句子都保存在相应的文件中。文件结构应如下所示: 
                arctic_0001.wav  
                arctic_0002.wav
                .....
                arctic_0019.wav
                arctic20.fileids
                arctic20.transcription
                You should verify  that these recordings sound okay. To do this, you can play them back with: 您应该验证这些录音听起来不错。要做到这一点,你可以用以下方式播放它们: 
                for i in *.wav; do play $i; done 
                If you already have a recording of the  speaker, you can split it on sentences and create the .fileids and the .transcription files.
                If you are adapting to a channel, accent  or some other generic property of the audio, then you need to collect a little  bit more recordings manually. For example, in a call center you can record and  transcribe hundred calls and use them to improve the recognizer accuracy by  means of adaptation.
                如果你已经有扬声器的记录,你可以在句子拆分,并创建.fileids和.transcription文件。 
                如果您正在调整音频的频道,重音或其他一般属性,则需要手动收集更多的录音。例如,在呼叫中心,您可以录制和转录一百个呼叫,并通过改编使用它们来提高识别器的准确性。 
                First we will copy the default acoustic  model from PocketSphinx into the current directory in order to work on it.  Assuming that you installed PocketSphinx under /usr/local, the acoustic model directory is /usr/local/share/pocketsphinx/model/en-us/en-us. Copy this  directory to your working directory:默认模型,拷贝到新目录下 
                cp -a /usr/local/share/pocketsphinx/model/en-us/en-us .
                Let’s also copy the dictionary and the  langauge model for testing:拷贝字典和语言模型 
                cp -a  /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict .
                cp -a /usr/local/share/pocketsphinx/model/en-us/en-us.lm.bin  .
In order to run the adaptation tools,  you must generate a set of acoustic model feature files from these WAV audio  recordings. This can be done with the sphinx_fe tool from SphinxBase. It is imperative that you make sure you are  using the same acoustic parameters to extract these features as were used to  train the standard acoustic model. Since PocketSphinx 0.4, these are stored in  a file called feat.params in the  acoustic model directory. You can simply add it to the command line for sphinx_fe, like this: 要运行自适应工具,必须从这些WAV录音中生成一组声学模型特征文件。这可以使用sphinx_feSphinxBase 的 工具完成。您必须确保使用相同的声学参数来提取这些特征,就像用于训练标准声学模型一样。从PocketSphinx 0.4开始,这些存储在声学模型目录中feat.params文件。您只需将其添加到命令行sphinx_fe,如下所示: 
                sphinx_fe -argfile en-us/feat.params \
                -samprate  16000 -c arctic20.fileids \
                -di . -do .  -ei wav -eo mfc -mswav yes
                You should now have the following files  in your working directory: 您现在应该在工作目录中包含以下文件: 
                en-us
                arctic_0001.mfc
                arctic_0001.wav
                arctic_0002.mfc
                arctic_0002.wav
                arctic_0003.mfc
                arctic_0003.wav
                .....
                arctic_0020.wav
                arctic20.fileids
                arctic20.transcription
                cmudict-en-us.dict
                en-us.lm.bin
Some models like en-us are distributed  in compressed version. Extra files that are required for adaptation are  excluded to save space. For the en-us model from pocketsphinx you can download  the full version suitable for adaptation:某些模型比如英语以压缩版本发布,需要的更多文件在解压目录下没有。Pocketsphinx的英语模型可从全版本套件里下载。 
  cmusphinx-en-us-ptm-5.2.tar.gz 
                Make sure you are using the full model  with the mixture_weights file  present.确保使用具备mixture_weights文件的全模型。 
                If the mdef file inside the model is converted to binary, you will also need to  convert the mdef file from  the acoustic model to the plain text format used by the SphinxTrain tools. To  do this, use the pocketsphinx_mdef_convert program:如果mdef转换成二进制形式,你也需要转换mdef文件到纯文本格式,使用 
                pocketsphinx_mdef_convert -text en-us/mdef en-us/mdef.txt
                In the downloads the mdef is already in the text form.下载后就是了。 
The next step in the adaptation is to  collect statistics from the adaptation data.从适应数据里收集统计
                This is done using the bw program  from SphinxTrain. You should be able to find the bw tool in a sphinxtrain installation in the folder /usr/local/libexec/sphinxtrain (or under another prefix on Linux)  or in bin\Release (in the  sphinxtrain directory on Windows). Copy it to the working directory along with  the map_adapt and mk_s2sendump programs.bw命令带map_adapt和mk_s2sendump一起用。 
                Now, to collect the statistics, run:
                ./bw \
                -hmmdir en-us \
                -moddeffn  en-us/mdef.txt \
                -ts2cbfn .ptm. \
                -feat 1s_c_d_dd \
                -svspec  0-12/13-25/26-38 \
                -cmn current \
                -agc none \
                -dictfn  cmudict-en-us.dict \
                -ctlfn  arctic20.fileids \
                -lsnfn  arctic20.transcription \
                -accumdir .
                Make sure the arguments in the bw command match the parameters in the feat.params file inside the acoustic model folder. 确保bw命令的参数和声学模型的feat.params参数一致Please note that not all the parameters from feat.param are supported by bw. bw for  example doesn’t suppport upperf or other  feature extraction parameters. You only need to use parameters which are  accepted, other parameters from feat.params should be  skipped.从feat.param里找到的参数不一定会被bw命令支持。Bw不支持upperf或其它。只能用支持的,不支持的忽略。 
                For example, for a continuous model you  don’t need to include the svspec option.  Instead, you need to use just -ts2cbfn .cont. For  semi-continuous models use -ts2cbfn .semi. If the model  has a feature_transform file like  the en-us continuous model, you need to add the -lda feature_transform argument to bw, otherwise it will not work properly. 例如,对于连续模型,您不需要包含该svspec 选项。相反,您只需使用-ts2cbfn .cont.半连续模型即可-ts2cbfn .semi。如果模型有feature_transform像en-us连续模型这样的文件,则需要添加-lda feature_transform 参数给bw,否则它将无法正常工作。 
                If you are missing the noisedict file, you also need an extra step. Copy the fillerdict file into the directory that you choose in  the hmmdir parameter  and renaming it to noisedict. 如果您错过了该noisedict文件,还需要一个额外的步骤。将fillerdict文件复制到您在hmmdir 参数中选择的目录并将其重命名为noisedict。 
MLLR transforms are supported by  pocketsphinx and sphinx4. MLLR is a cheap adaptation method that is suitable  when the amount of data is limited. It’s a good idea to use MLLR for online  adaptation. MLLR works best for a continuous model. Its effect for semi-continuous  models is very limited since semi-continuous models mostly rely on mixture  weights. If you want the best accuracy you can combine MLLR adaptation with MAP  adaptation below. On the other hand, because MAP requires a lot of adaptation  data it is not really practical to use it for continuous models. For continuous  models MLLR is more reasonable. pocketphinx和sphinx4支持MLLR变换。MLLR是一种廉价的自适应方法,适用于数据量有限的情况。使用MLLR进行在线调整是个好主意。MLLR最适合连续模型。由于半连续模型主要依赖于混合物权重,因此它对半连续模型的影响非常有限。如果您想获得最佳精度,可以将MLLR适应与MAP适配结合起来。另一方面,由于MAP需要大量的自适应数据,因此将其用于连续模型并不实际。对于连续型号,MLLR更合理。 
                Next, we will generate an MLLR  transformation which we will pass to the decoder to adapt the acoustic model at  run-time. This is done with the mllr_solve program: 接下来,我们将生成一个MLLR变换,我们将传递给解码器以在运行时调整声学模型。这是通过该mllr_solve 程序完成的: 
                ./mllr_solve \
                -meanfn  en-us/means \
                -varfn  en-us/variances \
                -outmllrfn  mllr_matrix -accumdir .
                This command will create an adaptation  data file called mllr_matrix. Now, if you  wish to decode with the adapted model, simply add -mllr mllr_matrix (or whatever the path to the  mllr_matrix file you created is) to your pocketsphinx command line. 此命令将创建一个名为mllr_matrix的自适应数据文件。现在,如果您希望使用改编的模型进行解码,只需将-mllr mllr_matrix (或您创建的mllr_matrix文件的路径)添加到pocketsphinx命令行即可。 
MAP is a different adaptation method. In  this case, unlike for MLLR, we don’t create a generic transform but update each  parameter in the model. We will now copy the acoustic model directory and  overwrite the newly created directory with the adapted model files: MAP是一种不同的适应方法。在这种情况下,与MLLR不同,我们不会创建通用转换,而是更新模型中的每个参数。我们现在将复制声学模型目录并使用改编的模型文件覆盖新创建的目录: 
                cp -a en-us en-us-adapt
                To apply the adaptation, use the map_adapt program: 要应用改编,请使用以下map_adapt程序: 
                ./map_adapt \
                -moddeffn en-us/mdef.txt  \
                -ts2cbfn .ptm.  \
                -meanfn  en-us/means \
                -varfn  en-us/variances \
                -mixwfn  en-us/mixture_weights \
                -tmatfn  en-us/transition_matrices \
                -accumdir . \
                -mapmeanfn  en-us-adapt/means \
                -mapvarfn  en-us-adapt/variances \
                -mapmixwfn  en-us-adapt/mixture_weights \
                -maptmatfn  en-us-adapt/transition_matrices
If you want to save space for the model  you can use a sendump file which  is supported by PocketSphinx. For Sphinx4 you don’t need that. To recreate  the sendump file from  the updated mixture_weights file run: 如果要为模型节省空间,可以使用PocketSphinx支持的sendump文件。对于Sphinx4,您不需要它。要从更新的mixture_weights文件运行中重新创建sendump 文件: 
                ./mk_s2sendump \
                -pocketsphinx  yes \
                -moddeffn  en-us-adapt/mdef.txt \
                -mixwfn  en-us-adapt/mixture_weights \
                -sendumpfn  en-us-adapt/sendump
                Congratulations! You now have an adapted  acoustic model.
                The en-us-adapt/mixture_weights and en-us-adapt/mdef.txt files are not used by the decoder,  so, if you like, you can delete them to save some space. 在en-us-adapt/mixture_weights和en-us-adapt/mdef.txt文件不使用解码器,所以,如果你喜欢,你可以将它们删除,以节省一些空间。 
For Sphinx4, the adaptation is the same as for PocketSphinx, except that Sphinx4 can not read the binary compressed mdef and sendump files, you need to leave the mdef and the mixture weights file. 对于Sphinx4,适应是和PocketSphinx一样,除了Sphinx4无法读取二进制压缩mdef和sendump文件,你需要把mdef和mixture weights文件放到一边不用管。
After you have done the adaptation, it’s  critical to test the adaptation quality. To do that you need to setup the  database similar to the one used for adaptation. To test the adaptation you  need to configure the decoding with the required paramters, in particular, you  need to have a language model <your.lm>. For more  details see the tutorial on Building a Language Model. The detailed  process of testing the model is covered in another part of the tutorial. 完成调整后,测试适应质量至关重要。为此,您需要设置类似于用于适应的数据库。要测试适应性,您需要使用所需的参数配置解码,特别是您需要具有语言模型 <your.lm>。有关更多详细信息,请参阅构建语言模型教程 。本教程的另一部分介绍了测试模型的详细过程。 
                You can try to run the decoder on the  original acoustic model and on the new acoustic model to estimate the  improvement. 您可以尝试在原始声学模型和新声学模型上运行解码器以估计改进。 
After adaptation, the acoustic model is  located in the folder en-us-adapt. You need only  that folder. The model should have the following files: 适应后,声学模型位于文件夹中en-us-adapt。您只需要该文件夹。该模型应具有以下文件: 
                mdef
                feat.params
                mixture_weights
                means
                noisedict
                transition_matrices
                variances
                depending on the type of the model you  trained. 取决于您训练的模型类型。 
                To use the model in PocketSphinx, simply  put the model files to the resources of your application. Then point to it with  the -hmm option: 要在PocketSphinx中使用该模型,只需将模型文件放入应用程序的资源即可。然后用-hmm选项指向它: 
                pocketsphinx_continuous -hmm  `<your_new_model_folder>` -lm `<your_lm>` -dict `<your_dict>`  -infile test.wav
                or with the -hmm engine configuration option through the cmd_ln_init function. Alternatively, you can replace the old model  files with the new ones. 或通过cmd_ln_init 函数指定具备-hmm引擎配置选项使用。或者,您可以使用新模型替换旧模型文件。 
                To use the trained model in Sphinx4, you  need to update the model location in the code. 要在Sphinx4中使用经过训练的模型,您需要更新代码中的模型位置。 
If the adaptation didn’t improve your  results, first test the accuracy and make sure it’s good. 如果适应性没有改善您的结果,首先测试准确性并确保它是好的。 
  I have no idea where to start looking  for the problem…
我不知道从哪里开始寻找问题......
......或者通过适应我可能会有多大改进
…or how much improvement I might expect  through adaptation 
                From few sentences you should get about  10% relative WER improvement.
  I’m lost about… 
                …whether it needs more/better training  data, whether I’m not doing the adaptation correctly, whether my language model  is the problem here, or whether there is something intrinsically wrong with my  configuration.
                Most likely you just ignored some error  messages that were printed to you. You obviosly need to provide more  information and give access to your experiment files in order to get more  definite advise.
                从几句话来看,你应该获得大约10%的相对WER改善。 
  我迷失了...... 
                ...是否需要更多/更好的训练数据,是否我没有正确进行调整,我的语言模型是否存在问题,或者我的配置是否存在本质上的错误。 
                您很可能只是忽略了打印给您的一些错误消息。您显然需要提供更多信息并访问您的实验文件,以获得更明确的建议。 
What’s next
                We hope the adapted model gives you  acceptable results. If not, try to improve your adaptation process by:
我们希望改编的模型能为您提供可接受的结果 如果没有,请尝试通过以下方式改进适应过程: