精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
本文介绍语音识别优化,通过调整语音模型和语言模型的权重、标或比例类似这方面的参数,来获取最优结果。晶格是指识别结果的一个原始部分。
I am evaluating differently sized language models and tried to modify how much the weight of the language model during decoding. I assumed that the acoustic-scale influences the language model weight as lm-weight=1-acoustic-scale. However, I quickly figured out that this is not the case.I read in a related post that the acoustic-scale should be close to the inverse of the lm-scale我正在评估大小不同的语言模型,并尝试在解码过程中修改语言模型的权重。我假设acoustic-scale会影响语言模型的权重,因为lm-weight = 1-acoustic-scale。但是,我很快就发现不是这种情况,我在一篇相关文章中读到,acoustic-scale应该接近lm-scale的倒数。
and that the lm-scale is usually set in scoring. I tried to modify the acoustic-scale and noticed a reduction in WER if I decrease it.I am now wonderin what the best way is to modify the influence of the language model.A pointer in the right direction would be highly appreciated.lm-scale通常是在评分中设置的。我尝试修改acoustic-scale,并发现如果减小WER会降低。现在我想知道修改语言模型影响的最佳方法是什么,朝着正确方向的指导将受到高度赞赏。
It's the relative scale between the acoustic and language models that's the most important (i.e. the ratio). In decoding, we generate lattices and the LM scale is left at one and the acoustic scale is varied. While rescoring lattices to get the best path during scoring, it happens to be more convenient to leave the LM scale at 1 and give the acoustic scale integer values. All that is important here is the ratio between the two. We typically expect good results when the acoustic scale used during decoding is set to the inverse of whatever LM scale was best in scoring. For example if the best LM weight in scoring was 13, you would set the acoustic scale used in decoding to 1/13.最重要的是声学模型和语言模型之间的相对比例(即比率)。在解码中,我们生成晶格,并且LM标度保留为一,并且声学标度发生变化。在对晶格进行计分以在评分过程中获得最佳路径时,将LM标度保留为1并给出声学标度整数值会更方便。在这里重要的是两者之间的比率。当将解码过程中使用的acoustic-scale设置为得分最佳的LM标的倒数时,我们通常会期望获得良好的结果。例如,如果评分的最佳LM权重为13,则可以将解码中使用的声阶设置为1/13。
So the best idea is to do an initial decoding with the default acoustic-scale of 0.1. Then rescore the lattice and set the lm-scale to integer values between ca. 7 and 15 (taken from another post). Then do another decoding with the acoustic-scale of 1/lm-scale with best results and rescore the lattice with the best lm-scale again.Currently, I am only using lattice-to-nbest and nbest-to-linear. To rescore with the same language model but a different lm-scale it should be (taken from the tutorial)因此,最好的办法是使用默认的acoustic-scale0.1进行初始解码。然后重新排列晶格,并将lm-scale设置为ca.7和15之间的整数值。(摘自另一个帖子)。然后再用1 / lm尺度的acoustic-scale进行另一次解码,得到最好的结果,并再次以lm尺度的最合适的方式重新整理晶格。要重用相同的语言模型但应使用不同的lm刻度(取自本教程)
Is this the correct way or do I need to use another script?这是正确的方法还是我需要使用其他脚本?
So the best idea is to do an initial decoding with the default.因此,最好的办法是使用默认值进行初始解码。
Yes- it might not make much (or any) difference but it could make the lattice a little higher quality at the optimal acoustic scale.是的-它可能不会有太大的(或任何)区别,但可以在最佳acoustic-scale下使晶格质量更高。
Many of the scripts that read lattices accept the --lm-scale option. I just added the --lm-scale option to lattice-to-nbest (update and recompile... previously it only accepted the --acoustic-scale option).Like most programs that scale lattices (but unlike lattice-copy),lattice-to-nbest reverses the scaling after doing its operation, so the output lattices will have the un-scaled values on them.许多读取晶格的脚本都接受--lm-scale选项。我只是将--lm-scale选项添加到了点对点的最佳效果(更新和重新编译...之前它只接受了--acoustic-scale选项),就像大多数缩放晶格的程序一样(但与晶格复制不同),点对点最佳操作完成后会反转缩放比例,因此输出晶格上将具有未缩放的值。
What do you mean with the output lattice will have the un-scaled values. Does this mean I cannot use lattice-to-nbest and then nbest-to-linear on the output?您对输出晶格的意思是将具有未缩放的值。这是否意味着我不能在输出上使用lattice-to-nbest和后续的nbest-to-linear?
No, you can still use it- lattice-to-nbest scales the input, then computes the n-best, then un-scales. It all depends what you're going to do with the output. Ultimately you are going to have to read the documentation on lattices and decoding in Kaldi.否,您仍然可以使用lattice-to-nbest来缩放输入,然后计算n最佳,然后不缩放。这完全取决于您将如何处理输出。最终,您将不得不阅读Kaldi中有关晶格和解码的文档。