精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
锐英源精品开源,禁止转载和任何形式的非法内容使用,违者必究
本文介绍语音识别决策树相关的叶子数和高斯数概念,并对比cmusphinx和kalid处理上的差异。
a)I observed very large numbers for variables numLeaves=2500 and numGauss=15000 in timit/s5/run.sh. Do they respectively correspond to numMixtureGaussians and numTiedStates (senones)? If yes, then in Sphinx we used to have very low values for a database of size of TIMIT, i.e. numTiedStates<1000 and numMixtureGaussians between 8 to 32.
a)我在timit / s5 / run.sh中观察到非常大的变量numLeaves = 2500和numGauss = 15000。它们分别对应于numMixtureGaussians和numTiedStates(senones)吗?如果是,那么在Sphinx中,我们过去对于TIMIT大小的数据库通常使用非常低的值,即numTiedStates<1000和numMixtureGaussians在8至32之间。
b)Also, I observed that the number of training iterations are fixed here in advance, unlike Sphinx where the training stops when the Convergence ratio of likelihoods doesn't increase beyond a preset threhold.b)另外,我观察到训练迭代的次数在此处是预先确定的,这与Sphinx不同,在Sphinx中,当似然收敛比率未增加到预设阈值以上时,训练将停止。
Please explain the differences in point a) and the logic behind hardcoding #iterations in point b)请说明点a)的差异以及点b)中对#迭代进行硬编码的逻辑
numGauss is the total Gaussians over all leaves (the #gauss per leaf is not fixed but varies according to data-count^0.2).Possibly the numLeaves and numGauss were not tuned well for TIMIT. numGauss是所有叶子上的总高斯(每片叶子的#gauss不是固定的,但会根据data-count ^ 0.2的变化而变化)。numLeaves和numGauss可能不适用于TIMIT。你
Youcould try fewer and see if it helps. But I don't recommend to use TIMIT; I prefer RM for small-scale debugging, or WSJ for larger scale.可以尝试减少尝试,看看是否有帮助。但是我不建议使用TIMIT。对于小型调试,我更喜欢使用RM;对于大规模调试,我更喜欢使用WSJ。
1) Does #pdfs obtained from gmm-info mean numLeaves? If yes, then why #pdfs trained = 1722, when I had set numLeaves to 2500 ? 1)从gmm-info获得的#pdf表示numLeaves吗?如果是,那么当我将numLeaves设置为2500时,为什么训练了#pdfs = 1722?
After doing the tree splitting, it clusters the leaves, so the number gets a little smaller.进行树分割后,它将叶子聚类,因此数量变小了。
2) Does numGauss/leaf vary for different leaves? 3)http://www.cs.toronto.edu/~fritz/absps/icassp12_dbn.pdf suggests that correlated feats such as log-filterbank work better than MFCCs? Then, why do we try to decorrelate the computed MFCCs using LDA+MLLT transform? as given at http://kaldi.sourceforge.net/dnn2.html2)numGauss / leaf是否随不同的叶子而变化? 3)http://www.cs.toronto.edu/~fritz/absps/icassp12_dbn.pdf建议相关的功能(例如对数过滤器库)比MFCC更好?然后,为什么我们尝试使用LDA + MLLT变换对相关的MFCC进行解相关?如http://kaldi.sourceforge.net/dnn2.html所述
That's an interaction with pre-training of DNNs. The dnn2 recipe does not use pre-training so it's not an issue.这是与DNN的预训练的交互。 dnn2方案不使用预训练,因此这不是问题。
(a)In Sphinx-3, if we set numLeaves (tied-states) to 2500, it would train exactly 2500 leaves. But, in Kaldi it is training 1722.(b)Same argument holds true with the numGauss for a leaf which is not exactly same for all leaves in Kaldi unlike Sphinx-3 So, do the above two attributes grossly convey that Kaldi does not rigidly follow these two user set params, and decides the best fit according to the data ?(a)在Sphinx-3中,如果将numLeaves(并列状态)设置为2500,则它将精确地训练2500片叶子。但是,在Kaldi中训练1722.(b)对于numGauss来说,相同的论点适用于一片叶子,与Sphinx-3不同,该叶子对于Kaldi中的所有叶子并不完全相同,因此,上述两个属性是否可以大致传达出Kaldi并不严格遵循这两个用户设置参数,并根据数据确定最佳拟合?
The number of leaves is slightly less than what you set because it clusters the leaves after splitting the tree to the specified size.The numGauss specified as a total across all states. The num-gauss for each state (pdf-id) is allocated according to a small power (0.2) of the count.叶子的数量略少于您设置的数量,因为它在将树拆分为指定的大小之后将叶子聚类。numGauss指定为所有状态的总数。每个状态的num-gauss(pdf-id)根据计数的小数幂(0.2)分配。
I see. So, as I get from the reply, the difference lies in the point that, in Kaldi, the tree is ""split until the specified size"" whereas in Sphinx-3 the tree is grown fully, and then it is ""pruned to leave as many leaves as specified"".The Sphinx info was taken from the link "Pruning Decision Trees" at http://www.speech.cs.cmu.edu/sphinxman/fr4.html我懂了。因此,正如我从答复中得到的,不同之处在于,在Kaldi中,树是“拆分到指定的大小”,而在Sphinx-3中,树已完全长大,然后被修剪了。留下指定数量的叶子“”。Sphinx信息来自ttp://www.speech.cs.cmu.edu/sphinxman/fr4.html上的“修剪决策树”链接。