理解numLeaves和numGaussians

背景

本文介绍语音识别决策树相关的叶子数和高斯数概念，并对比cmusphinx和kalid处理上的差异。

正文

a)I observed very large numbers for variables numLeaves=2500 and numGauss=15000 in timit/s5/run.sh. Do they respectively correspond to numMixtureGaussians and numTiedStates (senones)? If yes, then in Sphinx we used to have very low values for a database of size of TIMIT, i.e. numTiedStates<1000 and numMixtureGaussians between 8 to 32.

a）我在timit / s5 / run.sh中观察到非常大的变量numLeaves = 2500和numGauss = 15000。它们分别对应于numMixtureGaussians和numTiedStates（senones）吗？如果是，那么在Sphinx中，我们过去对于TIMIT大小的数据库通常使用非常低的值，即numTiedStates<1000和numMixtureGaussians在8至32之间。

b)Also, I observed that the number of training iterations are fixed here in advance, unlike Sphinx where the training stops when the Convergence ratio of likelihoods doesn't increase beyond a preset threhold.b）另外，我观察到训练迭代的次数在此处是预先确定的，这与Sphinx不同，在Sphinx中，当似然收敛比率未增加到预设阈值以上时，训练将停止。

Please explain the differences in point a) and the logic behind hardcoding #iterations in point b)请说明点a）的差异以及点b）中对＃迭代进行硬编码的逻辑

numGauss is the total Gaussians over all leaves (the #gauss per leaf is not fixed but varies according to data-count^0.2).Possibly the numLeaves and numGauss were not tuned well for TIMIT. numGauss是所有叶子上的总高斯（每片叶子的#gauss不是固定的，但会根据data-count ^ 0.2的变化而变化）。numLeaves和numGauss可能不适用于TIMIT。你

Youcould try fewer and see if it helps. But I don't recommend to use TIMIT; I prefer RM for small-scale debugging, or WSJ for larger scale.可以尝试减少尝试，看看是否有帮助。但是我不建议使用TIMIT。对于小型调试，我更喜欢使用RM；对于大规模调试，我更喜欢使用WSJ。

1) Does #pdfs obtained from gmm-info mean numLeaves? If yes, then why #pdfs trained = 1722, when I had set numLeaves to 2500 ? 1）从gmm-info获得的#pdf表示numLeaves吗？如果是，那么当我将numLeaves设置为2500时，为什么训练了#pdfs = 1722？

After doing the tree splitting, it clusters the leaves, so the number gets a little smaller.进行树分割后，它将叶子聚类，因此数量变小了。

2) Does numGauss/leaf vary for different leaves? 3)http://www.cs.toronto.edu/~fritz/absps/icassp12_dbn.pdf suggests that correlated feats such as log-filterbank work better than MFCCs? Then, why do we try to decorrelate the computed MFCCs using LDA+MLLT transform? as given at http://kaldi.sourceforge.net/dnn2.html2）numGauss / leaf是否随不同的叶子而变化？ 3）http://www.cs.toronto.edu/~fritz/absps/icassp12_dbn.pdf建议相关的功能（例如对数过滤器库）比MFCC更好？然后，为什么我们尝试使用LDA + MLLT变换对相关的MFCC进行解相关？如http://kaldi.sourceforge.net/dnn2.html所述

That's an interaction with pre-training of DNNs. The dnn2 recipe does not use pre-training so it's not an issue.这是与DNN的预训练的交互。 dnn2方案不使用预训练，因此这不是问题。

(a)In Sphinx-3, if we set numLeaves (tied-states) to 2500, it would train exactly 2500 leaves. But, in Kaldi it is training 1722.(b)Same argument holds true with the numGauss for a leaf which is not exactly same for all leaves in Kaldi unlike Sphinx-3 So, do the above two attributes grossly convey that Kaldi does not rigidly follow these two user set params, and decides the best fit according to the data ?（a）在Sphinx-3中，如果将numLeaves（并列状态）设置为2500，则它将精确地训练2500片叶子。但是，在Kaldi中训练1722.（b）对于numGauss来说，相同的论点适用于一片叶子，与Sphinx-3不同，该叶子对于Kaldi中的所有叶子并不完全相同，因此，上述两个属性是否可以大致传达出Kaldi并不严格遵循这两个用户设置参数，并根据数据确定最佳拟合？

The number of leaves is slightly less than what you set because it clusters the leaves after splitting the tree to the specified size.The numGauss specified as a total across all states. The num-gauss for each state (pdf-id) is allocated according to a small power (0.2) of the count.叶子的数量略少于您设置的数量，因为它在将树拆分为指定的大小之后将叶子聚类。numGauss指定为所有状态的总数。每个状态的num-gauss（pdf-id）根据计数的小数幂（0.2）分配。

I see. So, as I get from the reply, the difference lies in the point that, in Kaldi, the tree is ""split until the specified size"" whereas in Sphinx-3 the tree is grown fully, and then it is ""pruned to leave as many leaves as specified"".The Sphinx info was taken from the link "Pruning Decision Trees" at http://www.speech.cs.cmu.edu/sphinxman/fr4.html我懂了。因此，正如我从答复中得到的，不同之处在于，在Kaldi中，树是“拆分到指定的大小”，而在Sphinx-3中，树已完全长大，然后被修剪了。留下指定数量的叶子“”。Sphinx信息来自ttp://www.speech.cs.cmu.edu/sphinxman/fr4.html上的“修剪决策树”链接。

友情链接

汕头招聘网 | 山东招聘网 | 郑州教育培训 | 软件下载