train_diag_ubm参数

问

I run the DNN stage and in the local/online/run_nnet2_common.sh scripts i see different parameters in different examples (for example FE and SM) at the second stage (train_lda_mllt).
What is the mean of num-frames (=400000 at rm and 200000 at fisher) and what is the mean of the 512/256 parameter? (it is the STFT frames parameter?)我运行DNN阶段，并在local / online / run_nnet2_common.sh脚本中在第二阶段（train_lda_mllt）的
不同示例（例如FE和SM）
中看到不同的参数。
num-frame的平均值是什么（rm等于400000，fisher等于200000），并且512/256参数的平均值是多少？（这是STFT帧参数吗？）

答

I don't see the parameters you are mentioning as parameters for the steps/train_lda_mllt.sh script

These parameters are used during the UBM model training. The first number means the number of frames sampled from the input which will be used during the training, the second number is the number of gaussians in the UBM.
Meaning of both is described in the script.我看不到您提到的参数作为
step / train_lda_mllt.sh脚本的参数

这些参数在UBM模型训练期间使用。第一个数字是指在训练过程中将从输入中采样的帧数，第二个数字是UBM中的高斯数。
脚本中描述了两者的含义。

答

Right you're right
I mean the UBM step:

if [ $stage -le 3 ]; then
# To train a diagonal UBM we don't need very much data, so use the smallest # subset. the input directory exp/nnet2_online/tri5a is only needed for # the splice-opts and the LDA transform.
steps/online/nnet2/train_diag_ubm.sh --cmd "$train_cmd" --nj 4 --num-frames 400000 \ data/train_hires 512 exp/nnet2_online/tri5a exp/nnet2_online/diag_ubm
fi

I want to use this step to my data, but I don't know what number to choose for:

--num-frames 400000 or 200000 ?
512 or 256?

正确，您是正确的，
我的意思是UBM步骤：

如果[$ stage -le 3] ; 然后
＃要训练对角UBM，我们不需要太多数据，因此请使用最小的子集。只有plice-opts和LDA转换才需要输入目录exp / nnet2_online / tri5a 。
steps/online/nnet2/train_diag_ubm.sh --cmd "$train_cmd" --nj 4 --num-frames 400000 \ data/train_hires 512 exp/nnet2_online/tri5a exp/nnet2_online/diag_ubm
fi

我想对数据使用此步骤，但是我不知道选择哪个数字：

--num-frames 400000或200000？
512或256？

答

I doubt changing these values will give you any significant improvement. 256 and 400000 seems (to me) like a good starting value. If you have a lot of data (and more patience), you can increase both of these values. There might be some insights on whether the number of gaussians should be proportional to the number of speakers but I'm not aware of anything.我怀疑更改这些值是否会对您有任何显着改善。
在我看来，256和400000似乎是一个不错的起点。如果您有大量数据（并且有更多的耐心），则可以增加这两个值。有可能在高斯的数量和说话人数量成比例事情上，保持看法，但我也不确定。
y.

友情链接

汕头招聘网 | 山东招聘网 | 郑州教育培训 | 软件下载