锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / 三音素模型初始化中计算Gconsts错误
服务方向
人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发
联系方式
固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

锐英源精品开源,禁止转载和任何形式的非法内容使用,违者必究

 

三音素模型初始化中计算Gconsts错误


背景

最近遇到一个训练棘手问题,加了个说话人和一些语音,训练不过了,找了个英文文章觉得有帮助,翻译看下。通过本文我找到了问题发生的原因。有需要的朋友加QQ396806883联系。

正文

I've been playing with Kaldi for several weeks, but since I don't have access to speech data, I try to adapt the provided recipes to my field of work, handwriting recognition. Lately, I focused on the triphone models building but ran into several issues in the gmm-init-model part.我已经和Kaldi玩了几个星期,但是由于我无法访问语音数据,因此我尝试根据我的工作领域(手写识别)来调整提供的方案。最近,我专注于Triphone模型的构建,但是在gmm-init-model部分遇到了一些问题。

I followed the steps in egs/rm/s1, and when it came to initialize the model with the tree stats acc, I got an error saying that it was impossible to compute GConsts. I looked at the accumulator file, and noticed a bunch of zeros. After adding some logs in my code, I noticed that the problem appeared because I have some inv_var_(mix,d) equals infinity, making the quantity added to gc (in diag-gmm.cc) NaN. I have lots of zeros in the accumulator file, so the computed variance was probably zero. Fair enough. But where may that come from?我按照egs / rm / s1中的步骤进行操作,当使用树状统计信息acc初始化模型时,出现一个错误,提示无法计算GConsts。我看了累加器文件,发现一堆零。在我的代码中添加一些日志后,我注意到出现问题是因为我有一些inv_var_(mix,d)等于无穷大,从而使添加到gc(在diag-gmm.cc中)的NaN数量增加了。我在累加器文件中有很多零,所以计算出的方差可能是零。很公平。但是那可能从哪里来呢?

I would be glad to modify the monophone models, using different parameters for training, or a different number of Gaussians, but I don't quite grasp the process of tree building, or more precisely the acc-tree-stats working. That would surely help me have an idea where the problem comes from, and in what ways should I modify the models.我很乐意使用不同的训练参数或不同数量的高斯修改单音模型,但我不太了解树的构建过程,或者更确切地说,acc-tree-stats的工作原理。这肯定可以帮助我了解问题的根源以及应该以何种方式修改模型。

Is there any published paper, presentation, or documentation page regarding this process that would help me understand what is going on?是否有任何有关此过程的已发表论文,演示文稿或文档页面,可以帮助我了解正在发生的事情?

 

You can search for "clustering mechanisms" or "decision tree" in http://kaldi.sourceforge.net/, and also "decision tree".您可以在http://kaldi.sourceforge.net/中搜索“集群机制”或“决策树”,以及“决策树”。

Even with zero stats it should still work- I think there's some kind of variance floor. Can you write your tree stats in text mode and show me a segment where it has zeros, e.g. are there zero counts or just variances? And are there any warnings when the tree is being built? Do you have any features that are sometimes exactly zero?即使统计数据为零,它也应该仍然有效-我认为存在一些差异底线。您能否以文本模式编写树状统计信息,并向我显示一个包含零的段,例如零计数还是方差?并且在构建树时是否有任何警告?您是否有某些功能有时会完全为零?

Actually- perhaps it's that the variance floor is enforced in the tree-building but not the GMM initialization, which should be fixed- but it would still be nice to find out why this is happening, esp. whether it's zero counts or zero stats.实际上-也许是在树结构中强制实施了差异底限,但并未修正应固定的GMM初始化-但找出它为什么会发生仍然是一件很不错的事情,尤其是。无论是零计数还是零统计。

 


Thanks for the quick answers. I have indeed encountered warnings when building the tree (something about the objective function inscreasing sometimes in the process). In acc-tree-stats I kept the -var-floor option to its default value. Here is a part of the tree stats :感谢您的快速解答。构建树时确实遇到了警告(关于目标函数的某些信息在此过程中有时会增加)。在acc-tree-stats中,我将-var-floor选项保留为其默认值。这是树状态的一部分:

EV 4 -1 0 0 0 1 2 2 0 
T GCL 10 0.01  [
  0 0 0 -1.498798 0 0 0 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.01622272 2.2 -1.071532 -0.09000503 0.008714597 0.0008714597 2.2 -0.5 0 0.0006453735 0.002258807 0 0 0.0008714597 0 0.0004444444 0.004333333 0 0 0 0 0 0 0 0 0.01350352 0.0309286 0.03281539 0.06875694 0.06207478 0.234446 0.3995356 0.2362493 0.04817961 0.01099208 0 0.008522665 
  0 0 0 0.2906588 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0001194287 1.32 13.07694 0.009423427 6.227425e-05 7.59442e-07 1.32 0.13 0 2.33811e-07 1.695129e-06 0 0 7.59442e-07 0 1.975309e-07 9.148149e-06 0 0 0 0 0 0 0 0 9.355468e-05 0.000512627 0.0005589068 0.001888634 0.001800474 0.01537258 0.04021805 0.01426532 0.001140429 0.0001208259 0 7.263582e-05 ]
EV 4 -1 0 0 0 1 2 2 1 
T GCL 100 0.01  [
  0 0 0 -22.48715 0 0 0 200 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.07540225 0.7620931 6.415057 0.6415307 0.1059166 0 0 0 0.2443876 11.9 33.29661 0.9611968 0.01903067 0.05081851 8.3 1.5 0.0003968254 0.01359181 0.0181314 0.001048454 0.0003527337 0.00824947 0.0008888889 0.01802466 0.02965013 0.0007662835 0 0.0206048 0 0 0 0.05672932 0.09366696 0.1857097 0.319918 0.6256077 0.9178567 0.8582629 1.827387 5.563567 1.5674 0.6419963 0.09375235 0.04809504 0.1995389 
  0 0 0 6.307413 0 0 0 400 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.004143945 0.1247071 5.382755 0.0963399 0.005075073 0 0 0 0.003979291 5.51 91.61123 0.07204046 0.0001218673 0.0003227233 3.41 0.39 1.574704e-07 8.290994e-06 2.284529e-05 3.786443e-07 1.24421e-07 1.015884e-05 7.901235e-07 2.258667e-05 5.676471e-05 5.871905e-07 0 9.802626e-05 0 0 0 0.0008267016 0.001380023 0.002993633 0.00657781 0.03486262 0.07613795 0.04386431 0.1241599 0.8765761 0.08414948 0.02746228 0.002978953 0.0005307752 0.00563317 ]

After some log added (when x and x2 stats are retrieved and modifications applied to set up the means and variances (or maybe rather inv_mean and inv_vars)), I got these. Notice the 'inf' at 7th component of inv_vars, causing "gc" to become NaN.添加一些日志后(当检索x和x2统计信息并进行修改以设置均值和方差(或者更确切地说是inv_mean和inv_vars)时),我得到了这些。请注意inv_vars的第7个组件上的“ inf”,导致“ gc”变为NaN。

Stat 11
x is  [ 0.005 0.005 0.005 -128.801 0.005 0.005 0.005 1756 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005
0.005 -0.440038 -69.2 -164.269 1.84454 -0.135006 -0.0454223 -59.4 6.801 -0.0712467 0.00318505 -0.00192241 -0.0581773 -0.0966148 -0.0243636 -0.195697 0.00598413 -0.0107021 -0.127381 -0.233625 -0.0475183 -2.23639 -1.20044 -0.521
406 -0.0233868 0.005 0.0085977 0.0085977 0.0085977 0.0154942 -3.75901 -0.262131 0.12831 0.0695442 -2.24388 -10.1102 -35.2664 -8.75226 ]
x2 is  [ 0.005 0.005 0.005 22.8716 0.005 0.005 0.005 3512 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005
0.005 0.00289159 35.36 1705.1 0.306589 0.00353024 0.0011242 28.44 2.001 5.90201e-05 0.00300042 0.00100117 3.71574e-05 0.00124808 0.00101996 0.00174234 0.00400205 0.00102116 0.00127727 0.00238958 0.00124035 0.0564312 0.027146 0
.0126656 0.00318199 0.005 0.00402114 0.00402114 0.00402114 0.0040687 0.129385 0.00997947 0.0104759 0.0106462 0.0660989 0.603667 5.39491 0.435609 ]
After scale (count 878)
x is  [ 5.69476e-06 5.69476e-06 5.69476e-06 -0.146699 5.69476e-06 5.69476e-06 5.69476e-06 2 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.
69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 -0.000501183
 -0.0788155 -0.187094 0.00210084 -0.000153766 -5.17338e-05 -0.0676538 0.00774601 -8.11465e-05 3.62763e-06 -2.18954e-06 -6.62612e-05 -0.00011004 -2.7749e-05 -0.000222889 6.81563e-06 -1.21892e-05 -0.00014508 -0.000266088 -5.4121
1e-05 -0.00254715 -0.00136725 -0.000593856 -2.66364e-05 5.69476e-06 9.79237e-06 9.79237e-06 9.79237e-06 1.76472e-05 -0.00428134 -0.000298555 0.000146139 7.92076e-05 -0.00255567 -0.011515 -0.0401667 -0.00996841 ]
x2 is [ 5.69476e-06 5.69476e-06 5.69476e-06 0.0260497 5.69476e-06 5.69476e-06 5.69476e-06 4 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.
69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 3.29338e-06
0.0402733 1.94203 0.00034919 4.02078e-06 1.28041e-06 0.0323918 0.00227904 6.72211e-08 3.41733e-06 1.14029e-06 4.23205e-08 1.4215e-06 1.16169e-06 1.98444e-06 4.55814e-06 1.16305e-06 1.45475e-06 2.72162e-06 1.4127e-06 6.42724e-0
5 3.0918e-05 1.44255e-05 3.62413e-06 5.69476e-06 4.57989e-06 4.57989e-06 4.57989e-06 4.63406e-06 0.000147363 1.13661e-05 1.19316e-05 1.21255e-05 7.52835e-05 0.000687548 0.00614454 0.000496138 ]
After addvec2
x is  [ 5.69476e-06 5.69476e-06 5.69476e-06 -0.146699 5.69476e-06 5.69476e-06 5.69476e-06 2 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.
69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 -0.000501183
 -0.0788155 -0.187094 0.00210084 -0.000153766 -5.17338e-05 -0.0676538 0.00774601 -8.11465e-05 3.62763e-06 -2.18954e-06 -6.62612e-05 -0.00011004 -2.7749e-05 -0.000222889 6.81563e-06 -1.21892e-05 -0.00014508 -0.000266088 -5.4121
1e-05 -0.00254715 -0.00136725 -0.000593856 -2.66364e-05 5.69476e-06 9.79237e-06 9.79237e-06 9.79237e-06 1.76472e-05 -0.00428134 -0.000298555 0.000146139 7.92076e-05 -0.00255567 -0.011515 -0.0401667 -0.00996841 ]
x2 is [ 5.69473e-06 5.69473e-06 5.69473e-06 0.00452919 5.69473e-06 5.69473e-06 5.69473e-06 0 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5
.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 5.69473e-06 3.0422e-06
0.0340615 1.90702 0.000344777 3.99713e-06 1.27774e-06 0.0278148 0.00221904 6.06363e-08 3.41732e-06 1.14028e-06 3.793e-08 1.4094e-06 1.16092e-06 1.93476e-06 4.55809e-06 1.1629e-06 1.4337e-06 2.65082e-06 1.40977e-06 5.77844e-05
2.90487e-05 1.40728e-05 3.62342e-06 5.69473e-06 4.57979e-06 4.57979e-06 4.57979e-06 4.63374e-06 0.000129033 1.1277e-05 1.19102e-05 1.21193e-05 6.8752e-05 0.000554952 0.00453117 0.000396769 ]
After invert
x is  [ 5.69476e-06 5.69476e-06 5.69476e-06 -0.146699 5.69476e-06 5.69476e-06 5.69476e-06 2 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.
69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 5.69476e-06 -0.000501183
 -0.0788155 -0.187094 0.00210084 -0.000153766 -5.17338e-05 -0.0676538 0.00774601 -8.11465e-05 3.62763e-06 -2.18954e-06 -6.62612e-05 -0.00011004 -2.7749e-05 -0.000222889 6.81563e-06 -1.21892e-05 -0.00014508 -0.000266088 -5.4121
1e-05 -0.00254715 -0.00136725 -0.000593856 -2.66364e-05 5.69476e-06 9.79237e-06 9.79237e-06 9.79237e-06 1.76472e-05 -0.00428134 -0.000298555 0.000146139 7.92076e-05 -0.00255567 -0.011515 -0.0401667 -0.00996841 ]
x2 is [ 175601 175601 175601 220.79 175601 175601 175601 inf 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 175601 1756
01 175601 175601 175601 175601 175601 328710 29.3587 0.524377 2900.43 250179 782633 35.9521 450.645 1.64918e+07 292627 876974 2.63644e+07 709524 861388 516860 219390 859918 697494 377242 709336 17305.7 34425 71059.1 275982 175
601 218351 218351 218351 215808 7749.94 88676 83961.4 82513.3 14545 1801.96 220.693 2520.36 ]

I may indeed have some features that equals zero sometimes, I will check that.I do have zeros stats but that is indeed not what make the program crash (actually the logs I show are obtained with (artificially) floored stats to 0.001). I don't know where to find counts, but if it is in the "count" variable, then it is no zero count problem (in the example, count is 878).我有时确实确实有一些功能等于零,我会检查一下。我确实有零统计信息,但这确实不是导致程序崩溃的原因(实际上,我显示的日志是(人为地)将统计信息下限为0.001)。我不知道在哪里可以找到计数,但是如果它在“ count”变量中,那么就没有零计数问题(在示例中,count是878)

It just so happens in this particular case that the seventh component after scaling (x:=x/count, x2:=x2/count), x is 2 and x2 is 4. So x2-x^2 is zero in this case, and inverting cause it to become infinity. This in turn make gc NaN in the computation of GConsts.在这种特殊情况下,恰好发生这样的情况:缩放后的\u200b\u200b第七个分量(x:= x / count,x2:= x2 / count),x为2,x2为4。因此,x2-x ^ 2为零,并反转使它变为无穷大。这又使gc NaN成为GConsts的计算。
OK- well I'll have to look into exactly why it became infinity, and why the variance floor was not applied. But there is a more fundamental problem here. Your features are often exactly zero. This type of model is only applicable when your features are roughly Gaussian distributed. It's not an appropriate model otherwise. A lot of things will break if you have a lot of exact zeros in your data.好的,我必须仔细研究为什么它变为无穷大,以及为什么未应用方差底限。但是这里还有一个更根本的问题。您的功能通常完全为零。仅当您的特征大致呈高斯分布时,这种类型的模型才适用。否则,这不是一个合适的模型。如果您的数据中有很多精确的零,那么很多事情都会中断。

Also- in a few days there will be Resource Management example with publicy available data. A guy called Vassil Panayotov is working on this..]另外,几天之内将出现带有公开数据的资源管理示例。一个名叫Vassil Panayotov的人正在为此工作。
By the way, I don't know if that would be helpful for you, but I wrote a simple GraphViz-based visualization for the trees, as you can see at the bottom of: http://vpanayotov.blogspot.com/2012/02/poor-mans-kaldi-recipe-setup.html顺便说一句,我不知道这是否对您有帮助,但是我为树写了一个基于GraphViz的简单可视化,如您在底部看到的:http://vpanayotov.blogspot.com/2012 /02/poor-mans-kaldi-recipe-setup.html

Indeed, the feature vectors have quite a few zeros in them (about 0.3% of the time).实际上,特征向量中有很多零(大约占时间的0.3%)。

I get the impression that the feature values come from a discrete space of some kind. If you were to add Gaussian noise to them with some variance comparable to the difference in the original discrete values, it might help.我觉得特征值来自某种离散空间。如果您要向它们添加高斯噪声,并且其方差可以与原始离散值的差异相比较,则可能会有所帮助。
As promised, I keep you posted on my experiments. Indeed some features were "very discrete". I removed the most problematic ones while adding Gaussian noise to the others. At the same time I managed to understand the way statistics are accumulated and used.如所承诺的,我随时通知您我的实验。实际上,某些功能“非常离散”。我删除了最有问题的噪声,同时将高斯噪声添加到其他噪声噪声中。同时,我设法了解了统计数据的累积和使用方式。
Still, I get this problem in the tree building process. In the log I get a lot of "Objective function got worse when building tree", and eventually get stopped by an assertion fail尽管如此,我在树的构建过程中还是遇到了这个问题。在日志中,我得到了很多“目标函数在构建树时变得更糟”,并最终因断言失败而停止

build-tree: build-tree-utils.cc:321: kaldi::BaseFloat kaldi::ComputeInitialSplit(const std::vector<kaldi::Clusterable*, std::allocator<kaldi::Clusterable*> >&, const kaldi::Questions&, kaldi::EventKeyType, std::vector<int, std::allocator<int> >*): Assertion `!(this_objf < unsplit_objf - 0.01*(200 + std::abs(unsplit_objf)))' failed.

Could you find the values of objf and unsplit_objf? You may have to use gdb- see the Kaldi tutorial which has a section on debugging (find it from kaldi.sf.net)您能找到objf和unsplit_objf的值吗?您可能必须使用gdb,请参见Kaldi教程,其中包含有关调试的部分(可从kaldi.sf.net查找)

Here is the whole log, that gives the values of objf and unsplit_objf.这是整个日志,其中提供objf和unsplit_objf的值。

(...)
WARNING (build-tree:ComputeInitialSplit():build-tree-utils.cc:320) Objective function got worse when building tree: 3.77481e+06 < 3.77871e+06
WARNING (build-tree:ComputeInitialSplit():build-tree-utils.cc:320) Objective function got worse when building tree: 3.77418e+06 < 3.77871e+06
WARNING (build-tree:ComputeInitialSplit():build-tree-utils.cc:320) Objective function got worse when building tree: 100935 < 102269
build-tree: build-tree-utils.cc:321: kaldi::BaseFloat kaldi::ComputeInitialSplit(const std::vector >&, const kaldi::Questions&, kaldi::EventKeyType, std::vector >*): Assertion `!(this_objf < unsplit_objf - 0.01*(200 + std::abs(unsplit_objf)))' failed.
Aborted

Indeed, with

this_objf=100935
unsplit_objf=102269

the righthand side value is 101244.31 (>100935)右侧值为101244.31(> 100935)

OK- that could be due to roundoff- can you try compiling with -DKALDI_DOUBLEPRECISION=1 (modify kaldi.mk, then do make clean and make -j 8) If this helps, I will look into how to make it work without requiring double-precision compilation.好的,这可能是由于舍入而引起的。您可以尝试使用-DKALDI_DOUBLEPRECISION = 1进行编译(修改kaldi.mk,然后进行清理并制作-j 8)。如果有帮助,我将研究如何使其工作而无需两次精度编译。
I did that but still get the assertion fail. Actually, given the values of objf and unsplit_objf, if is normal that the assertion fails. What I don't get however is the reason for the choice of the 0.01 and 200 constant values in the assertion statement.I understand that a small difference can be due to roundoff, but we're talking about big differences here (more than 2,000 in the example). Can this nevertheless be due to roundoff? I still need to look a bit more into the code to understand how it works, because I don't understand why the objective function can decrease. I got the questions with "cluster-phones". Maybe I should try to come up with my own set of questions? or can it still be due to some problems in my features, causing treeacc to contain strange values (though I removed the least Gaussian features and added Gaussian noise)我这样做了,但仍然失败了。实际上,给定objf和unsplit_objf的值,如果断言失败是正常的。但是我没有得到在断言语句中选择0.01和200常量值的原因。我知道很小的差异可能是由于四舍五入引起的,但我们在这里谈论的是巨大的差异(超过2,000在示例中)。尽管如此,这是否可能是由于四舍五入而引起的?我仍然需要多看一些代码来理解它的工作原理,因为我不明白为什么目标函数会减少。我收到了“集群音素”的问题。也许我应该尝试提出自己的一系列问题?还是仍然是由于我的要素中的某些问题导致treeacc包含奇怪的值(尽管我删除了最少的高斯要素并添加了高斯噪声)
The way you got the questions should not matter. It is asserting something that should be mathematically true. Can you create a .tgz file including the command-line I need to run, and the files it needs to run the build-tree stage? .您提出问题的方式应该无关紧要。它断言某些在数学上应该是正确的。您是否可以创建一个.tgz文件,其中包括我需要运行的命令行以及运行build-tree阶段所需的文件? 。
Just to summarize a bit, there are two problems I experienced and reported in this thread.总结一下,我在此线程中遇到并报告了两个问题。

Model initialization模型初始化
gmm-init-model failed because Kaldi was not able to compute GConsts. But I noticed some features were discrete, which should not happen, and even the (expected) continuous ones were too often zero.So I removed the discrete features and added Gaussian noise to others. That seems to work.gmm-init-model失败,因为Kaldi无法计算GConsts。但是我注意到有些特征是离散的,这是不应该发生的,甚至(预期的)连续特征也经常为零,因此我删除了离散特征,并向其他特征添加了高斯噪声。这似乎有效。

Tree building树建筑
Lots of warnings saying objective function got worse, and eventually so worse that an assert failed.Just so I can go on with the training, I commented out this assertion, which obviously is not a proper way to proceed, but it allowed me to see whether the first problem I reported was fixed.许多警告说目标功能变得更糟,最终变得更糟,以至于断言失败了。为使我可以继续进行培训,我评论了这个断言,这显然不是正确的处理方法,但它使我看到了我报告的第一个问题是否已解决。
OK, I found and fixed the problem. Actually it was a kind of conceptual bug in the way the variance floor was applied while computing the objective function for Gaussian clustering.好的,我找到并解决了问题。实际上,这是在计算高斯聚类的目标函数时应用差异底限的方式中的一种概念性错误。
A little update on my situation. I managed to train triphone systems following the provided recipes (again, with handwriting recognition data, since I don't have access to speech data and will work on handwriting recognition during my PhD anyway). I got something like 20% WER in the monophone approach and tried to train a context-dependent model. And the WER is now 99%, so something must be wrong with what I am trying to do.我的情况有一些更新。我设法按照提供的方案训练了三音机系统(再次使用手写识别数据,因为我无权访问语音数据,并且无论如何在博士期间都可以进行手写识别)。我在单声道方法中得到了20%的WER,并尝试训练上下文相关的模型。现在的WER为99%,所以我要尝试做的事情一定有问题。
First, I encountered an issue in the conversion of alignments (after gmm-align, acc-tree-stats, gmm-init-model,…). I followed one recipe and converted the alignments from the monophone system into new alignment for my initialized context dependent model. That seems to work. However, when I try to get the old alignments back, I have an error :首先,我在对齐方式转换中遇到了一个问题(在gmm-align,acc-tree-stats,gmm-init-model等之后)。我遵循了一个配方,并将单声道系统的对齐方式转换为初始化的上下文相关模型的新对齐方式。这似乎有效。但是,当我尝试恢复旧的路线时,出现了一个错误:

>> convert-ali $srcmodel gmm/tri/mdl/1.mdl gmm/tri/mdl/tree ark:gmm/tri/ali/0.ali ark,t:$gmm/tri/ali/cur.ali 
LOG (convert-ali:main():convert-ali.cc:138) Succeeded converting alignments for 44194 files, failed for 0
>> convert-ali gmm/tri/mdl/1.mdl $srcmodel gmm/mono/mdl/tree ark:gmm/tri/ali/cur.ali ark,t:gmm/tri/ali/curbak.ali 
basic_filebuf::underflow error reading the file

Is that normal or might it be the cause of the high WER I get?Note: when I build a tree with questions only about HMM states it works, I get a reasonable WER. When I ask questions about context, I end up with a very bad model and also low counts for some states in gmm-init-model logs.请问:这是正常现象,还是导致我得到较高WER的原因?请注意:当我只对HMM状态提出疑问时,只要构建一棵树就可以了,那么我得到了合理的WER。当我问有关上下文的问题时,我最终得到一个非常糟糕的模型,并且gmm-init-model日志中某些状态的计数也很低。

This is weird. I have never seen this error before. See if you can find any earlier errors that might have caused this, or figure out what it's trying to read when it does this (e.g. run in gdb, and type "catch throw" to see the stack trace when it throws the exception).真奇怪我以前从未见过此错误。查看您是否可以找到可能导致此错误的更早错误,或者弄清楚它在执行此操作时要尝试读取的内容(例如,在gdb中运行,然后键入“ catch throw”以在引发异常时查看堆栈跟踪)。

友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内