锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / DNN nnet2在线解码结果错误
服务方向
人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发
联系方式
固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

DNN nnet2在线解码结果错误


I have success training the DNN nnet2 online model with MFCC feature. So I do the same thing again with fbank feature.
However, I got an unexpected WER, almost the sentence is recognized incorrectly.

I have checked the compute_prob_valid.*.log, it looks fine. With MFCC the final value is 0.6065 and with fbank it is 0.573.

The command for the decoding is:
online2-wav-nnet2-latgen-faster --online=true --do-endpointing=false --config=online_nnet2_decoding.conf --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=0.1 --word-symbol-table=words.txt final.mdl HCLG.fst ...

I also checked the online_nnet2_decoding.conf. It was generated correctly for the fbank:
--feature-type=fbank
--fbank-config=...fbank.conf
--ivector-extraction-config=...ivector_extractor.conf
--endpoint.silence-phones=...

I would appreciate if you could give me some hints to find out the problem!

我已经成功地使用MFCC功能训练了DNN nnet2在线模型。
所以我用fbank功能再次做同样的事情。 但是,我得到了意外的WER,几乎该句子被错误地识别。

我已经检查了compute_prob_valid。*。log,看起来不错。对于MFCC,最终值为0.6065;对于fbank,最终值为0.573。

解码命令为:
online2-wav-nnet2-latgen-faster --online = true --do-endpointing = false --config = online_nnet2_decoding.conf --max-active = 7000 --beam = 15.0 --lattice -beam = 6.0 --acoustic-scale = 0.1 --word-symbol-table = words.txt final.mdl HCLG.fst ...

我还检查了online_nnet2_decoding.conf。它是为fbank正确生成的:--
feature-type = fbank
--fbank-config = ... fbank.conf
--ivector-extraction-config = ... ivector_extractor.conf
--endpoint.silence-phones =。 ..

如果您能给我一些提示以找出问题的答案,我将不胜感激!

I don't think I have ever run the setup with the fbank features-- there is really no point, because we use MFCC without dimension
reduction, which are just a linearly transformed version of the fbank features. It is possible that there is some bug somewhere. Decode
with a higher verbose level and look for the objecctive-function changes reported for the iVectors. (You re-trained the iVector
extractor on top of fbank features, right?). That would narrow down whether something is going wrong with the iVectors.我认为我从未使用fbank功能运行安装程序-确实没有意义,因为我们使用不带维度缩减功能的MFCC ,这只是fbank功能的线性变换版本。某处可能存在一些错误。
以更高的详细级别进行解码,并查找为iVectors报告的目标功能更改。(您在fbank功能之上重新训练了iVector提取器,对吗?)。这样可以缩小
iVectors是否出现问题。

 

If so, why the result from fbank and mfcc features are slightly difference.
And when I combined the result from those 2 systems, I got some improvement (based on my experiment before).

Decode with a higher verbose level

The objective function improvement from estimating the iVector looks correct, it is increase when we see more frames.

Do you think the problem is in graph HCLG.fst?

如果是这样,为什么fbank和mfcc功能的结果略有不同。
当我将这两个系统的结果相结合时,我得到了一些改进(基于之前的实验)。

以更高的详细级别进行解码

通过估计iVector的目标函数改进看起来是正确的, 当我们看到更多的帧时,目标函数的改进就增加了。

您是否认为问题出在图HCLG.fst中?

VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 65.8562
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 65.8981
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 66.0219
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 66.2964
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 66.6309
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 66.8939
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 67.1176
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 67.5152
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 68.459
VLOG[4] (online2-wav-nnet2-latgen-faster:GetIvector():ivector-extractor.cc:650) Objective function improvement from estimating the iVector (vs. default value) is 69.3028


Those objective function improvements are too large- they should be around 10. It could indicate a mismatch in the iVector extractor
(e.g. trained on wrong data? mismatch in cmvn?)
What were the objf improvements like in training? The averages should have been printed in the log.

这些目标函数的改进太大了,应该在10左右。这可能表明iVector提取器不匹配
(例如,对错误的数据进行了训练?cmvn中的不匹配?)objf的改进是什么样的训练?平均值应已打印在日志中。

I found the mistake, the problem is I used the wrong ivector extractor.我发现了错误,问题是我使用了错误的ivector提取器。

Thank you so much for your advice.

友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内