锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / QR分解在nnet2训练中失败(交叉熵或判别式)

服务方向

人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发

联系方式

固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

QR分解在nnet2训练中失败(交叉熵或判别式)

I use nnet2 to train ReLU-DNN, Qrstep often failed because “failed: KALDI_ISFINITE(x)”.
In cross entropy, I set the learning rate lower(0.001) than sigmoid(0.01), it didn't happen again. But when I did discriminative training(--num-jobs-nnet 4), I set learning rate 0.000006(a very very small value, i think), but when nnet training pass 70, almost all the 4 jobs failed. 我使用nnet2训练ReLU-DNN,Qrstep经常失败,因为“失败:KALDI_ISFINITE(x)”。在交叉熵中,我将学习率设置为lower(0.001)而不是sigmoid(0.01),这种情况不再发生。但是当我进行有区别的培训(--num-jobs-nnet 4)时,我将学习率设置为0.000006(我认为这是一个非常小的值),但是当nnet培训通过70时,几乎所有4个工作都失败了。

Matrix operations may have "Numerical stability" problem. 矩阵运算可能会出现“数值稳定性”问题。

I haven't gone deep into the QR code, just report this. 我还没有深入研究QR代码码,只需报告一下即可。

Fail log: 失败日志:

nnet-combine-egs-discriminative ark:exp/fbank40_h5t2048_ReLU_degs/degs.$[((2-1+(90*4))%1071)+1].ark ark:- | nnet-train-discriminative-simple --silence-phones=1 --criterion=smbr --drop-frames=false --one-silence-class=true --boost=0.0 --acoustic-scale=0.1 --gpu-id=1 exp/fbank40_h5t2048_ReLU_NP_smbr_0.000006/90.mdl ark:- exp/fbank40_h5t2048_ReLU_NP_smbr_0.000006/91.2.mdl
Started at Tue Apr 7 10:36:43 CST 2015
nnet-combine-egs-discriminative ark:exp/fbank40_h5t2048_ReLU_degs/degs.362.ark ark:-
nnet-train-discriminative-simple --silence-phones=1 --criterion=smbr --drop-frames=false --one-silence-class=true --boost=0.0 --acoustic-scale=0.1 --gpu-id=1 exp/fbank40_h5t2048_ReLU_NP_smbr_0.000006/90.mdl ark:- exp/fbank40_h5t2048_ReLU_NP_smbr_0.000006/91.2.mdl
...

LOG (nnet-train-discriminative-simple:GetScalingFactor():nnet-component.cc:1911) Limiting step size using scaling factor 0.836495, for component index 3
LOG (nnet-train-discriminative-simple:GetScalingFactor():nnet-component.cc:1911) Limiting step size using scaling factor 0.75548, for component index 3
LOG (nnet-train-discriminative-simple:GetScalingFactor():nnet-component.cc:1911) Limiting step size using scaling factor 0.660106, for component index 3
LOG (nnet-train-discriminative-simple:GetScalingFactor():nnet-component.cc:1911) Limiting step size using scaling factor 0.623963, for component index 5
LOG (nnet-train-discriminative-simple:GetScalingFactor():nnet-component.cc:1911) Limiting step size using scaling factor 0.226656, for component index 3
LOG (nnet-train-discriminative-simple:GetScalingFactor():nnet-component.cc:1911) Limiting step size using scaling factor 0.965409, for component index 13
LOG (nnet-train-discriminative-simple:GetScalingFactor():nnet-component.cc:1911) Limiting step size using scaling factor 0.63207, for component index 3
LOG (nnet-train-discriminative-simple:GetScalingFactor():nnet-component.cc:1911) Limiting step size using scaling factor 0.960904, for component index 3
LOG (nnet-train-discriminative-simple:GetScalingFactor():nnet-component.cc:1911) Limiting step size using scaling factor 0.632257, for component index 3
LOG (nnet-train-discriminative-simple:GetScalingFactor():nnet-component.cc:1911) Limiting step size using scaling factor 0.979007, for component index 13
KALDI_ASSERT: at nnet-train-discriminative-simple:QrStep:qr.cc:265, failed: KALDI_ISFINITE(x)
Stack trace is:
kaldi::KaldiGetStackTrace()
kaldi::KaldiAssertFailure_(char const, char const, int, char const)
void kaldi::QrStep<float>(int, float, float, kaldi::MatrixBase<float>)
void kaldi::QrInternal<float>(int, float, float, kaldi::MatrixBase<float>)
kaldi::SpMatrix<float>::Qr(kaldi::MatrixBase<float>)
.
.
.
kaldi::nnet2::NnetDiscriminativeUpdater::Update()
kaldi::nnet2::NnetDiscriminativeUpdate(kaldi::nnet2::AmNnet const&, kaldi::TransitionModel const&, kaldi::nnet2::NnetDiscriminativeUpdateOptions const&, kaldi::nnet2::DiscriminativeNnetExample const&, kaldi::nnet2::Nnet, kaldi::nnet2::NnetDiscriminativeStats)
nnet-train-discriminative-simple(main+0x50f) [0x65c03c]
/lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf5) [0x7fe77c59eec5]
nnet-train-discriminative-simple() [0x65ba69]
WARNING (nnet-train-discriminative-simple:~Mutex():kaldi-mutex.cc:45) Error destroying pthread mutex; ignoring it as it could be a known issue that affects Haswell processors, see https://sourceware.org/bugzilla/show_bug.cgi?id=16657 If your processor is not Haswell and you see this message, it could be a bug in Kaldi. However it could be that multi-threaded code terminated messily.
KALDI_ASSERT: at nnet-train-discriminative-simple:QrStep:qr.cc:265, failed: KALDI_ISFINITE(x)
Stack trace is:
kaldi::KaldiGetStackTrace()
kaldi::KaldiAssertFailure_(char const, char const, int, char const)
void kaldi::QrStep<float>(int, float, float, kaldi::MatrixBase<float>)
void kaldi::QrInternal<float>(int, float, float, kaldi::MatrixBase<float>)
kaldi::SpMatrix<float>::Qr(kaldi::MatrixBase<float>)
.
.
.
kaldi::nnet2::NnetDiscriminativeUpdater::Update()
kaldi::nnet2::NnetDiscriminativeUpdate(kaldi::nnet2::AmNnet const&, kaldi::TransitionModel const&, kaldi::nnet2::NnetDiscriminativeUpdateOptions const&, kaldi::nnet2::DiscriminativeNnetExample const&, kaldi::nnet2::Nnet, kaldi::nnet2::NnetDiscriminativeStats)
nnet-train-discriminative-simple(main+0x50f) [0x65c03c]
/lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf5) [0x7fe77c59eec5]
nnet-train-discriminative-simple() [0x65ba69]

Accounting: time=4 threads=1
Ended (code 255) at Tue Apr 7 10:36:47 CST 2015, elapsed time 4 seconds

 

 

This isn't a problem with matrix operations, it's a problem with instability of stochastic gradient descent when using nonlinearities with
unbounded outputs such as ReLU. We normally solve this by putting a NormalizeComponent after each ReLU component. 当使用具有无穷大输出的非线性(例如ReLU)时,随机梯度下降的不稳定性。通常,我们通过在每个ReLU组件之后放置一个NormalizeComponent来解决此问题。

 

 

Actually, if your version of Kaldi is not up to date it's possible that it is a problem with QR. But it has been stable for a long time now. 实际上,如果您的Kaldi版本不是最新的,则QR可能有问题。但是它已经稳定了很长时间了。

 

 

0.000001 learning rate is ok, 0.000002 training failed. It is tricky.
0.000001学习率还可以,0.000002训练失败。这很棘手。

友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内