精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
服务方向
联系方式
I am using nnet2/train_pnorm_fast.sh to train DNNs. However, I always observe the better performance (lower WER) when I have only one hidden layer. In my first experiments I used only 10 hours of data, so I thought maybe that's because I don't have enough data to train NNs. But, I also observe a consistent reduction in accuracy as I increased the number of hidden layers from 1 to 5 (almost 2% reduction per layer) when I have about 40 hours of data.
This happens both in the monolingual and multilingual cases....
here is a command sample I use; which is mainly taken from run_online_decoding_nnet2_wsj.sh
我正在使用nnet2 / train_pnorm_fast.sh训练DNN。但是,当我只有一个隐藏层时,我总是观察到更好的性能(较低的WER)。在我的第一个实验中,我只使用了10个小时的数据,所以我想也许是因为我没有足够的数据来训练NN。但是,当我拥有大约40个小时的数据时,随着隐藏层的数量从1增加到5(每层减少近2%),我还会观察到精度的持续下降。
在单语和多语情况下都会发生这种情况
。这主要来自run_online_decoding_nnet2_wsj.sh
steps/nnet2/train_pnorm_fast.sh --stage $train_stage \ --splice-width 7 \ --feat-type raw \ --cmvn-opts "--norm-means=false --norm-vars=false" \ --num-threads "$num_threads" \ --minibatch-size "$minibatch_size" \ --parallel-opts "$parallel_opts" \ --num-jobs-nnet 1 \ --num-epochs-extra 10 --add-layers-period 1 \ --num-hidden-layers $nl \ --mix-up 4000 \ --initial-learning-rate 0.02 --final-learning-rate 0.004 \ --cmd "$decode_cmd" \ --pnorm-input-dim 3000 \ --pnorm-output-dim 00 \ $sodir/data_mfcc/train $sodir/data_mfcc/lang $sodir/exp_mfcc/tri1_ali $dir || exit 1;
Anyone has any suggestion why this happens? does it have anything to do with the parameters?
I saw number of papers reporting 4 or 5 hidden layers to be a good choice for more or less the same experiments I do...
有人对此有何建议?与参数有关系吗?
我看到许多报告4或5个隐藏层的论文是与我做的相同实验或类似实验的不错选择。
The learning rate might be too large (I'm using something like --initial-learning-rate 0.008 and --final-learning-rate 0.0008. ) You will have to experiment with that. Also "--pnorm-output-dim 00 " looks suspicious.学习率可能太大(我使用的是 --initial-learning-rate
0.008和--final-learning-rate 0.0008。)您将不得不尝试一下。另外,“-pnorm-output-dim 00”看起来也很可疑。
You could also have too many parameters: try 2000/200 instead of 3000/300.
Also, the currently recommended script is train_pnorm_simple2.sh.
This uses less disk space and is a bit faster than train_pnorm_fast.sh
您也可能有太多参数:尝试使用2000/200而不是3000/300。
另外,当前推荐的脚本是train_pnorm_simple2.sh。 这将使用更少的磁盘空间,比train_pnorm_fast.sh快
Also- in order to verify that what's going on is overtraining, you can
check the final train/valid objective functions:
grep LOG exp/foo/log/compute_prob_*.final.log
If overtraining is what is happening, then as you add layers you
should see the training objective function increasing and the
validation objective function decreasing.另外-为了验证发生了过度训练,您可以检查最终的训练/有效目标函数:
grep LOG exp / foo / log / compute_prob _ *。final.log
如果发生过度训练,则在添加图层时您应该看到训练目标函数增加而验证目标函数减少。