DNN的层个数和命令参数

问

I am using nnet2/train_pnorm_fast.sh to train DNNs. However, I always observe the better performance (lower WER) when I have only one hidden layer. In my first experiments I used only 10 hours of data, so I thought maybe that's because I don't have enough data to train NNs. But, I also observe a consistent reduction in accuracy as I increased the number of hidden layers from 1 to 5 (almost 2% reduction per layer) when I have about 40 hours of data.
This happens both in the monolingual and multilingual cases....
here is a command sample I use; which is mainly taken from run_online_decoding_nnet2_wsj.sh

我正在使用nnet2 / train_pnorm_fast.sh训练DNN。但是，当我只有一个隐藏层时，我总是观察到更好的性能（较低的WER）。在我的第一个实验中，我只使用了10个小时的数据，所以我想也许是因为我没有足够的数据来训练NN。但是，当我拥有大约40个小时的数据时，随着隐藏层的数量从1增加到5（每层减少近2％），我还会观察到精度的持续下降。
在单语和多语情况下都会发生这种情况。这主要来自run_online_decoding_nnet2_wsj.sh

steps/nnet2/train_pnorm_fast.sh --stage $train_stage \ --splice-width 7 \ --feat-type raw \ --cmvn-opts "--norm-means=false --norm-vars=false" \ --num-threads "$num_threads" \ --minibatch-size "$minibatch_size" \ --parallel-opts "$parallel_opts" \ --num-jobs-nnet 1 \ --num-epochs-extra 10 --add-layers-period 1 \ --num-hidden-layers $nl \ --mix-up 4000 \ --initial-learning-rate 0.02 --final-learning-rate 0.004 \ --cmd "$decode_cmd" \ --pnorm-input-dim 3000 \ --pnorm-output-dim 00 \ $sodir/data_mfcc/train $sodir/data_mfcc/lang $sodir/exp_mfcc/tri1_ali $dir || exit 1;

Anyone has any suggestion why this happens? does it have anything to do with the parameters?
I saw number of papers reporting 4 or 5 hidden layers to be a good choice for more or less the same experiments I do...

有人对此有何建议？与参数有关系吗？
我看到许多报告4或5个隐藏层的论文是与我做的相同实验或类似实验的不错选择。

答

The learning rate might be too large (I'm using something like --initial-learning-rate 0.008 and --final-learning-rate 0.0008. ) You will have to experiment with that. Also "--pnorm-output-dim 00 " looks suspicious.学习率可能太大（我使用的是 --initial-learning-rate
0.008和--final-learning-rate 0.0008。）您将不得不尝试一下。另外，“-pnorm-output-dim 00”看起来也很可疑。

答

You could also have too many parameters: try 2000/200 instead of 3000/300.
Also, the currently recommended script is train_pnorm_simple2.sh. This uses less disk space and is a bit faster than train_pnorm_fast.sh

您也可能有太多参数：尝试使用2000/200而不是3000/300。
另外，当前推荐的脚本是train_pnorm_simple2.sh。这将使用更少的磁盘空间，比train_pnorm_fast.sh快

答

Also- in order to verify that what's going on is overtraining, you can check the final train/valid objective functions:
grep LOG exp/foo/log/compute_prob_*.final.log
If overtraining is what is happening, then as you add layers you should see the training objective function increasing and the
validation objective function decreasing.另外-为了验证发生了过度训练，您可以检查最终的训练/有效目标函数：
grep LOG exp / foo / log / compute_prob _ *。final.log
如果发生过度训练，则在添加图层时您应该看到训练目标函数增加而验证目标函数减少。

友情链接

汕头招聘网 | 山东招聘网 | 郑州教育培训 | 软件下载