精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
Can we do Speaker Adapted Training using p-norm DNN model which is trained by local/online/run_nnet2_ms.sh?
我们可以使用由local / online / run_nnet2_ms.sh训练的p-norm DNN模型进行说话人自适应训练吗?
The DNN is already trained in a speaker-adaptive fashion -- the speaker identity is captured in those ivectors you have to train before train the DNN.DNN已经以说话人自适应的方式进行训练-说话人身份已在训练DNN之前要训练的ivector中捕获。
Generally speaking, not this script, Can we do Speaker Adapted Training using DNN model?一般来说,不用此脚本,我们可以使用DNN模型进行说话人适应性培训吗?
You can definitely do speaker-adaptive training on DNN -- for example, as
it is demonstrated in that script. :)
Perhaps you should explain in more detail what you are after.您绝对可以在DNN上进行说话人自适应培训-例如,如该脚本所示。:)
也许您应该更详细地说明您所想要的。
Let me explain, we already have a DNN model which is trained by local/online/run_nnet2_ms.sh, and Now, when decoding, we want to improve the recognition rate for specific people by Speaker Adapted, Any solutions in kaldi?让我解释一下,我们已经有一个由local / online / run_nnet2_ms.sh训练的DNN模型,现在,在解码时,我们希望通过Speaker Adapted提高特定人的识别率,kaldi有解决方案吗?
It already does adaptation so there is nothing more you can do, other than
than keeping the adaptation history (there is a class in the code,
something like SpeakerAdaptationState). You could experiment with
downweighting silence (see the script), and with the
--max-remembered-frames and --max-count options to see if tuning them
helps. Some of these options are in the iVector extraction config.它已经进行了适配,因此
除了保留适配历史(代码中有一个类,
如SpeakerAdaptationState之类)之外,您无能为力。您可以尝试降低静音(请参见脚本),并使用
--max-remembered-frames和--max-count选项来查看调整它们是否有帮助。其中一些选项位于iVector提取配置中。
I guess you could try discriminative training on the top of the network. I think there is an example in wsj or swb egs.我想您可以尝试在网络顶部进行有区别的培训。我认为在wsj或swb egs中有一个示例。