pnorm DNN中的说话人自适应训练Speaker Adapted Training

问

Can we do Speaker Adapted Training using p-norm DNN model which is trained by local/online/run_nnet2_ms.sh?

我们可以使用由local / online / run_nnet2_ms.sh训练的p-norm DNN模型进行说话人自适应训练吗？

答

The DNN is already trained in a speaker-adaptive fashion -- the speaker identity is captured in those ivectors you have to train before train the DNN.DNN已经以说话人自适应的方式进行训练-说话人身份已在训练DNN之前要训练的ivector中捕获。

答

Generally speaking, not this script, Can we do Speaker Adapted Training using DNN model?一般来说，不用此脚本，我们可以使用DNN模型进行说话人适应性培训吗？

答

You can definitely do speaker-adaptive training on DNN -- for example, as it is demonstrated in that script. :)
Perhaps you should explain in more detail what you are after.您绝对可以在DNN上进行说话人自适应培训-例如，如该脚本所示。:)
也许您应该更详细地说明您所想要的。

问

Let me explain, we already have a DNN model which is trained by local/online/run_nnet2_ms.sh, and Now, when decoding, we want to improve the recognition rate for specific people by Speaker Adapted, Any solutions in kaldi?让我解释一下，我们已经有一个由local / online / run_nnet2_ms.sh训练的DNN模型，现在，在解码时，我们希望通过Speaker Adapted提高特定人的识别率，kaldi有解决方案吗？

答

It already does adaptation so there is nothing more you can do, other than than keeping the adaptation history (there is a class in the code,
something like SpeakerAdaptationState). You could experiment with downweighting silence (see the script), and with the
--max-remembered-frames and --max-count options to see if tuning them helps. Some of these options are in the iVector extraction config.它已经进行了适配，因此除了保留适配历史（代码中有一个类，如SpeakerAdaptationState之类）之外，您无能为力。您可以尝试降低静音（请参见脚本），并使用 --max-remembered-frames和--max-count选项来查看调整它们是否有帮助。其中一些选项位于iVector提取配置中。

答

I guess you could try discriminative training on the top of the network. I think there is an example in wsj or swb egs.我想您可以尝试在网络顶部进行有区别的培训。我认为在wsj或swb egs中有一个示例。

友情链接

汕头招聘网 | 山东招聘网 | 郑州教育培训 | 软件下载