锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / 怎样逐帧训练DNN

服务方向

人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发

联系方式

固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

怎样逐帧训练DNN


I wish to train a DNN frame-by-frame by reading each MFCC at 39 I/P nodes, and it corresonding label (an array of fewer values) at the output node.

Please instruct which tools in Kaldi will help me do that. I went through the list of tools provided in src/nnet2bin/ directory and could see that nnet-init and nnet-train-simple could help, but am still not clear with the exact process to follow.

我希望通过在39个I / P节点上读取每个MFCC来逐帧训练DNN,并将其与输出节点上的标签(值较少的数组)相对应。

请说明Kaldi中的哪些工具可以帮助我做到这一点。我浏览了src / nnet2bin /目录中提供的工具列表,可以看到nnet-init和nnet-train-simple可以提供帮助,但是仍然不清楚要遵循的确切过程。

You need to look at the example scripts. There are two separate setups each with their own scripts:您需要查看示例脚本。有两个单独的设置,每个设置都有自己的脚本:
egs//s5/local/run_dnn.sh, and
egs/
/s5/local/run_nnet2.sh

 

Thanks for the prompt reply. Following the reply, I went to wsj/s5/local/run_dnn.sh, and found that "nnet-train-frmshuff" is the tool used in the script for training the DNN, which requires to be called with the following args:
nnet-train-frmshuff [options] (feature-rspecifier) (targets-rspecifier) (model-in) (model-out)

I wish to feed the MFCC values frame after frame. For that, I created a ark file "file.ark" with the 1st col as key and later cols. containing MFCC values.

感谢您的及时答复。收到答复后,我转到wsj / s5 / local / run_dnn.sh,发现“ nnet-train-frmshuff”是脚本中用于训练DNN的工具,需要使用以下args进行调用:
我希望逐帧输入MFCC值。为此,我创建了一个方舟文件“ file.ark”,其中第一个col为键,后来为cols。包含MFCC值。

$ head -2 file.ark
frame_1 34.805 -29.978 -4.929 -1.611 4.289 -5.250 1.293 -0.473 -6.691 -3.241
frame_2 35.042 -28.817 -4.269 -8.063 -3.532 1.059 2.302 1.211 -7.967 -5.489

I wanted to know,
i) Can I feed this file directly as <feature-rspecifier> to above tool?
ii) If not, how to write a .scp file for the above file.ark?

我想知道,
我)我可以直接将文件作为<feature-rspecifier>馈入上述工具吗?
ii)如果没有,如何为上述file.ark写入.scp文件?

I don't think you are going in the right direction here. The neural net training recipes use temporal context for the features so they can't accept frame by frame input. And they require the label information. The right way to do this is to prepare your data as described in the "data
preparation" part of the documentation at kaldi.sf.net, train your baseline GMM-based system (e.g. following one of the run.sh sequences in one of the example directories) and then do DNN training from there.
If you want to set it up as a standard machine-learning task with independent labels and training examples, and you just want to classify
frames, like people sometimes do on TIMIT, I suggest you use some other setup. The purpose of Kaldi is to train real speech recognition systems,
it's not really constructed as a generic machine-learning research tool.

我认为您的方向不正确。神经网络训练脚本使用时间上下文作为特征,因此它们不能接受逐帧输入。并且他们需要标签信息。正确的方法是按照kaldi.sf.net文档的“数据准备”部分中所述准备数据,训练基于GMM的基线系统(例如,在其中一个运行run.sh序列之一)在例如目录),然后做从那里DNN训练。
如果您想将其设置为带有独立标签和训练示例的标准机器学习任务,并且只想对框架进行分类(就像人们有时在TIMIT上所做的那样),建议您使用其他方法
建立。Kaldi的目的是训练真实的语音识别系统,它并不是真正地构建为通用的机器学习研究工具。

友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内