精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
服务方向
When I traced the online2/online-feature-pipeline.cc source code, I can find the feature extraction procedure in gmm decoder like this (assume no pitch)
OnlineMfcc -> OnlineCmvn -> OnlineSpliceFrames -> OnlineTransform
but in the online2/online-nnet2-feature-pipeline.cc source code, The feature extraction procedure in dnn decoder like this (assume no pitch, no ivector)
OnlineMfcc
My questions are
1.Why not apply online cmvn into the feature extraction procedure in the dnn decoder?
2.How to apply online cmvn into the feature extraction procedure in the dnn decoder?
I ever tried to apply cmvn into the feature extraction in the dnn decoder like gmm style, but this will reduce the accuracy rate.
当我跟踪online2 / online-feature-pipeline.cc源代码时,我可以在gmm解码器中找到这样的特征提取过程(假设没有补丁)
OnlineMfcc-> OnlineCmvn-> OnlineSpliceFrames-> OnlineTransform
但是在online2 / online-nnet2-feature-pipeline.cc源代码中,dnn解码器中的特征提取过程是这样的(假定没有音调,没有ivector)
OnlineMfcc
我的问题是
1. 为什么不将在线cmvn应用于dnn解码器中的特征提取过程?
2.如何将在线cmvn应用于dnn解码器中的特征提取过程?
我曾经尝试过将cmvn应用于gmm样式的dnn解码器中的特征提取,但这会降低准确率。
Because there is i-vector adaptation going on, the idea is for the i-vector to learn any offset of the features, so you don't have to
apply that normalization to the features. Also the test condition needs to be matched to training, so to change this you'd have to
change it in training too (and the training-time feature extraction is done at the script level).因为正在进行i向量调整,所以我的想法是让i向量学习特征的任何偏移量,因此您不必将该归一化应用于特征。同样,测试条件也需要与训练相匹配,因此要更改此条件,您也必须在训练中进行更改(并且训练时特征提取是在脚本级别进行的)。
Our condition is that the environment condition in training and test speech is different(channel), so that we got much better LVCSR performance using CMVN than that without using CMVN feature extraction.
You mean that we do not need to apply CMVN in DNN decoding if we have a i-vector in the feature? Does it means that LVCSR performance is comparable if we use additional i-vector and CMVN for feature extraction?
我们的条件是训练和测试语音的环境条件不同(信道),因此使用CMVN的LVCSR性能要比不使用CMVN特征提取的LVCSR性能好得多。
您的意思是,如果特征中包含i-vector,则无需在DNN解码中应用CMVN?如果我们使用附加的i-vector和CMVN进行特征提取,是否意味着LVCSR性能可比?
The i-vector method typically works well, but it doesn't always work
well if there is a very big train/test mismatch. We found, for
instance, that our models aren't always robust to differences in
volume because training data tends to be carefully volume normalized.
In future we'll do volume perturbation during training.
i-vector方法通常可以很好地工作,但是如果训练/测试不匹配很大,那么它并不总是能很好地工作。
例如,我们发现我们的模型并不总是对
音量差异具有鲁棒性,因为训练数据倾向于对音量进行仔细的归一化。将来,我们将在训练期间进行音量微扰。
Sorry, I want to ask another question.
You use ivector in dnn and use cmvn in gmm.
Why not use ivector in gmm instead of using cmvn in gmm?Sorry,我想问另一个问题。
您在dnn中使用ivector,在gmm中使用cmvn。
为什么不在gmm中使用ivector反而是在gmm中使用cmvn?
GMM classifier is not very good to combine inputs of different type and classify them. It can not learn complex dependency between ivector values and features. It can only approximate well a simple convex distribution and even that task is somewhat complex because of GMM inefficiency. Deep neural networks are way better classifiers of complex functions, they can classify non-convex objects and even learn complex dependencies between features. That's why ivectors can be used within DNN framework in order to augment features.GMM分类器不能很好地组合不同类型的输入并对它们进行分类。它无法学习ivector值和要素之间的复杂依赖关系。由于GMM效率低下,它只能很好地近似简单的凸分布,甚至该任务也有些复杂。
深度神经网络是复杂函数的更好分类器,它们可以对非凸对象进行分类,甚至可以学习特征之间的复杂依赖关系。这就是为什么可以在DNN框架中使用ivector来增强功能的原因。