精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
服务方向
联系方式
The sample script you have for online dnn decoding is for single utterance decoding, it does not allow for long continuous audio stream to be decoded.
I saw that the API of the online decoder supports long streaming audio by enabling calls to finalize decoding and init decoding between utterances that are part of a continuous stream.
My question is did you not implement such a sample because a lack of time or because the API is not tested ?
Is there something I should know before implementing it myself?
在线dnn解码的示例脚本用于单话语解码,它不允许对长时间连续的音频流进行解码。
我看到在线解码器的API通过启用调用以在连续流的一部分话语之间完成解码和初始化解码来支持长流音频。
我的问题是您是否由于时间不足或未对API进行测试而实现了这样的示例? 在自己实施该软件之前,我应该知道些什么吗?
The sample script you have for online dnn decoding is for single utterance decoding, it does not allow for long continuous audio stream to be encoded.
I saw that the API of the online decoder supports long streaming audio by enabling calls to finalize decoding and init decoding between utterances that are part of a continuous stream.This was intended for multiple utterances of the same speaker, but would work for the scenario you mention also.
在线dnn解码所用的示例脚本用于单话语解码,它不允许对长时间连续的音频流进行编码。
我看到在线解码器的API通过允许调用在连续流的一部分话语之间完成解码和初始化解码来支持长流音频。这是针对同一位发言人的多种说话,但也适用于您提到的场景。
My question is did you not implement such a sample because a lack of time or because the API is not tested ?
Is there something I should know before implementing it myself?我的问题是您是否由于时间不足或未对API进行测试而实现了这样的示例?
在自己实施该软件之前,我应该知道些什么吗?The need never really came up. If you're talking about decoders like online2-wav-nnet2-latgen-threaded or -faster, what you say could certainly be done. For instance, you could use the existing endpointing code that's there; and when an endpoint is detected, you could output a lattice and then re-start decoding the same wav file from the point where you were.需求从未真正出现过。如果您谈论的是诸如
online2-wav-nnet2-latgen-threaded或-faster之类的解码器,那么您所说的肯定可以完成。例如,您可以使用现有的终结点代码 。当检测到端点时,您可以输出晶格,然后从您所在的位置重新开始解码相同的wav文件。
Incidentally, I have some changes to that (-threaded) decoder that I intend to commit soon, mostly in the internal code (not the main()), but they won't affect what you are doing. It's to enable down-weighting of silence in the iVector extraction (we found this was important in highly mismatched conditions), and it changes the number of threads from 3 to 2 for simplicity.顺便说一句,我对
即将打算提交的(线程)解码器进行了一些更改,主要是在内部代码(不是main())中进行了更改,但是它们不会影响您的工作。这是为了
在iVector提取中降低静音的权重(我们发现这在高度不匹配的条件下非常重要),并且它将线程数从3更改为2简单。
Take a look at https://github.com/alumae/kaldi-gstreamer-server. It can do decoding on continuous stream using online DNN models, outputs partial and final hypotheses via web-based API, does endpointing.它可以使用在线DNN模型对连续流进行解码,通过基于Web的API输出部分假设和最终假设,进行终结处理。