锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / kaldi的nnet3在线识别参数
服务方向
人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发
联系方式
固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

锐英源精品原创,禁止全文或局部转载,禁止任何形式的非法使用,侵权必究


kaldi的nnet3在线识别参数


背景

kladi的nnet3模型是常用模型,用nnet3进行在线识别需要配置一些参数,参数值和训练特征文件配合好才能有好的识别效果,但是网上很少有这方面的说明,我辛苦找到了一个网页,里面是md文件内容,但在kaldi源代码解压目录下,没搜索到这个文件 ,这里分享给大家,希望能够共同进步。

中文译文

./online2-tcp-nnet3-decode-faster ./online2-tcp-nnet3-decode-faster
Reads in audio from a network socket and performs online decoding with neural nets (nnet3 setup), with iVector-based speaker adaptation and endpointing.从网络套接字读取音频并使用神经网络(nnet3 设置)执行在线解码,并使用基于 iVector 的扬声器自适应和端点。
Note: some configuration values and inputs are set via config files whose filenames are passed as options注意:一些配置值和输入是通过配置文件设置的,其文件名作为选项传递
Usage: online2-tcp-nnet3-decode-faster [options] <nnet3-in> <fst-in> <word-symbol-table>用法:online2-tcp-nnet3-decode-faster [options] \u003cnnet3-in> \u003cfst-in> \u003cword-symbol-table>
Options:选项:
--acoustic-scale : Scaling factor for acoustic log-likelihoods (float, default = 0.1) --acoustic-scale :声学对数似然的缩放因子(浮点数,默认值= 0.1)
--add-pitch : Append pitch features to raw MFCC/PLP/filterbank features [but not for iVector extraction] (bool,default = false) --add-pitch : 将音高特征附加到原始 MFCC/PLP/filterbank 特征 [但不适用于 iVector 提取] (bool,default= false)
--beam : Decoding beam. Larger->slower, more accurate.(float, default = 16) --beam : 解码光束。更大->更慢,更准确。(浮动,默认= 16)
--beam-delta : Increment used in decoding-- this parameter is obscure and relates to a speedup in the way the max-active constraint is applied. Larger is more accurate. (float, default = 0.5) --beam-delta :解码中使用的增量——这个参数是模糊的,并且与应用 max-active 约束的方式有关。越大越准确。 (浮动,默认值= 0.5)
--chunk-length : Length of chunk size in seconds, that we process. (float, default = 0.18) --chunk-length :我们处理的块大小的长度(以秒为单位)。 (浮动,默认值= 0.18)
--computation.debug : If true, turn on debug for the neural net computation (very verbose!) Will be turned on regardless if --verbose >= 5 (bool, default = false) --computation.debug : 如果为 true,则打开神经网络计算的调试(非常详细!)无论 --verbose >= 5(布尔值,默认值= false),都将打开
--debug-computation : If true, turn on debug for the actual computation (very verbose!) (bool, default = false) --debug-computation : 如果为 true,则为实际计算打开调试(非常详细!)(布尔值,默认值= false)
--delta : Tolerance used in determinization (float,default = 0.000976562) --delta :确定性中使用的容差(浮点数,默认值= 0.000976562)
--determinize-lattice : If true, determinize the lattice(lattice-determinization, keeping only best pdf-sequence for each word-sequence). (bool,default = true) --determinize-lattice :如果为真,则确定点阵(点阵确定化,仅保留每个单词序列的最佳 pdf 序列)。 (布尔,默认= 真)
--endpoint.rule1.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float,default = inf) --endpoint.rule1.max-relative-cost :这个端点规则要求最终状态的相对成本 \u003c= 这个值(描述最终状态的概率有多好)。 (浮点数,默认值= inf)
--endpoint.rule1.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float,default = 5) --endpoint.rule1.min-trailing-silence :此终结点规则要求尾随静音的持续时间(以秒为单位)>= 此值。 (浮点数,默认值= 5)
--endpoint.rule1.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0) --endpoint.rule1.min-utterance-length :此终结点规则要求 utterance-length(以秒为单位)>= 此值。 (浮点数,默认值= 0)
--endpoint.rule1.must-contain-nonsilence : If true, for this endpointing rule to apply there mustbe nonsilence in the best-path traceback. (bool,default = false) --endpoint.rule1.must-contain-nonsilence :如果为 true,则要应用此终结点规则,最佳路径回溯中必须有非沉默。 (布尔,默认= 假)
--endpoint.rule2.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float,default = 2) --endpoint.rule2.max-relative-cost :这个端点规则要求最终状态的相对成本 \u003c= 这个值(描述最终状态的概率有多好)。 (浮点数,默认值= 2)
--endpoint.rule2.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float, default = 0.5) --endpoint.rule2.min-trailing-silence :此终结点规则要求尾随静音的持续时间(以秒为单位)>= 此值。 (浮动,默认值= 0.5)
--endpoint.rule2.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0) --endpoint.rule2.min-utterance-length :此终结点规则要求 utterance-length(以秒为单位)>= 此值。 (浮点数,默认值= 0)
--endpoint.rule2.must-contain-nonsilence : If true, for this endpointing rule to apply there mustbe nonsilence in the best-path traceback. (bool,default = true) --endpoint.rule2.must-contain-nonsilence :如果为 true,则要应用此终结点规则,最佳路径回溯中必须有非沉默。 (布尔,默认= 真)
--endpoint.rule3.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float,default = 8) --endpoint.rule3.max-relative-cost :这个端点规则要求最终状态的相对成本 \u003c= 这个值(描述最终状态的概率有多好)。 (浮点数,默认值= 8)
--endpoint.rule3.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float,default = 1) --endpoint.rule3.min-trailing-silence :此终结点规则要求尾随静音的持续时间(以秒为单位)>= 此值。 (浮点数,默认值= 1)
--endpoint.rule3.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0) --endpoint.rule3.min-utterance-length :此终结点规则要求 utterance-length(以秒为单位)>= 此值。 (浮点数,默认值= 0)
--endpoint.rule3.must-contain-nonsilence : If true, for this endpointing rule to apply there mustbe nonsilence in the best-path traceback. (bool,default = true) --endpoint.rule3.must-contain-nonsilence :如果为 true,则要应用此终结点规则,最佳路径回溯中必须有非沉默。 (布尔,默认= 真)
--endpoint.rule4.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float,default = inf) --endpoint.rule4.max-relative-cost :这个端点规则要求最终状态的相对成本 \u003c= 这个值(描述最终状态的概率有多好)。 (浮点数,默认值= inf)
--endpoint.rule4.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float,default = 2) --endpoint.rule4.min-trailing-silence :此终结点规则要求尾随静音的持续时间(以秒为单位)>= 此值。 (浮点数,默认值= 2)
--endpoint.rule4.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 0) --endpoint.rule4.min-utterance-length :此终结点规则要求 utterance-length(以秒为单位)>= 此值。 (浮点数,默认值= 0)
--endpoint.rule4.must-contain-nonsilence : If true, for this endpointing rule to apply there mustbe nonsilence in the best-path traceback. (bool,default = true) --endpoint.rule4.must-contain-nonsilence :如果为 true,则要应用此终结点规则,最佳路径回溯中必须有非沉默。 (布尔,默认= 真)
--endpoint.rule5.max-relative-cost : This endpointing rule requires relative-cost of final-states to be <= this value (describes how good the probability of final-states is). (float,default = inf) --endpoint.rule5.max-relative-cost :这个端点规则要求最终状态的相对成本 \u003c= 这个值(描述最终状态的概率有多好)。 (浮点数,默认值= inf)
--endpoint.rule5.min-trailing-silence : This endpointing rule requires duration of trailing silence(in seconds) to be >= this value. (float,default = 0) --endpoint.rule5.min-trailing-silence :此终结点规则要求尾随静音的持续时间(以秒为单位)>= 此值。 (浮点数,默认值= 0)
--endpoint.rule5.min-utterance-length : This endpointing rule requires utterance-length (in seconds) to be >= this value. (float, default = 20) --endpoint.rule5.min-utterance-length :此终结点规则要求 utterance-length(以秒为单位)>= 此值。 (浮动,默认值= 20)
--endpoint.rule5.must-contain-nonsilence : If true, for this endpointing rule to apply there mustbe nonsilence in the best-path traceback. (bool,default = false) --endpoint.rule5.must-contain-nonsilence :如果为 true,则要应用此终结点规则,最佳路径回溯中必须有非沉默。 (布尔,默认= 假)
--endpoint.silence-phones : List of phones that are considered to be silence phones by the endpointing code. (string, default = "") --endpoint.silence-phones :被端点代码视为静音音素的音素列表。 (字符串,默认= \"\")
--extra-left-context-initial : Extra left context to use at the first frame of an utterance (note: this will just consist of repeats of the first frame, and should not usually be necessary. (int, default = 0) --extra-left-context-initial :在话语的第一帧使用的额外左上下文(注意:这将只包含第一帧的重复,通常不需要。(整数,默认值= 0)
--fbank-config : Configuration file for filterbank features (e.g.conf/fbank.conf) (string, default = "") --fbank-config : filterbank 特性的配置文件 (e.g.conf/fbank.conf) (string, default= \"\")
--feature-type : Base feature type [mfcc, plp, fbank] (string,default = "mfcc") --feature-type : 基本特征类型 [mfcc, plp, fbank] (string,default= \"mfcc\")
--frame-subsampling-factor : Required if the frame-rate of the output (e.g.in 'chain' models) is less than the frame-rate of the original alignment. (int, default = 1) --frame-subsampling-factor :如果输出的帧速率(例如在“链”模型中)小于原始对齐的帧速率,则需要。 (整数,默认值= 1)
--frames-per-chunk : Number of frames in each chunk that is separately evaluated by the neural net.Measured before any subsampling, if the --frame-subsampling-factor options is used (i.e. counts input frames. This is only advisory (may be rounded up if needed. (int, default = 20) --frames-per-chunk :每个块中由神经网络单独评估的帧数。在任何子采样之前测量,如果使用 --frame-subsampling-factor 选项(即计算输入帧。这只是建议(如果需要,可以四舍五入。(整数,默认值= 20)
--hash-ratio : Setting used in decoder to control hash behavior (float, default = 2) --hash-ratio :解码器中用于控制哈希行为的设置(浮点数,默认值= 2)
--ivector-extraction-config : Configuration file for online iVector extraction, see class OnlineIvectorExtractionConfig in the code (string, default = "") --ivector-extraction-config : 在线 iVector 提取的配置文件,见代码中的类 OnlineIvectorExtractionConfig (string, default= \"\")
--ivector-silence-weighting.max-state-duration : (RE weighting in iVector estimation for online decoding) Maximum allowed duration of a single transition-id; runs with durations longer than this will be weighted down to the silence-weight. (float,default = -1) --ivector-silence-weighting.max-state-duration :(在线解码的 iVector 估计中的 RE 加权)单个转换 ID 的最大允许持续时间;持续时间比这更长的运行将被加权到静音权重。 (浮动,默认值= -1)
--ivector-silence-weighting.silence-phones : (RE weighting in iVector estimation for online decoding) List of integer ids of silence phones, separated by colons (or commas). Data that (according to the traceback of the decoder) corresponds to these phones will be downweighted by --silence-weight. (string, default = "") --ivector-silence-weighting.silence-phones :(在线解码的 iVector 估计中的 RE 加权)静音音素的整数 id 列表,用冒号(或逗号)分隔。与这些手机对应的数据(根据解码器的回溯)将通过 --silence-weight 进行减重。 (字符串,默认= \"\")
--ivector-silence-weighting.silence-weight : (RE weighting in iVector estimation for online decoding) Weighting factor for frames that the decoder trace-back identifies as silence; onlyrelevant if the --silence-phones option is set. (float, default = 1) --ivector-silence-weighting.silence-weight :(在线解码的 iVector 估计中的 RE 加权)解码器回溯识别为静音的帧的加权因子;只有在设置了 --silence-phones 选项时才相关。 (浮动,默认值= 1)
--lattice-beam : Lattice generation beam. Larger->slower, and deeper lattices (float, default = 10) --lattice-beam : 晶格生成光束。更大->更慢,更深的格子(浮动,默认= 10)
--max-active : Decoder max active states. Larger->slower; more accurate (int, default = 2147483647) --max-active :解码器最大活动状态。更大->更慢;更准确(整数,默认值= 2147483647)
--max-mem : Maximum approximate memory usage in determinization (real usage might be many times this). (int, default = 50000000) --max-mem :确定中的最大近似内存使用量(实际使用量可能是这个的很多倍)。 (整数,默认值= 50000000)
--mfcc-config : Configuration file for MFCC features (e.g. conf/mfcc.conf) (string, default = "") --mfcc-config :MFCC 功能的配置文件(例如 conf/mfcc.conf)(字符串,默认值= \"\")
--min-active : Decoder minimum #active states. (int, default = 200) --min-active :解码器最小 #active 状态。 (整数,默认值= 200)
--minimize : If true, push and minimize after determinization. (bool, default = false) --minimize :如果为 true,则在确定后推送并最小化。 (布尔值,默认值= false)
--num-threads-startup : Number of threads used when initializing iVector extractor. (int, default = 8) --num-threads-startup :初始化 iVector 提取器时使用的线程数。 (整数,默认值= 8)
--online-pitch-config : Configuration file for online pitch features, if --add-pitch=true (e.g. conf/online_pitch.conf) (string, default = "") --online-pitch-config : 在线音高特性的配置文件,如果 --add-pitch=true (e.g. conf/online_pitch.conf) (string, default= \"\")
--optimization.allocate-from-other : Instead of deleting a matrix of a given size and then allocating a matrix of the same size, allow re-use of that memory (bool, default = true) --optimization.allocate-from-other :不是删除给定大小的矩阵,然后分配相同大小的矩阵,而是允许重复使用该内存(bool,默认值= true)
--optimization.allow-left-merge : Set to false to disable left-merging of variables in remove-assignments (obscure option) (bool, default = true) --optimization.allow-left-merge :设置为 false 以禁用删除赋值中变量的左合并(隐藏选项)(布尔值,默认值= true)
--optimization.allow-right-merge : Set to false to disable right-merging of variables in remove-assignments (obscure option) (bool, default = true) --optimization.allow-right-merge :设置为 false 以禁用删除赋值中变量的右合并(隐藏选项)(bool,默认值= true)
--optimization.backprop-in-place : Set to false to disable optimization that allows in-place backprop (bool, default = true)

--optimization.backprop-in-place :设置为 false 以禁用允许就地反向传播的优化(bool,默认值= true)
--optimization.consolidate-model-update : Set to false to disable optimization that consolidates the model-update phase of backprop (e.g. for recurrent architectures (bool,default = true) --optimization.consolidate-model-update :设置为 false 以禁用整合反向传播模型更新阶段的优化(例如,对于循环架构(bool,默认值= true)
--optimization.convert-addition : Set to false to disable the optimization that converts Add commands into Copy commands wherever possible. (bool, default = true) --optimization.convert-addition :设置为 false 以禁用将 Add 命令尽可能转换为 Copy 命令的优化。 (布尔值,默认值= 真)
--optimization.extend-matrices : This optimization can reduce memory requirements for TDNNs when applied together with --convert-addition=true (bool, default = true) --optimization.extend-matrices :当与 --convert-addition=true (bool, default= true) 一起应用时,此优化可以减少 TDNN 的内存需求
--optimization.initialize-undefined : Set to false to disable optimization that avoids redundant zeroing (bool, default = true) --optimization.initialize-undefined :设置为 false 以禁用避免冗余归零的优化(bool,默认值= true)
--optimization.max-deriv-time : You can set this to the maximum t value that you want derivatives to be computed at when updating the model. This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = 2147483647) --optimization.max-deriv-time :您可以将其设置为您希望在更新模型时计算导数的最大 t 值。这是一种优化,可以在循环框架的反向传播阶段节省时间(整数,默认值= 2147483647)
--optimization.max-deriv-time-relative : An alternative mechanism for setting the --max-deriv-time, suitable for situations where the length of the egs is variable. If set, it is equivalent to setting the --max-deriv-time to this value plus the largest 't' value in any 'output' node of the computation request.(int, default = 2147483647) --optimization.max-deriv-time-relative :设置 --max-deriv-time 的替代机制,适用于 egs 长度可变的情况。如果设置,则相当于将 --max-deriv-time 设置为此值加上计算请求的任何“输出”节点中的最大“t”值。(int,默认值= 2147483647)
--optimization.memory-compression-level : This is only relevant to training, not decoding. Set this to 0,1,2;higher levels are more aggressive at reducing memory by compressing quantities needed for backprop,potentially at the expense of speed and the accuracy of derivatives. 0 means no compression at all; 1 means compression that shouldn't affect results at all. (int, default = 1) --optimization.memory-compression-level :这仅与训练有关,与解码无关。将此设置为 0、1、2;更高的级别通过压缩反向传播所需的数量来减少内存,这可能会以牺牲速度和导数的准确性为代价。 0 表示完全没有压缩; 1 表示根本不应该影响结果的压缩。 (整数,默认值= 1)
--optimization.min-deriv-time : You can set this to the minimum t value that you want derivatives to be computed at when updating the model. This is an optimization that saves time in the backprop phase for recurrent frameworks (int, default = -2147483648) --optimization.min-deriv-time :您可以将其设置为更新模型时要计算导数的最小 t 值。这是一种优化,可以在循环框架的反向传播阶段节省时间(int,默认值= -2147483648)
--optimization.move-sizing-commands : Set to false to disable optimization that moves matrix allocation and deallocation commands to conserve memory. (bool, default = true) --optimization.move-sizing-commands :设置为 false 以禁用移动矩阵分配和解除分配命令以节省内存的优化。 (布尔值,默认值= 真)
--optimization.optimize : Set this to false to turn off all optimizations (bool, default = true) --optimization.optimize :将此设置为 false 以关闭所有优化(布尔值,默认值= true)
--optimization.optimize-row-ops : Set to false to disable certain optimizations that act on operations of type *Row*. (bool, default = true) --optimization.optimize-row-ops :设置为 false 以禁用对 *Row* 类型的操作起作用的某些优化。 (布尔值,默认值= 真)
--optimization.propagate-in-place : Set to false to disable optimization that allows in-place propagation (bool, default = true) --optimization.propagate-in-place :设置为 false 以禁用允许就地传播的优化(bool,默认值= true)
--optimization.remove-assignments : Set to false to disable optimization that removes redundant assignments (bool,default = true) --optimization.remove-assignments :设置为 false 以禁用删除冗余分配的优化(bool,默认值= true)
--optimization.snip-row-ops : Set this to false to disable an optimization that reduces the size of certain per-row operations (bool, default = true) --optimization.snip-row-ops :将此设置为 false 以禁用减少某些每行操作大小的优化(bool,默认值= true)
--optimization.split-row-ops : Set to false to disable an optimization that may replace some operations of type kCopyRowsMulti or kAddRowsMulti with up to two simpler operations. (bool, default = true) --optimization.split-row-ops :设置为 false 以禁用优化,该优化可以用最多两个更简单的操作替换 kCopyRowsMulti 或 kAddRowsMulti 类型的某些操作。 (布尔值,默认值= 真)
--output-period : How often in seconds, do we check for changes in output. (float, default = 1) --output-period :我们检查输出变化的频率(以秒为单位)。 (浮动,默认值= 1)
--phone-determinize : If true, do an initial pass of determinization on both phones and words (see also --word-determinize) (bool, default = true) --phone-determinize :如果为 true,则对音素和单词进行初始确定(另请参见 --word-determinize)(bool,默认值= true)
--plp-config : Configuration file for PLP features (e.g. conf/plp.conf) (string, default = "") --plp-config :PLP 功能的配置文件(例如 conf/plp.conf)(字符串,默认值= \"\")
--produce-time : Prepend begin/end times between endpoints (e.g. '5.46 6.81 <text_output>', in seconds) (bool,default = false) --produce-time :在端点之间添加开始/结束时间(例如'5.46 6.81 \u003ctext_output>',以秒为单位)(bool,default= false)
--prune-interval : Interval (in frames) at which to prune tokens(int, default = 25) --prune-interval :修剪令牌的间隔(以帧为单位)(整数,默认值= 25)
--read-timeout : Number of seconds of timout for TCP audio data to appear on the stream. Use -1 for blocking.(int, default = 3) --read-timeout :TCP 音频数据出现在流上的超时秒数。使用 -1 进行阻塞。(整数,默认值= 3)
--samp-freq : Sampling frequency of the input signal (coded as 16-bit slinear). (float, default = 16000) --samp-freq :输入信号的采样频率(编码为 16 位线性)。 (浮点数,默认值= 16000)
--word-determinize : If true, do a second pass of determinization on words only (see also --phone-determinize) (bool, default = true) --word-determinize :如果为 true,则仅对单词进行第二次确定(另请参阅 --phone-determinize)(bool,默认值= true)
Standard options:标准选项:
--config : Configuration file to read (this option may be repeated) (string, default = "") --config : 要读取的配置文件(此选项可能会重复)(字符串,默认值= \"\")
--help : Print out usage message (bool, default = false) --help : 打印使用信息(布尔值,默认值= false)
--print-args : Print the command line arguments (to stderr)(bool, default = true) --print-args :打印命令行参数(到 stderr)(bool,默认值= true)
--verbose : Verbose level (higher->more logging) (int,default = 0) --verbose :详细级别(更高->更多日志记录)(整数,默认值= 0)

友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内