单声道音素解码零权重问题

I'm trying to develop a recipe for a corpus using the WSJ-s5 egs as a template. During monophone decoding, however, I keep on running into zero weight issues. 我正在尝试使用WSJ-s5 egs作为模板为语料库制定脚本。但是，在单声道音素解码期间，我一直遇到零权重问题。

Below is an excerpt from decode.1.log: 以下是decode.1.log的摘录：

"gmm-latgen-faster --max-active=7000 --beam=13.0 --lattice-beam=6.0 --acoustic-scale=0.083333 --allow-partial=true --word-symbol-table=exp/mono0a/graph_bg/words.txt exp/mono0a/final.mdl exp/mono0a/graph_bg/HCLG.fst 'ark,s,cs:apply-cmvn --utt2spk=ark:data/test/split8/1/utt2spk scp:data/test/split8/1/cmvn.scp scp:data/test/split8/1/feats.scp ark:- | add-deltas ark:- ark:- |' 'ark:|gzip -c > exp/mono0a/decode_bg/lat.1.gz'

		        add-deltas ark:- ark:-

		        apply-cmvn --utt2spk=ark:data/test/split8/1/utt2spk scp:data/test/split8/1/cmvn.scp scp:data/test/split8/1/feats.scp ark:-

		        nchlt-eng-500m-0001 !SIL rebellion !SIL

		        WARNING (gmm-latgen-faster:ComputeBackwardWeight():determinize-lattice-pruned.cc:1047) Total weight of input latice is zero.

		        [empty subset]

		        WARNING (gmm-latgen-faster:InitialToStateId():determinize-lattice-pruned.cc:587) Zero weight!

LOG (gmm-latgen-faster:DecodeUtteranceLatticeFaster():lattice-faster-decoder.cc:981) Log-like per frame for utterance nchlt-eng-500m-0001 is -7.68534 over 226 frames. ..."

A few lines from best_path.10.log: 来自best_path.10.log的几行内容：

"lattice-scale --inv-acoustic-scale=10 'ark:gunzip -c exp/mono0a/decode_bg/lat.*.gz|' ark:-

		        lattice-best-path --word-symbol-table=exp/mono0a/graph_bg/words.txt ark:- ark,t:exp/mono0a/decode_bg/scoring/10.tra

		        lattice-add-penalty --word-ins-penalty=0.0 ark:- ark:-

		        WARNING (lattice-best-path:main():lattice-best-path.cc:92) Best-path failed for key nchlt-eng-500m-0001 ... "

I initially trained a bigram ARPA model using different toolkits (MIT-LM, SRILM and IRSTLM), but all gave the same error. I have settled on a unigram FST (used the RM egs approach found in rm_prepare_grammar_ug.sh).I have most probably made a mistake somewhere in the data preparation phase but can't find where it is. 最初，我使用不同的工具包（MIT-LM，SRILM和IRSTLM）训练了bigram ARPA模型，但都给出了相同的错误。我已经确定了一个unigram FST（使用rm_prepare_grammar_ug.sh中的RM egs方法）。我很可能在数据准备阶段的某个地方犯了一个错误，但是找不到它的位置。

I think this may actually be a bug in the determinization code. Could you please add to that the command line the option
--determinize-lattice=false, and send the output file exp/mono0a/decode_bg/lat.1.gz and your final.mdl to me at dpovey@gmail.com?
Then I'll be able to reproduce the problem and fix it (if it is a bug). 您能否在命令行中添加--determinize-lattice = false选项，然后通过dpovey@gmail.com将输出文件exp/mono0a/decode_bg/lat.1.gz和您的final.mdl发送给我？然后，我将能够重现该问题并解决（如果是错误）。

Adding --determinize-lattice=false helped, no errors. 添加--determinize-lattice =false有帮助，没有错误。

Yes, no errors are expected in that case, I would run another program on the output (determinize-lattice-phone-pruned) which is expected to produce the error. 按提示改没事了，我将在输出（确定删节的音素）运行另外一个程序。

Just to follow up on this- it looks like the problem went away after updating the code and recompiling and rerunning. If it recurs I'll look at
this again. 只是为了跟进此事-更新代码并重新编译并重新运行后，问题似乎消失了。如果再次出现，我会再次查看。

友情链接