精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
I am trying to run the sre08 recipe. A weird format error was found when running scripts with pipelines.
For example, with this definition:
我正在尝试运行sre08版本。使用管道运行脚本时发现了奇怪的格式错误。
例如,使用以下定义:
feats="ark,s,cs:add-deltas scp:$sdata/JOB/feats.scp ark:- | apply-cmvn-sliding --norm-vars=false --center=true --cmn-window=300 ark:- ark:- | select-voiced-frames ark:- scp,s,cs:$sdata/JOB/vad.scp ark:- | subsample-feats --n=$subsample ark:- ark:- |"
A command like the following one will always result in format error:
$cmd JOB=1:$nj_full $dir/log/gselect.JOB.log \ gmm-gselect --n=$num_gselect $dir/final.dubm "$feats" ark:- \| \ fgmm-global-gselect-to-post --min-post=$min_post $dir/final.ubm "$feats" \ ark,s,cs:- "ark:|gzip -c >$dir/post.JOB.gz" || exit 1;
Some error messages:
ERROR (gmm-gselect:Read(): kaldi-matrix.cc:1344) Failed to read matrix from stream. : Expected "[", got "archive" File position at start is -1, currently -1
ERROR (select-voiced-frames:Read():kaldi-matrix.cc:1344): Failed to read matrix from stream. : Expected "[", got "archive" File position at start is -1, currently -1
一些错误信息:
错误(gmm-gselect:Read():kaldi-matrix.cc:1344)无法从流中读取矩阵。:预期为“ [”,“存档”开始时文件位置为-1,
当前为-1
错误(select-voiced-frames:Read():kaldi-matrix.cc:1344):无法从流中读取矩阵。:预期“ [”,“存档”开始时文件位置为-1,当前为-1
...
Running the code with some debugging efforts, we found those fed into pipes did not have '\0'B so they are thought of as a text file, but they don't have "[", the file content begins immediately after the index name.
I tried to split these scripts with pipes into separate commands and they can properly generate results without problem.
That is, replace original definition of $feats with separate commands like these will work fine:
通过一些调试工作来运行代码,我们发现送入管道的代码没有'\ 0'B,因此它们虽然是文本文件,但没有“ [”,文件内容在索引之后立即开始名称。
我试图通过管道将这些脚本拆分为单独的命令,它们可以正确生成结果而不会出现问题。
也就是说,用这样的单独命令替换$ feats的原始定义将可以正常工作:
$cmd 1:$nj_full $dir/log/add-deltas.JOB.log \ add-deltas scp:$sdata/JOB/feats.scp ark:$sdata/JOB/feats_deltas.ark
$cmd 1:$nj_full $dir/log/apply-cmvn-sliding.JOB.log \ apply-cmvn-sliding --norm-vars=false --center=true --cmn-window=300 \ ark:$sdata/JOB/feats_deltas.ark ark:$sdata/JOB/feats_deltas_cmvn.ark
$cmd 1:$nj_full $dir/log/select-voiced-frames.JOB.log \ select-voiced-frames ark:$sdata/JOB/feats_deltas_cmvn.ark scp,s,cs:$sdata/JOB/vad.scp \
ark:$sdata/JOB/feats_deltas_cmvn_vad.ark
$cmd 1:$nj_full $dir/log/subsample-feats.JOB.log \ subsample-feats --n=$subsample ark:$sdata/JOB/feats_deltas_cmvn_vad.awk \ ark:$sdata/JOB/feats_deltas_cmvn_vad_$subsample.ark
feats="ark,s,cs:$sdata/JOB/feats_deltas_cmvn_vad_$subsample.ark"
But the frequently I/O may not only slow down the performance but take lots of hard drive spaces. Could someone help with this weird format issue?
Thanks a lot.但是频繁的I / O不仅会降低性能,还会占用 大量硬盘空间。有人可以解决这个奇怪的格式问题吗?
OK - make sure you have not changed the code (do "svn status | grep -v '?'), and run "make test". It's acting like it's reading the string
"archive" from a stream, but that string is never printed by Kaldi.OK-确保您没有更改代码(执行“ svn status | grep -v '?'”),然后运行“ make test”,其作用就像是
从流中读取字符串“ archive”一样,但是该字符串永远不会通过Kaldi打印。
Now, with a KALDI copy svn-synced recently, svn status | grep -v '?' generating nothing and all SUCCESS make test, it seems to work properly now.
Is it possible that I somehow corrupted the source file or other possible issues?
The previous copy was NOT ok with the make test.
现在,使用svn同步出来最近KALDI副本,svn status | grep -v'?' 什么也没有产生,make test 都成功进行了测试,现在看来它可以正常工作。
我是否可能以某种方式损坏了源文件或其他可能的问题? 先前的副本无法通过make test。
Yes, possibly either you changed the code, or you had checked out a bad
version number with a bug that I was not previously aware of, that was
later fixed. Anyway it doesn't matter. Please don't follow up on this
thread as I don't want to generate too much traffic on the list.是的,可能是您更改了代码,或者您签出了一个错误的
版本号以及一个我以前不知道的错误,后来该错误得以解决。无论如何都没关系。请不要跟进此线程,因为我不想在列表上产生太多流量。