精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
I have a lattice which was obtained after decoding the entire wav file. The decoder was decoding per utterance. When I open the lattice I can see that it has stamps for each utterance (All I see is the utterance stamp and the corresponding lattices in binary format, which is not readable to me). Can any one guide me as to how I can divide this big lattice into small lattices which will only have fsts for each utterance.我有一个在解码整个wav文件后获得的晶格。解码器正在按话语解码。当我打开格子时,我可以看到每个发声都有图章(我所看到的是发声图章和对应的二进制格式的格子,这对我来说是不可读的)。任何人都可以指导我如何将这个大格子分成小格子,而每个小格子只具有fsts。
have a look at script show_lattice.sh in egs/wsj/s5/utils The same trick that is used in lattice-to-fst should be usable for lattice-copy看看egs / wsj / s5 / utils中的脚本show_lattice.sh。晶格到fst中使用的相同技巧应可用于晶格复制
That is an archive of lattices. You need to read the Kaldi I/O tutorial
section at kaldi.sf.net.
It is possible to do what you want by something like
lattice-copy "ark:gunzip -c lat_archive.gz|" "scp:foo.scp" where foo.scp has lines like
utt_id1 some/dir/utt_id1.fst
for each utterance.
However, that is not generally necessary a good thing to do- usually it's
best to deal with archives to avoid I/O on many small files. Also, the
Kaldi lattice format is not readable by OpenFst due to a different weight
type.
If you just want to look at the lattices, you could do
lattice-copy "ark:gunzip -c lat_archive.gz|" ark,t:- | less
which will put them in text form.那是格子的档案。您需要阅读kaldi.sf.net上的Kaldi I / O教程部分。
可以通过lattice-copy“ ark:gunzip -c lat_archive.gz |”来完成所需的操作 “ scp:foo.scp” ,其中foo.scp的每行都有类似
utt_id1 some / dir / utt_id1.fst
的行。
但是,通常不必这样做是一件好事-通常最好处理归档文件,以避免对许多小文件进行I / O。同样,由于权重类型不同,OpenFst无法读取Kaldi格格式。
如果您只想查看晶格,则可以进行lattice-copy“ ark:gunzip -c lat_archive.gz |” ark,t:-| 更少
,它将以文本形式显示。