锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 开源技术 / 语音识别开源 / utterance帧太少不足以对齐

服务方向

人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发

联系方式

固话:0371-63888850
音素:138-0381-0136
Q Q:396806883
微信:ryysoft

utterance帧太少不足以对齐

 

I'm getting the following error on a number of my shorter audio files during alignment: 对齐期间,我的一些较短的音频文件出现以下错误:

WARNING (align-equal-compiled:EqualAlign():fstext/fstext-utils-inl.h:930) EqualAlign: utterance has too few frames 25 to align. 警告(align-equal-compiled:EqualAlign():fstext / fstext-utils-inl.h:930)EqualAlign:发声的帧25太少而无法对齐。

What is the best way around this? Many of my audio files are very short, so I would prefer not to reject any. Is there a way to change the frame length or the frame overlap size so that I have more frames in each utterance? Anything I should watch out for if I go this route? What is the minimum number of frames that the alignment will work with? 最好的办法是什么?我的许多音频文件都很短,所以我希望不要拒绝任何音频文件。有没有一种方法可以更改帧长或帧重叠大小,以便每次发声时可以容纳更多帧?如果我走这条路线,我应该注意什么?对齐将使用的最小帧数是多少?

 

 

It means the number of frames is less than the minimum number of HMM-states your model allows. Obviously this is way shorter than reasonable, so there must be a problem with your data or transcripts, or you are doing something wrong. 这意味着帧数小于模型允许的最小HMM状态数。显然,这比合理的时间短,因此您的数据或脚本肯定有问题,或者您做错了什么。

 

 

Some more info - I'm building monophone models for isolated phonemes based on the yesno recipe. For now, the recordings are all pretty short and my transcriptions contain only one phoneme.I had figured there were about 3 states per HMM, and that 25 frames of audio would be enough, but this obviously is not the case. Is there somewhere, perhaps in the train_mono.sh script, that I can specify the min number of HMM states? 更多信息-我正在基于yesno配方为孤立音素建立单音模型。就目前而言,录音都很短,我的录音只包含一个音素,我已经发现每个HMM大约有3种状态,而25帧音频就足够了,但是事实显然并非如此。是否可以在train_mono.sh脚本中的某个地方指定HMM状态的最小数量?

 

 

There should be 3 states per HMM, yes. Presumably for that particular utterance that failed to align, your transcript was too long, i.e. it had
too many phones. So check the accuracy of your transcript. 每个HMM应该有3个状态,是的。无法对齐的特定发音,大概是因为,您的transcript太长,即音素太多。因此,请检查脚本的准确性。

 

 

The transcripts in data/train/text are just one phoneme per file, like this: data/train/text下的transcripts每文件只有一个音素,比如
ih_1 ih
ih_2 ih
t_1 t
t_2 t
etc. t_2 t等

But when i check the alignments using: 但是当我使用以下方法检查对齐方式时:

show-alignments data/lang/phones.txt exp/mono/final.mdl 'ark:gunzip -c exp/mono/ali.1.gz|'   

I see that it has added in silences as well: 我看到它也增加了沉默:

ih_1 [ 4 1 1 1 13 15 15 15 15 8 5 5 5 18 ] [ 26 28 30 29 29 29 29 29 28 29 ]  ih_1  SIL                                                  ih                           

So it's adding silences itself during alignment. Is this weird? Should I supply more accurate transcription files with silences on either side of the phoneme? 因此,它会在对齐过程中添加静音。这很奇怪吗?我应该在音素的两边都提供更准确的无声transcription文件吗?

 

 

That is the optional-silence that the lexicon allows. If you don't want that to be allowed, you can set the optional-silence prob to be 0.0 to
prepare_lang.sh. 这是词典允许的可选沉默。如果您不希望这样做,可以将prepare-lang.sh的可选沉默概率设置为0.0。

 

 

Thanks Dan, that worked. No more warnings in exp/mono/log/align...log, and show-alignments has just one phoneme as the transcription. 谢谢丹,那工作了。 exp / mono / log / align..log中没有更多警告,并且show-alignments仅作为转录的一个音素。

One further question: is there a way of aligning using "optional-noise" as well as optional-silence?
Many of my transcriptions would take the form: 还有一个问题:是否有一种方法可以使用“可选噪声”和“可选静音”进行对齐? 我的许多transcriptions都采用以下形式:

< SIL | NOISE > phoneme < SIL | NOISE >

and i'd like to capture the noises and silences in the transcription without having to do it by hand. 而且我想捕获转录中的噪音和沉默,而无需手动进行。

 

 

There is not currently a way to allow more than one phone as the optional-silence phone in the lexicon FST. Fundamentally this is not very
hard to do- you could change the script that creates the lexicon- but in many situations it wouldn't make sense. For example, if you always allowed silence and noise as options, what is to stop the meaning of silence and noise from switching over? Unless you had a lot of instances of noise in your transcripts, this could easily happen. 当前没有一种方法可以将多个音素用作词典FST中的可选静音音素。从根本上讲,这并不是很难做到的-您可以更改创建词典的脚本-但在许多情况下这没有任何意义。例如,如果您始终允许静音和噪音作为选项,那么阻止静音和噪音转换的含义是什么?除非transcripts中有很多干扰的实例,否则这很容易发生。

 

 

Sure, I see how that could happen. It's given two labels to use but it doesn't know which type of sound should get which label.There are bursts of noise in many (but not all) of my audio. I was thinking I could change the transcriptions to read: 当然,我知道怎么可能发生。它有两个标签可供使用,但它不知道哪种声音应该获得哪个标签。我的许多(但不是全部)音频中都有一阵噪音。我当时想我可以将抄写更改为:

ih_1 NOISE ih NOISE
ih_2 NOISE ih NOISE

and reintroduce the optional silence probability.Then, if I had a sound with no noise, would I be reintroducing the "EqualAlign: utterance has too few frames to align." error?Is there a way to skip over a part of the transcription if it doesn't have enough frames for it, as opposed to rejecting the entire file? 并重新引入可选的静音概率。然后,如果我的声音没有噪音,我是否会重新引入“ EqualAlign:发声的帧太少而无法对齐”。错误?如果没有足够的帧,是否有办法跳过transcription的一部分,而不是拒绝整个文件?

 

 

You might still get the error- it might be necessary to change the code. I think the way that code works is it first selects a random path through the FST, and then if the length of the path is too long, it gives an error. The path through the alignment graph that it chose may not be the shortest one. It would not be difficult to change the code to handle this better- in the past we never had a reason to do so. The code could be changed to, in the case where the path is too short, try a few different random paths and try to find one whose length is OK, and only fail if after (say) 10 tries it could not find a path that is short enough. Perhaps someone else on the list would offer to make that change for you; I don't have time. I can review the patch though. 我认为代码的工作方式是,它首先通过FST选择一条随机路径,然后如果路径的长度过长,则会产生错误。它选择的通过对齐图的路径可能不是最短的路径。更改代码以更好地处理它并不困难-过去我们从来没有理由这样做。可以将代码更改为:在路径太短的情况下,尝试一些不同的随机路径并尝试找到长度合适的路径,并且只有在(例如)10次尝试后找不到路径而失败时才会失败足够短。也许名单上的其他人会为您做出改变。我没有时间我可以查看补丁。

 

 

Amelia, I checked in a modification of the EqualAlign function that will try several times to find a path through the fst. Can you please update kaldi, recompile and re-run your scripts? BTW, in case it fails again, it will at least provide us with some additional info -- in the log, look for messages like this: 阿米莉亚(Amelia),我检查了对EqualAlign函数的修改,它将尝试多次查找通过fst的路径。能否请您更新kaldi,重新编译并重新运行脚本?顺便说一句,万一再次失败,它将至少为我们提供一些其他信息-在日志中,查找如下消息:

                EqualAlign: the randomly constructed paths lengths:
38,37,37,48,35,37,38,44,30
友情链接
版权所有 Copyright(c)2004-2021 锐英源软件
公司注册号:410105000449586 豫ICP备08007559号 最佳分辨率 1024*768
地址:郑州大学北校区院(文化路97号院)内