锐英源软件
第一信赖

精通

英语

开源

擅长

开发

培训

胸怀四海 

第一信赖

当前位置:锐英源 / 英语翻译 / Windows Vista中的语音识别和综合管理API
服务方向
人工智能数据处理
人工智能培训
kaldi数据准备
小语种语音识别
语音识别标注
语音识别系统
语音识别转文字
kaldi开发技术服务
软件开发
运动控制卡上位机
机械加工软件
软件开发培训
Java 安卓移动开发
VC++
C#软件
汇编和破解
驱动开发
技术分类
讨论组翻译
解决周围限制的方法
联系方式
固话:0371-63888850
手机:138-0381-0136
Q Q:396806883
微信:ryysoft

锐英源精品开源心得,转载请注明:“锐英源www.wisestudy.cn,孙老师作品,电话13803810136。需要全文内容也请联系孙老师。

Windows Vista中的语音识别和综合管理API

 

Introduction

One of the coolest features to be introduced with Windows Vista is the new built in speech recognition facility. To be fair, it has been there in previous versions of Windows, but not in the useful form in which it is now available. Best of all, Microsoft provides a managed API with which developers can start digging into this rich technology. For a fuller explanation of the underlying technology, I highly recommend the Microsoft whitepaper. This tutorial will walk the user through building a common text pad application, which we will then trick out with a speech synthesizer and a speech recognizer using the .NET managed API wrapper for SAPI 5.3. By the end of this tutorial, you will have a working application that reads your text back to you, obeys your voice commands, and takes dictation. But first, a word of caution: this code will only work for Visual Studio 2005 installed on Windows Vista. It does not work on XP, even with .NET 3.0 installed.Windows Vista引入的最酷功能之一是新的内置语音识别功能。公平地说,它已经存在于Windows的早期版本中,但不是现在可用的有用形式。最重要的是,Microsoft提供了一个托管API,开发人员可以开始深入研究这种丰富的技术。为了更全面地解释底层技术,我强烈推荐Microsoft白皮书。本教程将引导用户构建一个通用文本填充应用程序,然后我们将使用语音合成器和语音识别器,使用适用于SAPI 5.3的.NET托管API包装器进行操作。在本教程结束时,您将拥有一个工作应用程序,它可以将您的文本读回给您,服从您的语音命令并接受听写。但首先要提醒一句:此代码仅适用于Windows Vista上安装的Visual Studio 2005。即使安装了.NET 3.0,它也无法在XP上运行。

Background

Because Windows Vista has only recently been released, there are, as of this writing, several extant problems relating to developing on the platform. The biggest hurdle is that there are known compatibility problems between Visual Studio and Vista. Visual Studio.NET 2003 is not supported on Vista, and there are currently no plans to resolve any compatibility issues there. Visual Studio 2005 is supported, but in order to get it working well, you will need to make sure you also install service pack 1 for Visual Studio 2005. After this, you will also need to install a beta update for Vista called, somewhat confusingly, "Visual Studio 2005 Service Pack 1 Update for Windows Vista Beta". Even after doing all this, you will find that all the new cool assemblies that come with Vista, such as the System.Speech assembly, still do not show up in your Add References dialog in Visual Studio. If you want to have them show up, you will finally need to add a registry entry indicating where the Vista DLL's are to be found. Open the Vista registry UI by running regedit.exe in your Vista search bar. Add the following registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\AssemblyFolders\v3.0 Assemblies with this value: C:\\Program Files\\Reference Assemblies\\Microsoft\\Framework\\v3.0. (You can also install it under HKEY_CURRENT_USER, if you prefer.</a />) Now, we are ready to start programming in Windows Vista.由于Windows Vista最近才刚刚发布,因此在撰写本文时,存在与平台开发相关的几个现存问题。最大的障碍是Visual Studio和Vista之间存在已知的兼容性问题。Vista不支持Visual Studio.NET 2003,目前没有计划解决任何兼容性问题。支持Visual Studio 2005,但为了使其运行良好,您需要确保还为Visual Studio 2005安装Service Pack 1.此后,您还需要安装Vista 的测试版更新,有点令人困惑,“用于Windows Vista Beta的Visual Studio 2005 Service Pack 1更新”。即使在完成所有这些之后,你会发现Vista附带的所有新酷组件,例如System.Speech程序集仍然不会显示在Visual Studio的“添加引用”对话框中。如果你想让它们出现,你最后需要添加一个注册表项,指明Vista DLL的位置。通过在Vista搜索栏中运行regedit.exe打开Vista注册表UI 。添加以下注册表项HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ .NETFramework \ AssemblyFolders \ v3.0具有此值的程序集:C:\\ Program Files \\ Reference Assemblies \\ Microsoft \\ Framework \\ v3.0。(HKEY_CURRENT_USER如果您愿意,也可以安装它。</ a />)现在,我们已准备好在Windows Vista中开始编程。

Before working with the speech recognition and synthesis functionality, we need to prepare the ground with a decent text pad application to which we will add on our cool new toys. Since this does not involve Vista, you do not really have to follow through this step in order to learn the speech recognition API. If you already have a good base application, you can skip ahead to the next section, Speechpad, and use the code there to trick out your app. If you do not have a suitable application at hand, but also have no interest in walking through the construction of a text pad application, you can just unzip the source code linked above and pull out the included Textpad project. The source code contains two Visual Studio 2005 projects, the Textpad project, which is the base application for the SR functionality, and Speechpad, which includes the final code.

All the same, for those with the time to do so, I feel there is much to gain from building an application from the ground up. The best way to learn a new technology is to use it oneself and to get one's hands dirty, as it were, since knowledge is always more than simply knowing that something is possible; it also involves knowing how to put that knowledge to work. We know by doing, or as Giambattista Vico put it, verum et factum convertuntur.

在使用语音识别和合成功能之前,我们需要准备一个像样的文本写字板应用程序,我们将添加到我们的酷玩具上。由于这不涉及Vista,因此您无需为了学习语音识别API而必须完成此步骤。如果您已有一个很好的基础应用程序,可以跳到下一部分Speechpad,并使用那里的代码来欺骗您的应用程序。如果您手头没有合适的应用程序,但也没有兴趣通过构建文本板应用程序,您可以解压缩上面链接的源代码并拉出包含的Textpad项目。源代码包含两个Visual Studio 2005项目,即Textpad项目,它是SR功能的基本应用程序,以及Speechpad,其中包括最终代码。

同样,对于那些有时间的人,我觉得从头开始构建应用程序可以获得很多好处。学习一项新技术的最好方法就是自己动手使用它,因为知识总是不仅仅是知道某些事情是可能的; 它还涉及了解如何将这些知识付诸实践。我们知道,或者正如Giambattista Vico所说,verum et factum convertuntur

Textpad

Textpad is an MDI application containing two forms: a container, called Main.cs, and a child form, called TextDocument.csTextDocument.cs, in turn, contains a RichTextBox control.

Create a new project called Textpad. Add the "Main" and "TextDocument" forms to your project. Set the IsMdiContainer property of Main to true. Add a MainMenu control and an OpenFileDialog control (name it "openFileDialog1") to Main. Set the Filter property of the OpenFileDialog to "Text Files | *.txt", since we will only be working with text files in this project. Add a RichTextBox control to "TextDocument", name it "richTextBox1"; set its Dock property to "Fill" and its Modifiers property to "Internal".

Textpad是含有两种窗体的MDI应用程序:一个容器,称为Main.cs,和子窗体,称为TextDocument.cs。反过来,TextDocument.cs包含一个RichTextBox控件。

创建一个名为的新项目Textpad。将“ Main”和“ TextDocument”表单添加到项目中。设置IsMdiContainer属性Main为true。添加一个MainMenu控件和一个OpenFileDialog控件(将其命名为“ openFileDialog1”)Main。设置为“ ” 的Filter属性,因为我们将只处理此项目中的文本文件。将控件添加到“ ”,将其命名为“ ”; 将其属性设置为“ ”,将其属性设置为“ ”。OpenFileDialogText Files | *.txtRichTextBoxTextDocumentrichTextBox1DockFillModifiersInternal

Add a MenuItem control to MainMenu called "File" by clicking on the MainMenu control in Designer mode and typing "File" where the control prompts you to "type here". Set the File item's MergeType property to "MergeItems". Add a second MenuItem called "Window". Under the "File" menu item, add three more Items: "New", "Open", and "Exit". Set the MergeOrder property of the "Exit" control to 2. When we start building the "TextDocument" Form, these merge properties will allow us to insert menu items from child forms between "Open" and "Exit".

Set the MDIList property of the Window menu item to true. This automatically allows it to keep track of your various child documents during runtime.

通过在Designer模式下单击控件并键入“File”,控件将提示您“在此处键入”,将控件添加MenuItem到MainMenu名为“ File ”的MainMenu控件中。将文件项的MergeType属性设置为“ MergeItems”。添加第二个MenuItem名为“ Window ”。在“文件”菜单项下,再添加三个项目:“ 新建 ”,“ 打开 ”和“ 退出 ”。将MergeOrder“Exit”控件的属性设置为2.当我们开始构建“ TextDocument”表单时,这些合并属性将允许我们在“打开”和“退出”之间插入子表单中的菜单项。

将MDIListWindow菜单项的属性设置为true。这会自动允许它在运行时跟踪各种子文档。

Next, we need some operations that will be triggered off by our menu commands. The NewMDIChild() function will create a new instance of the Document object that is also a child of the Main container. OpenFile() uses the OpenFileDialog control to retrieve the path to a text file selected by the user. OpenFile() uses a StreamReader to extract the text of the file (make sure you add a using declaration for System.IO at the top of your form). It then calls an overloaded version of NewMDIChild() that takes the file name and displays it as the current document name, and then injects the text from the source file into the RichTextBox control in the current Document object. The Exit() method closes our Main form. Add handlers for the File menu items (by double clicking on them) and then have each handler call the appropriate operation: NewMDIChild(), OpenFile(), or Exit(). That takes care of your Main form.接下来,我们需要一些将由菜单命令触发的操作。该NewMDIChild()函数将创建Document对象的新实例,该对象也是Main容器的子对象。OpenFile()使用该OpenFileDialog控件来检索用户选择的文本文件的路径。OpenFile()使用StreamReader来提取文件的文本(确保在表单顶部添加using声明System.IO)。然后它调用一个带有NewMDIChild()文件名的重载版本并将其显示为当前文档名称,然后将源文件中的文本注入RichTextBox当前Document对象的控件中。该Exit()方法关闭我们的Main形成。添加处理程序的文件菜单项(通过他们双击),然后让每个处理程序调用相应的操作:NewMDIChild(),OpenFile()或Exit()。这照顾你的Main形式。

#region Main File Operations

private void NewMDIChild()
{
NewMDIChild("Untitled");
}

private void NewMDIChild(string filename)
{
TextDocument newMDIChild = new TextDocument();
newMDIChild.MdiParent = this;
newMDIChild.Text = filename;
newMDIChild.WindowState = FormWindowState.Maximized;
newMDIChild.Show();
}

private void OpenFile()
{
try
{
openFileDialog1.FileName = "";
DialogResult dr = openFileDialog1.ShowDialog();
if (dr == DialogResult.Cancel)
{
return;
}
string fileName = openFileDialog1.FileName;
using (StreamReader sr = new StreamReader(fileName))
{
string text = sr.ReadToEnd();
NewMDIChild(fileName, text);
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}

private void NewMDIChild(string filename, string text)
{
NewMDIChild(filename);
LoadTextToActiveDocument(text);
}

private void LoadTextToActiveDocument(string text)
{
TextDocument doc = (TextDocument)ActiveMdiChild;
doc.richTextBox1.Text = text;
}

private void Exit()
{
Dispose();
}

#endregion

To the TextDocument form, add a SaveFileDialog control, a MainMenu control, and a ContextMenuStripcontrol (set the ContextMenuStrip property of richTextBox1 to this new ContextMenuStrip). Set the SaveFileDialog's defaultExt property to "txt" and its Filter property to "Text File | *.txt". Add "Cut", "Copy", "Paste", and "Delete" items to your ContextMenuStrip. Add a "File" menu item to your MainMenu, and then "Save", Save As", and "Close" menu items to the "File" menu item. Set the MergeType for "File" to "MergeItems". Set the MergeType properties of "Save", "Save As" and "Close" to "Add", and their MergeOrder properties to 1. This creates a nice effect in which the File menu of the child MDI form merges with the parent File menu.在TextDocument窗体中,添加一个SaveFileDialog控件,一个MainMenu控件和一个ContextMenuStrip控件(将ContextMenuStrip属性设置richTextBox1为this new ContextMenuStrip)。将SaveFileDialog's defaultExt属性设置为“ txt”,将其Filter属性设置为“ Text File | *.txt”。添加“剪切”,“复制”,“粘贴”和“删除”项目到您的ContextMenuStrip。在“文件”菜单项中添加“文件”菜单项MainMenu,然后“ 保存 ”,“ 另存为 ”和“ 关闭 ”菜单项。将MergeType“文件” MergeItems设置为“ ”。设置MergeType“保存”的属性“,”另存为“和”关闭“到”添加“,以及他们的MergeOrder 属性为1.这将创建一个很好的效果,其中子MDI窗体的文件菜单与父文件菜单合并。

The following methods will be called by the handlers for each of these menu items: Save(), SaveAs(), CloseDocument(), Cut(), Copy(), Paste(), Delete(), and InsertText(). Please note that the last five methods are scoped as internal, so they can be called by the parent form. This will be particularly important as we move on to the Speechpad project.下面的方法将被处理为这些菜单项称为Save(),SaveAs(),CloseDocument(),Cut(),Copy(),Paste(),Delete(),和InsertText()。请注意,最后五个方法的范围是internal,因此可以通过父窗体调用它们。随着我们进入该Speechpad项目,这将尤为重要。

#region Document File Operations

private void SaveAs(string fileName)
{
try
{
saveFileDialog1.FileName = fileName;
DialogResult dr = saveFileDialog1.ShowDialog();
if (dr == DialogResult.Cancel)
{
return;
}
string saveFileName = saveFileDialog1.FileName;
Save(saveFileName);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}

private void SaveAs()
{
string fileName = this.Text;
SaveAs(fileName);
}

internal void Save()
{
string fileName = this.Text;
Save(fileName);
}

private void Save(string fileName)
{
string text = this.richTextBox1.Text;
Save(fileName, text);
}

private void Save(string fileName, string text)
{
try
{
using (StreamWriter sw = new StreamWriter(fileName, false))
{
sw.Write(text);
sw.Flush();
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}

private void CloseDocument()
{
Dispose();
}

internal void Paste()
{
try
{
IDataObject data = Clipboard.GetDataObject();
if (data.GetDataPresent(DataFormats.Text))
{
InsertText(data.GetData(DataFormats.Text).ToString());
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}

internal void InsertText(string text)
{
RichTextBox theBox = richTextBox1;
theBox.SelectedText = text;
}

internal void Copy()
{
try
{
RichTextBox theBox = richTextBox1;
Clipboard.Clear();
Clipboard.SetDataObject(theBox.SelectedText);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}

internal void Cut()
{
Copy();
Delete();
}

internal void Delete()
{
richTextBox1.SelectedText = string.Empty;
}

#endregion

Once you hook up your menu item event handlers to the methods listed above, you should have a rather nice text pad application. With our base prepared, we are now in a position to start building some SR features.将菜单项事件处理程序连接到上面列出的方法后,您应该有一个相当不错的文本板应用程序。随着我们的基础准备,我们现在可以开始构建一些SR功能。

Speechpad

Add a reference to the System.Speech assembly to your project. You should be able to find it in C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.0\. Add using declarations for System.Speech, System.Speech.Recognition, and System.Speech.Synthesis to your Main form. The top of your Main.csfile should now look something like this:将System.Speech程序集的引用添加到项目中。您应该能够在C:\ Program Files \ Reference Assemblies \ Microsoft \ Framework \ v3.0 \中找到它。添加using的声明System.Speech,System.Speech.Recognition以及System.Speech.Synthesis到您的Main形式。Main.cs文件的顶部现在应该如下所示:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.IO;
using System.Speech;
using System.Speech.Synthesis;
using System.Speech.Recognition;

In design view, add two new menu items to the main menu in your Main form labeled "Select Voice" and "Speech". For easy reference, name the first item selectVoiceMenuItem. We will use the "Select Voice" menu to programmatically list the synthetic voices that are available for reading Speechpad documents. To programmatically list out all the synthetic voices, use the following three methods found in the code sample below. LoadSelectVoiceMenu() loops through all voices that are installed on the operating system and creates a new menu item for each. VoiceMenuItem_Click() is simply a handler that passes the click event on to the SelectVoice() method. SelectVoice() handles the toggling of the voices we have added to the "Select Voice" menu. Whenever a voice is selected, all others are deselected. If all voices are deselected, then we default to the first one.在设计视图中,在Main标有“ 选择语音 ”和“ 语音 ”的窗体中将两个新菜单项添加到主菜单。为便于参考,请为第一个项目命名selectVoiceMenuItem。我们将使用“选择语音”菜单以编程方式列出可用于阅读Speechpad文档的合成语音。要以编程方式列出所有合成语音,请使用以下代码示例中的以下三种方法。LoadSelectVoiceMenu()循环遍历操作系统上安装的所有语音,并为每个语音创建一个新的菜单项。VoiceMenuItem_Click()只是一个将click事件传递给SelectVoice()方法的处理程序。SelectVoice()处理我们添加到“选择语音”菜单中的语音切换。每当选择一个声音时,取消选择所有其他声音。如果取消选择所有音色,则默认为第一个。

Now that we have gotten this far, I should mention that all this trouble is a little silly if there is only one synthetic voice available, as there is when you first install Vista. Her name is Microsoft Anna, by the way. If you have Vista Ultimate or Vista Enterprise, you can use the Vista Updater to download an additional voice, named Microsoft Lila, which is contained in the Simple Chinese MUI. She has a bit of an accent, but I am coming to find it rather charming. If you don't have one of the high-end flavors of Vista, however, you might consider leaving the voice selection code out of your project.现在我们已经走到了这一步,我应该提一下,如果只有一个合成语音可用,那么所有这些麻烦都有点愚蠢,就像你第一次安装Vista一样。顺便说一句,她的名字是微软安娜。如果您使用的是Vista Ultimate或Vista Enterprise,则可以使用Vista更新程序下载另一个名为Microsoft Lila的语音,该语音包含在Simple Chinese MUI中。她有一点口音,但我发现它很有魅力。但是,如果您没有Vista的高端版本,则可以考虑将语音选择代码留在项目之外。

private void LoadSelectVoiceMenu()
{
foreach (InstalledVoice voice in synthesizer.GetInstalledVoices())
{
MenuItem voiceMenuItem = new MenuItem(voice.VoiceInfo.Name);
voiceMenuItem.RadioCheck = true;
voiceMenuItem.Click += new EventHandler(voiceMenuItem_Click);
this.selectVoiceMenuItem.MenuItems.Add(voiceMenuItem);
}
if (this.selectVoiceMenuItem.MenuItems.Count > 0)
{
this.selectVoiceMenuItem.MenuItems[0].Checked = true;
selectedVoice = this.selectVoiceMenuItem.MenuItems[0].Text;
}
}

private void voiceMenuItem_Click(object sender, EventArgs e)
{
SelectVoice(sender);
}

private void SelectVoice(object sender)
{
MenuItem mi = sender as MenuItem;
if (mi != null)
{
//toggle checked value
mi.Checked = !mi.Checked;

if (mi.Checked)
{
//set selectedVoice variable
selectedVoice = mi.Text;
//clear all other checked items
foreach (MenuItem voiceMi in this.selectVoiceMenuItem.MenuItems)
{
if (!voiceMi.Equals(mi))
{
voiceMi.Checked = false;
}
}
}
else
{
//if deselecting, make first value checked,
//so there is always a default value
this.selectVoiceMenuItem.MenuItems[0].Checked = true;
}
}
}

We have not declared the selectedVoice class level variable yet (your Intellisense may have complained about it), so the next step is to do just that. While we are at it, we will also declare a private instance of the System.Speech.Synthesis.SpeechSynthesizer class and initialize it, along with a call to the LoadSelectVoiceMenu() method from above, in your constructor:我们还没有声明selectedVoice类级变量(你的Intellisense可能会抱怨它),所以下一步就是这样做。当我们在它的时候,我们还将声明一个类的private实例System.Speech.Synthesis.SpeechSynthesizer并在构造函数中初始化它,同时调用上面的LoadSelectVoiceMenu()方法:

#region Local Members

private SpeechSynthesizer synthesizer = null;
private string selectedVoice = string.Empty;

#endregion

public Main()
{
InitializeComponent();
synthesizer = new SpeechSynthesizer();
LoadSelectVoiceMenu();
}

To allow the user to utilize the speech synthesizer, we will add two new menu items under the "Speech" menu labeled "Read Selected Text" and "Read Document". In truth, there isn't really much to using the Vista speech synthesizer. All we do is pass a text string to our local SpeechSynthesizer object and let the operating system do the rest. Hook up event handlers for the click events of these two menu items to the following methods and you will be up and running with an SR enabled application:为了允许用户使用语音合成器,我们将在“语音”菜单下添加两个新菜单项,标记为“ 读取所选文本 ”和“ 读取文档”。事实上,使用Vista语音合成器并不是很多。我们所做的就是将一个文本字符串传递给我们的本地SpeechSynthesizer对象,让操作系统完成剩下的工作。将这两个菜单项的单击事件的事件处理程序连接到以下方法,您将启动并运行启用SR的应用程序:

#region Speech Synthesizer Commands

private void ReadSelectedText()
{
TextDocument doc = ActiveMdiChild as TextDocument;
if (doc != null)
{
RichTextBox textBox = doc.richTextBox1;
if (textBox != null)
{
string speakText = textBox.SelectedText;
ReadAloud(speakText);
}
}
}

private void ReadDocument()
{
TextDocument doc = ActiveMdiChild as TextDocument;
if (doc != null)
{
RichTextBox textBox = doc.richTextBox1;
if (textBox != null)
{
string speakText = textBox.Text;
ReadAloud(speakText);
}
}
}

private void ReadAloud(string speakText)
{
try
{
SetVoice();
synthesizer.Speak(speakText);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}

}

private void SetVoice()
{
try
{
synthesizer.SelectVoice(selectedVoice);
}
catch (Exception)
{
MessageBox.Show(selectedVoice + " is not available.");
}
}

#endregion

Playing with the speech synthesizer is a lot of fun for about five minutes (ten if you have both Microsoft Anna and Microsoft Lila to work with) -- but after typing "Hello World" into your Speechpad document for the umpteenth time, you may want to do something a bit more challenging. If you do, then it is time to plug in your expensive microphone, since speech recognition really works best with a good expensive microphone. If you don't have one, however, then go ahead and plug in a cheap microphone. My cheap microphone seems to work fine. If you don't have a cheap microphone, either, I have heard that you can take a speaker and plug it into the mic jack of your computer, and if that doesn't cause an explosion, you can try talking into it.使用语音合成器很有趣,大约五分钟(如果您同时使用Microsoft Anna和Microsoft Lila,可以使用10个) - 但是在您的Speechpad文档中无数次输入“Hello World”后,您可能希望做一些更具挑战性的事情。如果你这样做,那么现在是时候插入昂贵的麦克风,因为语音识别确实最适合使用昂贵的麦克风。但是,如果您没有,请继续插入便宜的麦克风。我便宜的麦克风似乎工作正常。如果你没有便宜的麦克风,我听说你可以拿一个扬声器并将它插入电脑的麦克风插孔,如果这不会引起爆炸,你可以尝试对话。

While speech synthesis may be useful for certain specialized applications, voice commands, by cantrast, are a feature that can be used to enrich any current WinForms application. With the SR Managed API, it is also easy to implement once you understand certain concepts such as the Grammar class and the SpeechRecognitionEngine.

We will begin by declaring a local instance of the speech engine and initializing it.

虽然语音合成可能对某些特定应用程序有用,但语音命令(cantrast)是一种可用于丰富任何当前WinForms应用程序的功能。使用SR Managed API,一旦理解了Grammar类和类等特定概念,它也很容易实现SpeechRecognitionEngine。

我们将首先声明语音引擎的本地实例并初始化它。


#region Local Members

private SpeechSynthesizer synthesizer = null;
private string selectedVoice = string.Empty;
private SpeechRecognitionEngine recognizer = null;

#endregion

public Main()
{
InitializeComponent();
synthesizer = new SpeechSynthesizer();
LoadSelectVoiceMenu();
recognizer = new SpeechRecognitionEngine();
InitializeSpeechRecognitionEngine();
}

private void InitializeSpeechRecognitionEngine()
{
recognizer.SetInputToDefaultAudioDevice();
Grammar customGrammar = CreateCustomGrammar();
recognizer.UnloadAllGrammars();
recognizer.LoadGrammar(customGrammar);
recognizer.SpeechRecognized +=
new EventHandler<SpeechRecognizedEventArgs>(recognizer_SpeechRecognized);
recognizer.SpeechHypothesized +=
new EventHandler<SpeechHypothesizedEventArgs>
(recognizer_SpeechHypothesized);
}

private Grammar CreateCustomGrammar()
{
GrammarBuilder grammarBuilder = new GrammarBuilder();
grammarBuilder.Append(new Choices("cut", "copy", "paste", "delete"));
return new Grammar(grammarBuilder);
}

The speech recognition engine is the main workhorse of the speech recognition functionality. At one end, we configure the input device that the engine will listen on. In this case, we use the default device (whatever you have plugged in), though we can also select other inputs, such as specific wave files. At the other end, we capture two events thrown by our speech recognition engine. As the engine attempts to interpret the incoming sound stream, it will throw various "hypotheses" about what it thinks is the correct rendering of the speech input. When it finally determines the correct value, and matches it to a value in the associated grammar objects, it throws a speech recognized event, rather than a speech hypothesized event. If the determined word or phrase does not have a match in any associated grammar, a speech recognition rejected event (which we do not use in the present project) will be thrown instead.语音识别引擎是语音识别功能的主要工具。一方面,我们配置引擎将监听的输入设备。在这种情况下,我们使用默认设备(无论您插入什么),但我们也可以选择其他输入,例如特定的波形文件。另一方面,我们捕获了语音识别引擎抛出的两个事件。当引擎试图解释传入的声音流时,它会抛出关于它认为正确渲染语音输入的各种“假设”。当它最终确定正确的值并将其与相关语法对象中的值匹配时,它会抛出语音识别事件,而不是语音假设事件。如果确定的单词或短语在任何相关语法中没有匹配,则抛出拒绝事件(本项目里没用)。

In between, we set up rules to determine which words and phrases will throw a speech recognized event by configuring a Grammar object and associating it with our instance of the speech recognition engine. In the sample code above, we configure a very simple rule which states that a speech recognized event will be thrown if any of the following words: "cut", "copy", "paste", and "delete", is uttered. Note that we use a GrammarBuilderclass to construct our custom grammar, and that the syntax of the GrammarBuilder class closely resembles the syntax of the StringBuilder class.在这两者之间,我们通过配置Grammar对象并将其与我们的语音识别引擎实例相关联来设置规则以确定哪些单词和短语将引发语音识别事件。在上面的示例代码中,我们配置了一个非常简单的规则,该规则声明如果发出以下任何单词:“ cut ”,“ copy ”,“ paste ”和“ delete ”,将抛出语音识别事件。请注意,我们使用一个GrammarBuilder类来构造我们的自定义语法,并且GrammarBuilder该类的语法非常类似于该类的语法StringBuilder。

This is the basic code for enabling voice commands for a WinForms application. We will now enhance the Speechpad application by adding a menu item to turn speech recognition on and off, a status bar so we can watch as the speech recognition engine interprets our words, and a function that will determine what action to take if one of our key words is captured by the engine.这是为WinForms应用程序启用语音命令的基本代码。我们现在将Speechpad通过添加一个菜单项来打开和关闭语音识别来增强应用程序,一个状态栏,以便我们可以在语音识别引擎解释我们的单词时观看,以及一个功能,如果我们的一个关键单词由引擎捕获,我们将确定要采取的操作。

Add a new menu item labeled "Speech Recognition" under the "Speech" menu item, below "Read Selected Text" and "Read Document". For convenience, name it speechRecognitionMenuItem. Add a handler to the new menu item, and use the following code to turn speech recognition on and off, as well as toggle the speech recognition menu item. Besides the RecognizeAsync() method that we use here, it is also possible to start the engine synchronously or, by passing it a RecognizeMode.Single parameter, cause the engine to stop after the first phrase it recognizes. The method we use to stop the engine, RecognizeAsyncStop(), is basically a polite way to stop the engine, since it will wait for the engine to finish any phrases it is currently processing before quitting. An impolite method, RecognizeAsyncCancel(), is also available -- to be used in emergency situations, perhaps.在“语音”菜单项下的“读取所选文本”和“读取文档”下添加一个标记为“ 语音识别 ” 的新菜单项。为方便起见,请将其命名speechRecognitionMenuItem。在新菜单项中添加处理程序,并使用以下代码打开和关闭语音识别,以及切换语音识别菜单项。除了RecognizeAsync()我们在这里使用的方法之外,还可以同步启动引擎,或者通过传递RecognizeMode.Single参数,使引擎在它识别的第一个短语之后停止。我们用来停止引擎的方法RecognizeAsyncStop()基本上是一种停止引擎的礼貌方式,因为它会等待引擎完成当前正在处理的任何短语,然后再退出。不礼貌的方法,RecognizeAsyncCancel(),也可用 - 在紧急情况下使用。

private void speechRecognitionMenuItem_Click(object sender, EventArgs e)
{
if (this.speechRecognitionMenuItem.Checked)
{
TurnSpeechRecognitionOff();
}
else
{
TurnSpeechRecognitionOn();
}
}

private void TurnSpeechRecognitionOn()
{
recognizer.RecognizeAsync(RecognizeMode.Multiple);
this.speechRecognitionMenuItem.Checked = true;
}

private void TurnSpeechRecognitionOff()
{
if (recognizer != null)
{
recognizer.RecognizeAsyncStop();
this.speechRecognitionMenuItem.Checked = false;
}
}

We are actually going to use the RecognizeAsyncCancel() method now, since there is an emergency situation. The speech synthesizer, it turns out, cannot operate if the speech recognizer is still running. To get around this, we will need to disable the speech recognizer at the last possible moment, and then reactivate it once the synthesizer has completed its tasks. We will modify the ReadAloud() method to handle this.由于存在紧急情况,我们现在实际上将使用该RecognizeAsyncCancel()方法。事实证明,如果语音识别器仍在运行,则语音合成器无法运行。为了解决这个问题,我们需要在最后一刻禁用语音识别器,然后在合成器完成任务后重新激活它。我们将修改ReadAloud()方法来处理这个问题。

private void ReadAloud(string speakText)
{
try
{
SetVoice();
recognizer.RecognizeAsyncCancel();
synthesizer.Speak(speakText);
recognizer.RecognizeAsync(RecognizeMode.Multiple);
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}

The user now has the ability to turn speech recognition on and off. We can make the application more interesting by capturing the speech hypothesize event and displaying the results to a status bar on the Main form. Add a StatusStrip control to the Main form, and a ToolStripStatusLabel to the StatusStrip with its Springproperty set to true. For convenience, call this label toolStripStatusLabel1. Use the following code to handle the speech hypothesized event and display the results:用户现在可以打开和关闭语音识别。我们可以通过捕获语音假设事件并将结果显示到Main表单上的状态栏来使应用程序更有趣。添加StatusStrip控件到Main窗体,并且ToolStripStatusLabel给StatusStrip其Spring属性设置为true。为方便起见,请调用此标签toolStripStatusLabel1。使用以下代码处理语音假设事件并显示结果:

private void recognizer_SpeechHypothesized(object sender,
SpeechHypothesizedEventArgs e)
{
GuessText(e.Result.Text);
}

private void GuessText(string guess)
{
toolStripStatusLabel1.Text = guess;
this.toolStripStatusLabel1.ForeColor = Color.DarkSalmon;
}

Now that we can turn speech recognition on and off, as well as capture misinterpretations of the input stream, it is time to capture the speech recognized event and do something with it. The SpeechToAction() method will evaluate the recognized text and then call the appropriate method in the child form (these methods are accessible because we scoped them internal in the Textpad code above). In addition, we display the recognized text in the status bar, just as we did with hypothesized text, but in a different color in order to distinguish the two events.现在我们可以打开和关闭语音识别,以及捕获输入流的误解,现在是时候捕获语音识别事件并用它做一些事情。该SpeechToAction()方法将评估已识别的文本,然后在子窗体中调用适当的方法(这些方法是可访问的,因为我们internal在Textpad上面的代码中限定了它们)。此外,我们在状态栏中显示已识别的文本,就像我们对假设文本所做的那样,但是以不同的颜色来区分这两个事件。

private void recognizer_SpeechRecognized(object sender,
SpeechRecognizedEventArgs e)
{
string text = e.Result.Text;
SpeechToAction(text);
}

private void SpeechToAction(string text)
{
TextDocument document = ActiveMdiChild as TextDocument;
if (document != null)
{
DetermineText(text);

switch (text)
{
case "cut":
document.Cut();
break;
case "copy":
document.Copy();
break;
case "paste":
document.Paste();
break;
case "delete":
document.Delete();
break;
}
}
}

private void DetermineText(string text)
{
this.toolStripStatusLabel1.Text = text;
this.toolStripStatusLabel1.ForeColor = Color.SteelBlue;
}

Now let's take Speechpad for a spin. Fire up the application and, if it compiles, create a new document. Type "Hello world." So far, so good. Turn on speech recognition by selecting the Speech Recognition item under the Speech menu. Highlight "Hello" and say the following phrase into your expensive microphone, inexpensive microphone, or speaker: delete. Now type "Save the cheerleader, save the". Not bad at all.

Voice command technology, as exemplified above, is probably the most useful and most easy to implement aspect of the Speech Recognition functionality provided by Vista. In a few days of work, any current application can be enabled to use it, and the potential for streamlining workflow and making it more efficient is truly breathtaking. The cool factor, of course, is also very high.

现在让我们来看看Speechpad。启动应用程序,如果编译,则创建一个新文档。输入“Hello world”。到现在为止还挺好。通过选择“语音”菜单下的“语音识别”项打开语音识别。突出显示“你好”并将以下短语说到昂贵的麦克风,便宜的麦克风或扬声器:删除。现在输入“保存啦啦队员,保存”。一点也不差。

如上所示,语音命令技术可能是Vista提供的语音识别功能中最有用和最容易实现的方面。在几天的工作中,任何当前的应用程序都可以启用它,并且简化工作流程并使其更高效的潜力真正令人叹为观止。当然,很酷的因素也非常高。

Having grown up watching Star Trek reruns, however, I can't help but feel that the dictation functionality is much more interesting. Computers are meant to be talked to and told what to do, not cajoled into doing tricks for us based on finger motions over a typewriter. My long-term goal is to be able to code by talking into my IDE in order to build UML diagrams and then, at a word, turn that into an application. What a brave new world that will be. Toward that end, the SR managed API provides the DictationGrammar class.

Whereas the Grammar class works as a gatekeeper, restricting the phrases that get through to the speech recognized handler down to a select set of rules, the DictateGrammar class, by default, kicks out the jams and lets all phrases through to the recognized handler.

In order to make Speechpad a dictation application, we will add the default DicatateGrammar object to the list of grammars used by our speech recognition engine. We will also add a toggle menu item to turn dictation on and off. Finally, we will alter the SpeechToAction() method in order to insert any phrases that are not voice commands into the current Speechpad document as text.

Begin by creating a local instance of DictateGrammar for our Main form, and then instantiate it in the Mainconstructor. Your code should look like this:

#region Local Members

private SpeechSynthesizer synthesizer = null;
private string selectedVoice = string.Empty;
private SpeechRecognitionEngine recognizer = null;
private DictationGrammar dictationGrammar = null;

#endregion

public Main()
{
InitializeComponent();
synthesizer = new SpeechSynthesizer();
LoadSelectVoiceMenu();
recognizer = new SpeechRecognitionEngine();
InitializeSpeechRecognitionEngine();
dictationGrammar = new DictationGrammar();
}

Create a new menu item under the Speech menu and label it "Take Dictation". Name it takeDictationMenuItem for convenience. Add a handler for the click event of the new menu item, and stub out TurnDictationOn() and TurnDictationOff() methods. TurnDictationOn() works by loading the local dictationGrammar object into the speech recognition engine. It also needs to turn speech recognition on if it is currently off, since dictation will not work if the speech recognition engine is disabled. TurnDictationOff() simply removes the local dictationGrammar object from the speech recognition engine's list of grammars.

private void takeDictationMenuItem_Click(object sender, EventArgs e)
{
if (this.takeDictationMenuItem.Checked)
{
TurnDictationOff();
}
else
{
TurnDictationOn();
}
}

private void TurnDictationOn()
{
if (!speechRecognitionMenuItem.Checked)
{
TurnSpeechRecognitionOn();
}
recognizer.LoadGrammar(dictationGrammar);
takeDictationMenuItem.Checked = true;
}

private void TurnDictationOff()
{
if (dictationGrammar != null)
{
recognizer.UnloadGrammar(dictationGrammar);
}
takeDictationMenuItem.Checked = false;
}

For an extra touch of elegance, alter the TurnSpeechRecognitionOff() by adding a line of code to turn off dictation when speech recognition is disabled:

TurnDictationOff();

Finally, we need to update the SpeechToAction() method so it will insert any text that is not a voice command into the current Speechpad document. Use the default statement of the switch control block to call the InsertText() method of the current document.

private void SpeechToAction(string text)
{
TextDocument document = ActiveMdiChild as TextDocument;
if (document != null)
{
DetermineText(text);
switch (text)
{
case "cut":
document.Cut();
break;
case "copy":
document.Copy();
break;
case "paste":
document.Paste();
break;
case "delete":
document.Delete();
break;
default:
document.InsertText(text);
break;
}
}
}

With that, we complete the speech recognition functionality for Speechpad. Now try it out. Open a new Speechpad document and type "Hello World." Turn on speech recognition. Select "Hello" and say delete. Turn on dictation. Say brave new.

This tutorial has demonstrated the essential code required to use speech synthesis, voice commands, and dictation in your .NET 2.0 Vista applications. It can serve as the basis for building speech recognition tools that take advantage of default as well as custom grammar rules to build adanced application interfaces. Besides the strange compatibility issues between Vista and Visual Studio, at the moment the greatest hurdle to using the Vista managed speech recognition API is the remarkable dearth of documentation and samples. This tutorial is intended to help alleviate that problem by providing a hands on introduction to these new tools.


友情链接
版权所有 Copyright(c)2004-2024 锐英源软件
统一社会信用代码:91410105098562502G 豫ICP备08007559号 最佳分辨率 1440*900
地址:郑州市金水区文化路97号郑州大学北区院内南门附近