精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
锐英源精品原创,禁止全文或局部转载,禁止任何形式的非法使用,侵权必究。锐英源软件对经典开源项目有大量翻译,翻译内容技术层次较高,对初学者有深究意义。有幸浏览到的朋友请关注头条号,可以获取最新更新。
语音识别处理的数据目标是音频,音频是一种特别的数据,如果单纯让程序员看原始数据肯定很难发现问题,如果用专业工具察看就事半功倍,Audacity就是这样的工具软件。Audacity功能包含察看、录制、轨道、生成、效果、分析和插件。生成是指一些模板成组常用的音频数据段加入到轨道里,是制作的主要功能。而效果是指在已有音频数据上制作美化的数据。分析是察看的高级功能,能够找出一些规律性的数据。通常来说,高级音频师和高级调音师要掌握生成和效果功能,普通用户要在导入和听上多下功夫。因为锐英源软件是把Audacity当做语音识别音频调试工具,所以不用生成和效果功能,但是这两个功能还是要向大家推荐,因为它们是Audacity的精华。
在语音识别平台时,识别要正确,要保证输入的数据是否正确,判断正确可以用Audacity听和看,看当然是波形数据了。输入的数据可以预先保存为pcm文件,然后就可以用Audacity导入原始数据方式来进行导入,导入菜单在文件菜单下面,点击导入-》原始数据菜单,看到哪下界面:
编码是指一个音频数据项的表示方式,就像整数有不同类型一样;字节序是整数的特别要求,值大的部分按字节顺序看是放前还是放后,不同的放置顺序就有不同的值,所以要指明;声道是音频特别数据,简单来说,双耳就是双声道,但是计算机能够实现单声道效果;开始偏移是文件范畴的偏移。总计要导入是指文件内数据导入多少部分;采样率是指每秒采集多少数据项,这个最重要,如果和数据实际不符,则会看到奇怪波形。
从上面介绍可以知道,音频数据就是很多数值组成,有正的有负的,所以音频数据可以用柱状图这样的图来表示,如下图:
图里的1.0和-1.0不是实际数据,是比例数据。左上角的X号按钮可以关闭此轨道,Audacity可以组织多个轨道在一起,如果各个轨道是有规律的,则整体就是一个美妙的音乐。这样的波形可以放大缩小,更有利于观看波形时间。
Audacity是开源软件,可以自己下载源代码进行编译安装,支持不同操作系统,这里翻译了一个文章给大家,原文链接是:https://aosabook.org/en/audacity.html。想通过研究开源提升能力的朋友,这里给几个提示:PortAudio里的环形缓冲、库组织成大软件、主EXE怎么和脚本集成、大软件内库之间接口。
Audacity is a popular sound recorder and audio editor. It is a capable program while still being easy to use. The majority of users are on Windows but the same Audacity source code compiles to run on Linux and Mac too.
Dominic Mazzoni wrote the original version of Audacity in 1999 while he was a research student at Carnegie Mellon University. Dominic wanted to create a platform on which to develop and debug audio processing algorithms. The software grew to become useful in its own right in many other ways. Once Audacity was released as open source software, it attracted other developers. A small, gradually-changing team of enthusiasts have modified, maintained, tested, updated, written documentation for, helped users with, and translated Audacity's interface into other languages over the years.
One goal is that its user interface should be discoverable: people should be able to sit down without a manual and start using it right away, gradually discovering its features. This principle has been crucial in giving Audacity greater consistency to the user interface than there otherwise would be. For a project in which many people have a hand this kind of unifying principle is more important than it might seem at first.
It would be good if the architecture of Audacity had a similar guiding principle, a similar kind of discoverability. The closest we have to that is "try and be consistent". When adding new code, developers try to follow the style and conventions of code nearby. In practice, though, the Audacity code base is a mix of well-structured and less well-structured code. Rather than an overall architecture the analogy of a small city is better: there are some impressive buildings but you will also find run-down neighborhoods that are more like a shanty town.
Audacity 是一款流行的录音机和音频编辑器。它是一个功能强大的程序,同时仍然易于使用。大多数用户使用 Windows,但相同的 Audacity 源代码也可以编译为在 Linux 和 Mac 上运行。
多米尼克·马佐尼 (Dominic Mazzoni) 于 1999 年编写了 Audacity 的原始版本,当时他还是卡内基梅隆大学的一名研究生。Dominic 希望创建一个平台来开发和调试音频处理算法。该软件本身在许多其他方面变得有用。Audacity 作为开源软件发布后,就吸引了其他开发人员。多年来,一个逐渐变化的小型爱好者团队修改、维护、测试、更新、编写文档、帮助用户并将 Audacity 的界面翻译成其他语言。
一个目标是它的用户界面应该是可发现的:人们应该能够在没有手册的情况下坐下来立即开始使用它,逐渐发现它的功能。这一原则对于使 Audacity 的用户界面具有比其他方式更高的一致性至关重要。对于一个有很多人参与的项目,这种统一原则比起初看起来更重要。
如果 Audacity 的架构有类似的指导原则,类似的可发现性,那就太好了。我们最接近的是“尝试并保持一致”。添加新代码时,开发人员尝试遵循附近代码的样式和约定。但实际上,Audacity 代码库是结构良好和结构较差的代码的混合体。与整体建筑相比,小城市的类比更好:有一些令人印象深刻的建筑,但您也会发现破旧的社区更像是棚户区。
Audacity is layered upon several libraries. While most new programming in Audacity code doesn't require a detailed knowledge of exactly what is going on in these libraries, familiarity with their APIs and what they do is important. The two most important libraries are PortAudio which provides a low-level audio interface in a cross-platform way, and wxWidgets which provides GUI components in a cross-platform way.
When reading Audacity's code, it helps to realize that only a fraction of the code is essential. Libraries contribute a lot of optional features—though people who use those features might not consider them optional. For example, as well as having its own built-in audio effects, Audacity supports LADSPA (Linux Audio Developer's Simple Plugin API) for dynamically loadable plugin audio effects. The VAMP API in Audacity does the same thing for plugins that analyze audio. Without these APIs, Audacity would be less feature-rich, but it does not absolutely depend on these features.
Other optional libraries used by Audacity are libFLAC, libogg, and libvorbis. These provide various audio compression formats. MP3 format is catered for by dynamically loading the LAME or FFmpeg library. Licensing restrictions prevent these very popular compression libraries from being built-in.
Licensing is behind some other decisions about Audacity libraries and structure. For example, support for VST plugins is not built in because of licensing restrictions. We would also like to use the very efficient FFTW fast Fourier transform code in some of our code. However, we only provide that as an option for people who compile Audacity themselves, and instead fall back to a slightly slower version in our normal builds. As long as Audacity accepts plugins, it can be and has been argued that Audacity cannot use FFTW. FFTW's authors do not want their code to be available as a general service to arbitrary other code. So, the architectural decision to support plugins leads to a trade-off in what we can offer. It makes LADSPA plugins possible but bars us from using FFTW in our pre-built executables.
Architecture is also shaped by considerations of how best to use our scarce developer time. With a small team of developers, we do not have the resources to do, for example, the in-depth analysis of security loopholes that teams working on Firefox and Thunderbird do. However, we do not want Audacity to provide a route to bypass a firewall, so we have a rule not to have TCP/IP connections to or from Audacity at all. Avoiding TCP/IP cuts out many security concerns. The awareness of our limited resources leads us to better design. It helps us cut features that would cost us too much in developer time and focus on what is essential.
A similar concern for developers' time applies to scripting languages. We want scripting, but the code implementing the languages does not need to be in Audacity. It does not make sense to compile copies of each scripting language into Audacity to give users all the choices they could want.1 We have instead implemented scripting with a single plugin module and a pipe, which we will cover later.
Audacity 位于多个库之上。尽管 Audacity 代码中的大多数新编程不需要详细了解这些库中究竟发生了什么,但熟悉它们的 API 以及它们的作用很重要。两个最重要的库是以跨平台方式提供低级音频接口的 PortAudio 和以跨平台方式提供 GUI 组件的 wxWidgets。
在阅读 Audacity 的代码时,它有助于意识到只有一小部分代码是必不可少的。库提供了许多可选的特性——尽管使用这些特性的人可能不会认为它们是可选的。例如,除了拥有自己的内置音频效果外,Audacity 还支持 LADSPA(Linux 音频开发人员的简单插件 API),用于动态加载插件音频效果。Audacity 中的 VAMP API 为分析音频的插件做同样的事情。如果没有这些 API,Audacity 的功能就不那么丰富了,但它并不绝对依赖于这些功能。
Audacity 使用的其他可选库是 libFLAC、libogg 和 libvorbis。这些提供各种音频压缩格式。MP3 格式是通过动态加载 LAME 或 FFmpeg 库来满足的。许可限制阻止内置这些非常流行的压缩库。
许可是有关 Audacity 库和结构的其他一些决定的背后。例如,由于许可限制,没有内置对 VST 插件的支持。我们还想在我们的一些代码中使用非常高效的 FFTW 快速傅立叶变换代码。然而,我们只为那些自己编译 Audacity 的人提供这个选项,而不是在我们的正常构建中回退到稍微慢一点的版本。只要 Audacity 接受插件,就可以并且一直争论 Audacity 不能使用 FFTW。FFTW 的作者不希望他们的代码作为对任意其他代码的一般服务可用。因此,支持插件的架构决策导致了我们可以提供的权衡。它使 LADSPA 插件成为可能,但禁止我们在预构建的可执行文件中使用 FFTW。
架构也取决于如何最好地利用我们稀缺的开发人员时间的考虑。我们的开发团队很小,我们没有资源做,例如深入分析 Firefox 和 Thunderbird 上的团队所做的安全漏洞。但是,我们不希望 Audacity 提供绕过防火墙的路由,因此我们有一条规则,根本不与 Audacity 建立 TCP/IP 连接。避免使用 TCP/IP 可以消除许多安全问题。对我们有限资源的认识使我们能够进行更好的设计。它帮助我们削减了会花费我们太多开发时间的功能,并专注于重要的事情。
对开发人员时间的类似关注也适用于脚本语言。我们想要脚本,但实现这些语言的代码不需要在 Audacity 中。将每种脚本语言的副本编译成 Audacity 来为用户提供他们想要的所有选择是没有意义的。1 我们使用单个插件模块和管道实现了脚本,稍后将介绍。
Figure 2.1: Layers in Audacity
Figure 2.1 shows some layers and modules in Audacity. The diagram highlights three important classes within wxWidgets, each of which has a reflection in Audacity. We're building higher-level abstractions from related lower-level ones. For example, the BlockFile system is a reflection of and is built on wxWidgets' wxFiles. It might, at some stage, make sense to split out BlockFiles, ShuttleGUI, and command handling into an intermediate library in their own right. This would encourage us to make them more general.
Lower down in the diagram is a narrow strip for "Platform Specific Implementation Layers." Both wxWidgets and PortAudio are OS abstraction layers. Both contain conditional code that chooses between different implementations depending on the target platform.
The "Other Supporting Libraries" category includes a wide collection of libraries. Interestingly quite a few of these rely on dynamically loaded modules. Those dynamic modules know nothing of wxWidgets.
On the Windows platform we used to compile Audacity as a single monolithic executable with wxWidgets and Audacity application code in the same executable. In 2008 we changed over to using a modular structure with wxWidgets as a separate DLL. This is to allow additional optional DLLs to be loaded at run time where those DLLs directly use features of wxWidgets. Plugins that plug in above the dotted line in the diagram can use wxWidgets.
The decision to use DLLs for wxWidgets has its downsides. The distribution is now larger, partly because many unused functions are provided in the DLLs that would previously have been optimized away. Audacity also takes longer to start up because each DLL is loaded separately. The advantages are considerable. We expect modules to have similar advantages for us as they do for Apache. As we see it, modules allow the core of Apache to be very stable while facilitating experimentation, special features and new ideas in the modules. Modules go a very long way to counteracting the temptation to fork a project to take it in a new direction. We think it's been a very important architectural change for us. We're expecting these advantages but have not seen them yet. Exposing the wxWidgets functions is only a first step and we have more to do to have a flexible modular system.
The structure of a program like Audacity clearly is not designed up front. It is something that develops over time. By and large the architecture we now have works well for us. We find ourselves fighting the architecture when we try to add features that affect many of the source files. For example, Audacity currently handles stereo and mono tracks in a special cased way. If you wanted to modify Audacity to handle surround sound you'd need to make changes in many classes in Audacity.
图 2.1展示了 Audacity 中的一些层和模块。该图突出显示了 wxWidgets 中的三个重要类,每个类在 Audacity 中都有一个反映。我们正在从相关的较低级别的抽象中构建更高级别的抽象。例如,BlockFile 系统是 wxWidgets 的 wxFiles 的反映并建立在 wxFiles 之上。在某个阶段,将 BlockFiles、ShuttleGUI 和命令处理拆分为一个独立的中间库可能是有意义的。这将鼓励我们使它们更通用。
图中下方是“平台特定实施层”的窄条。wxWidgets 和 PortAudio 都是操作系统抽象层。两者都包含根据目标平台在不同实现之间进行选择的条件代码。
“其他支持库”类别包括大量库。有趣的是,其中不少依赖于动态加载的模块。那些动态模块对 wxWidgets 一无所知。
在 Windows 平台上,我们曾经将 Audacity 编译为单个单体可执行文件,在同一可执行文件中包含 wxWidgets 和 Audacity 应用程序代码。2008 年,我们改为使用模块化结构,将 wxWidgets 作为单独的 DLL。这是为了允许在运行时加载额外的可选 DLL,这些 DLL 直接使用 wxWidgets 的功能。插在图中虚线上方的插件可以使用wxWidgets。
为 wxWidgets 使用 DLL 的决定有其缺点。分歧现在更大,部分原因是在 DLL 中提供了许多未使用的函数,而这些函数以前会被优化掉。Audacity 也需要更长的时间来启动,因为每个 DLL 都是单独加载的。优点是可观的。我们希望模块对我们具有与它们对 Apache 类似的优势。正如我们所见,模块允许 Apache 的核心非常稳定,同时促进模块中的实验、特殊功能和新想法。模块在很大程度上抵消了分叉项目以将其推向新方向的诱惑。我们认为这对我们来说是一个非常重要的架构变化。我们期待这些优势,但还没有看到它们。
像 Audacity 这样的程序结构显然不是预先设计好的。它是随着时间的推移而发展的。总的来说,我们现在拥有的架构非常适合我们。当我们尝试添加影响许多源文件的功能时,我们发现自己与架构作斗争。例如,Audacity 目前以一种特殊的方式处理立体声和单声道轨道。如果您想修改 Audacity 以处理环绕声,您需要在 Audacity 中的许多类中进行更改。
Going Beyond Stereo: The GetLink Story}
Audacity has never had an abstraction for number of channels. Instead the abstraction it uses is to link audio channels. There is a function GetLinkthat returns the other audio channel in a pair if there are two and that returns NULL if the track is mono. Code that uses GetLink typically looks exactly as if it were originally written for mono and later a test of (GetLink() != NULL) used to extend that code to handle stereo. I'm not sure it was actually written that way, but I suspect it. There's no looping using GetLink to iterate through all channels in a linked list. Drawing, mixing, reading and writing all contain a test for the stereo case rather than general code that can work for n channels where n is most likely to be one or two. To go for the more general code you'd need to make changes at around 100 of these calls to the GetLink function modifying at least 26 files.
It's easy to search the code to find GetLink calls and the changes needed are not that complex, so it is not as big a deal to fix this "problem" as it might sound at first. The GetLink story is not about a structural defect that is hard to fix. Rather it's illustrative of how a relatively small defect can travel into a lot of code, if allowed to.
With hindsight it would have been good to make the GetLink function private and instead provide an iterator to iterate through all channels in a track. This would have avoided much special case code for stereo, and at the same time made code that uses the list of audio channels agnostic with respect to the list implementation.
超越立体声:GetLink故事}
Audacity 从未对通道数量进行过抽象。相反,它使用的抽象是链接音频通道。有一个函数GetLink,如果有两个,则返回一对中的另一个音频通道,如果轨道是单声道,则返回 NULL。使用的代码GetLink通常看起来就像它最初是为单声道编写的,后来测试(GetLink() != NULL)用于扩展该代码以处理立体声。我不确定它实际上是这样写的,但我怀疑它。没有循环使用GetLink遍历链表中的所有通道。绘图、混合、读取和写入都包含对立体声情况的测试,而不是适用于 n 通道的通用代码,其中 n 最有可能是一两个。要获得更通用的代码,您需要GetLink在对至少 26 个文件的函数的大约 100 次调用中进行 更改。
搜索代码以查找GetLink调用很容易,并且所需的更改并不那么复杂,因此解决这个“问题”并不像一开始听起来那么大。这个GetLink故事不是关于一个难以修复的结构缺陷。相反,它说明了一个相对较小的缺陷如何传播到大量代码中,如果允许的话。
事后看来,最好将GetLink 函数设为私有,而是提供一个迭代器来遍历轨道中的所有通道。这将避免立体声的许多特殊情况代码,同时使使用音频通道列表的代码与列表实现无关。
The more modular design is likely to drive us towards better hiding of internal structure. As we define and extend an external API we'll need to look more closely at the functions we're providing. This will draw our attention to abstractions that we don't want to lock in to an external API.更模块化的设计可能会促使我们更好地隐藏内部结构。当我们定义和扩展外部 API 时,我们需要更仔细地查看我们提供的功能。这将引起我们对我们不想锁定到外部 API 的抽象的注意。
The most significant single library for Audacity user interface programmers is the wxWidgets GUI library, which provides such things as buttons, sliders, check boxes, windows and dialogs. It provides the most visible cross-platform behavior. The wxWidgets library has its own string class wxString, it has cross-platform abstractions for threads, filesystems, and fonts, and a mechanism for localization to other languages, all of which we use. We advise people new to Audacity development to first download wxWidgets and compile and experiment with some of the samples that come with that library. wxWidgets is a relatively thin layer on the underlying GUI objects provided by the operating system.
To build up complex dialogs wxWidgets provides not only individual widget elements but also sizers that control the elements' sizes and positions. This is a lot nicer than giving absolute fixed positions to graphical elements. If the widgets are resized either directly by the user or, say, by using a different font size, the positioning of the elements in the dialogs updates in a very natural way. Sizers are important for a cross-platform application. Without them we might have to have custom layouts of dialogs for each platform.
Often the design for these dialogs is in a resource file that is read by the program. However in Audacity we exclusively compile dialog designs into the program as a series of calls to wxWidgets functions. This provides maximum flexibility: that is, dialogs whose exact contents and behavior will be determined by application level code.
You could at one time find places in Audacity where the initial code for creating a GUI had clearly been code-generated using a graphical dialog building tool. Those tools helped us get a basic design. Over time the basic code was hacked around to add new features, resulting in many places where new dialogs were created by copying and modifying existing, already hacked-around dialog code.
After a number of years of such development we found that large sections of the Audacity source code, particularly the dialogs for configuring user preferences, consisted of tangled repetitive code. That code, though simple in what it did, was surprisingly hard to follow. Part of the problem was that the sequence in which dialogs were built up was quite arbitrary: smaller elements were combined into larger ones and eventually into complete dialogs, but the order in which elements were created by the code did not (and did not need to) resemble the order elements were laid out on screen. The code was also verbose and repetitive. There was GUI-related code to transfer data from preferences stored on disk to intermediate variables, code to transfer from intermediate variables to the displayed GUI, code to transfer from the displayed GUI to intermediate variables, and code to transfer from intermediate variables to the stored preferences. There were comments in the code along the lines of //this is a mess, but it was quite some time before anything was done about it.
对于 Audacity 用户界面程序员来说,最重要的单个库是 wxWidgets GUI 库,它提供按钮、滑块、复选框、窗口和对话框等功能。它提供了最明显的跨平台行为。wxWidgets 库有它自己的字符串类wxString,它具有线程、文件系统和字体的跨平台抽象,以及一种本地化到其他语言的机制,所有这些我们都使用。我们建议刚接触 Audacity 开发的人首先下载 wxWidgets 并编译和试验该库附带的一些示例。wxWidgets 是操作系统提供的底层 GUI 对象上相对较薄的一层。
为了构建复杂的对话框,wxWidgets 不仅提供了单独的小部件元素,还提供了控制元素大小和位置的大小器。这比为图形元素提供绝对固定位置要好得多。如果用户直接调整小部件的大小,或者说,通过使用不同的字体大小,则对话框中元素的位置会以非常自然的方式更新。Sizer 对于跨平台应用程序很重要。如果没有它们,我们可能必须为每个平台自定义对话框布局。
通常,这些对话框的设计位于程序读取的资源文件中。然而,在 Audacity 中,我们专门将对话框设计作为对 wxWidgets 函数的一系列调用编译到程序中。这提供了最大的灵活性:即,其确切内容和行为将由应用程序级代码确定的对话框。
您可以在 Audacity 中找到一些地方,其中用于创建 GUI 的初始代码显然是使用图形对话框构建工具生成的代码。这些工具帮助我们获得了一个基本的设计。随着时间的推移,基本代码被修改以添加新功能,导致许多地方通过复制和修改现有的、已经被修改的对话框代码来创建新对话框。
经过多年的此类开发,我们发现 Audacity 源代码的大部分,尤其是用于配置用户首选项的对话框,都由错综复杂的重复代码组成。这段代码虽然简单,却出人意料地难以遵循。部分问题在于,对话框的构建顺序非常随意:较小的元素组合成较大的元素,最终形成完整的对话框,但代码创建元素的顺序没有(并且不需要) 类似于在屏幕上布置元素的顺序。代码也是冗长和重复的。有与 GUI 相关的代码将数据从存储在磁盘上的首选项传输到中间变量,从中间变量传输到显示的 GUI 的代码,从显示的 GUI 转移到中间变量的代码,以及从中间变量转移到存储的首选项的代码。代码中有这样的注释//this is a mess,但过了很长时间才对它做任何事情。
The solution to untangling all this code was a new class, ShuttleGui, that much reduced the number of lines of code needed to specify a dialog, making the code more readable. ShuttleGui is an extra layer between the wxWidgets library and Audacity. Its job is to transfer information between the two. Here's an example which results in the GUI elements pictured in Figure 2.2.
解开所有这些代码的解决方案是一个新类 ShuttleGui,它大大减少了指定对话框所需的代码行数,使代码更具可读性。ShuttleGui 是 wxWidgets 库和 Audacity 之间的一个额外层。它的工作是在两者之间传递信息。这是一个示例,它产生了图 2.2 中描绘的 GUI 元素。
ShuttleGui S; // GUI Structure S.StartStatic("一些标题",...); { S.AddButton("某些按钮",...); S.TieCheckbox("一些复选框",...); } S.EndStatic();
Figure 2.2: Example Dialog
This code defines a static box in a dialog and that box contains a button and a checkbox. The correspondence between the code and the dialog should be clear. The StartStatic and EndStatic are paired calls. Other similar StartSomething/EndSomething pairs, which must match, are used for controlling other aspects of layout of the dialog. The curly brackets and the indenting that goes with them aren't needed for this to be correct code. We adopted the convention of adding them in to make the structure and particularly the matching of the paired calls obvious. It really helps readability in larger examples.
The source code shown does not just create the dialog. The code after the comment "//GUI Structure" can also be used to shuttle data from the dialog out to where the user preferences are stored, and to shuttle data back in. Previously a lot of the repetitive code came from the need to do this. Nowadays that code is only written once and is buried within the ShuttleGui class.
There are other extensions to the basic wxWidgets in Audacity. Audacity has its own class for managing toolbars. Why doesn't it use wxWidget's built in toolbar class? The reason is historic: Audacity's toolbars were written before wxWidgets provided a toolbar class.
此代码在对话框中定义了一个静态框,该框包含一个按钮和一个复选框。代码和对话框之间的对应关系应该是明确的。在StartStatic与EndStatic 配对电话。其他必须匹配的相似 StartSomething/EndSomething对用于控制对话框布局的其他方面。不需要大括号和与它们一起使用的缩进,这是正确的代码。我们采用了将它们加入的惯例,使结构,特别是配对调用的匹配变得明显。它确实有助于更大示例的可读性。
显示的源代码不只是创建对话框。注释“ //GUI Structure”后面的代码也可以用来将对话框中的数据穿梭到存储用户首选项的位置,并将数据穿梭回来。以前很多重复的代码来自于需要这样做。如今,该代码只编写一次,并且埋藏在ShuttleGui类中。
在 Audacity 中还有对基本 wxWidgets 的其他扩展。Audacity 有自己的管理工具栏的类。为什么不使用 wxWidget 的内置工具栏类?原因是历史性的:Audacity 的工具栏是在 wxWidgets 提供工具栏类之前编写的。
The main panel in Audacity which displays audio waveforms is the TrackPanel. This is a custom control drawn by Audacity. It's made up of components such as smaller panels with track information, a ruler for the timebase, rulers for amplitude, and tracks which may show waveforms, spectra or textual labels. The tracks can be resized and moved around by dragging. The tracks containing textual labels make use of our own re-implementation of an editable text box rather than using the built-in text box. You might think these panels tracks and rulers should each be a wxWidgets component, but they are not.
Audacity 中显示音频波形的主面板是 TrackPanel。这是 Audacity 绘制的自定义控件。它由一些组件组成,例如带有轨道信息的较小面板、时基标尺、幅度标尺以及可能显示波形、频谱或文本标签的轨道。轨道可以通过拖动来调整大小和移动。包含文本标签的轨道使用我们自己重新实现的可编辑文本框,而不是使用内置文本框。您可能认为这些面板轨道和标尺都应该是一个 wxWidgets 组件,但事实并非如此。
Figure 2.3: Audacity Interface with Track Panel Elements Labelled
The screenshot shown in Figure 2.3 shows the Audacity user interface. All the components that have been labelled are custom for Audacity. As far as wxWidgets is concerned there is one wxWidget component for the TrackPanel. Audacity code, not wxWidgets, takes care of the positioning and repainting within that.
The way all these components fit together to make the TrackPanel is truly horrible. (It's the code that's horrible; the end result the user sees looks just fine.) The GUI and application-specific code is all mixed together, not separated cleanly. In a good design only our application-specific code should know about left and right audio channels, decibels, muting and soloing. GUI elements should be application agnostic elements that are reusable in a non-audio application. Even the purely GUI parts of TrackPanel are a patchwork of special case code with absolute positions and sizes and not enough abstraction. It would be so much nicer, cleaner and more consistent if these special components were self-contained GUI elements and if they used sizers with the same kinds of interface as wxWidgets uses.
To get to such a TrackPanel we'd need a new sizer for wxWidgets that can move and resize tracks or, indeed, any other widget. wxWidgets sizers aren't yet that flexible. As a spin off benefit we could use that sizer elsewhere. We could use it in the toolbars that hold the buttons, making it easy to customize the order of buttons within a toolbar by dragging.
Some exploratory work has been done in creating and using such sizers, but not enough. Some experiments with making the GUI components fully fledged wxWidgets ran into a problem: doing so reduces our control over repainting of the widgets, resulting in flicker when resizing and moving components. We would need to extensively modify wxWidgets to achieve flicker-free repainting, and better separate the resizing steps from the repainting steps.
A second reason to be wary of this approach for the TrackPanel is that we already know wxWidgets start running very slowly when there are large numbers of widgets. This is mostly outside of wxWidget's control. Each wxWidget, button, and text entry box uses a resource from the windowing system. Each has a handle to access it. Processing large numbers of these takes time. Processing is slow even when the majority of widgets are hidden or off screen. We want to be able to use many small widgets on our tracks.
The best solution is to use a flyweight pattern, lightweight widgets that we draw ourselves, which do not have corresponding objects that consume windowing system resources or handles. We would use a structure like wxWidgets's sizers and component widgets, and give the components a similar API but not actually derive from wxWidgets classes. We'd be refactoring our existing TrackPanel code so that its structure became a lot clearer. If this were an easy solution it would already have been done, but diverging opinions about exactly what we want to end up with derailed an earlier attempt. Generalizing our current ad hoc approach would take significant design work and coding. There is a great temptation to leave complex code that already works well enough alone.
图 2.3所示的屏幕截图显示了 Audacity 用户界面。所有已标记的组件都是为 Audacity 定制的。就 wxWidgets 而言,TrackPanel 有一个 wxWidget 组件。Audacity 代码,而不是 wxWidgets,负责其中的定位和重绘。
所有这些组件组合在一起形成 TrackPanel 的方式真的很糟糕。(可怕的是代码;用户看到的最终结果看起来很好。)GUI 和特定于应用程序的代码都混合在一起,没有完全分开。在一个好的设计中,只有我们特定于应用程序的代码应该知道左右声道、分贝、静音和独奏。GUI 元素应该是可在非音频应用程序中重用的应用程序无关元素。甚至 TrackPanel 的纯 GUI 部分也是具有绝对位置和大小且没有足够抽象的特殊情况代码的拼凑而成。如果这些特殊组件是独立的 GUI 元素,并且如果它们使用具有与 wxWidgets 使用的相同类型的界面的 sizer,那将会更好、更清晰、更一致。
要获得这样的 TrackPanel,我们需要一个新的 wxWidgets 的 sizer,它可以移动和调整轨道或任何其他小部件的大小。wxWidgets sizer 还没有那么灵活。作为附带利益,我们可以在其他地方使用该 sizer。我们可以在包含按钮的工具栏中使用它,从而可以轻松地通过拖动自定义工具栏中按钮的顺序。
在创建和使用此类 sizer 方面已经进行了一些探索性工作,但还不够。一些使 GUI 组件完全成熟的 wxWidgets 的实验遇到了一个问题:这样做会减少我们对重新绘制小部件的控制,导致在调整大小和移动组件时闪烁。我们需要大量修改 wxWidgets 以实现无闪烁重绘,并更好地将调整大小步骤与重绘步骤分开。
警惕 TrackPanel 的这种方法的第二个原因是我们已经知道当有大量小部件时 wxWidgets 开始运行非常缓慢。这主要是在 wxWidget 的控制之外。每个 wxWidget、按钮和文本输入框都使用来自窗口系统的资源。每个都有一个句柄来访问它。处理大量这些需要时间。即使大多数小部件隐藏或不在屏幕上,处理速度也很慢。我们希望能够在我们的轨道上使用许多小部件。
最好的解决方案是使用享元模式,我们自己绘制的轻量级小部件,它们没有消耗窗口系统资源或句柄的相应对象。我们将使用像 wxWidgets 的 sizers 和组件小部件这样的结构,并为组件提供类似的 API,但实际上并不是从 wxWidgets 类派生的。我们将重构我们现有的 TrackPanel 代码,使其结构变得更加清晰。如果这是一个简单的解决方案,它早就已经完成了,但是关于我们最终想要什么的不同意见使早先的尝试脱轨。推广我们当前的临时方法需要大量的设计工作和编码。将已经运行良好的复杂代码单独放置是很有诱惑力的。
PortAudio is the audio library that gives Audacity the ability to play and record audio in a cross-platform way. Without it Audacity would not be able to use the sound card of the device it's running on. PortAudio provides the ring buffers, sample rate conversion when playing/recording and, crucially, provides an API that hides the differences between audio on Mac, Linux and Windows. Within PortAudio there are alternative implementation files to support this API for each platform.
I've never needed to dig into PortAudio to follow what happens inside. It is, however, useful to know how we interface with PortAudio. Audacity accepts data packets from PortAudio (recording) and sends packets to PortAudio (playback). It's worth looking at exactly how the sending and receiving happens, and how it fits in with reading and writing to disk and updates to the screen.
Several different processes are going on at the same time. Some happen frequently, transfer small amounts of data, and must be responded to quickly. Others happen less frequently, transfer larger quantities of data, and the exact timing of when they happen is less critical. This is an impedance mismatch between the processes, and buffers are used to accommodate it. A second part of the picture is that we are dealing with audio devices, hard drives, and the screen. We don't go down to the wire and so have to work with the APIs we're given. Whilst we would like each of our processes to look similar, for example to have each running from a wxThread, we don't have that luxury (Figure 2.4).
PortAudio 是一个音频库,它使 Audacity 能够以跨平台的方式播放和录制音频。没有它,Audacity 将无法使用它正在运行的设备的声卡。PortAudio 在播放/录制时提供环形缓冲区、采样率转换,并且至关重要的是,提供了一个 API 来隐藏 Mac、Linux 和 Windows 上的音频之间的差异。在 PortAudio 中,有替代的实现文件来支持每个平台的这个 API。
我从来不需要深入研究 PortAudio 来了解里面发生的事情。然而,了解我们如何与 PortAudio 交互是很有用的。Audacity 接受来自 PortAudio 的数据包(录音)并将数据包发送到 PortAudio(回放)。值得仔细研究发送和接收是如何发生的,以及它如何适应读写磁盘和更新屏幕。
几个不同的过程同时进行。有些经常发生,传输少量数据,必须快速响应。其他的发生频率较低,传输的数据量更大,而且发生的确切时间不太重要。这是进程之间的阻抗不匹配,并且使用缓冲器来适应它。图片的第二部分是我们正在处理音频设备、硬盘驱动器和屏幕。我们不会深入到线路中,因此必须使用我们提供的 API。虽然我们希望我们的每个进程看起来都相似,例如让每个进程都从 wxThread 运行,但我们没有那么奢侈(图 2.4)。
注:kaldi开源项目里也使用PortAudio,锐英源软件对里面的环形缓冲代码进行研究使用过。
Figure 2.4: Threads and Buffers in Playback and Recording
One audio thread is started by PortAudio code and interacts directly with the audio device. This is what drives recording or playback. This thread has to be responsive or packets will get lost. The thread, under the control of PortAudio code, calls audacityAudioCallback which, when recording, adds newly arrived small packets to a larger (five second) capture buffer. When playing back it takes small chunks off a five second playback buffer. The PortAudio library knows nothing about wxWidgets and so this thread created by PortAudio is a pthread.
A second thread is started by code in Audacity's class AudioIO. When recording, AudioIO takes the data from the capture buffer and appends it to Audacity's tracks so that it will eventually get displayed. Additionally, when enough data has been added, AudioIO writes the data to disk. This same thread also does the disk reads for audio playback. The function AudioIO::FillBuffers is the key function here and depending on the settings of some Boolean variables, handles both recording and playback in the one function. It's important that the one function handle both directions. Both the recording and playback parts are used at the same time when doing "software play through," where you overdub what was previously recorded. In the AudioIO thread we are totally at the mercy of the operating system's disk IO. We may stall for an unknown length of time reading or writing to a disk. We could not do those reads or writes in audacityAudioCallback because of the need to be responsive there.
Communication between these two threads happens via shared variables. Because we control which threads are writing to these variables and when, we avoid the need for more expensive mutexes.
In both playback and recording, there is an additional requirement: Audacity also needs to update the GUI. This is the least time critical operation. The update happens in the main GUI thread and is due to a periodic timer that ticks twenty times a second. This timer's tick causes TrackPanel::OnTimer to be called, and if updates to the GUI are found to be needed, they are applied. This main GUI thread is created within wxWidgets rather than by our own code. It is special in that other threads cannot directly update the GUI. Using a timer to get the GUI thread to check if it needs to update the screen allows us to reduce the number of repaints to a level that is acceptable for a responsive display, and not make too heavy demands on processor time for displaying.
Is it good design to have an audio device thread, a buffer/disk thread and a GUI thread with periodic timer to handle these audio data transfers? It is somewhat ad hoc to have these three different threads that are not based on a single abstract base class. However, the ad-hockery is largely dictated by the libraries we use. PortAudio expects to create a thread itself. The wxWidgets framework automatically has a GUI thread. Our need for a buffer filling thread is dictated by our need to fix the impedance mismatch between the frequent small packets of the audio device thread and the less frequent larger packets of the disk drive. There is very clear benefit in using these libraries. The cost in using the libraries is that we end up using the abstractions they provide. As a result we copy data in memory from one place to another more than is strictly necessary. In fast data switches I've worked on, I've seen extremely efficient code for handling these kinds of impedance mismatches that is interrupt driven and does not use threads at all. Pointers to buffers are passed around rather than copying data. You can only do that if the libraries you are using are designed with a richer buffer abstraction. Using the existing interfaces, we're forced to use threads and we're forced to copy data.
一个音频线程由 PortAudio 代码启动并直接与音频设备交互。这就是驱动录制或播放的原因。该线程必须响应,否则数据包将丢失。该线程在 PortAudio 代码的控制下调用 audacityAudioCallback它,在录制时将新到达的小数据包添加到更大的(五秒)捕获缓冲区。播放时,它会从 5 秒的播放缓冲区中取出小块。PortAudio 库对 wxWidgets 一无所知,因此 PortAudio 创建的这个线程是一个 pthread。
第二个线程由 Audacity 的类 AudioIO 中的代码启动。录音时,AudioIO 从捕获缓冲区中获取数据并将其附加到 Audacity 的音轨中,以便最终显示出来。此外,当添加了足够的数据时,AudioIO 会将数据写入磁盘。同一个线程还会读取磁盘以进行音频播放。功能AudioIO::FillBuffers是这里的关键函数,根据一些布尔变量的设置,在一个函数中处理记录和回放。一个函数处理两个方向很重要。录制和播放部分在进行“软件播放”时同时使用,您可以在其中对之前录制的内容进行叠加。在 AudioIO 线程中,我们完全受操作系统磁盘 IO 的支配。我们可能会停止读取或写入磁盘的时间长度未知。audacityAudioCallback由于需要在那里做出响应,我们无法进行这些读取或写入。
这两个线程之间的通信通过共享变量进行。因为我们控制哪些线程在何时写入这些变量,所以我们避免了对更昂贵的互斥锁的需要。
无论是播放还是录音,都有一个额外的要求:Audacity 还需要更新 GUI。这是对时间要求最低的操作。更新发生在主 GUI 线程中,并且是由于每秒滴答 20 次的周期性计时器。这个计时器的滴答声会TrackPanel::OnTimer被调用,如果发现需要更新 GUI,就会应用它们。这个主要的 GUI 线程是在 wxWidgets 中创建的,而不是由我们自己的代码创建的。特殊之处在于其他线程无法直接更新 GUI。使用计时器让 GUI 线程检查它是否需要更新屏幕允许我们将重绘次数减少到响应式显示可以接受的水平,并且不会对显示的处理器时间提出太多要求。
有一个音频设备线程、一个缓冲区/磁盘线程和一个带有周期性计时器的 GUI 线程来处理这些音频数据传输是否是好的设计?拥有这三个不基于单个抽象基类的不同线程有点特别。然而,广告在很大程度上取决于我们使用的库。PortAudio 希望自己创建一个线程。wxWidgets 框架自动拥有一个 GUI 线程。我们对缓冲区填充线程的需求取决于我们需要修复音频设备线程的频繁小数据包与磁盘驱动器的不太频繁的大数据包之间的阻抗失配。使用这些库有非常明显的好处。使用这些库的代价是我们最终会使用它们提供的抽象。因此,我们将内存中的数据从一个地方复制到另一个地方,这超出了绝对必要的程度。在我研究过的快速数据切换中,我看到了非常有效的代码来处理这些类型的阻抗不匹配,这些代码是中断驱动的,根本不使用线程。指向缓冲区的指针被传递而不是复制数据。只有当您使用的库设计有更丰富的缓冲区抽象时,您才能这样做。使用现有的接口,我们被迫使用线程,我们被迫复制数据。只有当您使用的库设计有更丰富的缓冲区抽象时,您才能这样做。使用现有的接口,我们被迫使用线程,我们被迫复制数据。只有当您使用的库设计有更丰富的缓冲区抽象时,您才能这样做。使用现有的接口,我们被迫使用线程,我们被迫复制数据。
One of the challenges faced by Audacity is supporting insertions and deletions into audio recordings that may be hours long. Recordings can easily be too long to fit in available RAM. If an audio recording is in a single disk file, inserting audio somewhere near the start of that file could mean moving a lot of data to make way. Copying that data on disk would be time consuming and mean that Audacity could then not respond rapidly to simple edits.
Audacity's solution to this is to divide audio files into many BlockFiles, each of which could be around 1 MB. This is the main reason Audacity has its own audio file format, a master file with the extension .aup. It is an XML file which coordinates the various blocks. Changes near the start of a long audio recording might affect just one block and the master .aup file.
BlockFiles balance two conflicting forces. We can insert and delete audio without excessive copying, and during playback we are guaranteed to get reasonably large chunks of audio with each request to the disk. The smaller the blocks, the more potential disk requests to fetch the same amount of audio data; the larger the blocks, the more copying on insertions and deletions.
Audacity's BlockFiles never have internal free space and they never grow beyond the maximum block size. To keep this true when we insert or delete we may end up copying up to one block's worth of data. When we don't need a BlockFile anymore we delete it. The BlockFiles are reference counted so if we delete some audio, the relevant BlockFiles will still hang around to support the undo mechanism until we save. There is never a need to garbage collect free space within Audacity BlockFiles, which we would need to do with an all-in-one-file approach.
Merging and splitting larger chunks of data is the bread and butter of data management systems, from B-trees to Google's BigTable tablets to the management of unrolled linked lists. Figure 2.5 shows what happens in Audacity when removing a span of audio near the start.
Audacity 面临的挑战之一是支持在可能长达数小时的录音中插入和删除。录音很容易太长而无法容纳在可用的 RAM 中。如果录音位于单个磁盘文件中,则在该文件开头附近的某处插入音频可能意味着移动大量数据以让路。将这些数据复制到磁盘上将非常耗时,这意味着 Audacity 无法对简单的编辑做出快速响应。
Audacity 对此的解决方案是将音频文件分成许多 BlockFiles,每个 BlockFiles 可能约为 1 MB。这是 Audacity 拥有自己的音频文件格式的主要原因,即扩展名为.aup. 它是一个 XML 文件,用于协调各种块。在一段长录音开始附近的更改可能只影响一个块和主.aup文件。
BlockFiles 平衡了两种相互冲突的力量。我们可以在不过度复制的情况下插入和删除音频,并且在播放过程中,我们保证每个磁盘请求都能获得相当大的音频块。块越小,获取相同数量音频数据的潜在磁盘请求就越多;块越大,插入和删除的复制越多。
Audacity 的 BlockFiles 永远没有内部可用空间,它们永远不会超过最大块大小。为了在我们插入或删除时保持这一点,我们最终可能会复制多达一个块的数据。当我们不再需要 BlockFile 时,我们将其删除。BlockFiles 是引用计数的,所以如果我们删除一些音频,相关的 BlockFiles 将仍然存在以支持撤消机制,直到我们保存。从来不需要在 Audacity BlockFiles 中对可用空间进行垃圾回收,而我们需要使用多合一文件方法来完成。
合并和拆分更大的数据块是数据管理系统的基础,从 B 树到 Google 的 BigTable 平板电脑,再到展开链表的管理。 图 2.5显示了在 Audacity 中移除开头附近的一段音频时发生的情况。
Figure 2.5: Before deletion, .aup file and BlockFiles hold the sequence ABCDEFGHIJKLMNO. After deletion of FGHI, two BlockFiles are merged.删除前,.aupfile 和 BlockFiles 保持序列 ABCDEFGHIJKLMNO。删除 FGHI 后,合并了两个 BlockFile。
BlockFiles aren't just used for the audio itself. There are also BlockFiles that cache summary information. If Audacity is asked to display a four hour long recording on screen it is not acceptable for it to process the entire audio each time it redraws the screen. Instead it uses summary information which gives the maximum and minimum audio amplitude over ranges of time. When zoomed in, Audacity is drawing using actual samples. When zoomed out, Audacity is drawing using summary information.
A refinement in the BlockFile system is that the blocks needn't be files created by Audacity. They can be references to subsections of audio files such as a timespan from audio stored in the .wav format. A user can create an Audacity project, import audio from a .wav file and mix a number of tracks whilst only creating BlockFiles for the summary information. This saves disk space and saves time in copying audio. All told it is, however, a rather bad idea. Far too many of our users have removed the original audio .wav file thinking there will be a complete copy in the Audacity project folder. That's not so and without the original .wav file the audio project can no longer be played. The default in Audacity nowadays is to always copy imported audio, creating new BlockFiles in the process.
The BlockFile solution ran into problems on Windows systems where having a large number of BlockFiles performed very poorly. This appeared to be because Windows was much slower handling files when there were many in the same directory, a similar problem to the slowdown with large numbers of widgets. A later addition was made to use a hierarchy of subdirectories, never with more than a hundred files in each subdirectory.
The main problem with the BlockFile structure is that it is exposed to end users. We often hear from users who move the .aup file and don't realize they also need to move the folder containing all the BlockFiles too. It would be better if Audacity projects were a single file with Audacity taking responsibility for how the space inside the file is used. If anything this would increase performance rather than reduce it. The main additional code needed would be for garbage collection. A simple approach to that would be to copy the blocks to a new file when saving if more than a set percentage of the file were unused.
BlockFiles 不仅仅用于音频本身。还有缓存摘要信息的 BlockFiles。如果 Audacity 被要求在屏幕上显示长达四小时的录音,则每次重绘屏幕时都不能处理整个音频。相反,它使用汇总信息提供时间范围内的最大和最小音频幅度。放大时,Audacity 正在使用实际样本进行绘制。缩小时,Audacity 使用摘要信息进行绘图。
BlockFile 系统的一个改进是块不需要是由 Audacity 创建的文件。它们可以是对音频文件子部分的引用,例如以该.wav格式存储的音频的时间跨度。用户可以创建一个 Audacity 项目,从.wav文件导入音频并混合多个音轨,同时仅为摘要信息创建 BlockFiles。这可以节省磁盘空间并节省复制音频的时间。然而,总而言之,这是一个相当糟糕的主意。我们有太多用户.wav认为 Audacity 项目文件夹中会有完整的副本,因此删除了原始音频文件。事实并非如此,如果没有原始.wav文件,则无法再播放音频项目。现在 Audacity 的默认设置是始终复制导入的音频,在此过程中创建新的 BlockFiles。
BlockFile 解决方案在 Windows 系统上遇到了问题,在这些系统中,大量 BlockFile 的性能非常差。这似乎是因为当同一目录中有很多文件时,Windows 处理文件的速度要慢得多,这与大量小部件导致速度变慢的问题类似。后来增加了使用子目录的层次结构,每个子目录中的文件永远不会超过一百个。
BlockFile 结构的主要问题是它暴露给最终用户。我们经常听到用户移动.aup文件但没有意识到他们也需要移动包含所有 BlockFiles 的文件夹。如果 Audacity 项目是一个单独的文件,并且 Audacity 负责如何使用文件内的空间,那就更好了。如果有的话,这会提高性能而不是降低性能。所需的主要附加代码将用于垃圾收集。一个简单的方法是在保存时将块复制到一个新文件,如果超过一定百分比的文件未使用。
Audacity has an experimental plugin that supports multiple scripting languages. It provides a scripting interface over a named pipe. The commands exposed via scripting are in a textual format, as are the responses. As long as the user's scripting language can write text to and read text from a named pipe, the scripting language can drive Audacity. Audio and other high-volume data does not need to travel on the pipe (Figure 2.6).Audacity 有一个支持多种脚本语言的实验性插件。它通过命名管道提供脚本接口。通过脚本公开的命令采用文本格式,响应也是如此。只要用户的脚本语言可以在命名管道中写入和读取文本,脚本语言就可以驱动 Audacity。音频和其他大容量数据不需要通过管道传输(图 2.6)。
Figure 2.6: Scripting Plugin Provides Scripting Over a Named Pipe
The plugin itself knows nothing about the content of the text traffic that it carries. It is only responsible for conveying it. The plugin interface (or rudimentary extension point) used by the scripting plugin to plug in to Audacity already exposes Audacity commands in textual format. So, the scripting plugin is small, its main content being code for the pipe.
Unfortunately a pipe introduces similar security risks to having a TCP/IP connection—and we've ruled out TCP/IP connections for Audacity on security grounds. To reduce that risk the plugin is an optional DLL. You have to make a deliberate decision to obtain and use it and it comes with a health/security warning.
After the scripting feature had already been started, a suggestion surfaced in the feature requests page of our wiki that we should consider using KDE's D-Bus standard to provide an inter-process call mechanism using TCP/IP. We'd already started going down a different route but it still might make sense to adapt the interface we've ended up with to support D-Bus.
插件本身对其承载的文本流量的内容一无所知。它只负责传送它。脚本插件用于插入 Audacity 的插件接口(或基本扩展点)已经以文本格式公开了 Audacity 命令。所以,脚本插件很小,它的主要内容是管道的代码。
不幸的是,管道引入了与 TCP/IP 连接类似的安全风险——出于安全考虑,我们已经排除了 Audacity 的 TCP/IP 连接。为了降低这种风险,插件是一个可选的 DLL。您必须慎重决定获取和使用它,并且它带有健康/安全警告。
脚本功能启动后,维基的功能请求页面中出现了一个建议,即我们应该考虑使用 KDE 的 D-Bus 标准来提供使用 TCP/IP 的进程间调用机制。我们已经开始走不同的路线,但调整我们最终得到的接口以支持 D-Bus 仍然是有意义的。
Origins of Scripting Code
The scripting feature grew from an enthusiast's adaptation of Audacity for a particular need that was heading in the direction of being a fork. These features, together called CleanSpeech, provide for mp3 conversion of sermons. CleanSpeech adds new effects such as truncate silence—the effect finds and cuts out long silences in audio—and the ability to apply a fixed sequence of existing noise removal effects, normalization and mp3 conversion to a whole batch of audio recordings. We wanted some of the excellent functionality in this, but the way it was written was too special case for Audacity. Bringing it into mainstream Audacity led us to code for a flexible sequence rather than a fixed sequence. The flexible sequence could use any of the effects via a look-up table for command names and a Shuttle class to persist the command parameters to a textual format in user preferences. This feature is called batch chains. Very deliberately we stopped short of adding conditionals or calculation to avoid inventing an ad hoc scripting language.
In retrospect the effort to avoid a fork has been well worthwhile. There is still a CleanSpeech mode buried in Audacity that can be set by modifying a preference. It also cuts down the user interface, removing advanced features. A simplified version of Audacity has been requested for other uses, most notably in schools. The problem is that each person's view of which are the advanced features and which are the essential ones is different. We've subsequently implemented a simple hack that leverages the translation mechanism. When the translation of a menu item starts with a "#" it is no longer shown in the menus. That way people who want to reduce the menus can make choices themselves without recompiling—more general and less invasive than the mCleanspeech flag in Audacity, which in time we may be able to remove entirely.
The CleanSpeech work gave us batch chains and the ability to truncate silence. Both have attracted additional improvement from outside the core team. Batch chains directly led on to the scripting feature. That in turn has begun the process of supporting more general purpose plugins to adapt Audacity.
脚本代码的起源
脚本功能源于狂热者对 Audacity 的改编,以满足特定需求,该需求正朝着成为分叉的方向发展。这些功能一起称为 CleanSpeech,提供讲道的 mp3 转换。CleanSpeech 添加了新的效果,例如截断静音(该效果会发现并消除音频中的长静音)以及将现有降噪效果、标准化和 mp3 转换的固定序列应用于整批录音的能力。我们想要一些出色的功能,但是它的编写方式对于 Audacity 来说太特殊了。将其带入主流 Audacity 使我们为灵活序列而不是固定序列编码。灵活的序列可以通过命令名称的查找表和 Shuttle类将命令参数保存为用户首选项中的文本格式。此功能称为批处理链。我们非常刻意地停止添加条件或计算,以避免发明一种特殊的脚本语言。
回想起来,避免分叉的努力是非常值得的。Audacity 中还有一个 CleanSpeech 模式,可以通过修改首选项来设置。它还减少了用户界面,删除了高级功能。已要求将 Audacity 的简化版本用于其他用途,尤其是在学校中。问题是每个人对哪些是高级功能,哪些是本质功能的看法是不同的。我们随后实施了一个利用翻译机制的简单黑客。当菜单项的翻译以“#”开头时,它不再显示在菜单中。这样,想要减少菜单的人可以自己做出选择,而无需重新编译——比mCleanspeechAudacity 中的标志更通用且侵入性更小,我们可能会及时将其完全删除。
CleanSpeech 工作为我们提供了批处理链和截断静音的能力。两者都吸引了核心团队之外的额外改进。批处理链直接导致脚本功能。这反过来又开始了支持更多通用插件以适应 Audacity 的过程。
Audacity does not have real-time effects, that is, audio effects that are calculated on demand as the audio plays. Instead in Audacity you apply an effect and must wait for it to complete. Real-time effects and rendering of audio effects in the background whilst the user interface stays responsive are among the most frequently made feature requests for Audacity.
A problem we have is that what may be a real-time effect on one machine may not run fast enough to be real-time on a much slower machine. Audacity runs on a wide range of machines. We'd like a graceful fallback. On a slower machine we'd still want to be able to request an effect be applied to an entire track and to then listen to the processed audio near the middle of the track, after a small wait, with Audacity knowing to process that part first. On a machine too slow to render the effect in real time we'd be able to listen to the audio until playback caught up with the rendering. To do this we'd need to remove the restrictions that audio effects hold up the user interface and that the order of processing the audio blocks is strictly left to right.
A relatively recent addition in Audacity called on demand loading has many of the elements we need for real time effects, though it doesn't involve audio effects at all. When you import an audio file into Audacity, it can now make the summary BlockFiles in a background task. Audacity will show a placeholder of diagonal blue and gray stripes for audio that it has not yet processed and respond to many user commands whilst the audio is still being loaded. The blocks do not have to be processed in left-to-right order. The intention has always been that the same code will in due course be used for real-time effects.
On demand loading gives us an evolutionary approach to adding real time effects. It's a step that avoids some of the complexities of making the effects themselves real-time. Real-time effects will additionally need overlap between the blocks, otherwise effects like echo will not join up correctly. We'll also need to allow parameters to vary as the audio is playing. By doing on demand loading first, the code gets used at an earlier stage than it otherwise would. It will get feedback and refinement from actual use.
Audacity 没有实时效果,即在音频播放时按需计算的音频效果。相反,在 Audacity 中,您应用效果并且必须等待它完成。在用户界面保持响应的同时在后台实时效果和渲染音频效果是 Audacity 最常提出的功能请求之一。
我们遇到的一个问题是,在一台机器上的实时效果可能无法在速度较慢的机器上实时运行。Audacity 可在多种机器上运行。我们想要一个优雅的回退。在较慢的机器上,我们仍然希望能够请求将效果应用于整个音轨,然后在稍等片刻之后收听音轨中间附近处理过的音频,Audacity 知道首先处理该部分. 在速度太慢而无法实时渲染效果的机器上,我们可以收听音频,直到播放赶上渲染为止。为此,我们需要取消音频效果对用户界面的限制,以及音频块的处理顺序严格从左到右。
Audacity 中相对较新的新增功能称为按需加载, 具有我们实时效果所需的许多元素,尽管它根本不涉及音频效果。当您将音频文件导入 Audacity 时,它现在可以在后台任务中制作摘要 BlockFiles。Audacity 将为尚未处理的音频显示一个蓝色和灰色斜条纹占位符,并在音频仍在加载时响应许多用户命令。块不必按从左到右的顺序进行处理。目的一直是相同的代码将在适当的时候用于实时效果。
按需加载为我们提供了一种添加实时效果的进化方法。这是避免使效果本身实时的一些复杂性的步骤。实时效果还需要块之间的重叠,否则像回声这样的效果将无法正确连接。我们还需要允许参数随着音频的播放而变化。通过首先执行按需加载,代码在比其他情况下更早的阶段被使用。会从实际使用中得到反馈和细化。
The earlier sections of this chapter illustrate how good structure contribute to a program's growth, or how the absence of good structure hinders it.
The more you look, the more obvious it is that Audacity is a community effort. The community is larger than just those contributing directly because it depends on libraries, each of which has its own community with its own domain experts. Having read about the mix of structure in Audacity it probably comes as no surprise that the community developing it welcomes new developers and is well able to handle a wide range of skill levels.
For me there is no question that the nature of the community behind Audacity is reflected in the strengths and weaknesses of the code. A more closed group could write high quality code more consistently than we have, but it would be harder to match the range of capabilities Audacity has with fewer people contributing.
本章前面的部分说明了良好的结构如何促进程序的增长,或者缺乏良好的结构如何阻碍它。
你看得越多,就越能看出 Audacity 是社区的努力。社区不仅仅是那些直接贡献的人,因为它依赖于图书馆,每个图书馆都有自己的社区和自己的领域专家。在阅读了 Audacity 中的结构组合后,开发它的社区欢迎新开发人员并且能够很好地处理各种技能水平也就不足为奇了。