精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
Project Overview
In this notebook, you will build a deep neural network that functions as part of an end-to-end automatic speech recognition (ASR) pipeline!
We begin by investigating the LibriSpeech dataset that will be used to train and evaluate your models. Your algorithm will first convert any raw audio to feature representations that are commonly used for ASR. You will then move on to building neural networks that can map these audio features to transcribed text. After learning about the basic types of layers that are often used for deep learning-based approaches to ASR, you will engage in your own investigations by creating and testing your own state-of-the-art models. Throughout the notebook, we provide recommended research papers for additional reading and links to GitHub repositories with interesting implementations. 我们首先研究将用于训练和评估模型的LibriSpeech数据集。您的算法首先将将任何原始音频转换为常用于ASR的要素表示。然后,您将继续构建可以将这些音频特征映射到转录文本的神经网络。在了解了常用于基于深度学习的ASR方法的基本图层类型后,您将通过创建和测试自己最先进的模型来参与自己的调查。在整个笔记本中,我们提供了推荐的研究论文,以便进行额外阅读,并通过有趣的实现链接到GitHub存储库。
Project Instructions
Amazon Web Services
This project requires GPU acceleration to run efficiently. Please refer to the Udacity instructions for setting up a GPU instance for this project, and refer to the project instructions in the classroom for setup. link for AIND students该项目需要GPU加速才能有效运行。有关为此项目设置GPU实例的信息,请参阅Udacity说明,并参阅教室中的项目说明进行设置。
You should run this project with GPU acceleration for best performance. 您应该使用GPU加速运行此项目以获得最佳性能
git clone https://github.com/udacity/AIND-VUI-Capstone.git
cd AIND-VUI-Capstone
pip install -r requirements.txt
注意:在训练模型0的第一个时期之后,Keras / Windows错误可能会出现此错误:‘rawunicodeescape’ codec can’t decode bytes in position 54-55: truncated \uXXXX 。要解决这个问题:
浏览到Libav网站
cd ..
python create_desc_json.py LibriSpeech/dev-clean/ train_corpus.json
python create_desc_json.py LibriSpeech/test-clean/ valid_corpus.json
python -m ipykernel install --user --name aind-vui --display-name "aind-vui"
jupyter notebook vui_notebook.ipynb
Suggestions to Make your Project Stand Out!
(1) Add a Language Model to the Decoder
The performance of the decoding step can be greatly enhanced by incorporating a language model. Build your own language model from scratch, or leverage a repository or toolkit that you find online to improve your predictions.
(2) Train on Bigger Data
In the project, you used some of the smaller downloads from the LibriSpeech corpus. Try training your model on some larger datasets - instead of using dev-clean.tar.gz, download one of the larger training sets on the website.
(3) Try out Different Audio Features
In this project, you had the choice to use either spectrogram or MFCC features. Take the time to test the performance of bothof these features. For a special challenge, train a network that uses raw audio waveforms!
Special Thanks
We have borrowed the create_desc_json.py and flac_to_wav.sh files from the ba-dls-deepspeech repository, along with some functions used to generate spectrograms.