 
   精通
英语
和
开源
,
擅长
开发
与
培训
,
胸怀四海
第一信赖
Project Overview
In this notebook, you will build a deep  neural network that functions as part of an end-to-end automatic speech  recognition (ASR) pipeline!
                
                We begin by investigating the LibriSpeech  dataset that will be used to train and evaluate your models.  Your algorithm will first convert any raw audio to feature representations that  are commonly used for ASR. You will then move on to building neural networks  that can map these audio features to transcribed text. After learning about the  basic types of layers that are often used for deep learning-based approaches to  ASR, you will engage in your own investigations by creating and testing your own  state-of-the-art models. Throughout the notebook, we provide recommended  research papers for additional reading and links to GitHub repositories with  interesting implementations. 我们首先研究将用于训练和评估模型的LibriSpeech数据集。您的算法首先将将任何原始音频转换为常用于ASR的要素表示。然后,您将继续构建可以将这些音频特征映射到转录文本的神经网络。在了解了常用于基于深度学习的ASR方法的基本图层类型后,您将通过创建和测试自己最先进的模型来参与自己的调查。在整个笔记本中,我们提供了推荐的研究论文,以便进行额外阅读,并通过有趣的实现链接到GitHub存储库。 
                Project Instructions
Amazon Web Services
                This project requires GPU acceleration  to run efficiently. Please refer to the Udacity instructions for setting up a  GPU instance for this project, and refer to the project instructions in the  classroom for setup. link for AIND students该项目需要GPU加速才能有效运行。有关为此项目设置GPU实例的信息,请参阅Udacity说明,并参阅教室中的项目说明进行设置。 
You should run this project with GPU acceleration for best performance. 您应该使用GPU加速运行此项目以获得最佳性能
git clone  https://github.com/udacity/AIND-VUI-Capstone.git
                cd  AIND-VUI-Capstone
pip install -r requirements.txt
注意:在训练模型0的第一个时期之后,Keras / Windows错误可能会出现此错误:‘rawunicodeescape’ codec can’t decode bytes in position 54-55: truncated \uXXXX 。要解决这个问题:
浏览到Libav网站
cd ..
                python  create_desc_json.py LibriSpeech/dev-clean/ train_corpus.json
                python create_desc_json.py  LibriSpeech/test-clean/ valid_corpus.json
python -m  ipykernel install --user --name aind-vui --display-name "aind-vui"
                jupyter  notebook vui_notebook.ipynb
                Suggestions to Make your Project Stand  Out!
(1) Add a Language Model to the Decoder
                The performance of the decoding step can  be greatly enhanced by incorporating a language model. Build your own language  model from scratch, or leverage a repository or toolkit that you find online to  improve your predictions.
  (2) Train on Bigger Data
                In the project,  you used some of the smaller downloads from the LibriSpeech corpus. Try  training your model on some larger datasets - instead of using dev-clean.tar.gz, download one  of the larger training sets on the website.
  (3) Try out Different Audio Features
                In this project, you had the choice to  use either spectrogram or MFCC features. Take the time to test  the performance of bothof these features. For a special challenge,  train a network that uses raw audio waveforms!
                Special Thanks
We have borrowed the create_desc_json.py and flac_to_wav.sh files from the ba-dls-deepspeech repository, along with some functions used to generate spectrograms.