Transcription Services & Audio Corpora

“We make Speech Intelligently Simple”

What is ASR Speech Corpora?

Speech corpora are recordings of speech along with their word-aligned transcriptions.  They have several uses in natural language systems, including training speaker models, collecting frequencies for word and phrase usage, and developing models of interaction styles.

All of the above contribute to more successful and accurate speech recognition interfaces.  Additionally, the performance of an automatic speech recognition (ASR) system is directly related to the size (and quality) of the speech corpus used to train it.

This is why all of the major players in ASR are working hard to increase the size of their corpora.  One way they do this is as part of your user agreement with ARS systems, like Siri and Google, your voice recordings and requests are captured and can be used for up to two years in order to better train their acoustic models.

shutterstock_112905853
shutterstock_116432626

ARS Speech Corpora

As the demand for specialized speech recognition applications increases and technology leans more to speech interfaces over keyboards, the demand for quality recorded and transcribed audio corpora is on the rise.

Speech corpora are recordings of speech along with their word-aligned transcriptions.

They have several uses in speech recognition and natural language systems, including training speaker models, collecting frequencies for word and phrase usage, and developing models of interaction styles.

Despite the fact that speech is our most common form of communication, there aren’t many speech corpora in the world.  The performance of an automatic speech recognition (ASR) system is directly related to the size (and quality) of the speech corpus used to train it.

So, all of the major players in ASR are working hard to increase the size of their corpora.

Unfortunately, most major generic speech recognition systems today, such as Siri, Google Now, and Nuance, collect voice data by capturing the audio we all say using the product, and this audio is used to build better acoustic models for their ASP systems.

Many major problems exist when trying to use ASR from these companies as speech recognition interfaces to their applications:

  1. These speech platforms are not fully open and available to developers for customization and modification to meet specific needs.
  2. The acoustic recordings and their associates transcriptions are not public and not available to developers for customization and use in their specific design needs.
  3. These companies, in most cases, are unwilling to modify and/or develop new acoustic models that would allow their system to adapt and conform with unique specific speech recognition interfaces.
  4. Outside of these ASR systems (where their audio corpora is unavailable), there are very few sources that developers and companies can turn to for the availability of new or specific speech corpora which has been recorded and transcribed for their specific needs.

Unavailability of Accurate and Adaptable Speech Corpus

Despite the fact that speech is our most common form of communication, there aren’t many speech corpora in the world.

Linguistic Data Consortium (LDC) has a large collection of research results and audio corpora; however, the quality varies with respect to developing high quality ARS systems, and the products in many cases are 10-20 years old.

SRI, then originators of SIRI, do not sell their speech datasets.

Amazon, Apple, Google, and Microsoft all are actively working to build datasets, but they don’t sell their data.

Nuance also provides best-in-class ASR for specific environment only (i.e., Dragon Dictate), but they also do not sell their speech datasets.

Unfortunately, this leaves little in the way of opportunity to gain high quality, specifically-focused speech data when it comes to training, or having someone else build and design your own ASR system.

MatrixHCI can design, collect, and transcribe any unique speech dataset that is required to meet your needs.  Please refer to our page titled, “Transcription Services,” for more information on customer-developed audio corpora.

Existing Speech Corpora and Availability.

In order to meet the tremendous and critical need for existing speech datasets for using and developing ASR systems, MartrixHCI is in the process of completing the following audio corpora for the development of ASR systems.

This audio corpora is available to internal MatrixHCI developers as we develop your custom AS interface, or it is available for license to companies who choose to do their own ASR development.

Quality and freshness of audio corpora today are scarce and our speech datasets have been developed to give our customers the very latest speech data that provides quality speech and conforms with the most current speech dialects and speech characteristics.

shutterstock_325937852

MatrixHCI offers Custom Software Development with an emphasis on Advanced Cutting-Edge Speech Recognition

If you have a need for specialized custom speech recognition solutions, please contact us for a free confidential consultation.