Audio Corpora Libraries
CUSTOM DEVELOPED AND EXISTING AUDIO CORPORA
Acoustic Speech Corpora Libraries & Licensed ASR Training Data
Existing Corpora Libraries
MatrixHCI is developing a variety of quality acoustic speech corpora for use in speech recognition research and development. These libraries are available for license to companies in need of large collections of transcribed speech samples.
The libraries that that have been developed for release include:
✔ Conversational speech
✔ Dictational speech
✔ Aviation-related speech corpora.
Spoken languages, include English and Spanish. We are also compiling an audio corpus of English speakers that use “vocal fry” in their speech, technically called “creaky voice.” Vocal fry is produced by compressing the vocal chords which reduces the airflow through the larynx and the frequency of vibrations causing speech to sound rattled or “creaky,” fry.
It is most common among younger females populating in the US, and is becoming evermore pervasive in today’s social society of communication.
Unfortunately, vocal fry can cause speech recognition accuracy problems; however, this new audio corpus is specifically designed to bridge the gap and allow acoustic models to better handle this new type of speaking behavior.
Available English Conversational Audio Corpora
English Conversational Speech 200 hrs.
✔ Geographically diverse set of 400 speakers each speaking conversationally for 30 minutes each.
✔ Recorded at 96kHz from 6 different recording sources, including high quality Cardoid Microphone, Lapel Microphone, Omnidirectional conference table mic, and a noise canceling headset mic.
✔ Additionally, audio has been collected and passed through both iPhone/iOS and Android devices through the AT&T phone network.
For more information about license details on either of these audio corpora, please contact us.
Available English Dictational Audio Corpora
✔ Geographically diverse set of 400 speakers each speaking from a set of phonetically balanced utterances for 30 minutes each.
✔ Recorded at 96kHz from 6 different recording sources, including high quality Cardoid Microphone, Lapel Microphone, Omnidirectional conference table mic, and a noise canceling headset mic.
✔ Additionally, audio has been collected and passed through both iPhone/iOS and Android devices through the AT&T phone network.
For more information about license details on either of these audio corpora, please contact us.
Available English (Vocal Fry) Conversational Audio Corpora
MatrixHCI has released audio corpora for the following spoken in the English language. This corpus will include only speakers who exhibit vocal fry in their speech.
It is intended to help increase the accuracy of speech recognition acoustic models for younger generation speakers who use this new approach to speaking.
English (Vocal Fry) Conversational Speech 100 hrs.
✔ Geographically diverse set of 400 speakers each speaking conversationally for 30 minutes each.
✔ Recorded at 96kHz from 6 different recording sources, including high quality Cardoid Microphone, Lapel Microphone, Omnidirectional conference table mic, and a noise canceling headset mic.
✔ Additionally, audio has been collected and passed through both iPhone/iOS and Android devices through the AT&T phone network.
For more information about license details on either of these audio corpora, please contact us.
Available Spanish Conversational Audio Corpora
Spanish Conversational Speech 200 hrs.
✔ Geographically diverse set of 400 speakers each speaking conversationally for 30 minutes each.
✔ Recorded at 96kHz from 6 different recording sources, including high quality Cardoid Microphone, Lapel Microphone, Omnidirectional conference table mic, and a noise canceling headset mic.
✔ Additionally, audio has been collected and passed through both iPhone/iOS and Android devices through the AT&T phone network.
For more information about license details on either of these audio corpora, please contact us.
Available Spanish Dictational Audio Corpora
✔ Geographically diverse set of 400 speakers each speaking from a set of phonetically balanced utterances for 30 minutes each.
✔ Recorded at 96kHz from 6 different recording sources, including high quality Cardoid Microphone, Lapel Microphone, Omnidirectional conference table mic, and a noise canceling headset mic.
✔ Additionally, audio has been collected and passed through both iPhone/iOS and Android devices through the AT&T phone network.
For more information about license details on either of these audio corpora, please contact us.
Available Air Traffic Control Audio Corpus
Audio has been collected from many phases of flight to include Ground, Taxi to Takeoff, Departure, Approach, and en route.
Additionally, audio has been collected in selected major airports during the time of abnormally high traffic activity.
ATC audio corpus can be custom configured to meet your needs. Contact MatrixHCI for a consultation; as well as, to learn how we can help enable speech recognition in the aviation and aerospace sectors.
Audio was collected using custom built J-pole antennas that have been tuned to the middle of the ATC frequency, and all recordings are recorded in raw wav form at 96kHz using MatrixHCI’s remote audio collection bus located directly at each airport for quality reception.
This audio includes both ATC-to-pilot and pilot-to-ATC transmissions.
Additionally, many recordings also have ADS-B data associated with the transcribed recordings.
MatrixHCI offers Custom Software Development with an emphasis on Advanced Cutting-Edge Speech Recognition
If you have a need for specialized custom speech recognition solutions, please contact us for a free confidential consultation.