Audio Corpora Data

Audio Corpora Libraries

CUSTOM DEVELOPED AND EXISTING AUDIO CORPORA

Acoustic Speech Corpora Libraries & Licensed ASR Training Data

MatrixHCI develops and licenses high-quality acoustic speech corpora designed for training and improving automatic speech recognition (ASR) systems. Our libraries include conversational, dictational, aviation, and vocal-fry speech recordings in both English and Spanish—captured from geographically diverse speakers across multiple microphone types, mobile devices, and network pathways. With raw 96kHz recordings and professionally aligned transcriptions, our datasets provide the accuracy, realism, and linguistic diversity needed to build next-generation speech engines. If your organization requires specialized, modern corpora for ASR training or speech research, MatrixHCI offers scalable licensing and custom dataset design.

Existing Corpora Libraries

MatrixHCI is developing a variety of quality acoustic speech corpora for use in speech recognition research and development. These libraries are available for license to companies in need of large collections of transcribed speech samples.

The libraries that that have been developed for release include:

✔ Conversational speech

✔ Dictational speech

✔ Aviation-related speech corpora.

Spoken languages, include English and Spanish. We are also compiling an audio corpus of English speakers that use “vocal fry” in their speech, technically called “creaky voice.” Vocal fry is produced by compressing the vocal chords which reduces the airflow through the larynx and the frequency of vibrations causing speech to sound rattled or “creaky,” fry.

It is most common among younger females populating in the US, and is becoming evermore pervasive in today’s social society of communication.

Unfortunately, vocal fry can cause speech recognition accuracy problems; however, this new audio corpus is specifically designed to bridge the gap and allow acoustic models to better handle this new type of speaking behavior.

Available English Conversational Audio Corpora

MatrixHCI has released audio corpora for the following spoken in the English language.

English Conversational Speech 200 hrs.

✔ Geographically diverse set of 400 speakers each speaking conversationally for 30 minutes each.
✔ Recorded at 96kHz from 6 different recording sources, including high quality Cardoid Microphone, Lapel Microphone, Omnidirectional conference table mic, and a noise canceling headset mic.
✔ Additionally, audio has been collected and passed through both iPhone/iOS and Android devices through the AT&T phone network.

For more information about license details on either of these audio corpora, please contact us.

Available English Dictational Audio Corpora

English Dictational Speech 200 hrs.

✔ Geographically diverse set of 400 speakers each speaking from a set of phonetically balanced utterances for 30 minutes each.
✔ Recorded at 96kHz from 6 different recording sources, including high quality Cardoid Microphone, Lapel Microphone, Omnidirectional conference table mic, and a noise canceling headset mic.
✔ Additionally, audio has been collected and passed through both iPhone/iOS and Android devices through the AT&T phone network.

For more information about license details on either of these audio corpora, please contact us.

Available English (Vocal Fry) Conversational Audio Corpora

MatrixHCI has released audio corpora for the following spoken in the English language. This corpus will include only speakers who exhibit vocal fry in their speech.

It is intended to help increase the accuracy of speech recognition acoustic models for younger generation speakers who use this new approach to speaking.

English (Vocal Fry) Conversational Speech 100 hrs.

✔ Geographically diverse set of 400 speakers each speaking conversationally for 30 minutes each.
✔ Recorded at 96kHz from 6 different recording sources, including high quality Cardoid Microphone, Lapel Microphone, Omnidirectional conference table mic, and a noise canceling headset mic.
✔ Additionally, audio has been collected and passed through both iPhone/iOS and Android devices through the AT&T phone network.

For more information about license details on either of these audio corpora, please contact us.

Available Spanish Conversational Audio Corpora

MatrixHCI has released audio corpora for the following spoken in the Spanish language.

Spanish Conversational Speech 200 hrs.

✔ Geographically diverse set of 400 speakers each speaking conversationally for 30 minutes each.
✔ Recorded at 96kHz from 6 different recording sources, including high quality Cardoid Microphone, Lapel Microphone, Omnidirectional conference table mic, and a noise canceling headset mic.
✔ Additionally, audio has been collected and passed through both iPhone/iOS and Android devices through the AT&T phone network.

For more information about license details on either of these audio corpora, please contact us.

Available Spanish Dictational Audio Corpora

Spanish Dictational Speech 200 hrs.

✔ Geographically diverse set of 400 speakers each speaking from a set of phonetically balanced utterances for 30 minutes each.
✔ Recorded at 96kHz from 6 different recording sources, including high quality Cardoid Microphone, Lapel Microphone, Omnidirectional conference table mic, and a noise canceling headset mic.
✔ Additionally, audio has been collected and passed through both iPhone/iOS and Android devices through the AT&T phone network.

For more information about license details on either of these audio corpora, please contact us.

Available Air Traffic Control Audio Corpus

MatrixHCI has released audio corpora for Air Traffic Control (ATC) transmissions from many major airports throughout the US.

Audio has been collected from many phases of flight to include Ground, Taxi to Takeoff, Departure, Approach, and en route.

Additionally, audio has been collected in selected major airports during the time of abnormally high traffic activity.

ATC audio corpus can be custom configured to meet your needs. Contact MatrixHCI for a consultation; as well as, to learn how we can help enable speech recognition in the aviation and aerospace sectors.

Audio was collected using custom built J-pole antennas that have been tuned to the middle of the ATC frequency, and all recordings are recorded in raw wav form at 96kHz using MatrixHCI’s remote audio collection bus located directly at each airport for quality reception.

This audio includes both ATC-to-pilot and pilot-to-ATC transmissions.

Additionally, many recordings also have ADS-B data associated with the transcribed recordings.

MatrixHCI ATC Recognition and Management 2

MatrixHCI offers Custom Software Development with an emphasis on Advanced Cutting-Edge Speech Recognition

If you have a need for specialized custom speech recognition solutions, please contact us for a free confidential consultation.

CONTACT US