Transcription Services

“When it Comes to Training Speech Recognition Systems it’s all About the Acoustic Details”

Custom or Existing Speech Corpora?

There are many factors to consider when choosing audio data for use in training your acoustic models.  First, is there reliable audio speech data that is available that can be used in a beneficial way to create your custom acoustic models?  There are not a lot of avenues available where you can select and license audio that has been appropriately recorded and transcribed in a fashion that is suitable for acoustic model development.  Most speech recognition companies that develop their own acoustic models hold them close and do not open them up to other companies for future development and customization to particular acoustic needs and environments.  Products like Siri, Google, and Nuance to mention a few, collect audio from users, but they do not make that audio available to third parties for use in custom-focused development.

There are a few sources that sell acoustic speech corpora, but these libraries are limited, very costly, and in many cases, do not fit the exact needs of client requirements.  In addition, many of them are extremely limited and also collected over ten years ago presenting issues with current audio properties, such as conversational social media speech, and also, vocal fracking that has been adopted by today’s younger generation.

When possible, MatrixHCI will use audio corpora that is already available as it saves time and resources.  This previously collected and transcribed audio can be used to build “baseline” models that are then continually added to buy collecting audio from your own environment and used to further train and refine your acoustic models.  Audio collected outside of your actual environment is called “out of domain” audio.  Whereas, audio that is collected using your system and by your customers is considered “in domain” audio which is highly preferred after a baseline acoustic model is developed because it contains the actual audio artifacts that your system configuration produces.  It is estimated that 30 hours of “in domain” audio is equivalent to 300 hours of “out of domain” audio.

Baseline acoustic models are initially created with the goal of about 70% recognition accuracy, and these baseline acoustic models are then improved and refined using your “in domain” audio as it is collected and transcribed.  Because of the vacuum of good usable and affordable acoustic audio corpora, MatrixHCI is in the process of creating high quality audio corpora for use in building these baseline models in ways that yield the highest initial accuracy rate.  MatrixHCI will choose, when available, the best existing audio corpora for your projects, and when necessary, will create audio corpora where none exists for your needs from outside sources or from MatrixHCI’s own internal speech corpora library.

Screen Shot 2015-03-19 at 9.31.16 AM

MatrixHCI Speech Corpora Library

MatrixHCI is planning an initial line of speech corpora for 2016 and its initial speech corpora library offering will include:

  • Spanish Conversational  Speech Corpora  (Blended Geographical Accents)
  • Spanish Dictational Speech Corpora  (Blended Geographical Accents)
  • English Conversational  Speech Corpora  (Blended Geographical Accents)
  • English Dictational Speech Corpora  (Blended Geographical Accents)
  • English – Vocal Fracking Conversational Speech Corpora (Blended Geographical Accents)
  • English – Vocal Fracking Dictational Speech Corpora (Blended Geographical Accents)

Each of the above speech corpora are available in four different microphone capture configurations in order to best meet your needs (Cardiod Mic, Lapel Mic, Omni Directional Conference Mic, and Noise Canceling Headset Mic).

Additionally, the above audio speech corpora have also been captured on both the Android and iPhone devices through AT&T wireless network.

  • Air Traffic Control (ATC) and Pilot-to-ATC transmissions at a variety of airport locations.

The ATC/Pilot speech corpora have been captured professionally on our mobile audio capture bus using custom built J-pole antennas tuned to the center ranges of FAA ATC transmission frequencies and cover all states of flight, including ground, departure, arrival, and en route.

Additionally, many of our captured hours also have accompanying ADS-B data associated with each recording.  All audio recordings are captured at 96kHz using the best multi channel recording devices available in the industry.

Please refer to our audio corpus price and availability list for more details.

Capturing You Own Speech Corpora

A well-trained acoustic model uses less data not more.  It is actually possible to over-train an acoustic model, overloading it with hours and hours of speech data that ultimately reduces the actual speech recognition accuracy instead of increasing it.

The age old saying “Garbage in … Garbage out” applies when it comes to training your acoustic models for their highest accuracy potential.

That is why it is so important to carefully select ongoing audio corpus data from your customer’s real audio environment when training or retraining your acoustic models.

In order to provide the very best audio corpus data for your ongoing acoustic model training, your data needs to be carefully and accurately transcribed with a focus on speech recognition training.

If you capture voice data from your customer sites and it’s improperly or inaccurately transcribed and used in retraining your acoustic models, it can have a serious negative impact on the speech recognition accuracy of your speech interface.

shutterstock_138796304 (1)

Matrix Speech Transcription Services

MatrixHCI is a leader in the development and training of custom speech recognition systems in the industry.  As a result, we know what transcription data needs to be captured from your ongoing audio samples in order to give you the best retraining results for your speech system.

When your business model allows you to collect real-time “in domain” live voice data, you have a gold mine of accurate audio samples that truly represent your client’s or customer’s needs.

With proper transcription of your audio, you can use it to further increase your speech recognition interface accuracy, and also allow your system to change with your customer’s audio requirements.

Custom Speech Recognition Transcription Services

MatrixHCI offers you a highly trained transcription service designed specifically with the tuning and retraining of your acoustic model in mind.  No other transcription service offers audio transcription solely focused on providing data that is necessary for improving the accuracy of your acoustic models.

Because we train and tune acoustic models, we know specifically what to extract from your audio samples that will produce the best results for increased accuracy of your speech recognition system.  MatrixHCI takes into consideration and identifies a myriad of data that is critical to the ongoing retraining and tuning of your speech recognition system.  Here are just some of the factors that are considered and captured when we transcribe your audio for use in future retraining and speech recognition tuning:

– Gender, Age, Language, Accents, Background, Capture Device, Word Location, Sentence Location, Unintelligible Words, Mispronounced Words, Sampling Rate, File Compression, Audio Quality Analysis, Type of Speech Environment, Phonemes, and more …


Taking Speech Recognition Full Circle

MatrixHCI customers that have the capability to capture and collect real-time live audio from their customers, have the ability to constantly retrain and tune their speech recognition system to the ever changing environment of their customers.  This allows you to continually adapt your speech system  to the immediate needs of your customers … especially when you branch out into new markets that contain different types of speech that you have encountered before in your system.

Constant Speech Recognition Evolution

With our new custom audio transcription services you can now have a speech recognition system that constantly evolves with your customers.  As the market changes or the speech characteristics of your customer base changes, your system will be ready to handle your customer’s constantly changing needs and requirements.  Providing them with the level of satisfaction that they need to guarantee a positive interactive experience with your speech enabled system.


MatrixHCI Custom Transcription Services

If you have audio that you are collecting from your customers or if you need us to collect audio for you, let MatrixHCI show you how you can put this critical data to work in order to improve the quality of your speech interface.

Additionally, if you have audio that is available for capture but you aren’t capturing the audio yet, ask us how we can capture and transcribe this data for you so that you can use it in improving your speech system and your customer’s overall satisfaction with the services that you offer.

Please contact MatrixHCI to learn more about  how we can help you meet your current and future speech application and transcription needs.

MatrixHCI offers Custom Software Development with an emphasis on Advanced Cutting-Edge Speech Recognition

If you have a need for specialized custom speech recognition solutions, please contact us for a free confidential consultation.