Custom Grammars

Custom Grammar Development

Speech Recognition Grammar Development and ASR Accuracy Testing

Grammars, along with acoustic models discussed on the next page, are critical components in a successful speech application.  Grammars are the rails that guide the application and connect the voice interface to the decoder.  In fact, accuracy rates for automatic speech recognition (ASR) almost always improve with cycles of grammar testing and tuning.

Grammars describe all the likely terms the core speech engine should expect plus they control when to switch on and off various speech functions.  A professionally tuned grammar can deliver robust results by better defining the target space and whether to acquire or ignore or expand upon certain input.  Grammars are pivotal because they provide the intelligence to make the core engine more efficient and reliable.

MatrixHCI is literally rewriting the rules with its “Grammar Engineering” practice.  Whereas, many ordinary grammars are static XML files, our dynamic approach goes beyond normal conventions to make certain your speech application can identify the key words or phrases it’s expecting.

Please see the Grammar FAQ section below to better understand our grammar tuning and development capabilities:

shutterstock_245893798-2
shutterstock_295386143

What is the difference between a grammar (linguistic model) and an acoustic model?

The acoustic model stores the information about sounds that actually make up words; however, for speech recognition to correctly recognize the proper words and sentences it also needs a grammar file (language model).

A language model is a file that contains the probabilities of sequences of words (unigrams, bigrams, and trigrams).  Within the language model words can also be weighted by importance or frequency occurrence in order to help improve recognition results.

A grammar is a much smaller file containing sets of predefined combinations of words.  These combinations of words are organized into specific ordering structure in what are called “rules.”  The rules help the speech recognition system to better recognize/reject wanted and unwanted speech.

In general, language models are used for dictation applications, and grammars are used in command and control and telephony systems like interactive voice response (IVR) type systems.

When, how, and why are grammars written?

Grammar files are part of the voice user interface (VUI) phase.  Grammars are written to specifications collected about the nature and flow of the application.  Technically, grammars are written to a document standard, such as SRGS or VoiceXML.

Grammar files are used in all voice recognition applications.  Development is required for new systems, but may need to be rewritten during the migration of legacy systems.  When designed properly, grammar files or portions, thereof, can be reused.

shutterstock_268521839
shutterstock_245612560

What is involved in tuning grammars?

Grammar files, like acoustic models, require a cycle of testing and tuning to reach an optimal performance balance.  This process involves setting a benchmark and a target.  In order to improve accuracy, it is necessary to explore and define the following terms in relation to the specific application:

Phonetic Dictionaries – Phonetic models are a phoneme-by-phoneme representation of target words.  Furthermore, we can accommodate a variety of perceived pronunciations.  This is especially helpful for proper nouns (e.g., company, person, or place names) where people may guess how to pronounce them.

OOV – Out of vocabulary words include vocal gestures like “umm, ahh, yeah,” as well as filler responses, such as “I think so” or “ok.”  If the application is aware of these in advance, they can be ignored or perhaps equated with another response (e.g., ok = yes).

False Positives – A false positive occurs when the system cannot rule out a word and decides to accept it as a match because it looks similar to a target word.

What is involved in testing grammars?

Grammar files provide a window to study the interaction between the predefined inputs and the actual decoded audio.  As such, MatrixHCI uses various methods to benchmark existing performance and monitor improvements given any added definition or variance.

Confidence thresholds – This refers to the weights attached to input terms compared to targets in the grammar file.  Confidence scores are used to determine the threshold for acceptance or rejection.  Careful analysis may reveal clusters of words or users that have low scores. Through the testing and tuning process, measures are taken to improve the system accuracy.

Regression testing – This method involves working backwards from errors where continued testing is done to ensure that the fix didn’t introduce further errors.

shutterstock_170899055-2
shutterstock_116283283

What is grammar engineering?

Grammar engineering describes the advanced techniques used at MatrixHCI to push beyond common practices.  We are interested in greater functionality and control through the use of creative grammar logic. We seek to use grammar files to improve security, provide a platform for business rules that steer the dialog, and add more resilience to speech recognition applications, such as conversational input.

For instance, testing may show persons with accents consistently do not meet the minimum confidence thresholds.  In such cases, a second “reserve” grammar with different limits can be activated if the system thinks an accent is present.  This type of iterative software “loop” generates logic and allows differing courses of action based on the real-time input.  Creative AI-based solutions that tighten the margins of present day ASR systems highlight how MatrixHCI’s Grammar Engineering service is redefining the art.

MatrixHCI offers Custom Software Development with an emphasis on Advanced Cutting-Edge Speech Recognition

If you have a need for specialized custom speech recognition solutions, please contact us for a free confidential consultation.