Blog

What Does Google Speech Recognition API Mean For The Industry

Posted by Chris Kikel

Comments Off

Google Voice Recognition API

In March 2016, Google announced that they were making their voice recognition technology accessible to third party software developers. Developers will be able to provide speech support access in their applications to Google’s speech-to-text technology through a set of cloud-based interfaces. We must ask, what does the Google speech recognition API mean for the industry?

Key Technology Differences

Google’s API is implemented as cloud-based technology. The biggest objections stem from requiring that the translation source be connected to the web. This has the potential to introduce processing latency. Additionally, it increases some loss of privacy and vulnerability to unauthorized access of data that must reside on web-based servers.

By far the Google API excels in many areas including:

Support for more than 80 languages
Superior performance in noisy situations
Real time translations
Machine learning algorithms that continuously improve recognition with use
Outperforms its competitors in speech recognition accuracy

Google vs Nuance

Nuance has been a recognized leader in voice recognition since the 1990’s. However, Google is well poised to dominate the market given the popularity of Google in general.

The thing about Nuance is that they have dedicated solutions. For example, Dragon Medical is designed specifically for healthcare professionals. There is a legal version for lawyers and a professional version for your serious business professionals. Additionally, Nuance boasts a slightly higher accuracy level at 99% accuracy compared to Google’s 92% accuracy rate.

Google vs Siri

Apple’s Siri is a voice recognition interface limited to iOS devices. Google can run across platforms including Mac computers and iOS smartphones and tablets.

Only recently has Apple begun to bring in software developers to add machine learning support for their speech recognition. Google has the advantage of their machine learning technology, which is implemented by extremely sophisticated neural networks.

Google also has a higher number of supported languages. Siri’s speech recognition accuracy rate is around 88% percent, just below Google’s 92% rate.

Google vs Microsoft Cortana

Microsoft’s Cortana is a speech recognition technology that ships with Windows operating systems. Cortana is also available as an API that Microsoft makes available to software developers. However, Cortana’s language support is the most limited of all the speech recognition API’s mentioned. Furthermore, support is only offered for 11 languages and 3 regions, significantly lower than the 80 languages supported by Google’s API.

Cortana API is supported on iOS and Android, though the speech recognition has been criticized for performance not on par with Google’s or their other competitors. This will be an impediment to Cortana’s acceptance and popularity.