Apple has shared a new paper in its Machine Learning Journal providing technical details on how it works to improve Siri’s ability to recognize names of local points of interest. In a post titled Finding Local Destinations with Siri’s Regionally Specific Language Models for Speech Recognition, Apple explains how iOS incorporates knowledge of the user’s current location into the speech recognition system behind Siri, to provide a level of accuracy beyond mere general speech recognition to the point of allowing Siri to more accurately recognize named entities like local businesses.
While Apple acknowledges that most virtual assistants have little problem correctly recognizing and understanding the names of high-profile businesses like Starbucks, it’s more of a challenge to accurately identify the names of millions of smaller, local businesses and services that users ask about. To address this, Apple began by incorporating knowledge of the user’s location into the speech recognition system, not only to identify businesses that might be near a user’s location, but also to build a regional acoustic model of how users might be likely to pronounce names of local businesses, as well as addressing the complexity of hundreds of business names that have little to no representation in Siri’s language model database.
Apple created 169 customized language models, called “Geo-LMs,” for each of the Combined Statistical Areas (CSA) in the United States, along with a single global Geo-LM to use in situations where the user is either outside of one of the predefined areas or their location simply can’t be identified. Each Geo-LM contains additional data that gets fed into Siri alongside the standard acoustic models to provide additional speed recognition data specific to that user’s region, helping Siri to better understand the user’s intended sequence of words as well as region-specific diction and pronunciation of business names.