rc3.org

Strong opinions, weakly held

Voice interfaces and third party app integration

If voice-activated user interfaces like Siri in the new iPhone 4S really take off, third party application developers are going to want in on the action. As John Gruber points out in his iPhone 4S review, that poses an interesting set of problems:

People are going to start clamoring for third-party Siri integration as soon as they see Siri in action. But I’m not sure what form that integration could take. Best I can think is that apps could hook up to (as yet utterly hypothetical) Siri APIs much in the same way that Mac apps can supply system-wide Services menu items. But how would they keep from stomping on one another? If Siri supported third-party apps and you said, “Schedule lunch tomorrow at noon,” what would Siri do if you have multiple Siri-enabled calendar apps installed? This is similar to the dilemma Mac OS X faces when you open a document with a file extension that multiple installed apps register support for.

And here’s a specific example of what he’s talking about, that involves only the built-in applications:

Here’s an example. Wolfram Alpha has terrific stock-price information and comparison features. I link to them frequently for stock info from Daring Fireball. So I tried asking Siri, “What was Apple’s stock price 10 years ago?” But once Siri groks that you’re asking about a stock price, it queries the built-in Stocks app for data, and the Stocks app doesn’t have historical data that goes back that far. “What did Apple’s stock price close at today?” works, but asking for historical data does not. But Wolfram Alpha has that data.

Working around those sorts of problems is difficult with regular touch or point and click interfaces — it’s easy to wind up in Preferences Hell. Dealing with them at the voice level is going to be even more complex.

2 Comments

  1. The 3rd-party-app I really want to be Siri-enabled (this, having not upgraded at all yet, just after seeing demos) is Omnifocus – and my thought was that you’d have to speak directly to the app you want to listen to you, as in “Omnifocus, create a project called XYZ.”

    Also: “MLB, what’s the score of the Nats game right now?” Hehe.

  2. Android has already solved this using something called intents. It consists of an action and some data. For example the action could be VIEW and the data a url, but there is lots more to it such as action of SHARE and data as some text, action of DIAL and data of a phone number etc.

    Any installed program can say which intent actions and data prefixes they handle. This provides lose coupling between actions and apps that handle them.

    Some folks are trying to bring this system to the web in general – see http://webintents.org

    You can get more of a feel on Android from the developer doc at http://developer.android.com/guide/topics/intents/intents-filters.html and http://developer.android.com/reference/android/content/Intent.html

Leave a Reply

Your email address will not be published.

*

© 2024 rc3.org

Theme by Anders NorenUp ↑