One of my favorite movies growing up was Disney Channel’s Smart House. Pat, the Smart House AI, could play the best music, make you a better dancer, and even kick out the bully when he started causing trouble at the party she organized for the kids without them even asking!
Today’s voice activated “assistants” like Siri, Cortana, and Alexa have been on the market for years now, but still seem to fall short of capturing the imagination like the voice-activated assistants inhabiting cultural masterpieces like Smart House and 2001: A Space Odyssey. Serving more as voice-controlled UI translators, today’s assistants match trigger words and phrases to fairly specific actions. Sometimes machine learning helps make these queries more flexible, but in the end we’re still asking a device to complete an action.
This call and response behavior has kept me wondering when these services could truly be considered assistants. As a designer, I’m thrilled by the future possibilities of asynchronous assistant workflows. Imagine being able to ask Google to actively monitor flights and prompt you only when needed to confirm the selected ticket. We have conversation bots and UIs, proactive discovery engines like Google Now, and vast knowledge graphs, but the promise of a true digital assistant still eludes us.
To me, a digital assistant must be able to meet four criteria before it can be considered more than a conversational user interface. Pat form Smart House met all of these. In 1999.
Support conversational and sequential requests
When I speak with my assistant, it should continue to listen and respond in appropriate circumstances. With Google Assistant, I should be able to reply naturally, rather than having to prepend the wake phrase of “OK, Google” with every subsequent request. Additionally an assistant should be able to understand and respond a string of requests like “Turn on the TV, dim the lights, and play The OA”.
Be consistent across surfaces
A digital assistant should be consistent across all of the surfaces from which a user can access it. It should be aware of the data and functions available from a phone, a speaker, a computer, car, etc. and be able to route your request accordingly. If I ask to text my mom from my computer, it should be able to send it from my phone, no questions asked. At a minimum, the assistant should support graceful degradation and ask if I want to complete the task on another device.
Provide asynchronous queries and autonomous resolutions
Assistants should be able to receive complicated, non-immediate requests and work on them in the background while continuing to respond in the present. A great example of this is offline scheduling, which a number of startups like Clara Labs have started to offer. My assistant could speak with you or your assistant and notify me when we have a meeting scheduled or some options available.
Finally, the assistant should be able to proactively offer insight when needed. This could be alerting me to the resolution of an asynchronous request, or something even more proactive like noticing that I’m about to leave for work and autonomously alerting me to an accident on the way.
Individually most of this functionality is already out there, but the major digital assistants are unable to meet all four criteria. The building blocks are already in place, with asynchronous and proactive tasks still out of most of our reach. Asynchronous and proactive tasks provide incredibly complex challenges for designers and engineers, but as with scheduling we’re already starting to see the potential. By improving on the first two criteria of a digital assistant, we can at least start to make Alexa, Cortana, Siri, and Google behave more like us, like a human assistant, so that they can eventually become a more natural extension of our technology and our homes – just like Pat.