Put What Where?
Like many people, my husband and I have been increasing our home's intelligence as of late. We can vacuum the house while we're gone, program the lights to simulate our presence, and turn the heat up at 3am without leaving our bed. Santa brought our home a cerebral cortex to coordinate all these smart gadgets. I won't divulge which digital assistant he brought, but I can say it's not Jarvis.
As I think about what we might automate and voice control next, I can't help but think about what I can't automate: "Alexa, put that there". Alexa is going to choke on that one. I'm sure this might not bother many people, but as a developer who has worked with an SDK for 8 years which enables coordinating the physical and digital worlds into the same space as you, it bothers me. In an advertisement for Google Home, a little boy asks Google to show him a star system "on the TV". Knowing modern families, they likely have more than one TV, and in reality, Google would have responded with the dreaded "which TV?" and then slowly and methodically, listed off every TV in the house. I want that little boy to point at the TV and say "that one.
" The software behind my home's automation is still missing two critical pieces of information that we as humans learn before we can talk—space and time. Let's talk about space first. A child, before she can say "car", can point to one when asked. Inversely, without understanding the concept of a question, can point to something in the room and expect an adult to tell her what it is. My husband is excited for a digital assistant so he can ask it questions about the world with our son, but it won't be able to answer the very first question my son is capable of asking with only a gesture, "what is that?"
Oblong's g-speak SDK builds the real world location, size, and orientation into the core of every object, which allows us to calculate the "where." Looking back at the Google Home example, if the system knows where the boy is, the direction he's pointing, and where all the TVs are located in the house, then it can compute the television intersected by the directional ray extended from his pointing finger. In g-speak, we call this intersection calculation a "WhackCheck." Voilà, we know where "there" is.
But what about "that?" That's where time comes in. Yes, you can ask Siri® to set an alarm at a given time or remind you to take the turkey out of the oven in 4 hours, but there's a subtle issue with time in the request to "put that there." Siri® needs to know the mapping of time and space to determine where you were pointing when you said "that" vs. where you were pointing when you said "there."
Our version of Put That There using g-speak, the Google Cloud Speech API, and a Leap Motion device.
This past summer, we lovingly threw together a re-enactment of Chris Schmandt's 1979 "Put That There" demo using g-speak, the Google Cloud Speech API, and a Leap Motion device. Ignoring issues with false hand and voice recognition data, the only challenge we needed to “hack” around was that most of the current cloud-based speech to text services fail to provide timing information for specific words. You stream or upload voice audio, and receive back the text, but there is no direct way to determine where the user was pointing when they said “that” without knowing when they said it. Time takes a bit of a back seat in today's public SDKs. In g-speak, we place time at equal footing with space so that every event / action / reaction is associated with time and can be played back.
Maybe building my own Jarvis isn't such a bad idea. We have devices that can scan a room and create a 3d model. We have increasingly better voice recognition software and AI to understand what our questions mean. We have hand-tracking devices that work decently well in indoor lighting. And we have g-speak to bring it all together. It’s on, Zuckerberg.
Working with Watson
The goal of each Watson Experience Center—located in New York, San Francisco, and Cambridge—is to demystify AI and challenge visitor’s expectations through more tangible demonstrations of Watson technology. Visitors are guided through a series of narratives and data interfaces, each grounded in IBM’s current capabilities in machine learning and AI. These sit alongside a host of Mezzanine rooms where participants further collaborate to build solutions together.
The process for creating each experience begins with dynamic, collaborative research. Subject matter experts take members of the design and engineering teams through real-world scenarios—disaster response, financial crimes investigation, oil and gas management, product research, world news analysis—where we identify and test applicable data sets. From there, we move our ideas quickly to scale.
Accessibility to the immersive pixel canvas for everyone involved is key to the process. Designers must be able to see their ideas outside of the confines of 15″ laptops and prescriptive software. Utilizing tools tuned for rapid iteration at scale, our capable team of designers, data artists, and engineers work side-by-side to envision and define each experience. The result is more than a polished marketing narrative; it's an active interface that allows the exploration of data with accurate demonstrations of Watson’s capabilities—one that customers can see themselves in.
Under the Hood
Underlying the digital canvas is a robust spatial operating environment, g‑speak, which allows our team to position real data in a true spatial context. Every data point within the system, and even the UI itself, is defined in real world coordinates (measured in millimeters, not pixels). Gestures, directional pointing, and proximity to screens help us create interfaces that more closely understand user intent and more effectively humanize the UI.
This award-nominated collaboration with IBM is prototyped and developed at scale at Oblong’s headquarters in Los Angeles as well as IBM’s Immersive AI Lab in Austin. While these spaces are typically invite-only, IBM is increasingly open to sharing the content and the unique design ideas that drive its success with the public. This November, during Austin Design Week, IBM will host a tour of their Watson Immersive AI Lab, including live demonstrations of the work and a Q&A session with leaders from the creative team.
Can't make it to Austin? Contact our Solutions team for a glimpse of our vision of the future at our headquarters in the Arts District in Los Angeles.