Origins: Arriving Here

13 November 2008


Here you are.

You are Oblong. There are different ways to count, and depending how you do it you are two and a half years old; or you are four years old; or almost fifteen; or you’re a quarter century; or you appear to have been born on 13 November 2008. You have many parts, and ideas have arrived from all over the place to build you, but there are some central strands too. Go back.

1.   It’s about 1994. Part of you is pursuing a new line of research at the MIT Media Laboratory, trying to make information more literally spatial. Your feeling is that, ten years in, the GUI that’s taken over the world’s idea of interface isn’t getting at everything there is. Substantial swaths of human brain are dedicated to understanding space, understanding geometry, understanding physical structure. A cartoon of a messy desk surface doesn’t much tax these swaths. The swaths can work harder, ought to be made to. You propose that informaton — and maybe especially the newly-blooming internet — has a topology but not yet a topography.

You build a number of examples of how giving information a topography might work. You build something like a flight simulator for music, in which a listener-pilot swoops and zooms and dives above a vast active living musical score. The music plays in synchrony with dynamic elements that swarm atop the score to keep track of time, to keep track of instruments. It’s true that one of the elements is a blue ball that bounces in time to the music and touches down on the notes that are playing just then. Also true: a ball like that can never be entirely serious, but it can be entirely effective. What works is assisted synaesthesia, making sound seem like sight and looking seem like hearing. The time in which music happens is turned into the space of the animated score.

What are the shortcomings of the flight simulator for music? One is that it appears on a screen, a big CRT. The musical topography is dimensional and beckons, but you’re separated from it by glass. The second is that you must fly using a mouse. There are many ways to navigate above and around a self-playing musical score, many different vantages that reveal something useful or unique or desirable; and the transitions between vantages are equally illuminating. Transitions can show how things are connected. But the mouse only moves two ways. People call those ways X and Y.

2.   Excellent persons who work on VR solve the first shortcoming by putting the pilot behind the glass, right back there with the graphics. The goggles wipe out the regular world so that the graphics world can enfold the pilot. Yet here in the mid 1990s, it suddenly seems to you that perhaps something is backwards. For one thing, Jeff Bridges had a fairly rotten time when pulled into the graphics by the MCP in the movie TRON. For another, it’s not the real world that is the problem. Many parts of the real world are perfectly good, and people spend a lot of time becoming experts at using the real world. Perhaps the problem is that the graphics aren’t enough like the real world. This does not mean that the graphics should look more like the real world. Your brain does not actually care about that. It means that perhaps the graphics should behave more like the real world.

And the second shortcoming: when you need to describe to someone else how you want the music landscape to move, you move your two hands in midair. The someone else instantly sees the music landscape in your hands, even though the only thing between your hands is midair. This is because you and the someone else are experts at holding things and moving them around and seeing what things look like as they’re held and moved. When you need to tell the flight simulator for music this same thing, one of your very capable hands descends to clutch the mouse and scrape it around. You scrape it in X and Y. Your fingers, clutching, can’t do anything else. Your other hand is useless.

So it suddenly seems to you here in the mid 1990s that putting the graphics in the real world might possibly solve both problems right at the same time.

3.   Just then at MIT it turns out that there are a few amazing people who are already thinking about other aspects of that very same idea: the idea that the real world is very useful and computers ought to make an effort to do more there. These people call themselves the Tangible Media Group. They’re smarter and have been thinking about this stuff longer. Because it’s the most natural thing in the world, you join up with them.

4.   The first thing to figure out is how to get graphics into the real world. You mean for graphics to go actually everywhere. Graphics should be able to go on floors and walls, which stay put, but also on tables, which only mostly stay put, and on chairs and clipboards and pets and people and all else that stays put rarely. These surfaces can’t display graphics on their own, and shouldn’t have to. So you’ll use something like a projector, or several. It occurs that the place to put projectors is where lightbulbs are. A lightbulb always has a privileged position. You suggest that maybe the lightbulb is already a projector, a projector that sends out a single giant pixel. So the world’s lightbulbs just need to be granted higher resolution. Then the planetwide infrastructure for illumination can be an infrastructure for information.

One-way information isn’t very good. But you’re already putting a little projector in the new lightbulb. The bulb’s glass lets light pass both ways, so something should be done with the light that moves from outside in. Put a video camera in there too, immediately next to the tiny projector.

This is the I/O Bulb: an input-output lightbulb.

5.   An I/O Bulb looks out at the world, and so any computation that uses an I/O Bulb has a way to figure out what’s going on out there. Then it can paint a piece of the world with information, in response to what’s going on. An I/O Bulb grips the real world on both sides, info coming and going. Is a single I/O Bulb enough? It is not. You are serious when you say that every surface should be a potential site for interaction. You don’t mean everywhere all at once. There must be discretion, which is the same as saying design. But still: it might need to happen anywhere, so you have to make it work everywhere. Use more than one, then. Put enough I/O Bulbs around a room so that every corner, every cranny can be lit up with interaction. Coordinate their activities until computation that depends on them sees the room as one continuous input-output space.

This is the Luminous Room: architecture with computation respectfully overlaid.

6.   You build different I/O Bulbs, to test what it means, to show the idea working. You build one on a weighted base with a small projector and miniature camera on hinged rods. It looks like a Luxo lamp. This is an I/O Bulb for a desk, for producing a small adjustable spotlight of interaction. You build a larger one for the ceiling. A motor lets it rotate around a vertical axis. A mirror is on board, mounted at an angle on a second motor. Together the two motors can sweep the beam of interaction around to any part of the room. The Seiko Epson company shows unutterable generosity. As part of a collaboration, Seiko Epson builds a tiny I/O Bulb that is actually the size of a normal lightbulb. It is beautiful. You build other fixed I/O Bulb configurations, optimized to serve particular tables, specific walls. Always a projector and a camera. Always light going both ways.

This is the hardware for building Luminous Rooms.

7.   Now you need software. The software must manifest a purpose, some purpose for which anyone would want to use a Luminous Room. There ought to be no end of purposes like that. Choose a few.

You build a prototyping workbench for optical engineers. You build a fluid flow simulator, a kind of digital wind tunnel. You build an environment for testing urban planning scenarios. The optics workbench has little models of lasers, mirrors, lenses, beamsplitters. The I/O Bulb locates these for the benefit of an optics simulation. The I/O Bulb projects the calculated laser beam path down onto the table, aligned precisely with the physical models. When you move or rotate the optics models, the beam path reacts instantly.

In every way but one they’re two wholly different things, the physical models and the projected simulation results. Perceptually, they are not different at all. Perceptually, they are part of the same causal system. This is all that matters. When you rotate a mirror, the beam reflected by it sweeps around accurately. The simulation happens in the real world.

The fluid flow system accounts for the presence of real-world objects. Real-world objects are valid obstacles in the simulated wind tunnel. You can see the projected fluid diverting around the obstacles — your hand, a sandwich, a drafting tool — as you know fluid has to. Physical models of buildings in the urban planning testbed cast computational shadows. Trying a different orientation means twisting the building with your hand. The shadows change accordingly. You understand the cause and effect with your hands and with your eyes.

This is how the world works.

Through 1998, 1999 you build many other things in the Luminous Room. You build a whole office whose walls know how to store digital objects in physical containers that you can hold. In this office unwanted objects, physical and digital alike, go in a real trash can. You install a second workbench across the room from the first. When you swivel a laser model and aim it toward the new workbench, the beam that disappears off the first table reappears on the second.

Of course it works this way. This is the way human minds know the world to work, so this is how things must. It is not optional.

8.   To make all this happen, you’ve designed the Luminous Room software to understand space. It acknowledges that people and objects and architecture all occupy one continuous space. But it also acknowledges that projected pixels end up in that same space.

So in the Luminous Room, every pixel has real-world coordinates.

9.   Toward the end of this time you also begin building applications that depend on no external physical objects. These applications concern themselves only with human hands. For many kinds of work, free hands enable highly effective interaction. Human hands, excused from having to grasp other objects, are incredibly dextrous, sophisticated, and expressive instruments for conveying intent.

Pointing, pushing, palping, poking: this is how hands convey geometric and spatial intent, superbly. Arranging fingers into specific poses or describing trajectories through the air: this is how hands describe semantic intent, gracefully.

In this mode, the interaction now involves hands and pixels solely. You find that the programmatic formalism of pixels existing in real space — in the same space as the hands — is of redoubled importance. Every screen, every display has a real-world location, size, orientation. These properties are imparted to the pixels housed by each display. Collections of pixels arranged to represent something coherent in turn inherit that geometry.

Everything that appears on a display is a real object in the world.

You begin to design a language of hands, a gestural language. The vocabulary and grammar of this language are very much about describing, manipulating, and navigating through space, but the language also prescribes additional elements that render it general. You intend this language to be broad and expressive enough, in fact, that it can drive a general purpose computing environment. Such an environment will not look much at all like existing general purpose computing environments.

It is a few months later that Minority Report visits your lab. The great production designer Alex McDowell and the legendary propmaster Jerry Moss have arrived to see MIT’s emerging technologies. They intend to extrapolate forward from what they find, in order to depict the plausible future demanded by the film’s director. They spend a lot of time with the Luminous Room systems.

10.   It’s the new millennium, and you are three thousand miles farther west. You are the science & technology advisor to the film Minority Report, which is in preproduction. You have the broad charge of insuring that all the future technology seen in the movie’s 2054 is plausible. But your major responsibility is to design the interface that will be used in several key scenes. In these scenes, characters must exercise an astonishing control over vast streams of image and video data.

You adapt the gestural language from the Luminous Room work. You train the actors to use this language. They become adept, even though it is partly an exercise in mime. The production will shoot the actors performing gestural tasks in front of an enormous transparent screen, but the screen will be blank, a prop. Graphics will be composited onto the screen in post-production. You understand that for a sense of causality to emerge the actors must be able to clearly visualize the effects of their gestural work. You assemble a training video showing this.

When the time comes to shoot, the director explains what sequence of analysis should occur in each scene. You translate this into the gestural language. You explain what graphical elements the actors would be seeing on different parts of the screen, how they are manipulating those elements. You make sure that a detailed account of all this is subsequently available to the editor and the visual effects people. Continuity of the original intent is critical. The cameras roll.

The movie appears in 2002. The scenes of gestural computation show something apparently real. You are a nascent part of Oblong, and it seems to you that the promise of a new kind of human machine interface has been vindicated a second time. The actors in the film are some of the very first g-speak users.