September 9, 2010

Making TV more like live theatre

I haven't done a blog entry in far too long, so here goes...

What follows is an idea I had more than a decade ago.

There is an old line:

Theatre is life
Film is art
Television is furniture.

There is a lot of merit in that.  Live theatre is immersive; it brings the audience and the actors together each reacting to the other; the degree of emotional involvement of the audience with the actors is beyond anything that exists in the world of film or television.

So I wondered, way back in the last century, what would it take to make television more like live theatre.

Back then I was working on network video - I had helped to start Precept Software back in 1995 where we invented IP/TV.

Sometime before year 2000 I was working on fast cut insertion of commercials tailored to each individual viewer.

I revisited an earlier idea, that of doing video conferencing using avatar images that would be morphed in real-time - See 000147.html  I thought, perhaps we could dynamically alter the content of the video stream on a per viewer basis?

My first idea was that we ought to develop a meta track that would exist alongside the video and audio tracks of a program.  That meta track could contain descriptions (for example pixel lists) and dynamics of objects in the video.  Remember the old movie "Repo Man" in which the scenes contained generic product packages - like supermarkets used to sell.  Imagine that this meta track contained the dimensions and motion descriptions for things like generic drink cans.  With that information a set-top box or video head-end could do very late patching - just before delivery or rendering of the video to the viewer - in which the generic object would be replaced by a particular products.

That way we could deliver softdrink product images to teenage viewers and beer product images to adults.

I went further and thought that we could do the same kind of mapping for audio and for video elements such as faces - thus allowing viewers, perhaps, to replace the actors' voices and faces with those of the viewer.  (I figured that one of the big customers for that kind of technology would be the porn business.)

But that still did not get me the kind of theatrical experience that I was trying to recreate.

So I thought - suppose we put sensors - things like directional microphones - into viewers' set-top boxes?  With directional microphones I could design a video that could be morphed in response to sounds in the user's viewing space.

For example consider the old move "Wait until Dark" in which bad people do things like sneak up on a blind Audrey Hepburn.  The audience always gasps.  If seen in a full theatre the movie is a real thriller as the audience members unknowing react to own another. If viewed at home on a TV the experience is flat.

Imagine that we add a meta track to "Wait Until Dark" so that during those suspense-building scenes our set-top box microphones listen for sounds in the viewing space.  If there is a sound we could have a scripted morphing of the video to recreate what happens in a live theatre performance - the actor might pause and look into the audience and fix his/her eyes on the audience member who made the sound.  Imagine if you were watching "Wait Until Dark" on your home theatre system and you gasp as the bad guy sneaks up on Audrey Hepburn - and imagine that the bad guy pauses, turns his eyes, and looks at you.  That would immediately make you part of the scene; it would drag you into the action; your emotional involvement would be vastly increased.

I mentioned this a while back to a friend who makes videos; she was appalled.  And I think that there is good reason to fear that such techniques would be overused and become hackneyed - like zooming in early 1970's films.  But when used with care I think that such techniques could significantly bridge the gap between theatre and video.

I've mentioned this idea to many people and in many places over the years.  But unfortunately I've never had the opportunity to build such a system.

I can imagine that we could define standard formats for meta track information - so that we could edit them along with the audio and video tracks in tools like Apple's Final Cut or Sony Vegas.

New movies could define meta track objects as early as the pre-production phase.

With the right kind of information it would be possible to do very late alteration of video (and audio) content - as late as the last few milliseconds before the video is rendered onto a screen or the audio sent to the speakers.  I believe that modern set-top boxes would easily have the horsepower to do that kind of substitution if the meta track data were sufficiently detailed.

Posted by karl at September 9, 2010 12:55 PM