Six Colors
Six Colors

This Week's Sponsor

Magic Lasso Adblock: YouTube ad blocker for Safari


By Joe Rosensteel

Vision Pro and the challenge of 3D movies

stereoscopic plane example
(Apple)

No matter what you think of the Vision Pro headset or 3D movies, it’s become apparent over the last few weeks that a lot of people need a primer on 3D, stereoscopic movies. Love them or hate them, there’s no escaping that they’re going to be a subject of conversation again, just as they were more than a decade ago.

Captive audience

Back in the 2000s there was a push to increase movie ticket prices without making major alterations to seating. Stereoscopic movies were an interesting possibility. Sure, they were more difficult and expensive to make, but the advent of digital projectors meant that theaters could be adapted to show them relatively easily. And of course, a 3D blockbuster with impressive visual effects that would give audiences a reason to pay a bit more.

Most 3D theaters are set up with a digital projector and a polarizer set up in front of the projector from a company called RealD. Left and right images would be projected onto the screen at the same time, and special glasses worn by the audience would filter the polarized light to display separate images in both eyes. It’s the same principle as polarized sunglasses or circular polarizer filters for cameras. (Extremely bright or contrasty parts of the image might bleed through from one eye to the other, creating “ghosting.”)

The problem with this approach is that the single projector can still only output an image at its maximum brightness, which is then cut in half by the polarizing glasses. The result is that 3D movies often seem dim. There are also the gross plastic glasses, which would also have to fit over any prescription lenses you might need to wear.

So to recap: They wanted you to pay more in order to see a movie that wasn’t as bright or clear, and you’d need to wear some weird glasses for the privilege.

Stereo sausage making

I was there at the dawn of the third age of stereo. I started working in visual effects in 2005, and the first stereoscopic project I worked on was “Monster House 3D” in 2006. The job mostly consisted of taking a movie that had been made by another team, adding a second stereo camera view, and re-rendering the existing work. It sounds easy, but it was laborious. Not everything just pops into place when you push the render button several months after it was pushed the first time.

We also had to play with depth to cheat the distances. Human vision infers depth from context (which only takes one eye) or from disparity (which is the difference between the two images your brain processes). We can cheat that disparity by feeding images directly to the left and right eyes to give the illusion of depth.

There are two values that we’re concerned with in stereoscopic cinematography:

  • Interaxial, the distance between the two stereo cameras. The distance between human eyes is fixed at about 65mm, but the distance between cameras can be anything.
  • Convergence, where the two images converge. When they have positive parallax, they recede into the screen, and when they negative parallax they stick out of the screen.

Increasing the interaxial distance while leaving your convergence point on the same spot will increase the separation of everything in front of and behind the screen plane, which has the effect of increasing the volume of the shot, making it feel more dimensional. The reverse is also true, and you can use that effect to flatten things out. It’s all relative to the screen plane, and camera separation.

The screen plane is literally where the screen is. There’s no complicated math to figure out where it is in space because it’s always going to be literally where it is in space. Your eyes are converged on the screen when you’re watching a movie. Generally everything in the movie should be close to that plane, and all depth is relative to it.

If something breaks frame — meaning the object touches and exceeds the screen border — then it should be at the depth of the screen, or behind it. This gives the effect that you’re watching the film through a window. Anything that breaks frame, but is supposed to be in front of the screen plane will cause visual discomfort because it creates a depth conflict with the edge. This is why movies generally have characters at the screen depth, and only punctuate moments where things pop out of the screen when it’s something that can be centered in the screen and not touch the edges.

2D moviemaking relies on bokeh and depth of field to separate out planes of depth and to isolate the action or characters that the audience should be paying attention to. However, when you push something out of focus in stereo, you mush away all the disparity. An extremely out of focus region will eventually converge to screen depth. Most stereo versions of 2D films reduce focus effects for that very reason.

If a 2D movie is made without these compositional considerations—think of an animated movie where things don’t have to be in certain spots because of gravity—then things get even more complicated. In order for it to feel like everything’s in a comfortable spot, a simple 2D shot might have to be replicated with multiple pairs of stereo cameras, with certain assets or characters rendered with a particular camera place it in depth relative to the screen.

Stereo conversion

This brings us to the stereo conversion of live-action films. It’s probably the most reviled and most misunderstood part of stereo filmmaking.

A movie is shot entirely in 2D. The 2D movie source material is then declared to be the “primary eye,” and is generally retained as either the left or right image. (On my projects, it was generally the left image.) With the left eye being taken care of, everything in the right eye needs to be reconstructed to match the left eye—but with an appropriate stereo offset. This means chopping up the left eye image into pieces and using those pieces to create the illusion of a parallax shift.

First, all film noise and grain has to be removed. If you preserved the original noise, it would appear at the depth of every object, as if it was glued to it. Grain would float in space at the screen depth, like a sheer curtain. Both effects would break the illusion.

Next it’s time to chop up the images. It’s a difficult task, even if you’re working with greenscreen and bluescreen elements. Everything needs to have rotoscoped mattes for both external edges and any internal surfaces that need to come forward or recede based on the relative depth of that object.

Then there’s the challenge of occlusion. A shift in parallax means that you can see items in the background that weren’t visible in the original shot. This creates a challenge in both eyes, because whatever you create in one eye has to match in the other, or you end up with a confusing disparity. This means some paint and matting has to happen in the left eye, too, in order to preserve the depth effect.

This is hard. Think about an actor—or a person in the last Portrait Mode photo you took—that has all these thin little hairs going off the edges of their heads. Those all get chopped away or blurred out, because those little hairs can’t be individually matted. When it makes sense to, these will be rotoscoped—though even then, they’ll likely be filled in with constant color or tracked in patches that approximate the thin wisps from the original plate. It’s movie magic.

Also consider anything that’s transparent or reflective, like glass or chrome. The depth of the reflective, or transparent object, is not the depth of the reflections or the depth of what’s seen through glass. Again, consider a Portrait Mode photo of a wine glass. You’ll see that the depth map incorrectly infers that everything visible through the glass is on the same plane as the glass. Once again, someone will need to paint out prominent reflections in the source material and then recreate them in order to provide appropriate transparency and depth that matches in both eyes.

Next, consider mapping the internal volume of any shot. Sometimes it’s done manually, sliding things around by eye. This is incredibly time intensive, and you’ll usually end up with objects on multiple flat planes like they’re mounted on cardboard because it’s just harder to fake the depth. It’s also very difficult to maintain consistency over a sequence of similar shots.

Another approach is to generate a depth map from simple proxy geometry matted by the rotoscoped edges. Or you can do some vector math based on movement to generate a depth map. Or you can create a depth map out of rotoscoped items with different values. Any of these depth map methods can be combined together in a single shot.

As you might imagine, the variety of approaches to manufacturing a duplicate view after the 2D movie is made all have their various pros and cons, depending on what’s being done and how quickly it needs to be done.

You might naively assume that shooting with two cameras in native stereo is easier.

Native stereo

Recall what I said about that 65mm distance between our eyes. Two movie-camera lenses simply won’t fit 65mm apart. Instead, camera rigs are created that rely on beam splitters to capture some of the light and bounce it to the second lens. This means you’ve doubled the volume of your camera gear, and added a large box with a beam splitter inside of it. Everything has to be mounted and geared to match exactly, or the resulting images won’t be usable in stereo.

Sometimes, even when the shot is technically good, people might creatively not be happy with the result. If a stereo shot doesn’t come out right, it’s more economical to convert the primary image into stereo rather than get everyone back on set and reshoot.

Shooting in native 3D is, therefore, an enormous burden. Almost no one who isn’t James Cameron is interested in doing this. They’d rather produce the film in 2D and let the studio handle making a 3D version if they want to.

And what about the VFX? They can all be rendered in 3D, but it adds significant time and expense. All mattes pulled for greenscreen and bluescreen shots need to be done twice. There aren’t a lot of savings to be gained here. It’s twice the work.

Apple’s double take

At one point, all films needed to be stereo to drive up ticket prices, and then it needed to be made cheaper, and faster, until there wasn’t much of a reason to see anything in stereo. No one really loved it except the guy making an Avatar movies. Even if someone wanted to watch a stereo movie again, they’d have to contend with the lackluster home-video situation, where we’re on the tail end of the late 2000s 3D trend. People genuinely want brighter, more contrasty images, and punchier colors — all things stereo diminishes.

But now here’s Apple, who will be able to create a market that features stereoscopic media again. The technology is different, in that they’re little displays on each eye, so no polarized light or flickering active shutter glasses. But stereo movies are still stereo movies. It’s still a left and right image. It’s still captured by a bulky camera rig, or post-converted, and it still requires more labor and expense.

I was surprised to see Sigmund Judge report this:

Based on conversations this week with people familiar with its production, I can confirm that the upcoming Monsterverse series has been shooting in Apple’s Spatial Video format, unveiled earlier this week during Apple’s opening keynote at WWDC.

I don’t know if something got lost in a game of telephone, but I think it’s pretty safe to assume that Apple is using the same tools that everyone else has been using. Still, it’s surprising that Apple is choosing to make a ten-episode TV show in stereo, because this is a considerable expense no matter how they chose to do it.

There is, however, a nice video by Apple’s Chris Flick on how to deliver video content for spatial experiences. It concerns presentation of media using logical extensions of current technology, and you’ll see a novel, low-resolution disparity map (what Flick calls parallax maps) that’s used to help float subtitles and captions at a comfortable viewing depth. Mastering existing stereoscopic media is going to require some work, but Apple can certainly do it themselves.

What’s really going to bake your noodle is this: The Vision Pro is employing the same stereoscopic stuff to produce a left eye and a right eye view in real time. It captures your environment in native stereo, it renders dialogs and windows in depth, and it renders a movie viewing screen for you to watch a stereoscopic movie where screen depth is an imaginary space created by left and right eye views of a thing that has left and right eye views. It’s a stereo view inside a stereo view.

You don’t need to become an expert on 3D technology, but I’m of the opinion that it helps to have a little history. And it’s not like we don’t have months to wait before the Vision Pro ships. In the meantime, it’s probably best to consider the impact on your bank account if you buy a headset that will make you see double.


Search Six Colors