I read the article and I read your comment but I still don't quite get how all the pieces fit together.
To make it concrete, let's say we want to set a pixel to red, with 20% transparency.
During the transparent phase we light up the object with white light (always). We set this particular pixel to 20% transparency for red green and blue channels.
During the opaque phase we light up the pixel directly (how? I didn't understand this? just a strong spotlight aimed at the display?) and set red to 80% transparency (100% red * 80% opaque) and the green and blue channels we set to block all light.
So it's important to call out here that they're using a modified LCD panel. A traditional LCD panel has two components: a backlight that constantly provides solid white light, and the LCD in front of it, which darkens each pixel. This is why on your computer monitor, if you send it a "black" signal, you still see some light. The LCD is as dark as it will go, but some light still bleeds through because the backlight is on all the time.
In the paper, they're effectively using two backlight configurations. When displaying the solid object, they shine the backlight on the object, which scatters some of that light back towards the viewer. During this phase, they mask the image the LCD wants to display by having it render a greyscale image, where black pixels should hide the physical object, and white pixels should let the physical object shine through. In this mode, the physical object becomes the backlight, effectively.
In the image display mode, they point the backlights towards the LCD panel, and display their super-imposed image. In this mode, the object is not being illuminated, and so its contribution to the image is minimal. This is also why an enclosure is required, because the display needs to be able to throw the physical object into shadow during this second phase.
All of this is, of course, happening extremely quickly. The panel alternates between these two modes so fast that the human eye can't perceive the flicker, and persistence of vision simply combines the light contributions from both modes. The result is what you see in the images and video example.
I don't think the enclosure is strictly necessary for this to work. How much light from the object passes through the display, will depend on the switchable diffuser's transparency. The lower its transparency during the diffusing state, the less need for the object to go dark.
However, they're effectively blocking 50% of the light coming from the object, so an enclosure with additional illumination might compensate that.
To make it concrete, let's say we want to set a pixel to red, with 20% transparency.
During the transparent phase we light up the object with white light (always). We set this particular pixel to 20% transparency for red green and blue channels.
During the opaque phase we light up the pixel directly (how? I didn't understand this? just a strong spotlight aimed at the display?) and set red to 80% transparency (100% red * 80% opaque) and the green and blue channels we set to block all light.