Worth noting that this does not apply to physical cameras - a pixel is not, in fact, a point sample, but the integral over a sub-region of the sensor plane. It's also not a complete integral - the red pixels in an image are interpolated from squares that only cover a quarter of the image plane (on 95+% of sensors). Then you bring in low pass filters (or don't), and the signal theory starts to get a bit complicated.
It doesn't apply to screens either, as pixels are - manifestly - little squares. Your screen does not apply any sort of lovely reconstruction filter over this "array of point samples".
In short, it's wrong. You can model an image as an array of point samples - however these are not "pixels".
The memo interestingly talked about screens, and that it does not contribute to the pixels as squares model because there are "overlapping shapes that serve as natural reconstruction filters"...
But it was in context of old CRT and Sony Trinitron monitors! I was wondering what it'd say about LCD screens but the memo is from 1995, and the first standalone LCDs only appeared in the mid-1990s and were expensive [1].
What it says about CRT electron beams no longer apply, but I'm guessing this still does:
> The value of a pixel is converted, for each primary color, to a voltage level. This stepped voltage is passed through electronics which, by its very nature, rounds off the edges of the level steps
> Your eye then integrates the light pattern from a group of triads into a color
> There are, as usual in imaging, overlapping shapes that serve as natural reconstruction filters
Your screen is the "lovely reconstruction filter over this 'array of point samples'".
This is, for LCDs, usually an array of little squares... sort of (probably more accurately described as an array of little rectangles of different color). Things get more complicated when you start talking about less traditional subpixel arrangements like PenTile, or the behavior of old CRTs (where you don't necessarily have fully discrete pixels at all).
Its a reconstruction of sorts, but arguably pretty much the worst possible reconstruction and pretty far from lovely, compared to e.g. the near perfect reconstruction filters in audio.
I wonder if there have been any experiments constructing displays with optical filters to provide better reconstruction. I guess the visual analogue would be image upscaling, and in that sense the reconstruction what LCDs etc provide would be comparable to nearest neighbor scaling (which generally sucks)
It depends on the goals. If the goal is accurate frequency response and in particular absence of aliasing, then a little square is a bad reconstruction filter.
But for graphics, why should this be the goal? Perhaps a better goal is high contrast of edges, and for this the box filter is one of the very best. An additional advantage of the box filter is that it has only positive support, so there's no clipping beyond white and black. This is especially helpful when rendering text.
And honestly I believe that those huge sinc-approximating reconstruction filters are overly fetishized even in the audio space. The main reason they sound "nearly perfect" is that the cutoff is safely outside the audible range. Try filtering a perfect slow sawtooth through a brick wall with a cutoff in the audio range, say 8kHz. It sounds like a very audible "ping" at that frequency, with pre-echo at that because of the symmetry of sinc.
One problem with pixelation is that if you take your beautifully sharp-edged pixel-aligned square and move it 1mm to the left, it will not be perfectly sharp edged anymore. So it lack certain uniformity. Comparing again to audio, afaik (please correct me if I'm wrong!) you can phase shift/delay signals however you wish and it should not add distortion.
Afaik this would come up in text rendering where the glyphs and strokes inevitably will not align with the pixel grid, but you would know better about that.
> you can phase shift/delay signals however you wish and it should not add distortion.
Digital audio sampling rates and resolution are far higher than the limits of perception, so there's sufficient resolution to shift by eg. half a wavelength for noise cancelling or by fractions of a wavelength for beamforming (at least, if you use the highest sample rates and resolution supported by your equipment, rather than the CD audio standard default settings).
The spatial resolution of computer graphics does not yet have such comfortable headroom. In the best cases, they've caught up to normal human visual acuity but usually not vernier acuity. Once displays outpace eyeballs to the same extent that soundcards outpace ears, it will be possible to shift an image by 1mm and have it remain as sharp-looking as the original—because a sharp edge smeared across several pixels will still be sharp enough to look "perfectly sharp" to a human.
The de-Bayering filter in a camera is a lot more sophisticated than a simple interpolation of pixels of the same color. Most real world images have a huge amount of correlation between the R,G,B channels because few colors are perfectly saturated. Filtering software takes advantage of this and gets much higher resolution than simple interpolation ever could.
Many RAW developers give you a choice of which de-bayering algorithm to use. Some are optimized for maximum detail retention. You can easily see the difference on a black-and-white resolution test target, which has perfect correlation between the R,G,B values.