Actually, it decompresses to a 5.8MB PNG. However, many graphics programs may choose to use three bytes per pixel when rendering the image and because it has incredibly large dimensions, this representation would take up 141GB of RAM.
Better graphics programs will not attempt to put the whole image into RAM, but only decompress the pieces needed for processing it.
I remember working with multi-megapixel images on systems with far less than 1MB of RAM, many years ago. Perhaps this is a good example of how more hardware resources can lead to them being wasted - the fact that RAM has grown so much that most images fit completely in it, has also meant programmers assuming they can do this for all images without a second thought when often all that's needed is a tiny subset of all the data.
Even if the image data is compressed, there's absolutely no need to keep all of it in memory - just decompress incrementally into a small, fixed-size buffer until you get to the "plaintext" position desired, ignoring everything before that. The fact that it's compressed also means that, with suitable algorithms, you can skip over huge spans at once - this is particularly easy to do with RLE and LZ - and the compression ratio actually boosts the speed of seeking to a specific position.
Currently, (hopefully...) no application is attempting to read entire video files into memory before processing them, but I wonder if that might change in the future as RAM becomes even bigger, and we'll start to get "video decompression bombs" instead?
This! Command line programs have no excuse, they should never need to decompress the entire file to memory. GUI image editors and web browsers probably generally do need to, but there definitely are such options for dealing with more pixels than you can display.
Anyway, do you know some of these "better" graphics programs that actually behave this way, especially command line processing? I am interested in finding more of them.
EDIT: Okay, I have to add & admit that by "no excuse", I actually mean somewhat the opposite. ;) I mean that its possible to do streaming image processing on compressed formats, not that its trivial to do or as easy as decompressing the file in a single call. I just wish that programs would handle very large images more, and it sucks when they don't even though I know its possible. Especially programs intended for dealing with large images like Hugin. Now, I know its a PITA to tile & stream compressed formats because I've done it, but I'm sure I've written image I/O that decompresses the entire file to RAM 100x more frequently than anything that tiles and/or streams, because I've only handled tiling or streaming myself once, and it was harder. :P
What you describe sounds a bit like demand paging of a memory-mapped file. The problem with implementing it for a 2d image is that a given rectangular region doesn't map to a contiguous area in memory. It's easy to construct a long thin image that would cause problems for a line-based demand paging strategy. For example, ten pixels high by a billion pixels wide.
Edit: skipping sections of the line to get to the region of interest is fine, I suppose, but what's really needed is a hierarchical quadtree-like organization of the storage, surely...
It is a bit like demand paging, yes. There are formats for which rectangular regions map to contiguous block of memory.
TIFF has specs for tiles, strips, subfiles, layers, etc. AFIAK, hardly anyone uses those, but they certainly exist.
JPEG is also composed of 8x8 squares and is easy to stream. The API has per-scanline read & write callbacks IIRC. But you're right; a very very wide one might be a case it can't deal with.
One of the rules of secure programming is that any program that is used in an even remotely security-sensitive context, and anything displaying a Portable Network Graphic is likely to be used in such a context, must be able to specify resource usage limits. In this case that could be dimensions or a limit on the total RAM allowed to be used. Limits need not be hard, either, but could produce a query, for instance, the way very long-running scripts in the browser ask you if they should continue.
Now, go find an API/library for dealing with PNGs that allow you to pass in such a limit, let alone pass in a callback for dealing with violations. Go ahead. I'll wait.
(The Internet being what it is, if there is one, someone will pop up in a reply in five minutes citing it. If so, my compliments to the authors! But I think we can all agree that in general image APIs do not offer this control. In fact, in general, if you submit a patch to allow it, it would probably be rejected from most projects as unnecessarily complicating the API.)
This is the sort of thing that I mean when I say that we are so utterly buried by insecure coding practices that we can't hardly even perceive it around us. I should add this as another example in http://www.jerf.org/iri/post/2942 .
These new libpng versions do not impose any arbitrary limits, on the memory consumption and number of ancillary chunks, but they do allow applications to do so via the png_set_chunk_malloc_max() and png_set_chunk_cache_max() functions, respectively.
You're overdoing it a bit. I believe the most popular API/library for server-side manipulation of images is ImageMagick, and it has a few options for specifying limits that will easily protect against decompression bombs.
That being said, even with these limits, it's undeniable that something like ImageMagick still has a very large attack surface (especially since it uses many third-party libraries), so it should run in its own heavily unprivileged or sandboxed process.
I should have added my "no credit for showing the existence of limits if your code doesn't use them" criterion in advance. Now it looks like I'm moving the goal posts. Oh well; you can see a similar point in my linked post from several months ago which at least adds credence to the idea that this isn't a new reaction of mine.
As you can see by reading that post as well, I'd also contend it really ought to be in the core API, not an optional thing that defaults to no limits. A sensible default could be imposed, too, though that turns out to be tricky to define the closer you look at it.
I think anyone that accepts scanned documents on their site will add these limits sooner or later out of necessity. At my last job we had them, because occasionally users would upload a 6000 dpi letter-sized page as a 1-bit tiff, or other such nonsense.
I'm actually about to add the same limits at my current job. It is an online slideshow builder, and most of the slides are scanned photographs. Last week someone uploaded a 11k x 17k JPG which hit us during a time of peak load and it caused quite a bit of trouble while the server was trying to build a thumbnail of it.
> I'd also contend it really ought to be in the core API, not an optional thing that defaults to no limits.
These limits really ought to be at a higher level, such as a ulimit on each apache process. In a large code base it requires too much effort to protect against every single avenue of potential accidental denial of service.
For example, at a previous job we had a lot of reports where the date range could be customized. This was fine for a long time, until a large client came along and ran his reports for the past five years. Since it was written in perl, it'd use up the RAM available on the system to generate the report, locking up the only webserver we had.
Ultimately it's going to require a malloc to get the space for all those pixels. That is where things should fail. If not, how is one to specify what the image size limits should be? Ever try to open the Blue Marble images from NASA? In a web browser? Back in 2001?
Any decent library (of any sort) should already be providing hooks for its memory allocation. This isn't necessarily provided as a security feature, and it's common for your callbacks not to get much in the way of information about what's going on, but it will allow you to at least crudely put a cap on the library's memory usage.
I'm inclined to agree. As legitimate image sizes increase, there's more need to sanely limit the resources thrown at such images.
While on vacation last week, I finally grokked that my relatively cheap Nikon camera is producing 6000x4000 images...that's about 100MB uncompressed. As a mobile app developer, I'm becoming painfully aware how images breaking the 25MB uncompressed line are breaking apps, with some still-in-use 256MB RAM iOS devices crashing when memory fills under normal usage plus a few instances of such large images (1-2 vacation photos can easily overwhelm available memory).
Some image programs will allocate space based on the metadata in the file. The actual image data isn't actually required. So, if there's corrupted image data, say a byte or two (or even missing), there's nothing stopping the reported size being in the gigapixel range.