Hacker News new | past | comments | ask | show | jobs | submit login
Treemaps are awesome (phronemophobic.com)
340 points by capableweb on July 25, 2023 | hide | past | favorite | 73 comments



Nice post - treemaps are great!

My friend and I made a codebase visualisation tool (https://www.codeatlas.dev/gallery) that's based on Voronoi treemaps, maybe of interest as an illustration of the aesthetics with a non-rectangular layout!

We've opted for zooming through double-clicks as the main method of navigating the map, because in deep codebases, the individual cells quickly get too small to accurately target with the cursor as shown in the key-path label approach!

If anyone's interested, this is also available as a Github Action to generate the treemap during CI: https://github.com/codeatlasHQ/codebase-visualizer-action


Hijacking top comment to link TreeSheets. It seems to have been designed for note taking using treemaps.

https://strlen.com/treesheets/


What an amazing tool! Certainly going to integrate into some of my codebases. I am actually fascinated by your charts, would you mind sharing which library did you use? I would love to use that in some of my personal projects.


Sadly, I mostly dislike tree maps. At large, I never actually see the tree structure that they are supposedly helping me see. Well done ones can be pretty, but that is true of basically every visualization.

The worst is when someone shows a tree map like viewing of something like a CPU, but that is not at all how those are necessarily logically connected. Such that many go for an odd mix of tree and heat maps, but then fail to actually show anything useful that a simple ordered list couldn't also show.


Personal favorite, visualizing used disk space in order to find large directories that can be deleted. I use QDirStat on Linux and WizTree on Windows.

Showing CPU performance of an application is a pretty neat use case of one type of treemaps too, typically called flamegraphs too, but in reality I think they're just up-side down treemaps :)


yeah i have a deep infatuation with it too. i like disk inventory x and the visualization really works for me. i wanted to visualize my financial accounts and how much each one had etc (rather than seeing it in numbers) so i built a script to generate files in bytes representing the amount of money in each account and then build a temporary directory and then load it in disk inventory x. really interesting.

i ended up building my own library in react to render various things like what i spend my time in throughout the day (using rescuetime api) to gain better insights into my day. really neat stuff!


I did something similar a little bit more directly in my finance visualization app. (It's written in F# and there was a .NET Treemap control I could easily use. At the moment it mostly relies on Mint [Intuit] CSV exports, but the benefit to that is that I could treemap based on the Mint categorization tree.)


Kinda love that creative use of disk inventory x. Kudos.


sounds pretty cool; willing to publish your libs?


Flamegraphs aren't treemaps because they don't fill the entire viewport. They're really just call stacks on a timeline.

Treemaps can be used though, KCachegrind for example uses treemaps much like KDirStat/WinDirStat: https://blog.equanimity.nl/images/qcachegrind.png


Flamegraphs are (usually) not a timeline.

> The x-axis spans the sample population. It does not show the passing of time from left to right, as most graphs do. The left to right ordering has no meaning (it's sorted alphabetically to maximize frame merging).

https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html


Yeah, I tend to prefer sunburst charts for visualizing disk space (I wish there were one for process size, too). I first saw this in kfilelight and on mac I ended up purchasing "DaisyDisk". The hierarchy is clearer and big outliers still stand out.


Unfortunately, most sunburst charts for disk space fails to properly scale things so things that looks like they are larger, are actually smaller. I don't know if this is the case for DaisyDisk, but surely something to watch out for.

> This is a flame graph (which is an adjacency diagram with an inverted icicle layout), using polar coordinates. It is very pretty, and as someone said "it always wows". Sunbursts are the new pie chart. Deeper slices exaggerate their size, and look visually larger. The smaller-looking slice 2 in this image represents 27.7 Mbytes, whereas the larger-looking slice 1 is 25.6 Mbytes. This visualization is actually showing that slice 1 is smaller than slice 2, although I bet most people would think it was the other way around! The problem is that to understand this correctly requires the comparison of angles, instead of lengths or areas, which has been evaluated as a more difficult perceptive task.

https://www.brendangregg.com/blog/2017-02-06/flamegraphs-vs-...


Yes exactly!! It took me a bit too long to realize that treemaps visualize the same structure as flame graphs, and I like flame graphs better

Also, I think flame graphs are easier to implement, and correspond better to the underlying data (fewer visual illusions)

Both of them take tree-shaped data, and a quantity, and you want to attribute the space of the children to the parents. For disk space it's:

    100 /home/andy/foo.txt  
     42 /home/andy/src/bar.txt
For function profiles it's

    100 f() g()
     42 f() g() h()
So you can use both visualizations with both kinds of data.

Also, ncdu is an curses UI for disk space, and I have to admit it's pretty much as effective in practice as tree maps. It doesn't necessarily give you a nice overview though.


They do not do the same thing.

A good treemap actually hides the tree structure. (From the link, the GrandPerspective one is good the Baobab one is useless due to the borders/padding). In a tree map you can see siblings, cousins, 2'nd degree cousins of similar importance (size) next to each other and you can see smaller items because size is represented as the surface of a rectangle.

On a flame graph, you can visualise the entire tree in all of it depth but since size is now just length instead of surface it is harder to compare tiny slivers. A flame graph can however be ordered by other criteria than size.

Sunbursts are circular distorted flame graphs and are of dubious utility.


Hm I prefer the Baobab one

What's an example of where you'd want to see the "siblings" ?

In my experience with disk files and function profiles, the parent-child and path to the root is what solves the problem


I love tree maps for disk space visualization.

A tree map is not used to see the tree structure. The tree is actually the information that is hidden. Especially if you use the treemap without borders/padding/bezels/names (IMO if you display those it ruins the usability of a treemap). What the tree map is good for is:

- seeing a flat representation of the entire space and therefore instantly seeing the largest files no matter how deeply nested. With a tree or a sunburst you need to drill down from largest folder to largest file because there are only so many concentric rings that can shown. With a flat file list ordered by size you lose all other information.

- seeing rough percentages by type. Maybe you have plenty of small videos taking space. Maybe you have just a few huge videos. Maybe you have one big video and plenty of small ones. All of this can be seen at a glance but is obscured by a summary by file type and is obscured by a sunburst or a tree because of depth.

- seeing the distribution of those file types. Are all of the small videos in the same place? Are they spread around and mixed with other files. Are certain types usually present together? For example if you store RAWs and Jpegs paired together you would be able to see that as a surface of two colors mixed together. If you store RAWs and Jpegs completely separated you would also be able to see that as two distinct surfaces of different colors..

- seeing similar structures. Since a tree map places surfaces near each other if they are neighbours or near neighbors in the tree structure, it is usually easy to spot duplicate trees because they look similar even when they are not identical. A duplicate finder is a separate tool and it also doesn't handle well the scenario of duplicate but changed and therefore not identical.


My problem with this is that it looks too similar to a CPU heat/component map.

And, frankly, the actual layout in treemaps is basically arbitrary. Indeed many early "disk utilization" charts that look a lot like treemaps were entirely based on physical location on the disk. As such, the logical layout was not on display, and you were only able to see the physical layout. Which was usually important, as you could see if there were obviously bad sectors or if fragmentation was out of hand.

That is, the very concept of "neighbor" that treemaps show is completely logical in nature, and doesn't really indicate any physical neighboring relationship at all. I suspect that is why I'm not a huge fan of them.


I think I agree. Treesize/windirstat’s list of folders in size order is just as useful as a treemap.

Maybe the treemap lets you hone in on the big files quicker but it only saves a few second’s really as you can expand a folder view to find the offending big files.


Maybe I am unique, but especially when you start dealing with many levels of nested directories, it is much faster to grok both the absolute and relative sizes of files and directories with a visual representation versus a textual representation like an indented list.

expanding subdirectories is suboptimal for me, too disorganized and too many directories to click.


Typically, `du -h . | sort -hr | head -n10` is more than sufficient to identify the large files for me to consider. Yes, it can be redundant, but I can't think of any time it hasn't helped me get the data I need.

I'll note that the visualizations are neat. And I certainly want to like them. But I also almost always need to know the name of the files more than anything, as it can help me decide how important a file is. If I'm impatient, I can add a `-d3` to the beginning.


> Sadly, I mostly dislike tree maps. At large, I never actually see the tree structure that they are supposedly helping me see.

I think the tree map examples in the article are missing the padding that you would add to every layer. Apparently, I didn't find any good examples with a quick search. I think this is a crucial feature of tree maps.


Why would you ever want the padding? For me the padding makes the treemap useless.


Padding in tree maps lets you extract the exact tree structure from the visualization as every layer is shown in a visible box. So it makes the visualization more useful. I found an example of such a tree map here [1].

[1] https://towardsdatascience.com/make-a-treemap-in-python-426c...


They are good for showing storage usage, because they create a direct metaphor: the sizes of the elements in the tree map corresponds to their actual proportion of space that they take up, and the tree map captures the hierarchy also: related files in the same directory (e.g. videos) are clustered into the same rectangle.


What they're missing is a link graph as given in https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-523.pdf


Uh. That's 87 pages. I'd like to have better treemaps, I don't like it 87 pages much. Mind pointing to a particular example?


What's even worse if you show it to someone unaccustomed to it, they will be just confused.


Sankey is a good alternative that is much more readable


I love tree maps so much. It's the only visualization that I've come across that can summarize large amounts of hierarchical data. I really wish there was better 3D and live interactive support for them.

We use maps all the time for getting around in the real world, I'm always suprised that these kinds of maps aren't more popular. I guess it's tough to come up with spatial parameters or something.

d3.js also has quite a few other very cool visualizations that are good at summarizing hierarchical data on their example page.

https://observablehq.com/@d3/gallery


I don't think tree maps can do anything that flame graphs can't:

https://news.ycombinator.com/item?id=36872165

I think flame graphs can handle more data because they are more trivially zoomed and panned (along one dimension only), and they can stack high as well.

Flame graphs seem to be easier to label. The size of the label itself doesn't distort the visual correspondence, and the bottom levels are easily visible without mouse-over

https://www.brendangregg.com/blog/2017-02-06/flamegraphs-vs-...


I find flame graphs misleading with regard to sizes, because what you primarily perceive as big is the area (width x height), whereas the actual value is just the width. In profiler output, for example, the same time usage can appear bigger or smaller depending on how deep the call graph is nested and on how (non-)"branchy" it is. Flame graphs show useful info, but you have to consciously disregard the area sizes.


Yeah I would agree that TreeMaps are more intuitive on first glance. When most people look at a flame graph for the first time, they think the X axis is time, like a time series

But it's really the proportion of time (or a space measure)

But once you get used to it, flame graphs show all the same info, and IMO it's just as good or nicer


I often use treemaps for quick datavis on hierarchical data, like this (ancient) analysis of Chrome binary size: https://neugierig.org/software/chromium/bloat/.

As part of that made a little web-based treemap widget called "webtreemap" and published a version that I use in unix pipelines. It accepts data on stdin and dumps self-contained HTML on stdout.

    # install program:
    $ npm install --global webtreemap

    # pipe any utility that outputs size+hierarchy data into it:
    $ du | webtreemap > demo.html

    # interact in browser:
    $ open demo.html


I really like the circular packing variant of it from D3.js (https://d3-graph-gallery.com/circularpacking.html). Its usage in CodeScene shows how powerful this visualization can be:

https://codescene.io/projects/30382/jobs/643265/results/code...


>Treemaps struggle with data that is either (wide and shallow) or (thin and deep).

That hasn't been my experience. If treemaps have a limitation is that they can't deal well with very large numbers of highly homogeneous items (e.g. all nearly the same size). The worst case is if you have to show over a million items of the same size in a 1000x1000 bitmap.


Isn't that an example of wide and shallow?


I understood "wide and shallow" to mean for example a directory tree with a single directory and lots of files. The example I gave is independent of where in the hierarchy the files are.


I made a tool to display my photo collection as basically a tree map: https://github.com/TOGoS/PicGrid

It's pretty good for idly looking through pictures, or in some cases for finding photos that I know approximately when I took and what they might look like, without needing to use any other metadata.

Not the best example of it in action, but here's my entire collection of saved screenshots: http://picture-files.nuke24.net/uri-res/raw/urn:bitprint:LZR...


Cushion treemap variation adds pseudo-3d lighting that helps visualize nesting.

https://www.win.tue.nl/sequoiaview/


I love treemaps, but I don't understand the appeal of the cushion shading, it just seems to add visual noise. I prefer to use color for depth.


I'm neutral on cushion shading, but I like the underlying property that you don't need borders to separate items. You can render many small items and not worry about a bunch of 1px borders messing things up.

Same goes for shading with a color gradient or some stylization thereof


I get that as the motivation 20 years ago, I just think it's irrelevant when I have a 4k monitor. I use a 2px border and a small line of text for a title at the top of parent nodes. I have seen the complaint that this reduces the accuracy of the area-representation, but it's a minor and very worthwhile tradeoff.


From experience, it may be minor or it may be quite major depending on the shape of your data. If you have a whole bunch of small items that, combined, only account for say 1% of the total, then adding borders and labels to all of them may significantly skew that proportion, and the proportions of its parent/ancestors


Yes, that's true. It depends on your use case. I'm almost always focused on the largest items, e.g. for space recovery. But also, being able to zoom in and out negates that issue in every case for me personally.


A treemap visualization of disk spaace is found in WinDirStat, which is a Windows port of KDirStat from KDE.

A key visualization technique in WinDirStat's treemap is the use of Phong shading to give the squares a fake 3-D look, as if they were convex, shiny domes sticking out of the image toward the user. This cue provides clear separation, even between adjacent cells that use the same base color/hue.


I use Disk Inventory X all the time to clean up my disk:

https://www.derlien.com/


Are there any good open-source JS libraries that can create treemaps and offer some customization and flexibility as part of the APIs? The ones I've seen that come with charting libraries are quite disappointing.


I built my own a few years back for similar reasons:

https://github.com/imranghory/treemap-squared

The code is pretty straight-forward if you want to customize



https://treemap.yatsyk.com/

Disclaimer: I'm the author


This is so weird! I was just like 5min ago using a treemap for data I haven't looked at in half a year and I don't use treemaps any other time.

I work in infosec, the usecases I have for it are typically where I have correlation between A->B (e.g.: network connection patterns) but then what specific artifact in each interaction is popular? (destination IP), ok but how much of the dataset is that relationship and what about with each artifact in play? One treemap can display all of that! This is my second favorite visualization after line charts.


Shameful plug: I've made an npm library for treemaps [1]. There is a sample app [2] that can draw a treemap of your drive file system, similar to SequoiaView or Disk Inventory X, in the browser using the file system API.

[1] https://treemap.yatsyk.com/ [2] https://treemap.yatsyk.com/folderstat


Jonathan Blow recently streamed over weekend building from scratch a tree map viewer for code base of his own and in his own programming language.

Treemap Viewer, part 1 https://www.youtube.com/watch?v=BqF2SbY99B8&t=1455s Treemap Viewer, part 7 The Last https://www.youtube.com/watch?v=f_TWdyAbKwQ


Is there an interesting extension for non unique items? (obviously it's not a tree anymore,but some hierarchical data contains only a 'few' duplicate elements, so an alteration of the treemap could be useful)


I thought this post was going to be about tree (species) maps that you see in parks sometimes: https://inhabitat.com/wp-content/blogs.dir/1/files/2015/03/T...

These are awesome too.


I love treemaps! I still use an ancient program on Windows called SpaceMonger for visualizing my disk usage which uses a tree map visualization.


Aww, I remember SpaceMonger. I’d guess that I found it from a Scott Hanselman recommendation.

I switched to “SpaceSniffer” at some point, quite similar a bit more polished. There’s probably a newer/better one now. SpaceSniffer still suits me fine.


A pie chart could serve a similar purpose, but can be much easier to interpret. I like this interactive pie chart for profiling Webpack bundle size. I've used it several times at work to help find and reduce bloat in our bundles.

https://alexkuz.github.io/webpack-chart/


Given how horrifically bad people are at interpreting pie charts, that does not bode well for treemaps.

    "[Pie] charts are bad and that the only thing worse than one pie chart is lots of them"
    -Edward Tufte
https://scc.ms.unimelb.edu.au/resources/data-visualisation-a...

https://www.businessinsider.com/pie-charts-are-the-worst-201...

https://www.data-to-viz.com/caveat/pie.html

etc etc


Correction:

The chart I'm talking about has multiple names, but is not a simple pie chart. Thanks to funcDropShadow for pointing this out. The names: sunburst chart, multilevel pie chart, and radial treemap.

https://www.anychart.com/chartopedia/chart-type/sunburst-cha...


Your example is usually referred to as a sunburn chat. Although it shares all drawbacks of pie charts. Some would say sunburn chart make it even harder to correctly understand the relative size of elements than pie charts.


I often confuse Treemaps vs Heatmaps.

This one is a treemap by the article's definition but is called a heatmap for as long as I can remember: https://www.marketbeat.com/market-data/sector-performance/


I was confused as my project [0] I called it a git heat map, not realising that a similar term was used for other completely different visualisations

[0] https://github.com/jmforsythe/Git-Heat-Map


Heatmap is one of those terms that gets applied to a few different things, especially choropleths. I've also seen it applied to grids (like a hours of the day vs day of the week, where each cell is how much X happened that hour).


Imo I'd say it's both, since the color is measuring something and not categorizing here.


I always assumed that heatmaps were a type of treemap


>To date, they've mostly been used for displaying the files consuming all of your disk space

popular example being kdirstat


Shameless plug for my project [0] using treemaps to visualise a git repo

[0] https://github.com/jmforsythe/Git-Heat-Map


I love tree maps, but hate that there's no easy way to integrate a time axis with it, aside from scrubbing.


Flame graph are kind of like treemaps with time axes.


Yea, tree maps needlessly occupy the second dimension whereas just one is sufficient for displaying the main metric.


SpaceMonger (1997) uses treemaps and is one of the best hard disk viewing tool around.

The original I thought was best (v1.40). Sean archived and MIT'ed the source code. It's a great story from college kid to now - https://www.werkema.com/programming/the-spacemonger-1-x-post...

.exe here - https://archive.org/details/spcmn140_zip

Treemaps (SpaceMonger) are one of the rare 2 dimensional visualization tools that actually work. Humans work better in lower dimensions. Most ~2Ds look pretty for people who can't understand the data anyway. 3D's are tools to extract VC money.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: