I actually started with PCA. But NMF proved more understandable since negative dimensions in PCA are hard to interpret. I didn't consider UMAP, but would be interested to see how it performs here.
It should be easy, yeah. for NMF, the activations vector is reshaped from (layers, neurons, token position) down into (layers/neurons, token position). And we present that to sklearn's NMF model. I would assume UMAP would operate on that same matrix. That matrix is called 'merged_act' and is located here:
https://github.com/jalammar/ecco/blob/1e957a4c1c9bd49c203993...
Are there theoretical reason to choose NMF over other dimensionality reduction algorithms, e.g. UMAP?
Is it easy to add other DR algorithms? I may submit a PR adding those in if it is...