Thanks for this. I was looking for someone to comment on this point.
The list of apps that people most "want to delete" is highly correlated with apps with the list of apps with the most MAUs.
The right thing to look at here is the percentage of users who want to delete the app. Otherwise this table is just telling us that Instagram has more users than other apps—e.g., 1% of 1 billion > 1% of 1 million—which I suspect most people already know.
For people not familiar with what a healthy brain looks like in a CT image, here [1] is a reference.
Fig. A in the original post corresponds to the same view (axial) of the brain as shown in the first image in [1].
The dark cashew-shaped blobs in the center of the healthy image in [1] correspond to what's labeled LV (for "lateral ventricles") in the original post. These cavities are filled with a fluid called cerebrospinal fluid (CSF) which serves to protect and clear waste from the brain.
Other examples of enlarged ventricles can be found, e.g., with a disease called normal pressure hydrochephalus (NPH) (example here: [2]).
So, needless to say, this case of ventricular enlargement is extremely severe, even compared to NPH.
I agree that most of the value in a clinical application won't come from the often (but not always) relatively small performance gains by tweaking your neural network architecture or fiddling with the loss function. Collecting a high-quality and diverse dataset is important for training and arguably even more important for validation because you want to show that the deployed model is reliable.
But before deploying a model, I'd argue that it is worth testing a few architectures out to determine if one is substantially better than the rest. It can be a pain to test out a bunch of architectures, but the ones we mention in the article have many implementations freely available (and we provide ones too!). So you can drop in one of these architectures and test them out pretty easily (especially if you skip hyperparameter tuning initially).
Spending too much time fussing over a 2-3% performance gain is silly, but sometimes, surprisingly, the difference in performance by choosing another architecture can be much greater. I wish I had more intuition as to why some architectures perform well and others don't. It would certainly make R&D easier if you could totally ignore the architecture choice.
Regarding smaller batch sizes and batch normalization: Have you found another normalization layer to work better for small batch sizes?
I agree that the mean and variance of the small batches won't be representative of the true mean and variance, but in practice, I've used batch norm for small batch sizes successfully (e.g., <8). In medical imaging, due to memory constraints, I commonly see batch sizes of 2 (or even 1, although it's not really "batch norm" at that point).
The paper "Revisiting small batch training for deep neural networks" [1] discusses the benefits of small batch sizes even in the presence of batch norm (see Fig. 13, 14). They only look at some standard CV datasets, so it isn't conclusive by any means, but the experimental results jive with my experience and what appears to be other researchers experience.
seconded, his grad probability classes were some of my favorites a few years ago (also hi!).
Measure theoretic probability significantly influenced how I think about nearly everything today, which is as strong an endorsement of these notes as I can muster.
Just out of curiousity, can you say a bit about how it influences your every day thinking? And why is measure theory so essential to really understanding probability? I sort of understand why it's necessary to have the language of measure theory and be able to talk about the measures/probabilities of uncountable sets but don't really understand beyond that. I have the equivalent of an undergrads understanding of measure theory after numerous gos at it but I haven't ever been able to piece it all together to have a cohesive understanding of the area the way I do for say linear algebra.
> can you say a bit about how it influences your every day thinking?
yes, I'd like to hear about that, too. I took Theory of Probability classes, and I appreciate that some complicated stuff is necessary to avoid some neat paradoxes, but must admit that measure theory hasn't taken my thinking or intuition forward at all.
Prior to giving real analysis and measure theory a serious go, I feel as though I was carrying around quite a lot of notation baggage that was essentially opaque to me. A lot of it was simply "received knowledge" and not at all cleanly organized in my mind.
For example, I remember fumbling over a modelling problem involving mixed random variables (that is, random variables with both continuous and discrete parts), and in retrospect the problem was that I just didn't have a clear understanding of what a random variable is, and how it relates to mathematical objects and concepts that I was more familiar with, like functions and vector spaces.
The point, for me, was not about needing to use the language of sigma-algebras to solve the types of problems that I come across in my job (electrical engineering and data analysis). It was more about going through the exercise of constructing the tools that I was using day-to-day, so that I could manipulate them with more confidence and creativity.
Do you think that real analysis and measure theory helped you get a better grip on the notion of a r.v. than just the simple function from sample space to real line definition? I'm slightly tempted to take or at least try to self-study real analysis and eventually measure theory, but everyone I know (including profs) has told me not to bother if I'm not going to do theoretical stuff.
It depends what you mean by "getting a better grip". There are books on scientific topics that do not rely on technical details. When they are great, they are so exactly because, even with this constraint, they manage to clearly convey the elemental notions to a layman ([0] is a great example). It is debatable whether the grip you get in this way is better or not. Certainly it can get deeper, when complemented with the right analytical tools.
Intuitively for me probability theory is a bit clunky before measure theory. Like we have a different equation for expectations if something is continuous vs a discrete distribution. We use probability density functions (pdf) vs a probability mass function (pmf) and etc.
We know this stuff is basically getting at the same underlying quantities. Now imagine a distribution over both continuous and discrete. For example something that measures temperature but breaks after a certain threshold. What does the expectation be for such an instrument? Imagine a distribution on different sized arrays of real numbers. How do you define a valid density function? Measure theory gives you the formalisms for those kinds of problems. You in practice don't need it very often but it keeps you on firm ground when you do.
I struggled through graduate analysis and measure theory as prereqs just to get to measure theoretic probability.
But I didn't retain much since it wasn't good for building intuition (informal proofs were better for that) and a lot of the corner cases it fixed didn't matter for the real world.
The list of apps that people most "want to delete" is highly correlated with apps with the list of apps with the most MAUs.
The right thing to look at here is the percentage of users who want to delete the app. Otherwise this table is just telling us that Instagram has more users than other apps—e.g., 1% of 1 billion > 1% of 1 million—which I suspect most people already know.