I agree that it's typical for authors to cherry pick the samples used to illustr...

I agree that it's typical for authors to cherry pick the samples used to illustrate their major message most forcefully.

That's why discriminating readers (like us) should rigorously overlook pretty pictures and instead look only at: 1) the amount of separation between groups (big effect?), 2) the stat. significance across the given population (consistent signal?), 3) the constraints the authors used to create that population (representative of the real world?), and 4) whether the discriminating signal they chose selectively detects the causal effect they propose.

No picture can do all of that.