Hacker News new | past | comments | ask | show | jobs | submit login

> I wish machine learning research didn't respond so strongly to trends and hype,

It's really because nobody actually understands what's going on inside a ML algorithm. When you give it a ginormous dataset, what data is it really using to make its determination of

[0.0000999192346 , .91128756789 , 0 , .62819364 , 32.8172]

Because what I do for ML is do a supervised fit, then use a next to test and confirm fitness, then unleash it on untrained data and check and see. But I have no real understanding of what those numbers actually represent. I mean, does .91128756789 represent the curve around the nose, or is it skin color, or is it a facial encoding of 3d shape?

> I'm still wondering what, if anything, is going to supplant deep learning.

I think it'll be a slow climb to actual understanding. Right now, we have object identifiers in NN. They work, after TB's of images and PFLOPS of cpu/gpu time. It's only brute force with 'magic black boxes' - and that provides results but no understanding. The next steps are actually deciphering what the understanding is, or making straight-up algorithms that can differentiate between things.




Shouldn't it be possible to backpropogate those categorical outputs all the way back to the inputs/features (NOT weights) after a forward pass, to localize the sensitivity of them with respect to the actual pixels for a prediction? I imagine that would have to give at least some insight.

Beyond that, the convolution/max pool repeated steps could be understood to be applying something akin to a multi-level wavelet decomposition, which is pretty well understood. It's how classical matched filtering, Haar cascading, and a wide variety of proceeding image classification methods operated at their first steps too.

CNNs/Deep learning really doesn't seem like a black box at all when examined in sequence. But to me at least, randomized ensemble methods (random forest, etc.) are actually a bit more mysterious to me in their performance out of the box, with little tuning.


I'm in no way a researcher or even an enthusiast of machine learning, but I'm pretty sure that I came across a paper posted on HN a few days ago that did exactly what you and the parent poster are describing, figuring out what pixels contributed most to some machine learning algorithm. I'll try and see if I can find it.

Edit: yep, found it.

SmoothGrad: removing noise by adding noise, https://arxiv.org/abs/1706.03825

Web page with explanations and examples

https://tensorflow.github.io/saliency/

I couldn't find the HN thread, but there was no discussion as far as I remember.


Bagging and bootstrap ensemble methods aren't really that confusing. Just think of it as stochastic gradient descent on a much larger hypothetical data set.

The effect is same one that occurs when you get a group of people together to estimate the number of jelly beans in a jar. All the estimators are biased, but if that bias is drawn from a zero mean distribution, deviation of the average bias goes down as the number of estimators increases.


I think you might be on to something, but the big problem here is that the Input is hundreds of GB or TB's . It's hard to understand what a feature is, or even why it's selected.

I can certainly observe what's being selected once the state machine is generated, but I have no clue how it was constructed to make the features. Do determine that, I have to watch the state of the machine as it "grows" to the final result.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: