Off topic, but it's very interesting to observe the ratio of upvotes / comments.
On any given chatGPT topic, there are hundreds of comments usually. Here, so far 100 upvotes, only 7 comments.
The book looks great - and given the authors, it almost certainly is (I will buy it for sure). It makes me think though about the state of 'ML / AI / Data Science' - and the cynic part of me thinks that this upvotes / comments ratio kind of reflects the fact that most people interested in AI hype have not really touched a lot of underlying concepts and don't have any deeper understanding of maths / stats behind.
PS. That being said, I didn't do a meaningful comment on the link topic neither.
This kind of books are key to getting started with Machine Learning/AI, and this particular book is a very good one. I started my ML journey with this book.
There is a lot of hype around AI and it is going to be like the dotcom bubble.
Unlike Crypto, AI has real uses right now and I am saying this not taking into account any LLM products. But there is also a lot of hype and wishful thinking, and this bubble is going to burst and hurt a lot of people. But that won't stop people making real money in the short term. Many hypers I know understand this well.
But AI is here to stay. And even after the bubble bursts, there will be real uses of AI all around us.
> and this bubble is going to burst and hurt a lot of people
> But AI is here to stay. And even after the bubble bursts, there will be real uses of AI all around us.
This is, as a ML researcher, exactly where I'm at (in belief). The utility is high, but so is the noise. Rather, the utility is sufficient. The danger of ML is not so much X-risk or malevolent AGI, but dumb ML being used inappropriately. And in general, that is using ML without understanding the limitations and having checks on it to ensure that when hallucinations happen that they don't cause major problems. But we're headed in a direction where we're becoming more reliant upon them and then once we have a few big issues with hallucinations the bubble will burst and can end up setting us back a lot in our progress to creating AGI. Previous winters were caused by lack of timely progress, but the next winter will happen because we shoot ourselves in the foot. Unfortunately, the more people you give guns to, the more likely this is to happen -- especially when there's no safety training or even acknowledgement of danger (or worse, the only discussion is about being shot by others).
Well if you care more about the study than the money (which is nice -- though I'm at the tail of grad school), it makes more sense to chase knowledge than chase metrics. Then again, I don't come from a computer science background, so maybe not having that momentum helps.
This and elements where the intro to ML for me as well.
I understand your sentiment but we also have to accept that for a lot of Ml usecases just calling ChatGPT api is 100x better approach than creating your own Ml model, and thus there is really no need to understand any math.
As an example I am building an Ai nutrition counting app. And I use ChatGPT function calling. I can just add a field that has say an emoji of the good and it automatically classifies any food to the right emoji. There is absolutely no need to know gradient descent or any fundamental property to be able to do that.
As an ML engineer, I look forward to the day OpenAI ups their prices 5x and companies hire people like me as a consultant to replace their expensive API calls with an SVM or random forest that can be run off a smartphone.
That intern who figured out how to make a POST request and accidentally committed the API keys to public GitHub? Long gone. The rise and grind manager who discovered ChatGPT in April? Retired. But we will be there, ready to cut your costs by 95% because people couldn’t be bothered to understand the basics of what they’re using, in exchange for a sizable consulting fee of course.
There's a meme where it shows someone stepping over all the steps to understand how to think about and analyze data directly to BERT. Well now people are stepping past BERT to stable diffusion and ChatGPT. It's been like this for years. Most work environments suffer from it in a bad way. I don't envy practicing data scientists managing expectations.
Interesting seeing job postings wanting 5+ years of LLM experience. Like unless you were at OpenAI working on GPT-1 or Google on BERT there's no one else in the world with that much experience and your shitty Startup/Fortune 500 company can't afford them anyway.
>Interesting seeing job postings wanting 5+ years of LLM experience.
Not really interesting.
Similar to seeing job postings some years back (and even recently), wanting n+ years of Rails experience when DHH had created it significantly less than n years before.
As a ML researcher, I don't think you're far off the point.
w.r.t HN, there's almost all hype and no "science". People have strong convictions but not strong evidence. They happily cite papers, but only read the abstracts and miss the essential nuance. Especially in a field where suggesting limitations puts you at high risk of rejection (reviewers just copy paste that and thank you for the work).
w.r.t academia, it is a bit better, but I find that in general there are a lot of researchers missing math fundamentals. I know or have met people at top universities or top labs that don't know the difference between likelihood and probability. Similarly ones that don't understand probability density. Even ones working on diffusion. But I will say, that in general the most prominent researchers do have these skills. But you'll notice that they aren't publishing as fast and their works might not even be as popular. A lot of research right now goes into parameter tuning and throwing compute at the problem. I've been a bit vocal about this though. Mostly due to it being a barrier to other types of research (because I'll admit that the tuning is needed, but we need to be honest that it isn't high innovation either and that it is hard to prove these are better given that we haven't tuned other models/architectures to the same degree).
tldr: You're pretty spot on. There's a shit ton of noise in ML/AI. Especially on HN
Edit:
I thought I should also suggest Richard McElreath's Statistical Rethinking (https://xcelab.net/rm/statistical-rethinking/), which is a more enjoyable read than ISLR and will also introduce you to Bayesian stats (Lectures are also on youtube). I'd also suggest Gelman's Regression and Other Stories (https://avehtari.github.io/ROS-Examples/).
>don't know the difference between likelihood and probability. Similarly ones that don't understand probability density.
I'm a phd student in a "top university", in a research group primarily focused on data science (NLP, LLMs, blah blah blah). I'm 100% sure I am the only person in the group of ~25 (including profs/postdocs) that knows the difference between f(θ|x) and f(x|θ). In fact I'm pretty sure I'm the only person that has ever even seen f(θ|x) (because I took a stats sequence out of casella+berger). This group puts out dozens of papers a year. My research focus is not data science (compilers).
Well your name says something about who you are (and might mean you can guess at the roots of mine :). I often find PL people are more likely to have good math chops because it is taken seriously in their field.
Fwiw, at CVPR last year I asked every author of a diffusion paper about likelihood or score and only 2 gave me meaningful answers (1 compared their model's density against the data's density which was estimated through an explicit density method. Yeah, parametric vs parametric, but diffusion is not a tractable density method). It is really impressive that people who are working with probability and likelihood every day do not understand the difference (I see many assume they are the same, not just not know the difference).
>and might mean you can guess at the roots of mine :)
your services are required on the busy beaver thread!
>It is really impressive that people who are working with probability and likelihood every day do not understand the difference
i think i come away from the whole experience (the phd, even though i'm not done yet) with a deep skepticism/cynicism of very many things. but it's probably not what you expect. i just don't think the math is at all relevant/important epistemically as long as you can run the experiments efficiently. which is exactly what you see happening - people with access to gobs of compute develop good intuition that leads them towards breakthroughs, and people that don't have access to compute struggle and make do with the formalisms. it's not much different in physics, where the good experimentalists aren't born that way, they're made in the well-funded labs.
i firmly believe that in ML, the math does not matter at all, beyond the tiny bit of calculus and linear algebra you need to kind of understand forwards and backwards. of course everytime i say this on here i'm skewered/debated to death on it, as if i don't know what i'm talking about :shrug:
Bayesians and frequentists should bury the hatchet. They are both useful, and the best tool to use depends on the problem/environment. There's no one-size-fits all in statistics.
> your services are required on the busy beaver thread!
Lol I didn't even see it. I'm assuming this is w.r.t Mutual Information's video?
> the phd, even though i'm not done yet
I'm at about a similar point (last year). Most of my cynicism though is around academia and publishing. Making conferences the de facto target for publishing was a mistake. Zero-shot submissions in a zero-sum game environment? Can't see how that would go wrong...
> it's not much different in physics, where the good experimentalists aren't born that way, they're made in the well-funded labs.
Coming from the experimental physics side (my undergrad), there is a big factor though. Generally the experimentalists who were good at the math and could learn to intuit them (and especially the uncertainty) did better. But you're absolutely correct about the __well funded__ part being a big indicator. When I've worked at gov labs I didn't notice a quality difference in intellect between peers from different schools (of a wide variety of prestige) but what did stand out was simply experience. Your no-name school physicists could pick up the skills fast, but they just never had opportunities like the prestigious school students did. It didn't make too big of a difference, but it is an interesting note, especially since it tells us how to make more of those higher status researchers...
> i firmly believe that in ML, the math does not matter at all
My opinion is that this is highly context dependent. Most research right now is about optimization and tuning, and with respect to that, I fully agree. I'm including in that even some architecture search, such as "replace CNN with Transformer" and such things. This you can do pretty much empirically. The only big point I'll get on here is that people do not understand the limitations of their metrics (especially parametric metrics), biases of the datasets, and the biases of their architectures, so it creates a really weird environment where we aren't comparing things fairly. (It is also why what works in research doesn't always work out well in industry) But if we're talking about interpretability, understanding, novel architecture design, evaluation methods, and so on, then I do think it matters. There's a lot that we can actually understand about ML -- how they work and how they form answers -- that isn't discussed not because it hasn't been researched but because the research has a higher barrier to entry and people don't even understand the results. It isn't uncommon to see a top tier paper empirically find what a theoretical paper (with experiments, but lower compute) found 5-10 years back, where the recent work didn't even know about the prior work. Where the higher level math really helps out is being able to read deeper and evaluate deeper. Fwiw, every time I make a "math is necessary" argument, I get a lot of people pushing back. But I think this is because both groups have two camps. For the pro math I believe there is people who legitimately believe it (like me) -- who usually talk about high dimensional statistics and other things well past calculus -- and people who say it to make themselves feel smart -- people who often think calculus is high level math or say "linear algebra" as if it is just what's in David Lay's book. For the anti-math crowd I think there are the hype people who just don't care and the people who are just doing other things and don't really end up using it. For the latter, I do think they are still benefiting a lot from the intuition about these systems that they gained from those math courses. But then again, the classic thing in math education is that you struggle while you learn it and then after you know it it is trivial.
For research, I firmly believe you need both the high math and the "knob turners." I just think academia and conferencing should be focused around the former and industry should focus around the latter. But the problem is we have these people operating in the exact same space and we're comparing works that use 100+yrs of compute hours to works that have a month or two of compute hours. This isn't a great way to really tell if one architecture is better than another since hyperparameters matter so much. It's just making for bad research and railroading.
There is, naturally, a reason for this. GPT is effectively a nice wrap on an otherwise complicated set of issues. I almost think of it as gui instead of console. Yeah, you lose some of the functionality and control, but a lot of people will take it and run with it simply because it is just so much easier.
Case in point, Google AML AI, which promises to do away with pesky model validation and such ( because it will do everything in a closed box you will not have a reason to investigate ). I am already looking forward to the conversations with regulators.
A lot of people do not need to know about underlying concepts in AI - they just use it. Though it is interesting to see that even on this website this seems to be the case.
This is pure gatekeeping. The math behind LLMs, that is, the math behind Neural Nets, is undergrad freshman level Calculus and some linear algebra. Not really complex at all. Can you deal with derivatives, the chain rule and matrix multiplications? Great you know all the "math" behind Deep Learning.
That's the beginning math, but definitely not "the math behind LLMs". That includes probability theory, metric theory, topology, and more. But most people don't even acknowledge this, but then again, unless you're deep in a subject you don't really know the complexities of that subject. Red flags should go off whenever anyone says "it's just <x>" or calls something simple. It's like the professor saying the proof is trivial, when that's the hardest part of the entire problem.
People like you are hilarious. You're sitting high in your ivory tower thinking that no one without a PhD in CS from Stanford/Berkeley/MIT can do what you do. Meanwhile people will take the Fast.ai course and be training full llm's from scratch in 6 months all the while you moan that "they don't even understand the REAL math". Yawn.
People like you are funny because you don't realize I'm actually also calling out a lot of ivory tower people.
Also, I'm not saying you need the math to train a model. I'm not sure you even need linear algebra to do that, mostly just programming. That's why I called it the beginning. I was directly responding to your claim that this is all you need __to understand__. Because let's be real, you don't need to know backprop (and thus derivatives and chain rule) to train models. The math is about how to analyze your models. You know, specifically what academia is supposed to be doing. Research and engineering overlap but they aren't necessarily the same thing.
Besides, math education is notorious for being essentially free. Who needs Stanford/Berkeley/MIT when textbooks exist widely. (Btw, CS doesn't typically produce mathematicians)
If you want, you can get all of that math at any Top500 university, even in undergrad. I agree that people will be having impact and deploying these models without that understanding, but you don't need to be at Stanford to gain that understanding and it's something desirable if you want to do research instead of deployment.
On any given chatGPT topic, there are hundreds of comments usually. Here, so far 100 upvotes, only 7 comments.
The book looks great - and given the authors, it almost certainly is (I will buy it for sure). It makes me think though about the state of 'ML / AI / Data Science' - and the cynic part of me thinks that this upvotes / comments ratio kind of reflects the fact that most people interested in AI hype have not really touched a lot of underlying concepts and don't have any deeper understanding of maths / stats behind.
PS. That being said, I didn't do a meaningful comment on the link topic neither.