There might be more than one reason for an ongoing crisis, and different takes on who’s responsible. However Maduro is responsible of huge number of refugees fleeing Venezuela, and we (and some other countries around) have some obligations to help asylum seekers.
Language is just a form, what exactly is encoded inside of the model can be very different. And to encode logical reasoning inside of the weights with activation functions is more than possible.
Models solving IMO level problems imo proves it.
I also think you greatly overestimate human intelligence, the fact we got AGI is nothing but barely side effect of evolution.
Isn’t this what Tao is addressing in the link, that LLMs haven’t encoded reasoning? Success in IMO is misleading because they are synthetic problems with known solutions that are subject to contamination (answers to similar questions are available in the textbooks and online).
He also discusses his view on the similarity and differences between mathematics and natural language.Tao says mathematics is driven entirely by efficiency, so presumably using natural language to do mathematics is a step backwards.
This being said in this setup of 2-4 h100 you’ll be able to generate with batch size of somewhere around 128 ie its 128 humans and not one. And just like that difference in efficiency isn’t that high anymore.
Would you rather be treated (medically) by the first 2000 people? Do you think code will be written better by the first 2000? I get being unhappy about current political class, but this kind of claims is wild to me.
Politicians aren't making technical choices. They make value judgments.
Not it's raining should I wear galoshes but, should I wear blue or red pants? Not should I buy a sports car or an SUV but do I want to do the things you can do with a sports car or the things you can do with an SUV?
It can get confusing because experts can help inform value judgments but they don't have anymore weight on making them then any other person.
When it comes to those choices having a random selection of a large group instead of a small selection of a group of "experts" is at least an arguable point of view.
> In the original comparison second category of people have much higher intellect than average.
I think that intellect may sometimes be weakly and unreliably associated with some kinds of competence, but not with integrity, which is as important. I think power seekers are self-selecting to be more self-serving.
Issue with your original statement is that it implies negative correlation between intellect and being good politician. All the issues you describe (both being self serving and power seeking) apply to anyone regardless to their intellect (and I still think they apply less to people of high intellect just because they see bigger picture, but I might be wrong).
We need to optimize for less self serving and more integrity but we should strive for smarter people up there too.
Mechanics is exactly the same - it's not Tesla revenues driving returns for investors, it's new investors putting their money into the stock at very high price.
If you believe Tesla is a Ponzi scheme then you also believe that the SEC is either knowingly keeping a Ponzi scheme going (and it is getting included in indexes) OR the SEC doesn’t know OR you are wrong.
> Groq raised $750 million at a valuation of about $6.9 billion three months ago. Investors in the round included Blackrock and Neuberger Berman, as well as Samsung, Cisco, Altimeter and 1789 Capital, where Donald Trump Jr. is a partner.
Makes it very hard not to think of this as a way to give money to the current administration.
I know, this sounds conspiracy theory grade, but 20b is too much for groq.
The value of Groq comes from its excellent price-to-performance ratio. Its inferencing speeds are faster than those of H200s, and it has the lowest costs in the industry. When running similar batch jobs across different providers compared to Groq, the processing speed can sometimes be more than 10 times faster. These figures are important for developing practical applications for production use. It's common for me to run workloads in Groq that cost less than $100, while the same workload can approach $1,000 on Bedrock or Gemini. They have tuned a set of OS models that can now deliver a full application. The speeds have allowed me to offload a lot of the functionality from heuristics to straight-up LLMs.
To me intellect has two parts to it: "creativity" and "correctness". And from this perspective random sampler is infinitely "creative" - over (infinite) time it can come up with answer to any given problem. And from this perspective it does feel natural that base models are more "creative" (because that's what being measured in the paper), while RL models are more "correct" (that's a slope of the curve from the paper).
Probably if you use a lot of Arc<Mutex<Box<T>>> languages with proper runtime (like Go or Java) are gonna be more performant, in the end they are built with those abstractions in mind. So the question isn’t only how much the nature of the problem it is, but also how common the problem is, and is rust a correct way to solve this problem.
If you use a lot of Arc<Mutex<Box<T>>> you you probably just learn to use Rust properly and just use Arc<Mutex<T>> instead because it pretty much never makes sense to have a Box inside an Arc...
I say that as someone that thinks Rust's learning curve is the main reason it rarely makes economic sense to use it.
If GPU demand growth continues to outpace GPU production growth, that is necessarily going to change. Older GPUs may not be cost competitive to operate with newer GPUs, but when the alternative is no GPU...
reply