More

espadrine · 2025-09-09T08:39:24 1757407164

Past Mistral investors: JC Decaux (urban advertizing), CMA CGM CEO (maritime logistics), Iliad CEO (Internet service provider), Salesforce (client relation management), Samsung (electronics), Cisco (network hardware), NVIDIA (chips designer)[0]. I agree ASML is a surprising choice, but I guess investments are not necessarily directly connected to the company purpose.

BTW, I generated that list by asking my default search engine, which is Mistral Le Chat: indeed, using Cerebras chips, the responses are so fast that it became competitive with asking Google Search. A lot of comments claim it is worse, but in my experience it is the fastest, and for all but very advanced mathematical questions, it has similar quality to its best competitors. Even LMArena’s Elo indicates it wins 46% of the time against ChatGPT.

[0]: https://mistral.ai/fr/news/mistral-ai-raises-1-7-b-to-accele...

andruby · 2025-09-09T11:49:47 1757418587

The list seems to be missing a couple of other notable investors: Eric Schmidt (former Google CEO), Andreessen Horowitz, Lightspeed Venture Partners, General Catalyst and Microsoft (only $16M).

boringg · 2025-09-09T12:40:56 1757421656

I didn't realize Mistral was A16z's pony in the race unless the splashed across the board(?)

espadrine · 2025-08-20T08:18:41 1755677921

At least it is not unprecedented. Palantir raised a series I in 2020 after 17 years of operation.

Temporary_31337 · 2025-08-20T16:56:35 1755708995

At that time the Palantit valuation was considered 'hefty / overpriced' at $9B. Current stock price valuation, post IPO is a completely detached from fundamentals ~$378B

if you were to apply the same ratio to Databricks it would have to trade at 42 000 000 000 000 000 USD - enough to buy the entire US sovereign debt, the moon, all earth's minerals with plenty to spare. A completely rational market if you ask me.

espadrine · 2025-08-01T22:08:21 1754086101

It would be interesting to have two generations per model without cherry picking, so that the Elo estimation can include an easy-to-compute standard deviation estimation.

espadrine · 2025-07-17T18:49:31 1752778171

The best model there is 2.5B parameters. I can believe that a model 10x bigger is somewhat better.

One element of comparison is OpenAI Whisper v3, which achieves 7.44 WER on the ASR leaderboard, and shows up as ~8.3 WER on FLEURS in the Voxtral announcement[0]. If FLEURS has +1 WER on average compared to ASR, it would imply that Voxtral does have a lead on ASR.

[0]: https://mistral.ai/news/voxtral

nomad_horse · 2025-07-17T18:57:24 1752778644

There are larger models in there, a 8B and a 6B. By this logic they should be above 2B model, yet we don't see this. That's why we have open standard benchmarks, to measure this directly - not hypothesize by the models' sizes or do some cross-dataset arithmetics.

Also note that, Voxtral's capacity is not necessarily all devoted to speech, since it "Retains the text understanding capabilities of its language model backbone"

espadrine · 2025-07-17T15:14:29 1752765269

I agree that there are some robotic designs that unnecessarily mimic human limbs. I have in mind heads, and feet (instead of wheels).

A hand however, is useful because so many manufactured objects have been constructed for their purpose.

swiftcoder · 2025-07-17T16:00:59 1752768059

Feet are used for roughly the same reason a human-like hand is preferable - human-designed spaces tend to not be perfectly compatible with wheeled locomotion.

The ability to negotiate stairs is table stakes for a household robot. It's already a pain when one's Roomba-like is defeated by a small ledge...

espadrine · 2025-07-05T07:12:42 1751699562

Here is someone that had significant corruption until they stopped: https://www.xda-developers.com/why-not-to-spin-down-nas-hard...

There are many similar articles.

philjohn · 2025-07-05T08:34:16 1751704456

I wonder if they were just hit with the bathtub curve?

Or perhaps the fact that my IronWolf drives are 5400rpm rather than 7200rpm means they're still going strong after 4 years with no issues spinning down after 20 minutes.

Or maybe I'm just insanely lucky? Before I moved to my desktop machine being 100% SSD I used hard drives for close to 30 years and never had a drive go bad. I did tend to use drives for a max of 3-5 years though before upgrading for more space.

billfor · 2025-07-05T15:34:12 1751729652

I wonder if it has to do with the type of HDD. The red NAS drives may not like to be spun down as much. I spin down my drives and have not had a problem except for one drive, after 10 years continuous running, but I use consumer desktop drives which probably expect to be cycled a lot more than a NAS.

espadrine · 2025-06-29T19:25:28 1751225128

Easy yes. Even VPS providers need to maintain the IP, since your DNS typically points to that IP. You can also typically move the IP to another machine from the same provider.

But as a resut, VPS often have a different price for public IPs compared to private IPs. For instance, it costs €0.004/h per IP at Scaleway.

ricardo81 · 2025-06-29T19:37:27 1751225847

I've used dozens of VPS providers in the past, albeit 'low end' in this instance- they constantly change IP addresses because they're renting them, buying them, etc.

For IPv4 definitely a problem.

I maybe used around 100 VPS hosts, less well known ones beyong DO etc. I'd get a dozen IP change notices a year.

Case in point: https://lowendtalk.com/discussion/160162/aio-ip-related-ipv4...

espadrine · 2025-03-30T08:36:42 1743323802

> I rather define freedom by the government not deciding what's good for me

Does that mean you are against this bill?

Before the bill, your community could either use your own water system, without fluoride, or use the wider system, which has fluoride.

After this bill, your community no longer can use your own water system with fluoride, and the wider system also does not have fluoride.

On a first read, this bill makes the government remove a choice for you, deciding what is good for you.

(I don’t have a horse in this race honestly. Assuming everyone can get toothpaste and toothbrushes, the effect is the same. But the wording of the bill is strange: “may not add fluoride” rather than just “is allowed not to add”.)

Source: https://archive.is/Nustz

pclmulqdq · 2025-03-30T10:49:47 1743331787

"Your community could decide" is not freedom in the American political sense. In US political theory, the unit of freedom is the individual, full stop.

nulbyte · 2025-03-30T12:19:13 1743337153

I've seen this retort elsewhere. It's not true. It's why corporations are persons with first amendment and other freedoms.

pclmulqdq · 2025-03-30T12:36:09 1743338169

Corporations are not actually persons. You should read the Citizens United opinion if that's what you think it meant.

Corporate owners are people engaging in voluntary transactions. Their freedom was essentially the question in Citizens United. "Corporations are people" emerged from the media as an oversimplified version.

thfuran · 2025-03-30T13:42:02 1743342122

They are not natural persons, but they are juridical persons and were so before Citizens United.

pclmulqdq · 2025-03-30T14:43:25 1743345805

Exactly this. But remember that "legal personhood" basically just means "able to enter contracts" and does not imply any sort of human rights or humanity. It does not mean anything like what "personhood" normally means.

This has also been the case for the entire history of corporations, which is longer than the history of the US.

espadrine · 2025-03-19T11:55:20 1742385320

> I still haven't seen any good reasoning for why NASA would delay the return flight

So many comments spread unsourced assertions on the topic on both sides. Let me change that.

On 24 Aug 2024, NASA stated in a conference published on X[0]:

> NASA has decided that Butch and Suni will return with Crew-9 next February.

So the decision sounded like it stemmed from NASA, and the plan a year ago was for a return in the time frame that actually occurred.

On 28 Sept 2024, the spacecraft that would bring Ms Williams and Mr Wilmore was launched. They restated the same plan[1]:

> A SpaceX Falcon 9 rocket and a Dragon spacecraft will launch Crew-9 to the space station for about a five-month mission. Hague and Gorbunov will join Butch Wilmore and Suni Williams, who are already aboard the space station, and all will return to Earth as a crew of four in February 2025.

Thus there was no action after Mr Trump took office that changed the plan. (Note that the SpaceX article we’re commenting on has a mistake, stating that Mr Grebenkin has come down on 18 March, but Mr Grebenkin came down on 25 Oct; it is indeed Gorbunov that came down.)

On 7 March 2025, Ken Bowersox, associate administrator, Space Operations Mission Directorate, stated[2] on the motivation for this:

> When it comes to adding on missions or bringing a capsule home early: those were always options but we ruled them out pretty quickly just based on how much money we've got in our budget and the importance of keeping crews on the International Space Station.

When asked specifically about a request from Mr Musk to have an earlier return:

> Was anyone outside NASA or the White House involved in the decision not to bring Suni and Butch back sooner? — There may have been some conversations that I wasn't part of. When we made the technical decisions about Starliner […] our leadership at NASA was trying to make sure that we considered everything just at a technical level and that's what we did.

[0]: https://x.com/NASA/status/1827396382702878966

[1]: https://www.nasa.gov/blogs/commercialcrew/2024/09/28/nasas-s...

[2]: https://www.youtube.com/live/Xq5CH-d4IuU?si=-q-J65i7G_d6Mcxs...

espadrine · 2025-03-05T16:13:17 1741191197

> if a llm will run with usable performance at that scale?

Yes.

The reason: MoE. They are able to run at a good speed because they don't load all of the weights into the GPU cores.

For instance, DeepSeek R1 uses 404 GB in Q4 quantization[0], containing 256 experts of which 8 are routed to[1] (very roughly 13 GB per forward pass). With a memory bandwidth of 800 GB/s[3], the Mac Studio will be able to output 800/13 = 62 tokens per second.

[0]: https://ollama.com/library/deepseek-r1

[1]: https://arxiv.org/pdf/2412.19437

[2]: https://www.apple.com/newsroom/2025/03/apple-unveils-new-mac...

_aavaa_ · 2025-03-05T16:33:00 1741192380

This doesn’t sound correct.

You don’t know which expert you’ll need for each layer, so you either keep them all loaded in memory or stream them from disk

espadrine · 2025-03-05T16:39:43 1741192783

In RAM, yes. But if you compute an activation, you need to load the weights from RAM to the GPU core.

_aavaa_ · 2025-03-05T16:44:28 1741193068

Got you, yeah I misread you commend the first time around

kgwgk · 2025-03-05T16:36:11 1741192571

Note that 404 < 512

fullstackchris · 2025-03-05T16:20:14 1741191614

You seem like you know what you are talking about... mind if I ask what your thoughts on quantization are? Its unclear to me if quantization affects quality... I feel like I've heard yes and no arguments

espadrine · 2025-03-05T16:38:31 1741192711

There is no question that quantization degrades quality. The GGUF R1 uses Q4_K_M, which, on Llama-3-8B, increases the perplexity by 0.18[0]. Many plots show increasing degradation as you quantize more[1].

That said, it is possible to train a model in a quantization-aware way[2][3], which improves the quality a bit, although not higher than the raw model.

Also, a loss in quality may not be perceptible in a specific use-case. Famously LMArena.ai tested Llama 3.1 405B with bf16 and fp8, and the latter was only 2 Elo points below, well within measurement error.

[0]: https://github.com/ggml-org/llama.cpp/blob/master/examples/q...

[1]: https://github.com/ggml-org/llama.cpp/discussions/5063#discu...

[2]: https://pytorch.org/blog/quantization-aware-training/

[3]: https://mistral.ai/news/ministraux

sosuke · 2025-03-05T16:27:25 1741192045

I don't know what I'm talking about but when I first asked your question this https://gist.github.com/Artefact2/b5f810600771265fc1e3944228... helped start me on a path to understanding. I think.

But if you don't already know the question your asking is not at all something I could distill down into a sentence or to that would make sense to a lay-person. Even then I know I couldn't distill it at all sorry.

Edit: I found this link I referenced above on quantized models by bartowski on huggingface https://huggingface.co/bartowski/Qwen2.5-Coder-14B-GGUF#whic...

Ambix · 2025-03-06T10:23:50 1741256630

I did my own experiments and it looks like (surprisingly) Q4KM models often outperforms Q6 and Q8 quantised models.

For bigger models (in range of 8B - 70B) the Q4KM is very good, there are no any degradation compared to full FP16 models.