More

veselin · 2025-07-23T10:59:06 1753268346

Anybody knows if one can find an inference provider that offers input token caching? It should be almost required for agentic use - first speed, but also almost all conversations start where the previous ended, so cost may end up quite higher with no caching.

I would have expected good providers like Together, Fireworks, etc support it, but I can't find it, except if I run vllm myself on self-hosted instances.

gianpaj · 2025-07-23T11:23:44 1753269824

Alibaba Cloud does: > Supported models. Currently, qwen-max, qwen-plus, qwen-turbo, qwen3-coder-plus support context cache.

zackify · 2025-07-23T11:01:08 1753268468

I know. I cannot believe lm studio. Ollama. Especially model providers, do not offer this yet.

veselin · 2025-07-02T16:07:55 1751472475

I think that people are just too quick to assume this is amazing, before it is there. Which doesn't mean it won't get there.

Somehow if I take the best models and agents, most hard coding benchmarks are at below 50% and even swe bench verified is like at 75 maybe 80%. Not 95. Assuming agents just solve most problems is incorrect, despite it being really good at first prototypes.

Also in my experience agents are great to a point and then fall off a cliff. Not gradually. Just the type of errors you get past one point is so diverse, one cannot even explain it.

veselin · 2025-03-31T12:53:27 1743425607

I noticed a similar trends in selling on X. Put a claim, peg on some product A with good sales - Cursor, Claude, Gemini, etc. Then say, the best way to use A is with our best product, guide, being MCP or something else.

For some of these I see something like 15k followers on X, but then no LinkedIn page for example. Website is always a company you cannot contact and they do everything.

jpadkins · 2025-03-31T15:31:54 1743435114

no linkedIn page is a green flag for me.

veselin · 2024-12-18T16:58:26 1734541106

Yes. The article is click bait. With such a title I would have expected majority of the area to be dummy, but it is just structurally more silicon, exactly like a picture may be majority of its mass wood.

ItsTotallyOn · 2024-12-18T17:35:11 1734543311

Your statement is incorrect. The analysis was made by a professional firm - dummy silicon shims are used because the dies are thinned, as per AMD's own disclosures. Those silicon shims are bonded to the compute and SRAM dies.

veselin · on April 8, 2024

I used them both.

I ended up disabling copilot. The reason is that the completions do not always integrate with the rest of the code, in particular with non-matching brackets. Often it just repeats some other part of the code. I had much fewer cases of this with Cody. But, arguably, the difference is not huge. But then add on top of this choice of models.

PoignardAzur · on April 9, 2024

I noticed I had a lot fewer of these problems these last few weeks. I suspect the Copilot team has put a lot more effort into quality-of-life recently.

For instance, I'd often get a problem where I'd type "foo(", and VsCode would auto-close the parenthesis, so my cursor would be in "foo(|)", but Copilot wouldn't be aware of the auto-close, so it would suggest "bar)" as a completion, leading to "foo(bar))" if I accepted it. But I haven't had this problem in recent versions. Other similar papercuts I'd noticed have been fixed.

I haven't used Cody, though, so I don't know how they compare.

veselin · on April 6, 2024

It seems recent years give us a lot of licenses (for core infra software) and now for LLMs. They all say in very legalese basically: these top 5-10 tech companies will not compete fairly with us, thus they are banned from using the software. The rest are welcome to use everything.

I wonder if US monopoly regulation actually starts to work well, which I see some signs of happening, will all this license revert back to fully open source?

veselin · on April 3, 2024

When I saw the name, I knew immediately this is Jyrki's work.

aebtebeten · on April 3, 2024

I'm waiting for huaraJPEG...

actionfromafar · on April 3, 2024

what is that?

mensi · on April 3, 2024

a much ruder but just as stereotypically Swiss German thing as the "-li" suffix ;)

veselin · on March 31, 2024

Exactly. The whole thing reads like some propaganda. It pits interesting topics ahead then to move on and push some agenda that sounds super political to me.

Yes, some languages are underrepresented and there are some thresholds. But exactly, it is well known that putting the threshold just slightly above or below will probably not materially affect the model.

veselin · on March 18, 2024

I think this is simply the default of lm-evaluation-harness. They said they ran every single benchmark they could out of the box.

veselin · on Jan 19, 2024

The product they often presented as started in 20% time is Google news. I don't know the actual details, just this is what I remember from my time at Google (2006-2012).