Anybody knows if one can find an inference provider that offers input token caching? It should be almost required for agentic use - first speed, but also almost all conversations start where the previous ended, so cost may end up quite higher with no caching.
I would have expected good providers like Together, Fireworks, etc support it, but I can't find it, except if I run vllm myself on self-hosted instances.
I think that people are just too quick to assume this is amazing, before it is there. Which doesn't mean it won't get there.
Somehow if I take the best models and agents, most hard coding benchmarks are at below 50% and even swe bench verified is like at 75 maybe 80%. Not 95. Assuming agents just solve most problems is incorrect, despite it being really good at first prototypes.
Also in my experience agents are great to a point and then fall off a cliff. Not gradually. Just the type of errors you get past one point is so diverse, one cannot even explain it.
I noticed a similar trends in selling on X. Put a claim, peg on some product A with good sales - Cursor, Claude, Gemini, etc. Then say, the best way to use A is with our best product, guide, being MCP or something else.
For some of these I see something like 15k followers on X, but then no LinkedIn page for example. Website is always a company you cannot contact and they do everything.
Yes. The article is click bait. With such a title I would have expected majority of the area to be dummy, but it is just structurally more silicon, exactly like a picture may be majority of its mass wood.
Your statement is incorrect. The analysis was made by a professional firm - dummy silicon shims are used because the dies are thinned, as per AMD's own disclosures. Those silicon shims are bonded to the compute and SRAM dies.
I ended up disabling copilot. The reason is that the completions do not always integrate with the rest of the code, in particular with non-matching brackets. Often it just repeats some other part of the code. I had much fewer cases of this with Cody. But, arguably, the difference is not huge. But then add on top of this choice of models.
I noticed I had a lot fewer of these problems these last few weeks. I suspect the Copilot team has put a lot more effort into quality-of-life recently.
For instance, I'd often get a problem where I'd type "foo(", and VsCode would auto-close the parenthesis, so my cursor would be in "foo(|)", but Copilot wouldn't be aware of the auto-close, so it would suggest "bar)" as a completion, leading to "foo(bar))" if I accepted it. But I haven't had this problem in recent versions. Other similar papercuts I'd noticed have been fixed.
I haven't used Cody, though, so I don't know how they compare.
It seems recent years give us a lot of licenses (for core infra software) and now for LLMs. They all say in very legalese basically: these top 5-10 tech companies will not compete fairly with us, thus they are banned from using the software. The rest are welcome to use everything.
I wonder if US monopoly regulation actually starts to work well, which I see some signs of happening, will all this license revert back to fully open source?
Exactly. The whole thing reads like some propaganda. It pits interesting topics ahead then to move on and push some agenda that sounds super political to me.
Yes, some languages are underrepresented and there are some thresholds. But exactly, it is well known that putting the threshold just slightly above or below will probably not materially affect the model.
The product they often presented as started in 20% time is Google news. I don't know the actual details, just this is what I remember from my time at Google (2006-2012).
I would have expected good providers like Together, Fireworks, etc support it, but I can't find it, except if I run vllm myself on self-hosted instances.