Yes I have some experience with TiDB. It is pretty amazing actually. They came up with a novel way of distributing data across nodes and having strong consistency while also maintaining great performance. We are recommending it to some of our clients who are looking for an easy scaling option with MySQL (TiDB is MySQL compatible on the connector level.)
> I asked the AI to write me some code to get a list of all the objects in an S3 bucket
they didn’t ask for all the objects in the first returned page of the query
they asked for all the objects.
the necessary context is there.
LLMs are just on par with devs who don’t read tickets properly / don’t pay attention to the API they’re calling (i’ve had this exact case happen with someone in a previous team and it was a combination of both).
The binaries are a few KBs at the moment. I haven't measured the build times, cause it's too early for it to mean much. The amount of supported types and functions is very limited, so it will change a lot over time as I add more stuff.
One interesting thing is that `eval()` support will require custom WebAssembly host functions, cause you can't do custom code generation in WASM. Thus by default the project will assume "no eval" compilation. In this mode it will be possible to do a lot of optimizations, like for example remove unused parts of the language/types, do certain optimiztions knowing exactly what types the script is dealing with etc. So a simple script that doesn't use a lot of the builtins should eventually result in a fairly small binary.
If you're using an LLM as a compressed version of a search index, you'll be constantly fighting hallucinations. Respectfully, you're not thinking big-picture enough.
There are LLMs today that are amazing at coding, and when you allow it to iterate (eg. respond to compiler errors), the quality is pretty impressive. If you can run an LLM 3x faster, you can enable a much bigger feedback loop in the same period of time.
There are efforts to enable LLMs to "think" by using Chain-of-thought, where the LLM writes out reasoning in a "proof" style list of steps. Sometimes, like with a person, they'd reach a dead-end logic wise. If you can run 3x faster, you can start to run the "thought chain" as more of a "tree" where the logic is critiqued and adapted, and where many different solutions can be tried. This can all happen in parallel (well, each sub-branch).
Then there are "agent" use cases, where an LLM has to take actions on its own in response to real-world situations. Speed really impacts user-perception of quality.
> There are LLMs today that are amazing at coding, and when you allow it to iterate (eg. respond to compiler errors), the quality is pretty impressive. If you can run an LLM 3x faster, you can enable a much bigger feedback loop in the same period of time.
Well now the compiler is the bottleneck isn't it? And you would still need human check for bugs that aren't caught by the compiler.
Still nice to have inference speed improvements tho.
Something will always be the bottleneck, and it probably won’t be the speed of electrons for a while ;)
Some compilers (go) are faster than others (javac) and some languages are interpreted and can only be checked through tests. Moving the bottleneck from AI code gen step to the same bottleneck as a person seems like a win.
And yet it takes a non-zero amount of time. I think an apt comparison is a language like C++ vs Python. Yea, technically you can write the same logic in both, but you can't genuinely say that "spelling out the code" takes the same amount of time in each. It becomes a meaningful difference across weeks of work.
With LLM-pair-programing, you can basically say "add a button to this widget that calls this callback" or "call this API with the result of this operation", and the LLM will spit out code that does that thing. If your change is entirely within 1-2 files, and < 300 LOC, in a few seconds, and it can be in your IDE, probably syntactically correct.
It's human-driven, and the LLM just handles the writing. The LLM isn't doing large refactors, nor is it designing scalable systems on its own. A human is doing that still. But it does speed up the process noticeably.
If the speed is used to get better quality with no more input from the user then sure, that is great. But that is not the only way to get better quality (though I agree that there are some low hanging fruit in the area).
To be honest most LLM's are reasonable at coding, they're not great.
Sure they can code small stuff.
But the can't refactor large software projects, or upgrade them.
Upgrading large java projects is exactly what AWS want you to believe their tooling can do, but the ergonomics aren't great.
I think most of the capability problems with coding agents aren't the AI itself, it's that we haven't cracked how to let them interact with the codebase effectively yet. When I refactor something, I'm not doing it all at once, it's a step by step process. None of the individual steps are that complicated. Translating that over to an agent feels like we just haven't got the right harness yet.
Honestly, most software tasks aren’t refactoring large projects, so it’s probably OK.
As the world gets more internet connected and more online, we’ll have an ever expanding list of “small stuff” - glue code that mixes and ever growing list of data sources/sinks and visualizations together. Many of which are “write once” and leave running.
Big companies (eg google) have built complex build systems (eg bazel ) to isolate small reusable libraries within in a larger repo. Which was a necessity to help unbelievably large development teams to manage a shared repository. An LLM acting in its small corner of the wold seems well suited to this sort of tooling, even if it can’t refactor large projects spanning large changes.
I suspect we’ll develop even more abstractions and layers to isolate LLMs and their knowledge of the wold. We already have containers and orchestration enabling “serverless” applications, and embedded webviews for GUIs.
Think about ChatGPT and their python interpreter or Claude and their web view. They all come with nice harnesses to support a boilerplate-free playground for short bits of code. That may continue to accelerate and grow in power.
> The biggest time sink for me is validating answers so not sure I agree on that take.
But you're assuming that it'll always ne validated by humans. I'd imagine that most validation (and subsequent processing, especially going forward) will be done on machines.
By comparison with reality. The initial LLMs had "reality" be "a training set of text", when ChatGPT came out everyone rapidly expanded into RLFH (reinforcement learning from human feedback), and now there's vision and text models the training and feedback is grounded on a much broader aspect of reality than just text.
That's one way to do it, but overkill for this specific thing — self-driving cars or robotics, or natural use of smart-[phone|watch|glass|doorbell|fridge], likely sufficient.
Total surveillance may be necessary for other reasons, like making sure organised crime can't blackmail anyone because the state already knows it all, but it's overkill for AI.
Not if you source your training data from reality.
Are you treating "the internet" as "reality" with this line of questions?
The internet is the map, don't mistake the map for the territory — it's fine as a bootstrap but not the final result, just like it's OK for a human to research a topic by reading on Wikipedia but not to use it as the only source.
Sooner or later someone is going to figure out how to do active training on AI models. It's the holy grail of AI before AGI. This would allow you to do base training on a small set of very high quality data, and then let the model actively decide what it wants to train on going forward or let it "forget" what it wants to unlearn.
1. AI can do what we can do, in much the same way we can do it, because it's biologically inspired. Not a perfect copy, but close enough for the general case of this argument.
2. AI can't ever be perfect because of the same reasons we can't ever be perfect: it's impossible to become certain of anything in finite time and with finite examples.
3. AI can still reach higher performance in specific things than us — not everything, not yet — because the information processing speedup going from synapses to transistors is of the same order of magnitude as walking is to continental drift, so when there exists sufficient training data to overcome the inefficiency of the model, we can make models absorb approximately all of that information.
Does the AI need to know or the curator of the dataset? If the curator took a camera and walked outside (or let a drone wander around for a while), do you believe this problem would still arise?
This looks really interesting and could really be a nice addition to my daily work.
I just downloaded the application, but are unable add OpenAI API keys. Looks like it's probably on my end (with quite an aggressive DNS blocking lists). So my guess here is: I'm unable to add API keys when telemetry is blocked.
Suggestion: please do add some error message when then this occurs. As in, did the request fail (500), faulty key etc
Thank you for the direct actionable feedback, we will improve that messaging.
Regarding debugging your specific problem, when an API is attempted to be added, the local process attempts a 1-token request to the cheapest model with the GPT platform (in your case, gpt-4o-mini on OpenAI) to verify that the key works. Though, if the account has no balance, this request may fail even though it costs a fraction of a fraction of a penny (though anything that fails that request will cause the API key to be considered invalid).
Built a couple of things with Semantic Kernel. Both some private test projects, but also two customer facing applications and one internal.
It's heavily tilted towards OpenAI and it's offerings (either through OpenAI API or through Azure). However, it works decent enough for other alternatives as well, like: huggingface or ollama. Compared to the others (CrewAI etc). I kind of feel like Semantic Kernel hasn't really solved observe ability yet. Sure you can connect what ever logging/metric solution .Net supports, but it's not as seamless like the others. Semantic Kernel is available in .Net, Java and Python. But it's quite obvious .Net is a lot more polished then the others. Python usually gets new features faster, or at least pocs or previews.
Some learnings from it all:
- It's quite easy to get started with
- I like the distinction between native plugins and textbased ones (if a plugin should run code or not)
- There is a feeling of black magic in the background, in the sense of observe ability
- A bit more manual work to get things in order, compared to the alternatives
- Rapid development, it's quite clear the development team from Microsoft is doing a lot of work with this library
All and all, if you feel comfortable with writing C#, then Semantic Kernel is totally a viable option. If you prefer python over anything else, then I would say llamaindex or langchain is probably a better option (for now).
Thanks. I would have preferred to use Go instead of Python, but somehow the language is not picking up a lot in terms of new LLM frameworks.
As of now, I am using very light weight abstractions over prompts in python and that gets the job done. But, it is way too early and I can see how pipelining multiple LLM calls would need a good library that is not too complex and involved. In the end it is just a API call and you hope for the best result :)
Since you prefer go, you might be interested in one of my pet projects. Where I've glued together some libraries, which lets you basically code all interactions with LLM's through lua. The project is written in go.
Currently it only supports ollama, but I've been thinking about adding support for more providers
As you can see, it's in a very early stage. I'm not a go developer, and I use this repository as a way to explore things both within Ollama and with go.
I'll probably add more things as the time goes by, but it isn't something I hack on every day or for that matter week. Just something to poke around and explore things with.
I don't see it like that at all. Some 0-days can (somewhat) be mitigated by other hardware/software.
I rather have as many "known" 0-days in the open. Then having it the other way. Even if it means I won't see any updates to affected devices or software
Just want to say thank you for your contribution. A lot of the wonky stuff made it really funny to watch, but at the same time impressive (for what it was).
Do you have any links to share about details surrounding the project? Would be fun to understand more about the process but as well the work in general.
A lot of it is not mine to share, I was basically a CXO hire brought in by Skyler Hartle and Brian Habersberger who created it* to work with them on assessing the viability of the product they'd built behind the scenes, as a business, and it's levels of scale etc. Skyler and Brian are amazing genius level dudes (and also HN users!). I know the whole story end to end, pretty much exactly as it unfolded, it's not really anything like this blog post thinks it is, to the point I don't think I even agree with the author. Again: I'm sorry but much of it is not mine to share, for a great many reasons.
*As a side project, over years, and years, before it took off, "over night", there is a lot of tech unmentioned they built totally unrelated to LLMs etc, much of it still worth, not mentioning, because it's "cutting edge" still even today)
It's going to be interesting to see what Wendell finds out about the oddities on Windows.
I think with this CPU it will also be the first time I'll no longer dual boot for gaming, or for that matter have a dedicated Windows machine only for gaming.
Has anyone any experience with TiDB? Haven't heard about it before this post