Hacker Newsnew | past | comments | ask | show | jobs | submit | summarity's commentslogin

It’s wall e but for devs

The nova min version is called Clinical Repair in Germany

Thanks!

You can have that today with Nim.

Nim feels like a really amazing language. There were some minor things that I wanted to do with it. Like trying to solve a codeforces question just out of mere curiosity to build something on top of it.

I felt like although it was similar to python. You can't underestimate the python's standard library features which I felt lacking. I am not sure if these were skill issues. Yes these are similar languages but I would still say that I really welcome a language like SPy too.

The funny thing is that I ended up architecting a really complicated solution to a simple problem in nim and I was proud of it and then I asked chatgpt thinking no way there can be anything simpler for it in nim and I found something that worked in 7-10 or 12* lines and my jaw dropped lol. Maybe chatgpt could be decent to learn nim imo or reading some nim books for sure but the packages environment etc. felt really brittle as well.

I think that there are good features of both nim and SPy and I welcome both personally.


GPT is amazing at Nim. Ive used it to find a subtle bug in a macro that’s hundreds of lines of code.

There don't seem to be great web frameworks like Flask, Django, or FastAPI for Nim.

"Great" smells very subjective. I went to https://forum.nim-lang.org/ . Put "flask" in the search box. Second hit (https://forum.nim-lang.org/search?q=flask) is this: https://forum.nim-lang.org/t/11032 . That mentions not one but 2 projects (https://github.com/HapticX/happyx & https://github.com/planety/prologue).

If either/both are not "great enough" in some particulars you want, why not raise a github issue? (Or even better look into adding said particulars yourself? This is really the main way Python grew its ecosystem.)


That's where it's at. I'm using the 1600D vectors from OpenAI models for findsight.ai, stored SuperBit-quantized. Even without fancy indexing, a full scan (1 search vector -> 5M stored vectors), takes less than 40ms. And with basic binning, it's nearly instant.

this is at the expense of precision/recall though isn't it?

With the quant size I'm using, recall is >95%.

Approximate nearest neighbor searches don't cost precision. Just recall.

In terms of GC quality, Nim comes to mind.

I keep ignoring nim for some reason. How fast is it with all the checks on? The benchmarks for it julia, and swift typically turn off safety checks, which is not how I would run them.

Since anything/0 = infinity, these kinds of things always depend upon what programs do and as a sibling comment correctly observes how much they interfere with SIMD autovectorization and sevral other things.

That said, as a rough guideline, nim c -d=release can certainly be almost the same speed as -d=danger and is often within a few (single digits) percent. E.g.:

    .../bu(main)$ nim c -d=useMalloc --panics=on --cc=clang -d=release -o=/t/rel unfold.nim
    Hint: mm: orc; opt: speed; options: -d:release
    61608 lines; 0.976s; 140.723MiB peakmem; proj: .../bu/unfold.nim; out: /t/rel [SuccessX]
    .../bu(main)$ nim c -d=useMalloc --panics=on --cc=clang -d=danger -o=/t/dan unfold.nim
    Hint: mm: orc; opt: speed; options: -d:danger
    61608 lines; 2.705s; 141.629MiB peakmem; proj: .../bu/unfold.nim; out: /t/dan [SuccessX]
    .../bu(main)$ seq 1 100000 > /t/dat
    .../bu(main)$ /t
    /t$ re=(chrt 99 taskset -c 2 env -i HOME=$HOME PATH=$PATH)
    /t$ $re tim "./dan -n50 <dat>/n" "./rel -n50 <dat>/n"
    225.5 +- 1.2 μs (AlreadySubtracted)Overhead
    4177 +- 15 μs   ./dan -n50 <dat>/n
    4302 +- 17 μs   ./rel -n50 <dat>/n
    /t$ a (4302 +- 17)/(4177 +- 15)
    1.0299 +- 0.0055
    /t$ a 299./55
    5.43636... # kurtosis=>5.4 sigmas is not so significant
Of course, as per my first sentence, the best benchmarks are your own applications run against your own data and its idiosyncratic distributions.

EDIT: btw, /t -> /tmp which is a /dev/shm bind mount while /n -> /dev/null.


In Julia, at least, bounds checks tend to be a pretty minor hit (~20%) unless the bounds check gets in the way of vectorization

I've always tried to avoid situations that could lead to SQLITE_BUSY. SQLITE_BUSY is an architecture smell. For standard SQLite in WAL, I usually structure an app with a read "connection" pool, and a single-entry write connection pool. Making the application aware of who _actually_ holds the write lock gives you the ability to proactively design access patterns, not try to react in the moment, and to get observability into lock contention, etc.

Even with that pattern (which I use too) you still need to ensure those write operations always start a transaction at the beginning in order to avoid SQLITE_BUSY.

Yes, indeed. In my apps, which are mostly Nim, my pool manager ensures this always happens - along with a host of other optimizations. I often start with barebones SQLite and then later switch to LiteSync (distributed SQLite with multi-master replication), so I keep the lock management at the app level to adapt to whatever backend I'm using.

I am really curious about LiteSync. Any chance you could share a bit on your experiences with it (recognising it’s somewhat off-topic…). Do you run with multiple primaries? What sort of use cases do you reach to it for? Conflict resolution seems a bit simplistic at first glance (from the perspective of someone very into CRDTs), have you experienced any issues as a result of that?

Yes! This is the way.

Honestly, its the key to getting the most out of sqlite. It also allows for transaction batching and various other forms if batching that can massively improve write throughput.


I mean, you're not wrong, and that is one way to solve it, but the whole point of a sensibly-designed WAL -- never mind database engine -- is that you do not need to commit to some sort of actor model to get your db to serialise writes.

These are performance optimizations. SQLite does serialize writes. Avoiding concurrent writes to begin with just avoids some overhead on locking.

"performance optimisation" --- yeees, well, if you don't care about data integrity between your reads and writes. Who knows when those writes you scheduled really get written. And what of rollbacks due to constraint violations? There's we co-locate transactions with code: they are intertwined. But yes, a queue-writer is fine for a wide range of tasks, but not everything.

It's that we need to contort our software to make sqlite not suck at writes that is the problem.


This is just FUD. The reason SQLite does locking to begin with is to avoid data corruption. Almost every statement this blog post makes about concurrency in SQLite is wrong, so it's little surprise that their application doesn't do what they expect.

>Who knows when those writes you scheduled really get written

When a commit completes for a transaction, that transaction has been durably written. No mystery. That's true whether you decide to restrict writes to a single thread in your application or not.


> When a commit completes for a transaction, that transaction has been durably written. No mystery. That's true whether you decide to restrict writes to a single thread in your application or not.

Usually this is true but there are edge cases for certain journaled file systems. IIRC sqlite.org has a discussion on this.


> there are edge cases for certain journaled file systems. IIRC sqlite.org has a discussion on this.

Can't currently find it but I guess it comes under the "if the OS or hardware lies to SQLite, what can it do?" banner?


That might have been it. Overall the whole “How to corrupt your database article” was quite a good read:

https://sqlite.org/howtocorrupt.html


You are talking about low level stuff like syncing to the filesystem; that data is journalled and ensuring atomicity is maintained and I am in actual fact not.

Dislocating DML from the code that triggers it creates many problems around ensuring proper data integrity and it divorces consistent reads of uncommitted data that you may want to tightly control before committing. By punting it to a dedicated writer you're removing the ability to ensure serialised modification of your data and the ability to cleanly react to integrity errors that may arise. If you don't need that? Go ahead. But it's not fud. We build relational acid compliant databases this way for a reason


Oh, I think you're picturing executing your transaction logic and then sending writes off to a background queue. I agree, that's not a general strategy - it only works for certain cases.

I just meant that if you can structure your application to run write transactions in a single thread (the whole transaction and it's associated logic, not just deferring writing the end result to a separate thread) then you minimize contention at the SQLite level.


> Who knows when those writes you scheduled really get written

I await the write to complete before my next read in my application logic, same as any other bit of code that interacts with a database or does other IO. Just because another thread handles interacting with the writer connection, doesn't mean my logic thread just walks away pretending the write finished successfully in 0ms.


SQLite, for the most part, uses polling locks. That means it checks if a lock is available to be taken, and if it's not, it sleeps for a bit, then checks again, until this times out.

This becomes increasingly inefficient as contention increases, as you can easily get into a situation where everyone is sleeping, waiting for others, for a few milliseconds.

Ensuring all, or most, writes are serialized, improves this.


Yeah it’s just heat. Some e-bikes have a clutch to disengage the motor to prevent this motor breaking.


Wouldn't it be much simpler to disengage the battery electrically than a physical clutch? Or is the "clutch" just an electronic?

And where is the heat dumped? Why does the physical resistance disappear when the battery is disconnected?


If you want to get reliable automated fixes today, I'd encourage you to enable code scanning on your repo. It's free for open-source repos and includes Copilot Autofix (also for free).

We've already seen more than 100,000 fixes applied with Autofix in the last 6 months, and we're constantly improving it. It's powered by CodeQL, our deterministic and in-depth static analysis engine, which also recently gained support for Rust.

To enable go to your repo -> Security -> code scanning.

Read more about how autofix works here: https://docs.github.com/en/code-security/code-scanning/manag...

And stay tuned for GitHub Universe in a few weeks for other relevant announcements ;).

Disclaimer: I'm the Product lead on detection & remediation engines at GitHub


Please tell your people about 2FA SMS delivery issues to certain West African countries. I'd rather have it via email or have the option of WhatsApp

I was fine before 2FA and I'm willing to pay to go without. Same username

Can't scan my code if I can't access my account


> Crazy amount at a crazy high speed

That's 300GB/s slower than my old Mac Studio (M1 Ultra). Memory speeds in 2025 remain thouroughly unimpressive outside of high-end GPUs and fully integrated systems.


The server systems have that much memory bandwidth per socket. Also, that generation supports DDR5-6400 but they were using DDR5-5200. Using the faster stuff gets you 614GB/s per socket, i.e. a dual socket system with DDR5-6400 is >1200GB/s. And in those systems that's just for the CPU; a GPU/accelerator gets its own.

The M1 Ultra doesn't have 800GB/s because it's "integrated", it simply has 16 channels of DDR5-6400, which it could have whether it was soldered or not. And none of the more recent Apple chips have any more than that.

It's the GPUs that use integrated memory, i.e. GDDR or HBM. That actually gets you somewhere -- the RTX 5090 has 1.8TB/s with GDDR7, the MI300X has 5.3TB/s with HBM3. But that stuff is also more expensive which limits how much of it you get, e.g. the MI300X has 192GB of HBM3, whereas normal servers support 6TB per socket.

And it's the same problem with Apple even though there's no great reason for it to be. The 2019 Intel Xeon Mac Pro supported 1.5TB of RAM -- still in slots -- but the newer ones barely reach a third of that at the top end.


> The M1 Ultra doesn't have 800GB/s because it's "integrated", it simply has 16 channels of DDR5-6400, which it could have whether it was soldered or not.

The M1 Ultra has LPDDR5, not DDR5. And the M1 Ultra was running its memory at 6400MT/s about two and a half years before any EPYC or Xeon parts supported that speed—due in part to the fact that the memory on a M1 Ultra is soldered down. And as far as I can tell, neither Intel nor AMD has shipped a CPU socket supporting 16 channels of DRAM; they're having enough trouble with 12 channels per socket often meaning you need the full width of a 19-inch rack for DIMM slots.


LPDDR5 is "low power DDR5". The difference between that and ordinary DDR5 isn't that it's faster, it's that it runs at a lower voltage to save power in battery-operated devices. DDR5-6400 DIMMs were available for desktop systems around the same time as Apple. Servers are more conservative about timings for reliability reasons, the same as they use ECC memory and Apple doesn't. Moreover, while Apple was soldering their memory, Dell was shipping systems using CAMM with LPDDR5 that isn't soldered, and there are now systems from multiple vendors with CAMM2 and LPDDR5X.

Existing servers typically have 12 channels per socket, but they also have two DIMMs per channel, so you could double the number of channels per socket without taking up any more space for slots. You could also use CAMM which takes up less space.

They don't currently use more than 12 channels per socket even though they could because that's enough to not be a constraint for most common workloads, more channels increase costs, and people with workloads that need more can get systems with more sockets. Apple only uses more because they're using the same memory for the GPU and that is often constrained by memory bandwidth.


> Existing servers typically have 12 channels per socket, but they also have two DIMMs per channel, so you could double the number of channels per socket without taking up any more space for slots. You could also use CAMM which takes up less space.

Usually this comes at a pretty sizable hit to MHz available. For example STH notes that their Zen5 ASRock Rack EPYC4000D4U goes from DDR5-5600 down to DDR5-3600 with the second slot populated, a 35% drop in throughput. https://www.servethehome.com/amd-epyc-4005-grado-is-great-an...


It comes with a drop in performance because there are then two sticks on the same channel. Having the same number of slots and twice as many channels would be a way around that.

(It's also because of servers being ultra-cautious again. The desktops say the same thing in the manual but then don't enforce it in the BIOS and people run two sticks per channel at the full speed all over the place.)


Do you have a benchmark that shows the M1 Ultra CPU to memory throughput?


Ah nice, I’ve been meaning to try TCC with Nim.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: