More

bobajeff · 2026-04-05T14:12:15 1775398335

It's very surprising to me that the state of the art tools for data entry and digitizing still require a lot of supervision. From the article it's not that surprising that handwritten documents are harder for old-school OCR or AI as that can be hard even for humans in some cases. But tables and different layouts seem like low hanging fruit for vision models.

chelm · 2026-04-05T15:57:27 1775404647

Speaking about "that the state of the art tools", might be 6 months or 20 years old. Surfaced opinions might rely on software that a company licensed 2 years ago. Sadly, we need to take this enterprise speed of adaptation into account.

bobajeff · 2026-03-19T13:59:19 1773928759

This might not be bad as long as Astral is allowed to continue to work on improving ty, uv and ruff. I do worry about they'll get distracted by their Codex job duties though.

bobajeff · 2026-03-04T02:29:54 1772591394

Kind of looks like like Minecraft if it was built out of Voxatron. (millions of Little destructible cubes) seems like a very very difficult thing to do at that scale. On top of that making a engine and a language. This guy must have interesting things to say.

bobajeff · 2026-02-19T13:49:56 1771508996

It's says:

>the reference implementation from Physically Based Rendering (Pharr, Jakob, Humphreys)

I'd like to know a little about the process you went through for the port. That book * sounds like an excellent resource to start from but what was it like using it and the code?

* https://pbrt.org/

simondanisch · 2026-02-19T14:09:45 1771510185

I've done lots of manually refactoring of the initial Prototype in Trace.jl (by Anton Smirnov, who I think ported an earlier version of the pbrt book). This helped familiarizing myself with the math and infrastructure and the general problems a raytracer faces and lay the ground work for the general architecture and what to pay attention to for fast GPU execution. One key insight was, that its possible to not need to have an UberMaterial, but instead use a MultiTypeSet for storing different materials and lights, which allows fast and concretely typed iterations.

Then I found that pbrt moved away from the initial design and I used claude code to port large parts of the new C++ code to Julia. This lead to a pretty bad port and I had lots of back and forth to fix bugs, improve the GPU acceleration, make the code more concise and "Julian" and correct the AIs mistakes and bogus design decisions ;) This polish isn't really over yet, but it works well enough and is fast enough for a beta release!

bobajeff · 2026-02-19T13:28:22 1771507702

As someone who currently uses dabbles in both. That prediction seems a bit unrealistic. Julia is a fantastic language but it has some trade offs that need to be considered. Probably the most well known is `time to first x`. Julia like Python is used comfortably in notebooks but loading libraries can take a minute, compared to Python where it happens right away. It may lead you to not reach for it when you want to do quick testing of something especially plotting. You can mitigate this somewhat by loading all the libraries you'll ever need at startup (preferably long before you are ready to experiment) but that assumes you already know what libraries you'll need for what you're wanting to try.

simondanisch · 2026-02-19T13:42:25 1771508545

What prediction? Maybe I need to rephrase what I said: My prediction is, that if Julia ever wants to have a shot at replacing Python, it absolutely has to solve the first time to first x problem! That's what I mean by shipping fully ahead of time compiled binaries and interpreting more glue code - which both have the potential to solve the first time to x problem.

bobajeff · 2026-02-19T13:56:28 1771509388

The prediction I was referring to was the one in the parent comment. (The one I was commenting under)

simondanisch · 2026-02-19T14:00:09 1771509609

Ah sorry :D

bobajeff · 2026-02-07T21:57:04 1770501424

I remember the creator of Kaiju engine stating something about C++ compilers producing slower code with C-style C++.

direwolf20 · 2026-02-08T01:18:58 1770513538

That's probably to do with exceptions — possibly the only thing that pervades C++ code even if you don't use it. The compiler has to write code so an exception at any point leaves the stack in a sensible way, etc. Try -fno-exceptions (and -fno-rtti might save some memory while you're at it)

Regrettably not every C++ feature is free if you don't use it. But there aren't many that aren't.

bobajeff · 2026-02-03T18:19:28 1770142768

I like the idea of a DSL for scraping but my scrapers do more than extract text. I also download files (+monitor download progress) and intercept images (+ check for partial or failed to load images). So it seems my use case isn't really covered with this.

zachperkitny · 2026-02-03T18:46:48 1770144408

Thanks for the idea actually! It's difficult to cover every use case in the 0.1.0 release. I'll take this into account. Downloading Files/Images could likely be abstracted into just an HTTP source and the data sources could be merged in some way.

bobajeff · 2026-02-03T17:57:01 1770141421

I had to look up what KDL is and what `Functional Source License, Version 1.1, ALv2 Future License` is.

So KDL is like another JSON or Yaml. FSL-1.1-ALv2 is an, almost but not really, open source license that after a 2 years becomes available under a real open source license. It's to prevent free loading from companies or something. Sounds fine to me actually.

zachperkitny · 2026-02-03T18:54:40 1770144880

Effectively, it's not meant to restrict people from using it, even in a commercial setting, just to protect my personal interests in what I want to do with it in a commercial setting.

KDL is more than just JSON or YAML. It's node based. It's output in libraries is effectively an AST and its use cases are open ended.

pseudohadamard · 2026-02-04T10:06:59 1770199619

For me the acronym clash was "DSL". Presumably a language for sucking information off web pages.

zachperkitny · 2026-02-04T12:36:58 1770208618

Domain Specific Language - It's a fairly common acronym I feel like. SQL is a DSL, so is CSS, Rust macros lets you create DSLs, for example. The opposite is General Purpose, like Python, JavaScript

bobajeff · 2026-02-03T14:21:05 1770128465

Interesting. I wonder if this technique can be used to aid c decompilation in tools like Ghidra.

bobajeff · 2026-01-24T14:01:18 1769263278

I've been looking into UI libraries lately like Qt and Slint and wondered why they chose to create a DSL over just using css, html and a little bit of js. But I imagine C++ generation from a small DSL is easier than js.