More

mullr · on July 5, 2023

"The Lightning Tamers", by Kathy Joseph, is a wonderful and accessible book about the history of electricity. Her youtube channel is great too: https://www.youtube.com/c/KathyLovesPhysicsHistory

mullr · on April 21, 2023

https://docs.oasis-open.org/virtio/virtio/v1.2/cs01/virtio-v...

mullr · on Dec 19, 2022

I had the opposite experience. I worked for a company with a bunch of Clojure projects, written by people of varying levels of experience. I had to do some cross-cutting changes and feared the worst. But when I actually got down to it, everything more or less made sense.

Why did this happen?

- We had small common framework that everybody used, at the very highest level (think application lifecycle management). That imposed some amount of consistency at the most basic run-the-program stage.

- The devs communicated openly, a lot, so there was some general consensus on what to do, and what not to do.

- The team at large was very suspicious of introducing new macros. You could do it, but you'd better have a really good reason.

- When I went to make the changes, I didn't have to worry about spooky-action-at-a-distance kinds of consequences anywhere NEAR as much as I do in other languages. Being strict with your state management, as Clojure strongly encourages, REALLY pays off here.

The actual problems I had were entirely related to the overall build system, the fractured nature of the source control, and figuring out who was responsible for what code once we were 3 reorgs deep. The code itself was remarkably resilient to all this nonsense.

nerdponx · on Dec 21, 2022

> We had small common framework that everybody used

Python applications seem to benefit from this as well, and I've encountered a surprising amount of resistance to it from other developers. I think everyone has been burned at least once by "over-designing", building too much of the wrong abstraction. But the result is that they are never willing to commit to a common internal framework even long after the need for one has become painfully obvious.

Usually this happens among developers who have been solo developing a project for a while, and see themselves as YAGNI zealots fighting the good fight against excessive abstraction and overengineering.

I understand and sympathize with the sentiment. But when every design decision is ad-hoc and as-needed, it makes it really really hard for external contributors to make changes to the existing codebase. It discourages contributors from "big-picture thinking" and eventually leads to the dreaded Ball of Mud design, with some combination of:

• meandering flow control

• poorly-defined or nonexistent interface boundaries

• inconsistent naming

• lack of documentation, or incorrect documentation and/or comments

• redundant safety checks, or absent safety checks

• poor runtime performance

• difficult-to-test code that freely mixes I/O and business logic, requiring complicated test fixtures, tests that are difficult or impossible to change, and poor test coverage (antipattern and code smell: "idk how to test that, don't waste your time. did you run it on the QA environment and check that it worked?")

I think it arises as a misunderstanding of the forces that lead to overengineering and building incorrect abstractions in the first place. The problem is usually one of expanding the scope of an abstraction or framework too early, not of building the abstraction or framework in the first place.

mullr · on Dec 9, 2022

Every Linux C/C++/Rust developer should know about https://github.com/KDAB/hotspot. It's convenient and fast. I use it for Rust all the time, and it provides all of these features on the back of regular old `perf`.

galangalalgol · on Dec 9, 2022

What perf record settings do you use? Trying to use dwarf has never worked well fore with rust, so I've been using lbr, but even then it seems like it gets which instructions are part of which function wrong a significant portion of the time.

mullr · on Dec 9, 2022

I've had no problems with dwarf.

galangalalgol · on Dec 9, 2022

With optimizations and debug symbols turned on and the arch specified perf report very often puts some pieces of functions in the calling function for me when I use dwarf. Do you do anything specific in the build?

namibj · on Dec 10, 2022

You need to activate the (somewhat slow?) inline-stack-aware addr2line integration/usage for optimized builds.

It doesn't place them there, they exist there (due to inlining).

saagarjha · on Dec 10, 2022

Ah, very nice! I’ve been looking for something like Instruments on Linux since I like to click around and this looks cool.

mullr · on July 9, 2022

Error recovery in nom is left as a very obtuse exercise to the reader. Custom error reporting is difficult at best. That stuff is supposed to be better in chumsky; I don’t know if it actually is.

However, for my own parser which is currently written in nom, my current plan is to port it over to tree-sitter. Its error recovery is completely automatic, and a fair sight better than anything I have time to do by hand.

atoav · on July 9, 2022

nom chumsky?

kjeetgill · on July 9, 2022

Thank you for this revelation. I'd always imagined nom being about "eating tokens" but this makes so much sense for a parser.

brundolf · on July 9, 2022

How do tree-sitter's ergonomics compare to these other two?

mullr · on July 9, 2022

Caveats: I've used nom in anger, chumsky hardly at all, and tree-sitter only for prototyping. I'm using it for parsing a DSL, essentially a small programming language.

The essential difference between nom/chomsky and tree-sitter is that the former are libraries for constructing parsers out of smaller parsers, whereas tree-sitter takes a grammar specification and produces a parser. This may seem small at first, but is a massive difference in practice.

As far as ergonomics go, that's a rather subjective question. On the surface, the parser combinator libraries seem easier to use. They integrate well with the the host language, so you can stay in the same environment. But this comes with a caveat: parser combinators are a functional programming pattern, and Rust is only kind of a functional language, if you treat it juuuuust right. This will make itself known when your program isn't quite right; I've seen type errors that take up an entire terminal window or more. It's also very difficult to decompose a parser into functions. In the best case, you need to write your functions to be generic over type constraints that are subtle and hard to write. (again, if you get this wrong, the errors are overwhelming) I often give up and just copy the code. I have at times believed that some of these types are impossible to write down in a program (and can only exist in the type inferencer), but I don't know if that's actually true.

deep breath

Tree-sitter's user interface is rather different. You write your grammar in a javascript internal dsl, which gets run and produces a json file, and then a code generator reads that and produces C source code (I think the codegen is now written in rust). This is a much more roundabout way of getting to a parser, but it's worth it because: (1) tree-sitter was designed for parsing programming languages while nom very clearly was not, and (2) the parsers it generates are REALLY GOOD. Tree-sitter knows operator precedence, where nom cannot do this natively (there's a PR open for the next version: https://github.com/Geal/nom/pull/1362) Tree-sitter's parsing algorithm (GLR) is tolerant to recursion patterns that will send a parser combinator library off into the weeds, unless it uses special transformations to accommodate them.

It might sound like I'm shitting on nom here, but that's not the goal. It's a fantastic piece of work, and I've gotten a lot of value from it. But it's not for parsing programming languages. Reach for nom when you want to parse a binary file or protocol.

As for chumsky: the fact that it's a parser combinator library in Rust means that it's going to be subject to a lot of the same issues as nom, fundamentally. That's why I'm targeting tree-sitter next.

There's no reason tree-sitter grammars couldn't be written in an internal DSL, perhaps in parser-combinator style (https://github.com/engelberg/instaparse does this). That could smooth over a lot of the rough edges.

strogonoff · on July 11, 2022

Tree-sitter appears to be ultra-focused on producing valid syntax trees really fast. This is great for e.g. syntax highlighting, but suboptimal in cases where you are writing a reference parser for your custom language and want to provide very useful error descriptions. Chumsky seems to be more suited for the latter (and also has a part of tutorial about precedence[0], so it seems to deal at least with that case).

This overview of parser tradeoffs may be helpful: https://blog.jez.io/tree-sitter-limitations/.

[0] https://github.com/zesterer/chumsky/blob/master/tutorial.md#...

mullr · on April 5, 2022

For Kagi, at least, there's a very well integrated search customization method that they didn't bother to show here. For any search result, you can add a ranking adjustment for the site it came from. This is directly in the results, so it's very accessible, and quite easy. One of the choices is 'pin', which is fantastic for technical work: 'sqlite.org' is now boosted over everything else, for me, and it's exactly what I want. I could just as easily take it out, if it becomes a problem.

mullr · on Jan 29, 2022

Yeah, it's really not. You CAN do knuth-style literate programming in org-mode (https://orgmode.org/manual/Extracting-Source-Code.html) I used it to make http://mullr.github.io/micrologic/literate.html.

The experience completely cured me of Knuth-style literate programming, fwiw. It's really great for making a lasting artifact about a program that's completely done. But I can count the number of programs I've worked on like that on zero fingers. Even this one isn't really done, but the cost of updating the essay along with the code discouraged me from working on it any more.

ParetoOptimal · on Jan 30, 2022

> It's really great for making a lasting artifact about a program that's completely done.

My emacs configuration is the exact opposite of a program that's completely done, but I find literate programming good for managing it's complexity.

> The experience completely cured me of Knuth-style literate programming, fwiw.

If it's not too much to ask, do you mind sharing some of the pain points?

mullr · on Jan 30, 2022

> If it's not too much to ask, do you mind sharing some of the pain points?

To me, the point of literate programming is that you have a coherent (literate, if you well) document that explains how the program actually works, and the reason it's put together how it is. This is NOT an easy thing to write. It takes as much organization as the program itself. I found the document structure to be continuously in flux, as I updated the program to deal with new requirements. So either document would poorly structured, or I would spend a LOT of time keeping it good.

svat · on Jan 31, 2022

Note that Knuth has written dozens of books and is perpetually making edits and corrections to them (not just typos, but often extensive rewrites while making sure that the book remains coherent as a whole): this is an activity that comes easily to him, so one can see how he can do it with programs too. The rest of us may find it harder.

But then again, we can ask: if we cannot keep the program's document structure as a whole up-to-date, then are we really editing the program properly? A major risk of introducing bugs is that one may edit a part of a program without taking into account the broader context, such that the integrity of the program is lost, and literate programming (which forces us to update "the whole program" every time) can be considered a mitigation of this risk… so IMO the greater time that it takes to update the whole program could actually be a good thing, saving time in debugging or whatever.

Edit: Also, part of the trick is to put only as much as you think is really relevant in the "text" part: literate programming does not necessarily mean over-commenting everything (Knuth doesn't either), it's just an orientation that what you're doing is writing a document. It's ok for most of that document to be code, as long as you think you've presented it well enough.

joseph8th · on Jan 31, 2022

Yeah, I encountered this difficulty while writing the "literate" API documentation that my blog post was based on. The example in my post is very simplistic. The real thing makes numerous database queries to set up data, then creates an entire mock "client" with a mock employee, mock location, mock services, appointments, etc., by making successive requests to the (local dev) API. Order matters. And it's still a pretty simple example of literate programming (if, as someone mentioned earlier, it could be called "literate programming" at all in the sense that Knuth meant it).

In that regard, I'd say it is. True, we don't tangle and weave to separate artifacts. In fact, it doesn't tangle at all. However, I'd argue that in this case that isn't the point of the program. It's essentially a TUI to build an HTML artifact. Obviously we can't tangle `restclient` source blocks, but nor are we just dumping the results of `restclient` requests. Everything is piped into bash and post-processed by `jq` to clean up the final result for display.

That said, I accept that it's still not really "literate programming". It's literate API documentation.

nesarkvechnep · on Jan 30, 2022

I’ve experimented with NOWEB in the past and it’s cool. I would imagine it being good for teaching. Like “these are the steps we have to take to accomplish the desired result” and then break down every step with the ability to actually build the code at the end.

mullr · on Jan 17, 2022

you mean like 'u' for micro, and 'lambda'? I think this is pretty common.

Regardless, you're probably doing yourself a disservice if you're allowing things like that to take choices out of your toolbelt. Perhaps the worst offender is TLA+, where you have to write actual ascii art and latex inside your code (yes, I know model/spec). I despise it, to be clear, but it's still a pretty good tool and often the right thing to reach for

exdsq · on Jan 17, 2022

Or Agda where you basically have to use emacs with the agda extension to get all the characters.

Less than or equal ends up looking like:

data _≤_ : ℕ → ℕ → Set where

   z≤n : {n : ℕ} → zero ≤ n

   s≤s : {n m : ℕ} → n ≤ m → suc n ≤ suc m

mullr · on Jan 17, 2022

And to really rub salt on it, they have a syntax that looks very much like stock latex for math symbols, and SOME of them are the same, but not all of them! (all / forall is the one that comes to mind, it's been a little while)

hwayne · on Jan 17, 2022

It always ALWAYS trips me up that TLA+ uses \A and \E while LaTeX uses \forall and \exists

mullr · on Nov 18, 2021

What are people doing about this on the client side? The solution that comes to mind is to do all my Rust builds in a sandbox of some kind, but with rust-analyzer involved, I'd likely have to put my editor in there as well.

gpm · on Nov 18, 2021

There's some work towards moving the scarier parts of rust builds (e.g. procedural macros, that run arbitrary code) into a wasm-based sandbox. E.g. [1]. Obviously doesn't make the final artifacts safe to run though, and I also wouldn't trust LLVM to have no bugs exploitable by feeding it bad code, but at least it would raise the bar.

[1] https://github.com/dtolnay/watt

Edit: And someone on reddit brought up vscode's dev containers [2], to move everything into docker. Obviously docker isn't really a security sandbox, but again it raises the bar.

[2] https://code.visualstudio.com/docs/remote/containers

rectang · on Nov 18, 2021

At first glance, watt looks like a substantial improvement that would close the door on arbitrary code execution by proc macro crates. Yes, please! While this may not solve the general problem of package identity validation, it closes a Rust-specific hole that hopefully doesn't need to exist.

Now if only `build.rs` could be nerfed...

duped · on Nov 18, 2021

build.rs is particularly useful for Rust because it is routinely used to compile C/C++ object files as a previous step, which is crucial to having solid Rust to C/C++ FFI.

It is no different from a ./configure script, or other prebuilt script. Lots of builds require these, and "nerfing" it just makes building Rust harder. Cargo is already a crippled build system that requires extensions like cargo-make to be useful. Getting rid of something so fundamentally required by modern software with no standard fallback would be a massive blow to the ecosystem.

I really am not convinced that there is anything "scary" about a build.rs file - other than that standard tools like rust-analyzer find it sane to run external code during initialization. Your language server shouldn't be coupled to the build system and require it to run!

(And yes, Cargo is a build system - it's just a bad one)

rectang · on Nov 18, 2021

sigh, probably "nerfed" wasn't the greatest choice of words... I'm writing such an FFI crate right now, and I use a `build.rs`. I can still wish that the package management system didn't have to fall back to running arbitrary code, or that there was some way to sandbox that code. That would make it easier for people to trust my crate!

mullr · on Sept 23, 2021

> support for the Wine and Proton compatibility layers on Linux is included