> business rules engines, complex event processing, and related technologies are still marginal in the industry for reasons I don't completely understand
Translating between complex implicit intention in colloquial language and software and formal language used in proof assistants is usually very time consuming and difficult.
By the time you’ve formalized the rules, the context in which the rules made sense will have changed/a lot will be outdated. Plus time and money spent on formalizing rules is time and money not spent on core business needs.
That's definitely true, but I do think production rules have some uses that are less obvious.
For instance, XSLT is not "an overcomplicated Jinja 2" but rather it is based on production rules but hardly anybody seems to know that, they just think it's a Jinja 2 that doesn't do what they want.
Production rules are remarkably effective at dealing with deep asynchrony, say a process that involves some steps done by people or some steps done by humans, like a loan application being processed by a bank that has to be looked at by a loan officer. They could be an answer to the async comm problems in the web browser. See also complex events processing.
Production rules could be a more disciplined way to address the issues addressed by stored procedures in databases.
I've written systems where production rules are used in the control plane to set up and tear down data pipelines with multiple phases in a way that can exploit the opportunistic parallelism that can be found in sprawling commercial batch jobs. (The Jena folks told me what I was doing wasn't supported but I'd spent a lot of time with the source code and there was no problem.)
In order to get to your destination, you need to explain where you want to go. Whatever you call that “imperative language”, in order to actually get the thing you want, you have to explain it. That’s an unavoidable aspect of interacting with anything that responds to commands, computer or not.
If the AI misunderstands those instructions and takes you to a slightly different place than you want to go, that’s a huge problem. But it’s bound to happen if you’re writing machine instructions in a natural language like English and in an environment where the same instructions aren’t consistently or deterministically interpreted. It’s even more likely if the destination or task is particularly difficult/complex to explain at the desired level of detail.
There’s a certain irreducible level of complexity involved in directing and translating a user’s intent into machine output simply and reliably that people keep trying to “solve”, but the issue keeps reasserting itself generation after generation. COBOL was “plain english” and people assumed it would make interacting with computers like giving instructions to another employee over half a century ago.
The primary difficulty is not the language used to articulate intent, the primary difficulty is articulating intent.
this is a weak argument.. i use normal taxis and ask the driver to take me to a place in natural language - a process which is certainly non deterministic.
and the taxi driver has an intelligence that enables them to interpret your destination, even if ambiguous. And even then, mistakes happen (all the time with taxis going to a different place than the passenger intended because the names may have been similar).
The specific events that follow when asking a taxi driver where to go may not be exactly repeatable, but reality enforces physical determinism that is not explicitly understood by probabilistic token predictors. If you drive into a wall you will obey deterministic laws of momentum. If you drive off a cliff you will obey deterministic laws of gravity. These are certainties, not high probabilities. A physical taxi cannot have a catastrophic instant change in implementation and have its wheels or engine disappear when it stops to pick you up. A human taxi driver cannot instantly swap their physical taxi for a submarine, they cannot swap new york with paris, they cannot pass through buildings… the real world has a physically determined option-space that symbolic token predictors don’t understand yet.
And the reason humans are good at interpreting human intent correctly is not just that we’ve had billions of years of training with direct access to physical reality, but because we all share the same basic structure of inbuilt assumptions and “training history”. When interacting with a machine, so many of those basic unstated shared assumptions are absent, which is why it takes more effort to explicitly articulate what it is exactly that you want.
We’re getting much better at getting machines to infer intent from plain english, but even if we created a machine which could perfectly interpret our intentions, that still doesn’t solve the issue of needing to explain what you want in enough detail to actually get it for most tasks. Moving from point A to point B is a pretty simple task to describe. Many tasks aren’t like that, and the complexity comes as much from explaining what it is you want as it does from the implementation.
We can (unreliably) write more code in natural english now. At its core it’s the same thing: detailed instructions telling the computer what it should do.
I’ve been saying this for years now: you can’t avoid communicating what you want a computer to do. The specific requirements have to be made somewhere.
Inferring intent from plain english prompts and context is a powerful way for computers to guess what you want from underspecified requirements, but the problem of defining what you want specifically always requires you to convey some irreducible amount of information. Whether it’s code, highly specific plain english, or detailed tests, if you care about correctness they all basically converge to the same thing and the same amount of work.
> if you care about correctness they all basically converge to the same thing and the same amount of work.
That's the part I'd push back on. They're not the same amount of work.
When I'm writing the code myself, it's basically a ton of "plumbing" of loops and ifs and keeping track of counters and making sure I'm not making off-by-one errors and not making punctuation mistakes and all the rest. It actually takes quite a lot of brain energy and time to get that all perfect.
It saves a lot of time to write the function definition in plain English, have the LLM generate a bunch of tests that you verify are the correct definition... and then let the LLM take care of all the loops and indexing and punctuation and plumbing.
I regularly cut what used to be an entire afternoon or day's worth of work down into 30 minutes. I spend 10 minutes writing the design for what will be 500-1,000 lines of code, 5 minutes answering the LLM's questions about it, 5 minutes skimming the code to make sure it all looks vaguely plausible (no obvious red flags), 5 minutes ensuring the unit tests cover everything I can think of (almost always, the LLM has thought of a bunch of edge cases I never would have bothered to test), and another 5 minutes telling it to fix things, like its unit tests make me suddenly realize there's an edge case that should be defined differently.
The idea that it's the "same amount of work" is crazy to me. It's so much more efficient. And in all honesty, the code is more reliable too because it tests things that I usually wouldn't bother with, because writing all the tests is so boring.
> When I'm writing the code myself, it's basically a ton of "plumbing" of loops and ifs and keeping track of counters and making sure I'm not making off-by-one errors and not making punctuation mistakes and all the rest. It actually takes quite a lot of brain energy and time to get that all perfect.
All of that "plumbing" affects behavior. My argument is that all of the brain energy used when checking that behavior is necessary in order to check that behavior. Do you have a test for an off by one error? Do you have a test to make sure your counter behaves correctly when there are multiple components on the same page? Do you have a test to make sure errors don't cause the component to crash? Do you have a test to ensure non utf-8 text or binary data in a text input throws a validation error? Etc etc. If you're checking all the details for correct behavior, the effort involved converges to roughly the same thing.
If you're not checking all of that plumbing, you don't know whether or not the behavior is correct. And the level of abstraction used when working with agents and LLMs is not the same as when working with a higher level language, because LLMs make no guarantees about the correspondence between input and output. Compilers and programming languages are meticulously designed to ensure that output is exactly what is specified. There are bugs and edge cases in compilers and quirks based on different hardware, so it's not always 100% perfect, but it's 99.9999% perfect.
When you use an LLM, you have no guarantees about what it's doing, and in a way that's categorically different than not knowing what a compiler does. Very few people know all of the steps that break down `console.log("hello world")` into the electrical signals that get sent to the pixels on a screen on a modern OS using modern hardware given the complexity of the stack, but they do know with as close as is humanly possible to 100% certainty that a correctly configured environment will result in that statement outputting the text "hello world" to a console. They do not need to know the implementation because the contract is deterministic and well defined. Prompts are not deterministic nor well defined, so if you want to verify it's doing what you want it to do, you have to check what it's doing in detail.
Your basic argument here is that you can save a lot of time by trusting the LLM will faithfully wire the code as you want, and that you can write tests to sanity check behavior and verify that. That's a valid argument, if you're ok tolerating a certain level of uncertainty about behavior that you haven't meticulously checked or tested. The more you want to meticulously check behavior, the more effort it takes, and the more it converges to the effort involved in just writing the code normally.
> If you're checking all the details for correct behavior, the effort involved converges to roughly the same thing.
Except it doesn't. It's much less to verify the tests.
> That's a valid argument, if you're ok tolerating a certain level of uncertainty about behavior that you haven't meticulously checked or tested.
I'm a realist, and know that I, like all other programmers, am fallible. Nobody writes perfect code. So yes, I'm ok tolerating a certain level of uncertainty about everybody's code, because there's no other choice.
I can get the same level of uncertainty in far less time with an LLM. That's what makes it great.
> Except it doesn't. It's much less to verify the tests.
This is only true when there is less information in those tests. You can argue that the extra information you see in the implementation doesn't matter as long as it does what the tests say, but the amount of uncertainty depends on the amount of information omitted in the tests. There's a threshold over which the effort of avoiding uncertainty becomes the same as the effort involved in just writing the code. Whether or not you think that's important depends on the problem you're working on and your tolerance for error and uncertainty, and there's no hard and fast rule for that. But if you want to approach 100% correctness, you need to attempt to specify your intentions 100% precisely. The fact that humans make mistakes and miscommunicate their intentions does not change the basic fact that a human needs to communicate their intention for a machine to fulfill that intention. The more precise the communication, the more work that's involved, regardless of whether you're verifying that precision after something generates it or generating it yourself.
> I can get the same level of uncertainty in far less time with an LLM. That's what makes it great.
I have a low tolerance for uncertainty in software, so I usually can't reach a level I find acceptable with an LLM. Fallible people who understand the intentions and current function of a codebase have a capacity that a statistical amalgamation of tokens trained on fallible people's output simply do not have. People may not use their capacity to verify alignment between intention and execution well, but they have it.
Again, I'm not denying that there's plenty of problems where the level of uncertainty involved in AI generated code is acceptable. I just think it's fundamentally true that extra precision requires extra work/there's simply no way to avoid that.
> I have a low tolerance for uncertainty in software
I think that's what's leading you to the unusual position that "This is only true when there is less information in those tests."
I don't believe in perfection. It's rarely achieved despite one's best efforts -- it's a mirage. What we can realistically look for is a statistical level of reliability that tests help achieve.
At the end of the day, it's about delivering value. If you can on average deliver 5x value with an LLM because of the speed, or 1.05x value because you verified every line of code 3 times and avoided a rare bug that both the LLM and you didn't think about testing (compared to the 1x value of a non-perfectionist developer), then I know which one I'm choosing.
> Such laws cannot be enforced. Enforcement can only be arbitrary.
I am against criminalizing cryptography and largely agree about it being infeasible given the extent of proliferation and ease of replicating it/am playing devil's advocate:
Laws banning math related to manufacturing nuclear weapons can and has been enforced. It's important to take legal threats like ChatControl seriously and not just dismiss it as absurd/unenforceable overreach, even if that's likely true.
Banning math in relation to nuclear weapons was typically very specific and most often involved hardware export controls.
The key note with what the previous poster said was 'arbitrary', meaning the laws will end up a nonsensical mess because the maths have huge amount of industrial, commercial, and personal uses and suddenly one range of use is banned leads to situation where law enforcement tends to go after particular groups for who they are, not what they've done.
I’ve talked to their devs/met them in person and trust them, most of their stack is public/all the primitives they use are available and well documented (see https://github.com/holepunchto and https://docs.pears.com/), I’ve used that stack and verified it does what is advertised, and I believe they’re planning a full open source release of the parts that aren’t already public.
I'm probably ignorant to specific issues that make more advanced typesetting for journal submissions necessary, but I don't understand why some academic flavor of markdown isn't the standard. I'd advocate for that before either LaTeX or Typst.
I absolutely get the importance of typesetting for people who publish physical books/magazines/etc, but when it comes to research I don't see the value of typsetting anything. Journals or print publishers should be responsible for typsetting submissions to fit their style/paper size/etc, and researchers should just be responsible for delivering their research in a format that's simpler and more content focused.
> I don't understand why some academic flavor of markdown isn't the standard
I would argue that is exactly what LaTeX is... I studied mathematics in university, and from what I recall, every major publisher provided a LaTeX template for articles and textbooks. Likewise, pretty much every mathematics presentation uses Beamer slides, and most mathematicians are able to "compile" subsets of LaTeX in their head. Websites like MSE and MO use MathJax precisely so that people can write notation as they would on assignments, notes, papers, etc.
Note: I am not saying people particularly like LaTeX as a tool. However, the vast majority of the complaints about LaTeX do seem to be from computer science people. Many mathematics students just create an Overleaf (formerly ShareLaTeX) account and call it a day. Of course, nobody enjoys notes taking 10 seconds to compile, or the first 100 lines of their thesis being a preamble, but the important part is the ability to express notation across a variety of mediums, and the publisher support (as GP mentioned).
I agree the standard for mathematical notation is latex, but it’s only needed for fairly limited parts of a document. It makes more sense to me as something you’d use in snippets like `$\sum_{n=1}^{10}n$` than something that should control the whole document.
Markdown and mathjax is imo way more web friendly than a full latex document and avoids distracting/unnecessary aspects of latex.
As for publisher support, that’s what frustrates me most: html was specifically designed for academics at cern to publish and link documents. Instead of using an html friendly format like markdown, publishers demand a format designed for printing content to a physical piece of paper. So now html is used for almost everything except academic papers, and academic papers are all pdfs submitted to journal or preprint severs with no links.
Translating between complex implicit intention in colloquial language and software and formal language used in proof assistants is usually very time consuming and difficult.
By the time you’ve formalized the rules, the context in which the rules made sense will have changed/a lot will be outdated. Plus time and money spent on formalizing rules is time and money not spent on core business needs.
reply