More

maerch · 2025-12-12T05:52:02 1765518722

The closest I come to working with part-time, minimum-wage workers is working with student employees. Even then, they earn more and usually work more than five hours a week.

Most of the time, I end up putting in more work than I get out of it. Onboarding, reviewing, and mentoring all take significant time.

Even with the best students we had, paying around 400 euros a month, I would not say that I saved five hours a week.

And even when they reach the point of being truly productive, they are usually already finished with their studies. If we then hire them full-time, they cost significantly more.

maerch · 2025-11-19T13:59:13 1763560753

Factorio 2.0 seemed to pull it off. I think that as long as users don’t feel misled by a DLC that only adds a few skins, they generally appreciate larger updates to a game.

maerch · 2025-10-13T10:25:40 1760351140

Exactly this. I thought about getting a T7, but the price is just ridiculous. And it’s not even like you’re paying for quality, there are so many complaints about both minor and major issues.

maerch · 2025-10-09T13:40:24 1760017224

People being prevented from doing their job because of code formatting? In my nearly 20 years of development, that statement was indeed true, but only before the age of formatters. Back then, endless hours were spent on recurring discussions and nitpicky stylistic reviews. The supposed gains were minimal, maybe saving a few seconds parsing a line faster. And if something is really hard to read, adding a prettier-ignore comment above the lines works wonders. The number of times I’ve actually needed it since? Just a handful.

Code style is a Pareto-optimal problem space: what one person finds readable may look like complete chaos to someone else. There’s no objective truth, and that’s why I believe that in a project involving multiple people, spending time on this is largely a waste of time.

maerch · 2025-10-08T04:50:35 1759899035

> My experience is it often generates code that is subtlety incorrect. And I'll waste time debugging it.

> […]

> Or it'll help me debug my code and point out things I've missed.

I made both of these statements myself and later wondered why I had never connected them.

In the beginning, I used AI a lot to help me debug my own code, mostly through ChatGPT.

Later, I started using an AI agent that generated code, but it often didn’t work perfectly. I spent a lot of time trying to steer the AI to improve the output. Sometimes it worked, but other times it was just frustrating and felt like a waste of time.

At some point, I combined these two approaches: I cleared the context, told the AI that there was some code that wasn’t working as expected, and asked it to perform a root cause analysis, starting by trying to reproduce the issue. I was very surprised by how much better the agent became at finding and eventually fixing problems when I framed the task from this different perspective.

Now, I have commands in Claude Code for this and other due diligence tasks, and it’s been a long time since I last felt like I was wasting my time.

maerch · 2025-10-02T19:15:36 1759432536

I still have a bad taste in my mouth after all those GPT-5 hype articles that claimed the model was just one step away from AGI.

gardnr · 2025-10-02T20:52:47 1759438367

TBF, they all believed that scaling reinforcement learning would achieve the next level. They had planned to "war-dial" reasoning "solutions" to generate synthetic datasets which achieved "success" on complex reasoning tasks. This only really produced incremental improvements at the cost of test-time compute.

Now Grok is publicly boasting PhD level reasoning while Surge AI and Scale AI are focusing on high quality datasets curated by actual PhD humans.

Surge AI is boasting $1B in revenue, and I am wondering how much of that was paid in X.ai stock: https://podcasts.apple.com/us/podcast/the-startup-powering-t...

In my opinion the major advancements of 2025 have been more efficient models. They have made smaller models much, much better (including MoE models) but have failed to meaningfully push the SoTA on huge models; at least when looking at the USA companies.

ACCount37 · 2025-10-02T22:38:19 1759444699

Raw model size is still pegged by the hardware.

You can try to build a monster the size of GPT-4.5, but even if you could actually make the training stable and efficient at this scale, you still would suffer trying to serve it to the users.

Next generation of AI hardware should put them in reach, and I expect that model scale would grow in lockstep with new hardware becoming available.

svachalek · 2025-10-02T21:23:11 1759440191

Same, qwen3 omni blows my mind for what a 30b-A3b model can do. I had a video chat with it and it correctly identified plant species I showed it.

adastra22 · 2025-10-02T23:32:09 1759447929

Without defining “AGI” that’s always true, and trivially so.

maerch · 2025-10-02T16:08:36 1759421316

> The agent follows references like a human analyst would. No chunks. No embeddings. No reranking. Just intelligent navigation.

I think this sums it up well. Working with LLMs is already confusing and unpredictable. Adding a convoluted RAG pipeline (unless it is truly necessary because of context size limitations) only makes things worse compared to simply emulating what we would normally do.

maerch · 2025-08-28T13:23:56 1756387436

Apart from the “—“, what else gives it away? Just asking from a non-native perspective.

Romario77 · 2025-08-28T13:49:10 1756388950

It's just too bombastic for what it is - listing some equations with brief explanation and implementation.

If you don't know these things on some level already the post doesn't give you too much (far from 95%), it's a brief reference of some of the formulas used in machine learning/AI.

random3 · 2025-08-28T18:27:40 1756405660

Slop brings back memories of literature teachers red-marking my "bombastic" terms in primary school essays

TFortunato · 2025-08-28T13:40:42 1756388442

This is probably not going to be a very helpful answer, but I sort of think of it this way: you probably have favorite authors or artist (or maybe some really dislike!), where you could probably take a look at a piece of their work, even if its new to you, and immediately recognize their voice & style.

A lot of LLM chat models have a very particular voice and style they use by default, especially in these longer form "Sure, I can help you write a blog article about X!" type responses. Some pieces of writing just scream "ChatGPT wrote this", even if they don't include em-dashes, hah!

TFortunato · 2025-08-28T13:50:04 1756389004

OK, on reflection, there are a few things,

Kace's response is absolutely right that the summaries tend to be a place where there is a big giveaway.

There is also something about the way they use "you" and the article itself... E.g. the "you now have a comprehensive resource to understand and apply ML math. Point anyone asking about core ML math here..." bit. This isn't something you would really expect to read in a human written article. It's a ChatBot presenting it's work to "you", the single user it's conversing with, not an author addressing their readers. Even if you ask the bot to write you an article for a blog, a lot of times it's response tends to mix in these chatty bits that address the user or directly references to the users questions / prompts in some way, which can be really jarring when transferred to a different medium w/o some editing

kace91 · 2025-08-28T13:26:55 1756387615

Not op, but it is very clearly the final summary telling the user that the post they asked the AI to write is now created.

gandalfgreybeer · 2025-08-29T08:45:52 1756457152

I stopped reading the post before that and went back to check. It's so blatant...especially when it mentions visualizations.

> With theoretical explanations, practical implementations, and visualizations, you now have a comprehensive resource to understand and apply ML math. Point anyone asking about core ML math here—they’ll learn 95% of what they need in one place!

gandalfgreybeer · 2025-08-29T08:43:49 1756457029

As someone who tended to use "—" in a lot of my writing naturally before, the prevalence of its usage by LLMs frustrate me a lot. I now have to rewrite things that felt natural just so no one will think I'm an LLM.

nxobject · 2025-08-28T19:06:37 1756407997

Three things come to mind:

- bold-face item headers (eg “Practical Significance:”)

- lists of complex descriptors non-technical parts of the writing (“ With theoretical explanations, practical implementations, and visualizations”)

- the cheery, optimistic note that underlines a goal plausibly derived from a prompt. (eg “ Let’s dive into the equations that power this fascinating field!”)

cgadski · 2025-08-28T13:43:23 1756388603

It's not really about the language. If someone doesn't speak English well and wants to use a model to translate it, that's cool. What I'm picking up on is the dishonesty and vapidness. The article _doesn't_ explore linear algebra, it _doesn't_ have visualizations, it's _not_ a comprehensive resource, and reading this won't teach you anything beyond keywords and formulas.

What makes me angry about LLM slop is imagining how this looks to a student learning this stuff. Putting a post like this on your personal blog is implicitly saying: as long as you know some some "equations" and remember the keywords, a language model can do the rest of the thinking for you! It's encouraging people to forgo learning.

maerch · 2025-08-27T16:10:54 1756311054

I’m really trying to understand your point, so please bear with me.

As I see it, this prompt is essentially an "executable script". In your view, should all prompts be analyzed and possibly blocked based on heuristics that flag malicious intent? Should we also prevent the LLM from simply writing an equivalent script in a programming language, even if it is never executed? How is this different from requiring all programming languages (at least from big companies with big engineering teams) to include such security checks before code is compiled?

echelon · 2025-08-27T19:39:06 1756323546

Prompts are not just executable scripts. They are API calls to servers that are listening and that can provide dynamic responses.

These companies can staff up a team to begin countering this. It's going to be necessary going forward.

There are inexpensive, specialized models that can quickly characterize adversarial requests. It doesn't have to be perfect, just enough to assign a risk score. Say from [0, 100], or whatever normalized range you want.

A combination of online, async, and offline systems can analyze the daily flux in requests and flag accounts and query patterns that need further investigation. This can happen when diverse risk signals trigger heuristics. Once a threshold has been triggered, it can escalate to manual review, rate limiting, a notification sent to the user, or even automatic account temporary suspension.

There are plenty of clues in this attack behavior that can lead to the tracking and identification of some number of attackers, and the relevant bodies can be made aware of any positively ID'd attackers: any URLs, hostnames, domains, accounts, or wallets that are being exfiltrated to can be shut down, flagged, or cordoned off and made subject of further investigation by other companies or the authorities. Countermeasures can be deployed.

The entire system can be mathematically modeled and controlled. It can be observed, traced, and replayed as an investagorory tool and means of restitution.

This is part of a partnership with law enforcement and the broader public. Red teams, government agencies, other companies, citizen bug and vuln reporters, customers, et al. can participate once the systems are built.

maerch · 2025-07-30T06:03:24 1753855404

I am not sure how good your test really is. Or at least how high your bar is.

Paul Erdös was told about this problem with multiple explanations and just rejected the answer. He could not believe it until they ran a simulation.

QuantumGood · 2025-07-30T22:30:19 1753914619

In my experience, as Harvard outlined long ago, the two main issues with decision making are frame blindness (don't consider enough other ways of thinking about the issue) and non-rigorous frame choice (jumping to conclusions).

But an even more fundamental cause, as a teacher, is that I often find seemingly different frames to both simply be misunderstood, not understood and rejected. I learned by trying many ways of presenting what I thought the best frame was. So I learned that "explanations" may be received primarily as noise, with "What is actually being said" being replaced with, incorrectly, by "What I think you probably mean". Whenever someone replies "okay" to a yes or no comment/statement, I find they have always misunderstood the statement, and learned how often people will attempt to move forwards without understanding where they are.

And if multiple explanations are just restatings of the same frame (as is common in casual arguments), it's impossible to compare frames, because only one is being presented.. It's the old "if you think aren't making any mistakes, that's another mistake".

Often, a faulty frame clears up both what is wrong with another frame, as well as leading to a best frame. I usually find the most fundamental frame is the most useful.

For example, I found many Reddit forums discussing a problem with selecting the choice of audio output (speaker) on Fire TV Sticks. If you go through the initial setup, sometimes it will give you a choice (first level of flow chart), but often not the next level choice, which you need. And setup will not continue. Then it turned out that old remotes and new remotes had the volume buttons in a different location, and there were two sets of what looked like volume buttons. When you pressed the actual volume buttons, everything worked normally. When you pressed the up/down arrows where the old volume buttons had been, you had to restart setup many times.

The correct framing of the problem was "Volume buttons are now on the left, not the right". It was not a software setup issue. Or wondering why you're key doesn't work, but you're at the wrong car. Or it's not a problem with your starter motor, you're out of gas. Etc.

johnnyanmac · 2025-07-30T07:21:05 1753860065

I don't know who Paul Erdös is, so this isn't useful information without considering why they rejected the answer and what counterarguments were provided. It is an unintuitive problem space to consider when approaching it as a simple probability problem, and not one where revealing new context changes the odds.

QuantumGood · 2025-07-31T07:35:09 1753947309

Erdös published more papers than any other mathematician in history—and collaborated with more than 500 coauthors, giving rise to the concept of the "Erdős number," a (playful) measure of collaborative proximity among mathematicians