Hacker Newsnew | past | comments | ask | show | jobs | submit | mtrovo's commentslogin

LLMs fail at laser-focused troubleshooting, but they excel at brute-force breadth. Priming an agent to list 50 distinct possible causes for a database connection failure and investigate each one of them works better than hoping it guesses the exact root cause.

The main take away from this article for me is that battle scars can be used to unbog these agents. That explains the current productivity divide we're seeing, seniors use their past experience to unbog agents. Juniors naturally frame the problem into the brute-force breadth approach. The problem then is focused on Mid-career devs that get the worst results because they are naturals on framing the problem on brute force way so they try to force agents through rigid logic without deep experience.


I don’t think mid career devs are inherently worse, if anything, they’re in the best position to adapt. The real skill shift isn’t “prompt better” vs “think harder”. Rather, it’s knowing when to explore vs when to cut the tree down.

The interesting thing about religions as a whole is that the timespan is so big that you can really see how the backbone of the narrative stays the same while the fanbase and how they pick winners changes a lot, the Vatican state itself is a theocratic state created by an agreement between the pope and the Mussolini.

And if you wanna go back even further just remember that while Europe and christian countries were living in the dark ages the Islamic world was the one driving forward scientific knowledge and the exchange of ideas with the East. https://en.wikipedia.org/wiki/Islamic_Golden_Age


Behold VikTok, the new competitor to Moltbook, soon to be acquihired by Oracle.

I think the main issue is treating LLM as a unrestrained black box, there's a reason nobody outside tech trust so blindly on LLMs.

The only way to make LLMs useful for now is to restrain their hallucinations as much as possible with evals, and these evals need to be very clear about what are the goal you're optimizing for.

See karpathy's work on the autoresearch agent and how it carry experiments, it might be useful for what you're doing.


> there's a reason nobody outside tech trust so blindly on LLMs.

Man, I wish this was true. I know a bunch of non tech people who just trusts random shit that chatgpt made up.

I had an architect tell me "ask chatgpt" when I asked her the difference between two industrial standard measures :)

We had politicians share LLM crap, researchers doing papers with hallucinated citations..

It's not just tech people.


We were working on translations for Arabic and in the spec it said to use "Arabic numerals" for numbers. Our PM said that "according to ChatGPT that means we need to use Arabic script numbers, not Arabic numerals".

It took a lot of back-and-forths with her to convince her that the numbers she uses every day are "Arabic numerals". Even the author of the spec could barely convince her -- it took a meeting with the Arabic translators (several different ones) to finally do it. Think about that for a minute. People won't believe subject matter experts over an LLM.

We're cooked.


Kind of a tangent but that did make me curious about how numbers are written in Arabic: https://en.wikipedia.org/wiki/Eastern_Arabic_numerals


I guess "Western Arabic" would have been more precise.


The architect should have required Hindu numbers. Same result, but even more confusion.


Man this is maddening.


Honestly I think we're just becoming more aware of this way of thinking. It's certainly exacerbated it now that everyone has "an expert" in their pocket.

It's no different than conspiracy theorists. We saw a lot more with the rise in access to the internet. Not because they didn't put in work to find answers to their questions, but because they don't know how to properly evaluate things and because they think that if they're wrong then it's a (very) bad thing.

But the same thing happens with tons of topics, and it's way more socially acceptable. Look how everyone has strong opinions on topics like climate, rockets, nuclear, immigration, and all that. The problem isn't having opinions or thoughts, but the strength of them compared to the level of expertise. How many people think they're experts after a few YouTube videos or just reading the intro to the wiki page?

Your PM is no different. The only difference is the things they believed in, not the way they formed beliefs. But they still had strong feelings about something they didn't know much about. It became "their expert" vs "your expert" rather than "oh, thanks for letting me know". And that's the underlying problem. It's terrifying to see how common it is. But I think it also leads to a (partial) solution. At least a first step. But then again, domain experts typically have strong self doubt. It's a feature, not a bug, but I'm not sure how many people are willing to be comfortable with being uncomfortable


There’s a possibility the same people might believe anything they read on social media or via Google and it’s something worthy of attention.

And the worst part is, these people don't even use the flagship thinking models, they use the default fast ones.


In my experience, people outside of tech have nearly limitless faith in AI, to the point that when it clashes with traditional sources of truth, people start to question them rather than the LLM.


Clothes, wristwatches, cars, you name it. It's a very common play on luxury brands, Hermes Birkins is the most famous that comes to my mind and follow a very similar playbook.

Apart from the KYC aspect of the process it's their way of solving the problem of artificial scarcity on the second-hand market as the article explains. They want a second hand market to exist to indicate that this is a luxury item, but too many and the price tanking with excess supply.


It also solves the real problem of labor scarcity. If you have X master watchmakers available to make a halo product you can only get so much output from them. You can increase X, increase production efficiencies (reduce labor input), or limit supply. The first two reduce exclusivity and perceived quality so the third makes sense if you can live without growth or can grow via high pricing strategies.


> This document was written by an LLM (Claude) and then iteratively de-LLMed by that same LLM under instruction from a human, in a conversation that went roughly like this

This is hilarious.


The hardware looks fine, but Apple's software vision is so confusing.

MacBook Neo is cheaper and weaker than a MacBook Air, yet shares the same price and single-app mindset as an iPad. It uses a phone chip similar to an iPad Pro, but gets multi-user support and a keyboard.

I struggle to run Tahoe on my 16GB M2 Air and somewhat I have to believe running it on a 8GB phone chip is gonna be alright, which if true have me thinking what exactly is the role of iPadOS anyway.

Ultimately, it feels like iPadOS and Tahoe are on a crash course for a middle ground that nobody asked for.


>It uses a phone chip similar to an iPad Pro

You're mixed up here. The iPad Pro has used the same M series processors as MacBooks since April 2021.


I'm (unfortunately) running Tahoe on my M1 macbook pro and don't notice it to be slower than Sequoia. Where is your slowdown?

I'm dislike Tahoe as much as anyone else but performance isn't the problem for me.


At least that's the story LLM labs leaders wanna tell everyone, just happen to be a very good story if you wanna hype your valuation before investment rounds.

Working with LLM on a daily basis I would say that's not happening, not as they're trying to sell it. You can get rid of a 5 vendor headcount that execute a manual process that should have been automated 10 years ago, you're not automating the processes involving high paying people with a 1% error chance where an error could cost you +10M in fines or jail time.

When I see Amodei or Sam flying on a vibe coded airplane is the day I believe what they're talking about.


Aero software people are not highly paid. It's a travesty.


> Workforce re-skilling programs should prioritize “fusion skills”, such as prompt engineering, data stewardship and human-in-the-loop decision-making that enhance human-AI complementarity.

First time reading the term fusion skills used in this context and I really like it.


I created something very similar but to display raindrop links into the remarkable and sync highlights back into raindrop. I also added a GenAI powered paper summary filter as a preface to the papers it send to the remarkable which is working quite well.

As you mentioned Remarkable file format is a PITA to extract highlights, one thing that helped a lot was to add an OCR fix phase that uses Gemini flash model to fix common OCR errors and to merge single highlights that are across pages.


Can you share yours? I'm starting to play with reMarkable tooling & it would be neat to compare what folks are doing.


Oh nice, that's a great idea! I'm exploring OCR of handwritten notes for future features, will give the Gemini pipeline a try.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: