More

hmaxwell · 2026-01-19T15:30:50 1768836650

I feel like people who can't get AI to write production ready code are really bad at describing what they want done. The problem is that people want an LLM to one shot GTA6. When the average software developer prompts an LLM they expect 1) absolutely safe code 2) optimized/performant code 3) production ready code without even putting the requirements on credential/session handling.

You need to prompt it like it's an idiot, you need to be the architect and the person to lead the LLM into writing performant and safe code. You can't expect it to turn key one shot everything. LLMs are not at the point yet.

ufmace · 2026-01-19T16:26:31 1768839991

That's just the thing though - it seems like, to get really good code out of an LLM, a lot of the time, you have to describe everything you want done and the full context in such excruciating detail and go through so many rounds of review and correction that it would be faster and easier to just write the code yourself.

rbanffy · 2026-01-19T17:00:52 1768842052

Yes, but please remember you specify the common parts only once for the agent. From there, it’ll base its actions on all the instructions you kept on their configuration.

halJordan · 2026-01-20T04:48:21 1768884501

Welcome to the waterfall development model. This is what companies did before enshitiffixation

dmux · 2026-01-19T16:04:14 1768838654

I’ve found LLMs to be severely underwhelming. A week or two ago I tried having both Gemini3 and GPT Codex refactor a simple Ruby class hierarchy and neither could even identify the classes that inherited from the class I wanted removed. Severely underwhelming. Describing what was wanted here boils down to minima language and they both failed.

jamesfinlayson · 2026-01-20T06:22:54 1768890174

I tried getting AI to update some JUnit 4 to Junit 5 - it replaced the JUnit 4 assertions with Java's built-in assert keyword. Very underwhelming.

xandrius · 2026-01-19T15:52:01 1768837921

Exactly this. Not sure what code other people who post here are writing but it cannot always and only be bleeding edge, fringe and incredible code. They don't seem to be able to get modern LLMs to produce decent/good code in Go or Rust, while I can prototype a new ESP32 which I've never seen fully in Rust and it can manage to solve even some edge cases which I can't find answers on dedicated forums.

amarant · 2026-01-19T16:42:03 1768840923

I have a sneaking suspicion that AI use isn't as easy as it's made out to be. There certainly seem to be a lot of people who fail to use it effectively, while others have great success. That indicates either a luck or a skill factor. The latter seems more likely.

What are your secrets? Teach me the dark arts!

sothatsit · 2026-01-19T20:27:52 1768854472

There are wide gaps in:

1) the models people are using (default model in copilot vs. Opus 4.5 or Codex xhigh)

2) the tools people are using (ChatGPT vs. copilot vs. codex vs. Claude code)

3) when people tried these tools (e.g., December saw a substantial capability increase but some people only tried AI this one time last March)

4) how much effort people put into writing prompts (e.g., one vague sentence vs. a couple paragraphs of specific constraints and instructions)

Especially with all the hype, it makes sense to me why people have such different estimates for how useful AI actually is.

SoftTalker · 2026-01-19T16:06:04 1768838764

This sounds like my first job with a big consulting firm many years ago (COBOL as it happens) where programming tasks that were close to pseudocode were handed to the programmers by the analysts. The programmer (in theory) would have very few questions about what he was supposed to write, and was essentially just translating from the firm's internal spec language into COBOL.

reuben364 · 2026-01-19T16:19:30 1768839570

I find that at the granularity you need to work with current LLMs to get a good enough output, while verifying its correctness is more effort than writing code directly. The usefulness of LLMs to me is to point me in a direction that I can then manually verify and implement.

hmaxwell · 2025-04-17T19:36:00 1744918560

I did some testing this morning:

Prompt: "can you find any mistakes on my codebase? I put one in there on purpose" + 70,000 tokens of codebase where in one line I have an include for a non-existent file.

Claude 3.7 Sonnet (Try 1 - new chat) Claimed it Found a race condition in the api of the admin interface that would be triggered if two admins were to change the room order at the same time. Claude suggested I group all sql queries in a single transaction. I looked at the code and found that it already used a transaction for all queries. I said: the order_update api is already done with a transaction. Claude replied: "You're absolutely right, and I apologize for my mistake. I was incorrect to claim there was a race condition issue. The transaction ensures atomicity and consistency of the updates, and the SQL queries are properly structured for their intended purpose."

Claude 3.7 Sonnet (Try 2 - new chat) Found a comment in the admin ui javascript code that suggested a potential alternative to event handler cleanup that was not implemented because I decided to go with a cleaner route. Then asked "Is this the issue you intentionally included, or would you like me to look for other potential problems?" I said: "The comment merely suggests an alternative, right?" claude said: "Yes, you're absolutely right. The comment is merely suggesting an alternative approach that isn't being used in the code, rather than indicating a mistake. So there's no actual bug or mistake in this part of the code - just documentation of different possible approaches. I apologize for misinterpreting this as an issue!"

Claude 3.7 Sonnet (Try 3 - new chat) When processing items out of the database to generate QR codes in the admin interface, Claude says that my code both attempts to generate QR codes with undefined data AS WELL AS saying that my error handling skips undefined data. Claude contradicts itself within 2 sentences. When asking about clarification Claude replies: Looking at the code more carefully, I see that the code actually has proper error handling. I incorrectly stated that it "still attempts to call generateQRCode()" in the first part of my analysis, which was wrong. The code properly handles the case when there's no data-room attribute.

Gemnini Advanced 2.5 Pro (Try 1 - new chat) Found the intentional error and said I should stop putting db creds/api keys into the codebase.

Gemnini Advanced 2.5 Pro (Try 2 - new chat) Found the intentional error and said I should stop putting db creds/api keys into the codebase.

Gemnini Advanced 2.5 Pro (Try 3 - new chat) Found the intentional error and said I should stop putting db creds/api keys into the codebase.

o4-mini-high and o4-mini and o3 and 4.5 and 4o - "The message you submitted was too long, please reload the conversation and submit something shorter."

Tiberium · 2025-04-17T19:37:44 1744918664

The thread is about 2.5 Flash though, not 2.5 Pro. Maybe you can try again with 2.5 Flash specifically? Even though it's a small model.

dyauspitr · 2025-04-18T07:54:06 1744962846

I don’t particularly care about the non frontier models though, I found the comment very useful.

danielbln · 2025-04-17T19:54:50 1744919690

Those responses are very Claude, to. 3.7 has powered our agentic workflows for weeks, but I've been using almost only Gemini for the last week and feel the output is better generally. It's gotten much better at agentic workflows (using 2.0 in an agent setup was not working well at all) and I prefer its tuning over Clause's, more to the point and less meandering.

bambax · 2025-04-17T21:16:28 1744924588

> codebase where in one line I have an include for a non-existent file

Ok but you don't need AI for this; almost any IDE will issue a warning for that kind of error...

rendang · 2025-04-17T20:10:11 1744920611

3 different answers in 3 tries for Claude? Makes me curious how many times you'd get the same answer if you asked 10/20/100 times

airstrike · 2025-04-17T19:44:14 1744919054

Have you tried Claude Code?

fandorin · 2025-04-18T09:57:45 1744970265

how did you put your whole codebase in a prompt for gemini?

hmaxwell · on Aug 19, 2024

You can use netboot.xyz from a flash drive to boot various operating systems and utilities. Alternatively, PXE (Preboot Execution Environment) has been around for a while and works by allowing a network-capable device to boot from its network interface. A PXE-compatible network card requests a DHCP lease during the boot process, which provides the IP address of a TFTP (Trivial File Transfer Protocol) server and the file that needs to be loaded from the server.

Typically, the network card contains a basic PXE kernel. To enhance this environment, you can chainload iPXE, which offers a broader range of features. iPXE allows for more advanced booting options, such as loading scripts or initiating an unattended installation directly from the network.

hmaxwell · on July 10, 2024

This is a nothing burger compared to amazon and google giving $4b and $2b respectively to Anthropic

hmaxwell · on June 17, 2024

Find a new job, or if you like the job you have just ignore it. Or if you want to get fired start explaining concepts to your boss.

hmaxwell · on May 28, 2024

I know that improvmx was reading my emails. Reason? I get an email from them saying: We've detected activity on your domain that violate our Terms of Service, particularly the "Prohibited Activities and Responsible Usage" section.

Yea, no thank you. What's been working really well is CloudFlare's email forwarding service, plus it's free unlike improvmx.

dcminter · on May 28, 2024

Given that they specifically say they don't do that, what concrete evidence do you have? I don't see anything in their prohibited activities list that would require them to do so (e.g. recipient complaints would suffice)

https://improvmx.com/terms-of-service/

https://improvmx.com/our-pledge-to-you/

hmaxwell · on May 13, 2024

using a GPU to mine $25 of bitcoin is going to take you way more than a month

dheera · on May 13, 2024

Mine some other shitcoin-of-the-week and sell it before it crashes? Fight some poor dude's traffic ticket by generating a trial-by-declaration for them? LLMs are actually likely good enough to figure out ways to make tiny amounts of their own money instead of me having to go in and punch a credit card number. $25/month isn't a very high bar.

I won't be surprised if we see a billion-dollar zero-employee company in the next decade with one person as the sole shareholder.

zenlikethat · on May 13, 2024

Unless you live in a place where power is dirt cheap, you need to try harder with your own brain.

dheera · on May 13, 2024

My point isn't about beating the cost of power. I know you can't mine bitcoin in California profitably.

My point is about autonomous software that can figure out how to run itself including registering its own API key and paying for its own service.

Even if it costs me $50/month in power, that's fine. I would just love to see software that can "figure it out" including the registration, captchas, payment, applying knowledge and interfacing with society to make small amounts of petty cash for said payment, everything.

sangnoir · on May 13, 2024

> My point is about autonomous software that can figure out how to run itself including registering its own API key and paying for its own service.

Here is a thought experiment: if you developed such an AI, would you sell it for any amount less than it could earn for you without selling it?

dragonwriter · on May 13, 2024

> My point is about autonomous software that can figure out how to run itself including registering its own API key and paying for its own service.

Most means of generating income are diluted by having multiple actors applying them, which is why someone who comes up with such an automated moneyprinter will be disincentives from sharing it.

Instead, they'll just use it directly.

dheera · on May 14, 2024

If it uses $50 worth of electricity to generate $25 worth of income to pay for ChatGPT it is not a money printer. This thread has nothing to do with generating profit. I'm not looking for a money printer.

What I'm looking for is an intelligent system that can figure out a creative way to keep itself running without asking humans for help with API keys or anything else (even if it is doing so at a financial loss; that is out of the scope of my experiment).

Basically "pip install X" and boom it magically works 24 hours later. Behind the scenes, in those 24 hours, it did some work, somewhere on the internet, to make its own income, register a bank account to get paid, buy a VISA gift card, register for ChatGPT account, pay the $25 fee, jump through all the captchas in the process, get a phone number for the idiot SMS confirmations along the way, then create the API key that it needs. Everything, end-to-end. It may have used $200 worth of my electricity, that doesn't matter, I just want to see this level of intelligence happen.

I honestly think we're not that far from this being possible.

dcow · on May 13, 2024

This is called advertising and selling your data to brokers. I'm very glad "autonomous software" is not tasked with figuring out how best to exploit my physical identity and resources to make $25/mo.

valicord · on May 13, 2024

Sure, let's make an AI that steals resources to survive (and spread?). What could go wrong...

hmaxwell · on May 6, 2024

you would think, but apparently anyone can now download an app called "talkatone" that gives you a free phone number to make calls and send texts with

Wytwwww · on May 7, 2024

In some countries you can still buy prepaid cards which don't required any registration etc. and can be used across the entire EU anonymously.

prox · on May 6, 2024

It would be nice if we had tracert for phones. So you can see the route through which someone calls and what platforms/orgs.

gravescale · on May 6, 2024

Intelligence agencies around the world have collectively spent many, many billions on trying to achieve exactly that.

Then again, full mandatory deanonymisation wouldn't be required to just let people unsubscribe from calls from numbers aren't "genuine" in some way. There's a middle ground before "block all unknown numbers" that still lets the doctor's office call me.

hmaxwell · on Dec 19, 2023

I work at a 30 person company in a partitioned office designed for four people, including the CEO, a graphic designer, and a secretary. My role involves focused programming tasks, which are frequently disrupted by the office dynamics.

When the CEO is away, the graphic designer and secretary frequently engage in loud, casual conversations, discussing everything from personal matters to home decor, like curtain colors. They also have a habit of yelling over the office partitions, adding to the disruption. Despite using noise-cancelling headphones, these distractions, including both the conversations and the yelling, consistently hinder my concentration.

Interestingly, the secretary has expressed concern about the perceived level of activity in the office, especially when the CEO is present. The secretary has mentioned to both me and the graphic designer that there might not be enough typing noises, suggesting a worry that the CEO might not think everyone is working hard. This concern about appearances adds another layer to the already challenging office environment.

Loughla · on Dec 19, 2023

>This concern about appearances adds another layer to the already challenging office environment.

This sentence sums up the biggest headache of my professional career.

I have never worried about appearances, but instead focused on doing good work.

To my detriment. I know I've missed out on key assignments and at least one promotion at my current employer because I don't sell myself or focus on perception management.

I guess what I'm saying is, this exists everywhere, in every field of work.

joshjje · on Dec 19, 2023

Thats one reason I prefer small companies. Less bureaucratic BS, more accountability, etc. Sure there isn't as much room for career growth and other downsides, but much simpler.

serf · on Dec 19, 2023

>This concern about appearances adds another layer to the already challenging office environment.

I sympathize, but I haven't yet had the pleasure of working with other people without the existence of those kind of friction layers.

At my age I just assume now that it's just part of the human condition -- but maybe it's just me.

antoniuschan99 · on Dec 20, 2023

Send this to your coworkers :P

https://switchandclick.com/best-loud-mechanical-keyboards/

hmaxwell · on Nov 18, 2023

what?

kiblekevin · on Nov 18, 2023

Sorry, first time using HN. I was trying to add a description, but looks like it added a comment... apologies

hmaxwell · on Nov 18, 2023

Who are you? Do you have any social media presence?

kiblekevin · on Nov 18, 2023

No, just a fellow traveler... seemed like an interesting idea that no one else has really done, so I dove in.