I feel like people who can't get AI to write production ready code are really bad at describing what they want done. The problem is that people want an LLM to one shot GTA6. When the average software developer prompts an LLM they expect 1) absolutely safe code 2) optimized/performant code 3) production ready code without even putting the requirements on credential/session handling.
You need to prompt it like it's an idiot, you need to be the architect and the person to lead the LLM into writing performant and safe code. You can't expect it to turn key one shot everything. LLMs are not at the point yet.
That's just the thing though - it seems like, to get really good code out of an LLM, a lot of the time, you have to describe everything you want done and the full context in such excruciating detail and go through so many rounds of review and correction that it would be faster and easier to just write the code yourself.
Yes, but please remember you specify the common parts only once for the agent. From there, it’ll base its actions on all the instructions you kept on their configuration.
I’ve found LLMs to be severely underwhelming. A week or two ago I tried having both Gemini3 and GPT Codex refactor a simple Ruby class hierarchy and neither could even identify the classes that inherited from the class I wanted removed. Severely underwhelming. Describing what was wanted here boils down to minima language and they both failed.
Exactly this. Not sure what code other people who post here are writing but it cannot always and only be bleeding edge, fringe and incredible code. They don't seem to be able to get modern LLMs to produce decent/good code in Go or Rust, while I can prototype a new ESP32 which I've never seen fully in Rust and it can manage to solve even some edge cases which I can't find answers on dedicated forums.
I have a sneaking suspicion that AI use isn't as easy as it's made out to be. There certainly seem to be a lot of people who fail to use it effectively, while others have great success. That indicates either a luck or a skill factor. The latter seems more likely.
This sounds like my first job with a big consulting firm many years ago (COBOL as it happens) where programming tasks that were close to pseudocode were handed to the programmers by the analysts. The programmer (in theory) would have very few questions about what he was supposed to write, and was essentially just translating from the firm's internal spec language into COBOL.
I find that at the granularity you need to work with current LLMs to get a good enough output, while verifying its correctness is more effort than writing code directly. The usefulness of LLMs to me is to point me in a direction that I can then manually verify and implement.
Prompt: "can you find any mistakes on my codebase? I put one in there on purpose" + 70,000 tokens of codebase where in one line I have an include for a non-existent file.
Claude 3.7 Sonnet (Try 1 - new chat) Claimed it Found a race condition in the api of the admin interface that would be triggered if two admins were to change the room order at the same time. Claude suggested I group all sql queries in a single transaction. I looked at the code and found that it already used a transaction for all queries. I said: the order_update api is already done with a transaction. Claude replied: "You're absolutely right, and I apologize for my mistake. I was incorrect to claim there was a race condition issue. The transaction ensures atomicity and consistency of the updates, and the SQL queries are properly structured for their intended purpose."
Claude 3.7 Sonnet (Try 2 - new chat) Found a comment in the admin ui javascript code that suggested a potential alternative to event handler cleanup that was not implemented because I decided to go with a cleaner route. Then asked "Is this the issue you intentionally included, or would you like me to look for other potential problems?" I said: "The comment merely suggests an alternative, right?" claude said: "Yes, you're absolutely right. The comment is merely suggesting an alternative approach that isn't being used in the code, rather than indicating a mistake. So there's no actual bug or mistake in this part of the code - just documentation of different possible approaches. I apologize for misinterpreting this as an issue!"
Claude 3.7 Sonnet (Try 3 - new chat) When processing items out of the database to generate QR codes in the admin interface, Claude says that my code both attempts to generate QR codes with undefined data AS WELL AS saying that my error handling skips undefined data. Claude contradicts itself within 2 sentences. When asking about clarification Claude replies: Looking at the code more carefully, I see that the code actually has proper error handling. I incorrectly stated that it "still attempts to call generateQRCode()" in the first part of my analysis, which was wrong. The code properly handles the case when there's no data-room attribute.
Gemnini Advanced 2.5 Pro (Try 1 - new chat) Found the intentional error and said I should stop putting db creds/api keys into the codebase.
Gemnini Advanced 2.5 Pro (Try 2 - new chat) Found the intentional error and said I should stop putting db creds/api keys into the codebase.
Gemnini Advanced 2.5 Pro (Try 3 - new chat) Found the intentional error and said I should stop putting db creds/api keys into the codebase.
o4-mini-high and o4-mini and o3 and 4.5 and 4o - "The message you submitted was too long, please reload the conversation and submit something shorter."
Those responses are very Claude, to. 3.7 has powered our agentic workflows for weeks, but I've been using almost only Gemini for the last week and feel the output is better generally. It's gotten much better at agentic workflows (using 2.0 in an agent setup was not working well at all) and I prefer its tuning over Clause's, more to the point and less meandering.
You can use netboot.xyz from a flash drive to boot various operating systems and utilities. Alternatively, PXE (Preboot Execution Environment) has been around for a while and works by allowing a network-capable device to boot from its network interface. A PXE-compatible network card requests a DHCP lease during the boot process, which provides the IP address of a TFTP (Trivial File Transfer Protocol) server and the file that needs to be loaded from the server.
Typically, the network card contains a basic PXE kernel. To enhance this environment, you can chainload iPXE, which offers a broader range of features. iPXE allows for more advanced booting options, such as loading scripts or initiating an unattended installation directly from the network.
I know that improvmx was reading my emails. Reason? I get an email from them saying: We've detected activity on your domain that violate our Terms of Service, particularly the "Prohibited Activities and Responsible Usage" section.
Yea, no thank you. What's been working really well is CloudFlare's email forwarding service, plus it's free unlike improvmx.
Given that they specifically say they don't do that, what concrete evidence do you have? I don't see anything in their prohibited activities list that would require them to do so (e.g. recipient complaints would suffice)
Mine some other shitcoin-of-the-week and sell it before it crashes? Fight some poor dude's traffic ticket by generating a trial-by-declaration for them? LLMs are actually likely good enough to figure out ways to make tiny amounts of their own money instead of me having to go in and punch a credit card number. $25/month isn't a very high bar.
I won't be surprised if we see a billion-dollar zero-employee company in the next decade with one person as the sole shareholder.
My point isn't about beating the cost of power. I know you can't mine bitcoin in California profitably.
My point is about autonomous software that can figure out how to run itself including registering its own API key and paying for its own service.
Even if it costs me $50/month in power, that's fine. I would just love to see software that can "figure it out" including the registration, captchas, payment, applying knowledge and interfacing with society to make small amounts of petty cash for said payment, everything.
> My point is about autonomous software that can figure out how to run itself including registering its own API key and paying for its own service.
Most means of generating income are diluted by having multiple actors applying them, which is why someone who comes up with such an automated moneyprinter will be disincentives from sharing it.
If it uses $50 worth of electricity to generate $25 worth of income to pay for ChatGPT it is not a money printer. This thread has nothing to do with generating profit. I'm not looking for a money printer.
What I'm looking for is an intelligent system that can figure out a creative way to keep itself running without asking humans for help with API keys or anything else (even if it is doing so at a financial loss; that is out of the scope of my experiment).
Basically "pip install X" and boom it magically works 24 hours later. Behind the scenes, in those 24 hours, it did some work, somewhere on the internet, to make its own income, register a bank account to get paid, buy a VISA gift card, register for ChatGPT account, pay the $25 fee, jump through all the captchas in the process, get a phone number for the idiot SMS confirmations along the way, then create the API key that it needs. Everything, end-to-end. It may have used $200 worth of my electricity, that doesn't matter, I just want to see this level of intelligence happen.
I honestly think we're not that far from this being possible.
This is called advertising and selling your data to brokers. I'm very glad "autonomous software" is not tasked with figuring out how best to exploit my physical identity and resources to make $25/mo.
Intelligence agencies around the world have collectively spent many, many billions on trying to achieve exactly that.
Then again, full mandatory deanonymisation wouldn't be required to just let people unsubscribe from calls from numbers aren't "genuine" in some way. There's a middle ground before "block all unknown numbers" that still lets the doctor's office call me.
I work at a 30 person company in a partitioned office designed for four people, including the CEO, a graphic designer, and a secretary. My role involves focused programming tasks, which are frequently disrupted by the office dynamics.
When the CEO is away, the graphic designer and secretary frequently engage in loud, casual conversations, discussing everything from personal matters to home decor, like curtain colors. They also have a habit of yelling over the office partitions, adding to the disruption. Despite using noise-cancelling headphones, these distractions, including both the conversations and the yelling, consistently hinder my concentration.
Interestingly, the secretary has expressed concern about the perceived level of activity in the office, especially when the CEO is present. The secretary has mentioned to both me and the graphic designer that there might not be enough typing noises, suggesting a worry that the CEO might not think everyone is working hard. This concern about appearances adds another layer to the already challenging office environment.
>This concern about appearances adds another layer to the already challenging office environment.
This sentence sums up the biggest headache of my professional career.
I have never worried about appearances, but instead focused on doing good work.
To my detriment. I know I've missed out on key assignments and at least one promotion at my current employer because I don't sell myself or focus on perception management.
I guess what I'm saying is, this exists everywhere, in every field of work.
Thats one reason I prefer small companies. Less bureaucratic BS, more accountability, etc. Sure there isn't as much room for career growth and other downsides, but much simpler.
You need to prompt it like it's an idiot, you need to be the architect and the person to lead the LLM into writing performant and safe code. You can't expect it to turn key one shot everything. LLMs are not at the point yet.
reply