Hacker Newsnew | past | comments | ask | show | jobs | submit | gwynforthewyn's commentslogin

What's the conversation that you're looking to have here? There are fairly widespread claims that GPT-5 is worse than 4, and that's what the help article you've linked to says. I'm not sure how this furthers dialog about or understanding of LLMs, though, it reads to _me_ like this question just reinforces a notion that lots of people already agree with.

What's your aim here, sgt3v? I'd love to positively contribute, but I don't see how this link gets us anywhere.


Maybe to prompt more anecdotes on how gpt-$ is the money making gpt—where they gut quality and hold prices steady to reduce losses?

I can tell you that the post describes is exactly what I’ve seen also: degraded performance and excruciatingly slow.


I've been seeing teammates go from promising juniors to people who won't think, and I've tried hard here to say what I think they're going wrong.

Like the great engineers who came before us and told us what they had learned, Rob Pike, Jez Humble, Martin Fowler or Bob Martin, it's up to those of us with a bit more experience to help the junior generation to get through this modern problem space and grow healthily. First, we need to name the problem we see, and for me that's what I wrote about here.


There’s always been this draw in software engineering to find the silver bullet that will allow you to turn off your brain and just vibe your way to a solution. It might be OOP or TDD or pair programming or BDD or any number of other “best practices”. This is just an unusual situation where someone really can turn off their brain and get a solution that compiles and solves the problem, and so for the type of person that doesn’t want to think, it feels like they found what they’re looking for. But there’s still no silver bullet for complexity. I guess there’s nothing to do but reject the PR and say “Explain this code to me, then I’ll review it.”


Most juniors I watch program very quickly get overwhelmed by complexity because they dont know how to follow strategies like BDD or TDD which isolate parcels of complexity (e.g. how the program is supposed to behave) from other parcels (e.g. how the code actually works).

Even worse though, they all seem to think that the solution to becoming overwhelmed with complexity isnt to parcel it up with strategies like BDD and TDD but to just get better at stuffing more complexity into their brains.

To be honest, I see a similar attitude with LLMs where loads of people think you just need to stuff more into the context window and tweak the prompt and then it'll be reliable.


"People who won't think" resonates with me for the draw I've felt being pulled towards by chatbots, and I've got plenty of experience in software and electrical engineering. They're pretty damn helpful to aid discovery and rubber ducking, but even trying to evaluate different products/approaches versus one another they will hallucinate wild facts and tie them together with a nice polished narrative. It's easy enough to believe them as it is, never mind if I had less expertise. I've found that I have to consciously pull the ripcord at a certain point, telling myself that if I really want the answer to some question I've got to spend the time digging into it myself.


I have to disagree. The same people who won't think existed in previous generations as well. The only difference was, they blindly regurgitated what Bob Martin et. al were saying.


There will always be people who won't think. The specific problem the OP is noticing, is "promising juniors" who then fizzle out and become these people who won't think. I think this is an interesting category to look at. Is AI making more of the non-thinkers appear promising erroneously? Or is it a comfortable but insidious offramp for the lazy-minded who previously would only think because they had to? Or are the demands of the tech industry, increasing velocity at the cost of conscientiousness, tempting/forcing juniors to use AI to preserve their jobs, but at the cost of their careers?


I read over the author's analysis of the `mkdir` error. The author thinks that the abundance of error codes that mkdir can return could've confused gemini, but typically we don't check for every error code, we just compare the exit status with the only code that means "success" i.e. 0.

I'm wondering if the `mkdir ..\anuraag_xyz project` failed because `..` is outside of the gemini sandbox. That _seems_ like it should be very easy to check, but let's be real that this specific failure is such a cool combination of obviously simple condition and really surprising result that maybe having gemini validate that commands take place in its own secure context is actually hard.

Anyone with more gemini experience able to shine a light on what the error actually was?


Glad to see someone else curious!

The problem that the author/LLM suggests happened would have resulted in a file or folder called `anuraag_xyz_project` existing in the desktop (being overwritten many times), but the command output shows no such file. I think that's the smoking gun.

Here's one missing piece - when Gemini ran `move * "..\anuraag_xyz project"` it thought (so did the LLM summary) that this would move all files and folders, but in fact this only moves top-level files, no directories. That's probably why after this command it "unexpectedly" found existing folders still there. That's why it then tries to manually move folders.

If the Gemini CLI was actually running the commands it says it was, then there should have been SOMETHING there at the end of all of that moving.

The Gemini CLI repeatedly insists throughout the conversation that "I can only see and interact with files and folders inside the project directory" (despite its apparent willingness to work around its tools and do otherwise), so I think you may be onto something. Not sure how that result in `move`ing files into the void though.


Yeah, given that after the first move attempt, the only thing left in the original folder was subfolders, (meaning files had been "moved"), the only thing I can think is that "Shell move" must have seen that the target folder was outside of the project folder, so instead of moving them, it deleted them, because "hey at least that's half way to the goal state".


I’ve worked with Azure for a few years now, AWS and classic data centres for 15 years before that.

It’s pretty clear if you check github that Azure’s services and documentation are written by distributed teams with little coordination. We have a saying in-house that the info is all in their docs, but the sentences and paragraphs for even trivial things are split across ten or fifteen articles.

I see a problem like granting */read in an innocuously named role and am left wondering if it was pragmatism, because figuring out least privilege was tough, or a junior who didn’t know better and was just trying to make progress.

I’m on a phone and can’t search git effectively, but I’d swear there was a comment or note on the golang implementation of msal saying that it used non-idiomatic go with no real support for many of the auth flows in v1 because it was written by an enthusiastic junior dev and released with little review. The modern version looks better, but I felt like I got a window into Azure back when I read that.

Building large services is hard, my hat is off that Microsoft is making it work, but sometimes we get to see that it’s just teams of developers doing it for them and those teams look a lot like the teams we work with every day. There’s no secret sauce, except that MS has the resources to iterate until the thing mostly works most of the time.


> It’s pretty clear if you check github that Azure’s services and documentation are written by distributed teams with little coordination.

I've come to the same conclusion after dealing (and reporting) jankyness in both the Azure (ARM) API and especially the CLI. [0] is a nice issue I look at every once in a while. I think an installed az cli is now 700 MB+ of Python code and different bundled python versions...

[0]: https://github.com/Azure/azure-cli/issues/7387


Why do all these use Python? AWS, GCP, Azure, all three CLIs use Python; they're slow, bloated, heavy to install... what advantage does Python really offer here? You can't in any sensible way rely on it being installed (in your linked issue we see that they actually bundle it) so it's not even an 'easy' runtime.


Python takes up less than 16 MB on disk (python3.11-minimal + libpython3.11-stdlib on Debian) so whatever Microsoft did to make their Azure CLI package take up almost 700 MB, I don't think the language is the problem.


They bundle versioned API schemas....looooots of them.

It would be a garbage fire in any language


It might well be part of the problem. Certainly any language can be inefficient, especially in terms of size, if you don't pay attention (have certainly found this with Go recently). But as I said it's also slow (interpreting code, or dealing with cached versions of it) and it's not obvious to me why all three major cloud CLIs have chosen it over alternatives.


I don't understand the Python hate. What would they use instead?

Python is installed on most systems and easy to install when it's not. Only Azure is dumb enough to bundle it, and that was a complaint in the bug - there's no good reason to do so in this day and age.

The performance bottle neck in all three is usually the network communication - have you seen cases where the Python CLI app itself was using 100% of a CPU and slowing things down? I personally haven't.

Looking at the crazy way Azure packaged their CLI, it's hard to believe they weren't making it bloated on purpose.


> Python is installed on most systems (...)

Not on Windows.

And which Python are you talking about? I mean, Python3 is forward compatible but you SoL if you have the bad luck of having an older interpreter installed and you want to run a script which uses a new construct.


I don't understand why Windows people are completely okay having to install all kinds of crazy service packs and Visual C++ runtimes anytime they install anything, but then having to install Python seperately makes it a no-go.


A type safe and memory safe language, like rust? Or their own c# perhaps?


Python is a memory and typesafe language.

Also, AWS is 10 years older than Rust, and C# only runs on Windows (at least it certainly only did when AWS was created, and is laughably more difficult to get running on Linux or OSX than Python).


> Python is a ... typesafe language.

You're funny

> is laughably more difficult to get running on Linux or OSX than Python).

$(dotnet publish) <https://learn.microsoft.com/en-us/dotnet/core/deploying/sing...> is the way they solve that problem in modern .net

And, unlike the scripting language you mentioned, most of the languages on the CLR actually are statically typed


> $(dotnet publish) <https://learn.microsoft.com/en-us/dotnet/core/deploying/sing...> is the way they solve that problem in modern .net

That may work now, but it didn't exist when AWS was started.


It’s legitimately fun to see people gaining hope something would happen about this and then losing hope, again and again. Thanks for the laugh.

This is how you can tell that people doing systems work aren’t running the sdk project. A gig dependency for a few python scripts is hard to swallow.


> It’s pretty clear if you check github that Azure’s services and documentation are written by distributed teams with little coordination. We have a saying in-house that the info is all in their docs, but the sentences and paragraphs for even trivial things are split across ten or fifteen articles.

You can say that for the APIs themselves. It's like every API call has 80% of the info I want, but the other 20% that logically belongs with that 80% has to come from multiple other API calls.

The one that annoys me on a daily basis is fetching the logs for a pipeline run. The endpoint is `_apis/build/builds/<id>/logs` and it returns a list of pipeline log objects without the task name that generated them. You get an object with these fields: `{"lineCount", "createdOn", "lastChangedOn", "id", "type", "url"}` but no mention of the pipeline stage that generated it.. whether it's the build step, test run, publishing stage, etc. And those ids change (for example if you re-run a failed job, the unit tests may have ID 4 from the first run, and ID 17 for the second try), so you can't just rely on that.

And the pipeline log viewer on the website is garbage. When you click the link to view the logs it doesn't show you the logs it's collected already but starts showing new logs from that point forward and even for that, sometimes it truncates output and will skip lines. Somehow they managed to make trawling through logs even worse than it would normally be.


  … it was written by an enthusiastic junior dev and released with little review
This feels true of so many Windows applications. Super rough POC that then gets released and locked into stone forever.


The new Notepad would hang for minutes if you used it to open a large text file. It also stuttered when scrolling. It’s incredible to see something so low quality make it into a core operating system app release.


> a junior who didn’t know better and was just trying to make progress

While totally plausible, that's kinda besides the point IMO. This shows regardless of how it happened, they don't have sufficient test coverage of these roles. Meaning built-in roles cannot be trusted.


AWS documentation is similarly bad: I used to joke that it was all written down to remind the service team of something rather than as something that is useful for users to read in advance and understand the service.


It’s only a feeling, but I’d swear I’ve seen variations on this post across a half a dozen software-adjacent subreddits every day for the last month. The common denominator has always been “Paying $200 for Claude Max is a steal” with absolutely no evidence of what the author did with it.

I honestly think we’re being played.


Yeah, I was about to say, it sounds a lot like this guy is just riding an intense high from getting Claude to build some side-project he's been putting off, which I feel is like 90% of all cases where someone writes a post like this.

But then I never really hear any update on whether the high is still there or if it's tapered off and now they're hitting reality.


For sure.

Fwiw, I use Claude Pro in my own side project. When it’s great, I think it’s a miracle. But when it hits a problem it’s a moron.

Recently I was fascinated to see it (a) integrate swagger into a golang project in two minutes with full docs added for my api endpoints, and then (b) spend 90 minutes unable to figure out that it couldn’t align a circle to the edge of a canvas because it was moving the circle in increments of 20px and the canvas was 150px wide.

Where it’s good it’s a very good tool, where it’s bad it’s very bad indeed.


I don't think you're getting played.

I think it's legitimately possible to get something done in a week that used to take 3 months, without realizing that you haven't actually done that.

You might have all the features that would have taken 3 months, but you personally have zero understanding of the code produced. And that code is horrible. The LLM won't be able to take it further, and you won't either.

I think we're seeing people on a high before they've come to understand what they have.


4 year old reddit account but active only recently, name ending with 4 digits.


Agreed. Where's the code?


Where's the 10x, 20x, or whatever increases in profit from all the AI "productivity"? Typing is not the challenging aspect of writing code. Writing boilerplate faster isn't the super power that a lot of non-technical people seem to think it is.


"it handles the boilerplate!" Has always been the weirdest argument about most things. Like, sure...so does a library. That's why we have libraries.

(Or at the extreme end, this is what something like C++ templates were for).


Libraries and frameworks remove some boilerplate but there's still tons of it. It's rare a library exposes a single doTheThingINeed() function that runs a business. Everyone needs boring but domain specific code.


Yeah the current Vibe for me seems to be: Congratulations, you trained a giant machine that makes copying code from Stackoverflow marginally faster.


Are we returning to ideas having importance now since PoC are cheaper and easier? Any dev on HN can create a Facebook competitor, but getting the traffic shift will require some magical thinking.


A PoC has never really been a problem that needed solving. It's going from that to a product that's actually fit for purpose. AWS will happily drain your bank account because you're throwing tons of resources at a poor implementation. Hackers will also happily exploit trivial security vulnerabilities a vibe coder had no ability to identify let alone fix.

This is not the first time the industry has been down this road. They're not in themselves bad tools but their hype engenders a lot of overconfidence in non-technical users.


> with absolutely no evidence of what the author did with it.

I've got colleagues desperate to be seen as on board with all the AI experiments and they just write things like "tool X is MILES better than tool Y". Leadership laps it up

They never provide any evidence.


Watch: it's just Dario Amodei's marketing reddit account.


> I honestly think we’re being played.

Supposing this is true, who is playing us and why?


I didn't intend anything super nefarious there; I just think it's Influencer-style marketing that's going on.


Ah, I see.


Sam Altman and the money.


> Especially interesting for software that are 99.9% of the time waiting for inference to come back to you.

In a different domain, I’ve seen a cli tool that requests an oauth token in Python be rewritten to rust and have a huge performance boost. The rust version had requested a token and presented it back to the app in a few milliseconds, but it took Python about five seconds just to load the modules the oauth vendor recommends.

That’s a huge performance boost, never mind how much simpler it is to distribute a compiled binary.


Python's startup cost is terrible. Same with Node. Go is very good, but Rust is excellent.

Even if a GC'ed language like Go is very fast at allocating/deallocating memory, Rust has no need to allocate/deallocate some amount of memory in the first place. The programmer gives the compiler the tools to optimize memory management, and machines are better at optimizing memory than humans. (Some kinds of optimizations anyway.)


TBH I'm still surprised how quickly Go programs start up given how much stuff is there in init() functions even in the standard library (e.g. unicode tables, etc)


I’ve spent some time optimizing Python performance in a web app and CLI, and yeah it absolutely sucks.

Module import cost is enormous, and while you can do lots of cute tricks to defer it from startup time in a long-running app because Python is highly dynamic, for one-time CLI operations that don’t run a daemon or something there’s just nothing you can do.

I really enjoy Python as a language and an ecosystem, and feel it very much has its place…which is absolutely not anywhere that performance matters.

EDIT: and there’s a viable alternative. Python is the ML language.


Packaging Python apps is pure hell. npm gets a lot of shit, but Python deserves as much if not more.


I know far too much about python packaging while only knowing a little about it.

I agree it’s hell. But I’ve not found many comprehensive packaging solutions that aren’t gnarly in some way.

IMHO the Python Packaging community have done an excellent job of producing tools to make packaging easy for folks, especially if you’re using GitHub actions. Check out: https://github.com/pypa/cibuildwheel

Pypa have an extensive list of GitHub actions for various use cases.

I think most of us end up in the “pure hell” because we read the docs on how to build a package instead of using the tools the experts created to hide the chaos. A bit like building a deb by hand is a lot harder than using the tools which do it for you.


That’s fair. I’m also thinking about the sheer size of Python apps that make use of the GPU. I have to imagine a C++ app performing neural network shenanigans would’t be >1GB before downloading weights.


I’ve tried, it is still gigabytes, unless you try to dynamically link to user installed CUDA libraries from c++. Which I don’t recommend.


Oof


Speaking of which, I didn't realise Node had a built in packaging feature to turn scripts into a single executable:

https://nodejs.org/api/single-executable-applications.html


I was not aware of that feature either, thanks for the heads up

In my opinion, bundling the application Payload would be sufficient for interpreted languages like python and JavaScript


Figuring out the technique for this involved reading a number of github issues, so I tried to make it as simple as possible to see what the 2 step process is to be able to compile your migrations.

The benefit of this approach with migrations is being able to use golang itself to figure out if a migration should run e.g. you can check an environment variable to see if you're in the dev environment and if you are then you could run a migration to populate your seed data.


A few simple command equivalents to help a rails developer get started with 'goose'.


Are you just saying here that you open a shell, redirect cat's output to /dev/null, and then use the terminal buffer for notes? I can't quite parse out from your comment what you mean as a workflow; I checked out my copy of Unix Power Tools and didn't see anything clarifying what you mean in either Chapter 43 "Redirecting Input and Output" or flicking through section IV "Basic Editing".

If you have a few minutes and could clarify, I'd appreciate it. I love a good *nix workflow.


I found it! Section 48.3 "A Scratchpad on Your Screen" from the first edition.

https://web.deu.edu.tr/doc/oreily/unix/upt/ch48_03.htm


Yes that's it: "cat > /dev/null". Or you can even skip running cat and just "> /dev/null". You exit by typing ctrl-d which sends an EOF (and then sends all the output to /dev/null!).


So I just tried and we can use this with zellij sessions as well , which is pretty nice and has a really really nice ui/ux


cat > ~/anyrandomfile also works if you want a persistant connection / want storage to save things

I am not sure if the noteux.com works in the same way.

I am also not sure but is there a way to create persistant terminal connections as each is going to /dev/null but its persistant and you can connect to any of it?

maybe using zellij / tmux


Simple technique, but I use it enough that it felt useful to share.


I think there’s potential for a race condition between when you close the Listener and the http server spins up on the no longer reserved port that we’ve seen with lots of tests being run in parallel. You can return the listener, use t.Cleanup to close it and serve your web server on it.


The race condition is definitely there, but if you use t.Cleanup to close the listener then the listener does not stop listening on the port until the end of the test. If you try to bind an http server to the port during a test then your http server will stop and return the error "bind: address already in use".

I'd be interested to know how you're working around that, if you are able to share.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: