Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

GPT-4 is a scary improvement over 3.5, especially for handling code. It will be the literal definition of awesome when these models get a large enough context space to hold a small-medium sized codebase.

I've been playing around with it for an hour seeing what it can do to refactor some of the things we have with the most tech debt, and it is astounding how well it does with how little context I give it.



There are already some cool projects that help LLM go beyond the context window limitation and work with even larger codebases like https://github.com/jerryjliu/llama_index and https://github.com/hwchase17/langchain.


The fundamental techniques that they use are highly lossey and are far inferior to ultra-long context length models where you can do it all in one prompt. Hate to break it to you and all the others.


> Hate to break it to you and all the others.

Jeez. Their comment is quite obviously a complementary one in response to the limitation rather than a corrective one about the limitation.


The methods they employ are to improve the context being given to the model irrespective of the context length. Even when the context length improves these methods will be used to decrease the search space and resources required for a single task (think about stream search vs indexed search).

I’m also curious what paper you are referencing that finds that more context vs more relevant context yields better results?

A good survey of the methods for “Augmented Language Models” (CoT, etc.) is here: https://arxiv.org/pdf/2302.07842.pdf


Where can someone find and try ultra-long context length models?

Any links?


The longest one that is generally available is always going to be yourself :)


My context model is getting shorter and fuzzier.


… but still the weights are increasing ;)


The only thing holding this back now is lack of enough context. That’s the big nut to crack. How do you hold enough information in memory at once?


GPT4 supports 32k tokens, which I guesstimate would be ~25k code tokens and perhaps ~2.5k lines of code. So already enough to work on a whole module of code at the same time. If you're using small microservices, it might already be good enough.


Also if you were doing it personally you'd probably not take the whole codebase, but look at function signatures/docs as you get further away from the code you care about. While there's probably benefits in clever systems for summarising and iteratively working on the code, you can get away with just cramming a load of context in.

I have been playing with GPT4 and it's good. It's easy to imagine a bunch of use cases where even the maximal cost of a few dollars ($1.80 ish for the context alone) for a single query is worth it. If it's good enough to save you time, it's very easy to cross the threshold where it's cheaper than a person.


The 32k tokens version should do fairly well on such codebase. I don’t know if it’s the one used in ChatGPT.


I’m not even kidding when I say I just did 6h or so of work in around 45 minutes


I'm worried it's going to decimate the already wobbly jr market - who are the sr devs of tomorrow to wrangle the LLM output after it replaces the jr's job of coding all day?


All programmers that use AI just got a boost. I tried it and it is amazing, it writes complex code on demand like I never seen before. But competition also gets the boost. So now we have to compete with devs armed with GPT-4. Humans are still the differentiating factor, what company will use AI better?


Certainly it will.


Do you let it refactor your code base? How do you feed the code to it? Just copy paste?


Yup, it’s a bit of a learning curve how to get it to output the most useful stuff with the least amount of prompts.

Figure out what your goal is, and then start by giving a wide context and at first it will give wrong answers due to lack of context. With each wrong answer give it some more context fro what you think the solution provide is missing the most. Eventually you get 100% working code, or something so close to it that you can easily and quickly finish it yourself.

This is one strategy, but there are many that you can use to get it to reduce the burden of refactoring.


I can't get it to run a single query today. Are you on the paying plan?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: