Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Does this support context? Like can you hold conversations with it, or is it just instruct-trained rather than chat-trained?


Models don't support context. You feed prior prompt/response pairs back into the input.


I misspoke. My question is whether the model was fine-tuned on multiple prompt/response pairs (with later prompts referencing earlier ones) or only on one-shot prompts. I expect fine-tuning on e.g.:

User: Pick an animal. Bot: Horse User: How many legs does it have? Bot: 4

to be different from just fine-tuning it on User: Pick an animal. Bot: Horse

Especially when it comes to e.g. training the bot to ask follow-up questions, etc.


How does one find out what the max input size is?


Search <insert model name> context window size.

For LLaMA it's 2048 tokens


what is it for chatgpt?


4k for gpt-3.5 and 8k-32k for gpt4 (there are multiple versions: https://platform.openai.com/docs/models/gpt-4)


Are you sure there isn’t some additional tricks?

Is it possible they are doing summarization when you use the web interface to gpt4?

Because I have some extremely long conversations and I ask it if it remembers the beginning of the conversation and it does.


32k tokens is a novella to short novel length, so it can handle an extremely long conversation, but, sure, its possible they are using summarization and/or storage-and-retrieval hacks to extend beyond what fits in the context space naively.


You can always implement summarization, langchain for example support this. But that's not changing how many tokens the model itself is consuming


Somehow I doubt there is any type of “prompt compression” being done.


It's not an uncommon technique so I'm not sure why you're so doubtful.


Because it introduces a possibly recursive application of the model itself and is a bit messy in that there would be some loss of information. It certainly would be a good technique for external applications, though.

Anyway, I got some clarification from ChatGPT itself:

Question: In a growing conversation, does ChatGPT create summarized version of the conversation history and send it to the model?

Answer:

In a growing conversation, ChatGPT does not create a summarized version of the entire conversation history and send it to the model. Instead, the model only uses the most recent turns of the conversation to generate a response to the current turn.

However, some implementations of ChatGPT do use a technique called "history trimming" to limit the size of the context window that the model considers. History trimming involves discarding some of the oldest turns of the conversation to keep the context window within a manageable size. This allows the model to focus on the most recent turns of the conversation, which may be more relevant to the current turn.

For example, OpenAI's GPT-3 API includes a feature called "context chunking" that automatically trims the conversation history to a maximum length of 2048 tokens. If the conversation history exceeds this length, the oldest turns are discarded to keep the context window within the maximum length.

Overall, while ChatGPT does not create a summarized version of the entire conversation history, some implementations may use history trimming to limit the size of the context window and improve the model's performance.


Why would you trust what ChatGPT answered you?

Here's what I got out of it:

Q: In a growing conversation, does ChatGPT create a summarized version of the conversation history to build the prompt when it's getting too big for its context size?

A: Yes, ChatGPT may create a summarized version of the conversation history when it's getting too big for its context size. This is because the GPT-3.5 architecture used by ChatGPT has a limited context size, which means that it can only take into account a certain number of previous messages in the conversation when generating a response.

To work around this limitation, ChatGPT may use a summarization algorithm to condense the conversation history into a more manageable size, while still preserving the key points and important information. This summarization process may involve techniques such as text clustering, topic modeling, or text summarization algorithms like BERTSum, T5, or GPT itself.

Once the conversation history has been summarized, ChatGPT can use the condensed version as the prompt to generate a response that is more focused and relevant to the current topic of discussion.


> recursive application of the model itself

That's how chat with LLMs works. The LLM is not stateful, with every response you need to submit the entirety of the past conversation including its own past output.

> a bit messy in that there would be some loss of information

There has to be a loss of information either way. Their API calls for the web interface point very strongly toward enabling summarization once you exceed a given length.

> Anyway, I got some clarification from ChatGPT itself

As the other comment points out, what you got was a complete hallucination.

"Context chunking" is not a thing, there's a context window and if you submit more than it can fit, the API returns an error response. If you almost fill the window, it returns what it can and but raises an error flag.

There's a system prompt, but ChatGPT is using that to kick off the conversation, you can see in the API calls that even your first message is submitted a user message, and storing that system prompt is still external to the LLM


Wow yes point taken. In the absence of any definitive description of what it actually does, I supposed we have to piece together info from the API docs and possibly the InstructGPT paper


Where a token is roughly equal a word?


1,000 tokens ≈ 750 words


one token is about 3/4 words


Oh is that how it works under the hood?


It's not a secret, in OpenAI api you have to keep sending previous question and responses on top of your new question, essentially you are asking new question but give it more context with the previous questions and answers


Thank you for clarifying this mystery. My mental model of how ChatGPT works is now clearer. I was somehow thinking that in chat mode it would either (a) need to “update state”, or (b) be fed increasingly longer history. I thought (b) would slow down later responses but I guess with their ginormous compute it’s not perceptible to users.


Someone made a version with a buffer which only has a fixed size context history so it stays in constant speed mode. It will only focus on the latest questions.


Yes. The chat functionality is really just a clever trick. All LLMs are still just predicting the next token in a sequence (i.e. finishing the sentence).


Yes, it's why in longer conversations it might forget what was said earlier in the thread. The token limits still apply.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: