Does this support context? Like can you hold conversations with it, or is it jus...

qeternity · on March 28, 2023

Models don't support context. You feed prior prompt/response pairs back into the input.

sterlind · on March 29, 2023

I misspoke. My question is whether the model was fine-tuned on multiple prompt/response pairs (with later prompts referencing earlier ones) or only on one-shot prompts. I expect fine-tuning on e.g.:

User: Pick an animal. Bot: Horse User: How many legs does it have? Bot: 4

to be different from just fine-tuning it on User: Pick an animal. Bot: Horse

Especially when it comes to e.g. training the bot to ask follow-up questions, etc.

tlonny · on March 28, 2023

How does one find out what the max input size is?

BoorishBears · on March 29, 2023

Search <insert model name> context window size.

For LLaMA it's 2048 tokens

MuffinFlavored · on March 29, 2023

what is it for chatgpt?

atestu · on March 29, 2023

4k for gpt-3.5 and 8k-32k for gpt4 (there are multiple versions: https://platform.openai.com/docs/models/gpt-4)

alchemist1e9 · on March 29, 2023

Are you sure there isn’t some additional tricks?

Is it possible they are doing summarization when you use the web interface to gpt4?

Because I have some extremely long conversations and I ask it if it remembers the beginning of the conversation and it does.

dragonwriter · on March 29, 2023

32k tokens is a novella to short novel length, so it can handle an extremely long conversation, but, sure, its possible they are using summarization and/or storage-and-retrieval hacks to extend beyond what fits in the context space naively.

BoorishBears · on March 29, 2023

You can always implement summarization, langchain for example support this. But that's not changing how many tokens the model itself is consuming

d4rkp4ttern · on March 29, 2023

Somehow I doubt there is any type of “prompt compression” being done.

BoorishBears · on March 29, 2023

It's not an uncommon technique so I'm not sure why you're so doubtful.

d4rkp4ttern · on March 29, 2023

Because it introduces a possibly recursive application of the model itself and is a bit messy in that there would be some loss of information. It certainly would be a good technique for external applications, though.

Anyway, I got some clarification from ChatGPT itself:

Question: In a growing conversation, does ChatGPT create summarized version of the conversation history and send it to the model?

Answer:

In a growing conversation, ChatGPT does not create a summarized version of the entire conversation history and send it to the model. Instead, the model only uses the most recent turns of the conversation to generate a response to the current turn.

However, some implementations of ChatGPT do use a technique called "history trimming" to limit the size of the context window that the model considers. History trimming involves discarding some of the oldest turns of the conversation to keep the context window within a manageable size. This allows the model to focus on the most recent turns of the conversation, which may be more relevant to the current turn.

For example, OpenAI's GPT-3 API includes a feature called "context chunking" that automatically trims the conversation history to a maximum length of 2048 tokens. If the conversation history exceeds this length, the oldest turns are discarded to keep the context window within the maximum length.

Overall, while ChatGPT does not create a summarized version of the entire conversation history, some implementations may use history trimming to limit the size of the context window and improve the model's performance.

Guillaume86 · on March 29, 2023

Why would you trust what ChatGPT answered you?

Here's what I got out of it:

Q: In a growing conversation, does ChatGPT create a summarized version of the conversation history to build the prompt when it's getting too big for its context size?

A: Yes, ChatGPT may create a summarized version of the conversation history when it's getting too big for its context size. This is because the GPT-3.5 architecture used by ChatGPT has a limited context size, which means that it can only take into account a certain number of previous messages in the conversation when generating a response.

To work around this limitation, ChatGPT may use a summarization algorithm to condense the conversation history into a more manageable size, while still preserving the key points and important information. This summarization process may involve techniques such as text clustering, topic modeling, or text summarization algorithms like BERTSum, T5, or GPT itself.

Once the conversation history has been summarized, ChatGPT can use the condensed version as the prompt to generate a response that is more focused and relevant to the current topic of discussion.

BoorishBears · on March 29, 2023

> recursive application of the model itself

That's how chat with LLMs works. The LLM is not stateful, with every response you need to submit the entirety of the past conversation including its own past output.

> a bit messy in that there would be some loss of information

There has to be a loss of information either way. Their API calls for the web interface point very strongly toward enabling summarization once you exceed a given length.

> Anyway, I got some clarification from ChatGPT itself

As the other comment points out, what you got was a complete hallucination.

"Context chunking" is not a thing, there's a context window and if you submit more than it can fit, the API returns an error response. If you almost fill the window, it returns what it can and but raises an error flag.

There's a system prompt, but ChatGPT is using that to kick off the conversation, you can see in the API calls that even your first message is submitted a user message, and storing that system prompt is still external to the LLM

d4rkp4ttern · on March 30, 2023

Wow yes point taken. In the absence of any definitive description of what it actually does, I supposed we have to piece together info from the API docs and possibly the InstructGPT paper

throwayyy479087 · on March 29, 2023

Where a token is roughly equal a word?

axutio · on March 29, 2023

1,000 tokens ≈ 750 words

famouswaffles · on March 29, 2023

one token is about 3/4 words

barbazoo · on March 28, 2023

Oh is that how it works under the hood?

vdfs · on March 29, 2023

It's not a secret, in OpenAI api you have to keep sending previous question and responses on top of your new question, essentially you are asking new question but give it more context with the previous questions and answers

d4rkp4ttern · on March 29, 2023

Thank you for clarifying this mystery. My mental model of how ChatGPT works is now clearer. I was somehow thinking that in chat mode it would either (a) need to “update state”, or (b) be fed increasingly longer history. I thought (b) would slow down later responses but I guess with their ginormous compute it’s not perceptible to users.

Datagenerator · on March 29, 2023

Someone made a version with a buffer which only has a fixed size context history so it stays in constant speed mode. It will only focus on the latest questions.

qeternity · on March 29, 2023

Yes. The chat functionality is really just a clever trick. All LLMs are still just predicting the next token in a sequence (i.e. finishing the sentence).

Veen · on March 29, 2023

Yes, it's why in longer conversations it might forget what was said earlier in the thread. The token limits still apply.