As a followup on LLama2's prompting, here's a good thread with some more details...

As a followup on LLama2's prompting, here's a good thread with some more details: https://www.reddit.com/r/LocalLLaMA/comments/155po2p/get_lla... (see also: https://www.reddit.com/r/LocalLLaMA/comments/1561vn5/here_is... - it's a bit complex and people are still wrapping their heads around it)

Note, it is possible to reinject <<SYS>> prompts during the conversation to keep the LLM on target: https://twitter.com/overlordayn/status/1681631554672513025 (but obviously this should be the first thing you filter and track if you are running an LLM that is accessible to end-users - rebuff is a lib w/ some good ideas to start with)

With new fine tunes coming out, it's worth noting again that different datasets/models all use slightly different prompt formats (many are using multiple datasets these days, so it's hard to say just how much it matters now): https://www.reddit.com/r/LocalLLaMA/comments/13lwwux/comment...

Ideally, fine tunes for LLama2 would regularize all datasets to the official tokens/format and inference interfaces could standardize too (or there could be some metadata collected for what model uses what, but some of the low hanging fruit that's out for the future I guess). One other thing to keep in mind is that all the benchmarks/leaderboards don't use instruct formatting at all, and don't really represent capabilities for a model. IMO, ELO-style rankings against models for specific tasks would probably be more representative.