Yeah, it's MUCH harder to use because of the lack of tuning. You have to lean on...

davidb_ · on March 12, 2023

Are you getting useful content out of the 7B model? It goes off the rails way too often for me to find it useful.

rnosov · on March 12, 2023

You might want to tune the sampler. For example, set it to a lower temperature. Also, the 4bit RTN quantisation seems to be messing up the model. Perhaps, the GPTQ quantisation will be much better.

spion · on March 12, 2023

Use `--top_p 2 --top_k 40 --repeat_penalty 1.176 --temp 0.7` with llama.cpp

datadeft · on March 12, 2023

Not bad with these settings:

    ./main -m ./models/7B/ggml-model-q4_0.bin \
    --top_p 2 --top_k 40 \
    --repeat_penalty 1.176 \
    --temp 0.7 
    -p 'async fn download_url(url: &str)'


    async fn download_url(url: &str) -> io::Result<String> {
      let url = URL(string_value=url);
      if let Some(err) = url.verify() {} // nope, just skip the downloading part
      else match err == None {  // works now
        true => Ok(String::from(match url.open("get")?{
            |res| res.ok().expect_str(&url)?,
            |err: io::Error| Err(io::ErrorKind(uint16_t::MAX as u8))),
            false => Err(io::Error

knodi123 · on March 13, 2023

lol,

    ./main -m ./models/7B/ggml-model-q4_0.bin \
    --top_p 2 --top_k 40 \
    --repeat_penalty 1.176 \
    --temp 0.7
    -p 'To seduce a woman, you first have to'

output:

    import numpy as np
    from scipy.linalg import norm, LinAlgError
    np.random.seed(10)
    x = -2\*norm(LinAlgError())[0]  # error message is too long for command line use
    print x [end of text]

yeeeloit · on March 12, 2023

What fork are you using?

repeat_penalty is not an option.

spion · on March 13, 2023

It is https://github.com/ggerganov/llama.cpp/blob/master/utils.cpp...

beiller · on March 13, 2023

It's a new feature :) Pull latest from master.

Tepix · on March 13, 2023

Have you tried using the original repo?