Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, it's MUCH harder to use because of the lack of tuning.

You have to lean on much older prompt engineering tricks - there are a few initial tips in the LLaMA FAQ here: https://github.com/facebookresearch/llama/blob/main/FAQ.md#2...



Are you getting useful content out of the 7B model? It goes off the rails way too often for me to find it useful.


You might want to tune the sampler. For example, set it to a lower temperature. Also, the 4bit RTN quantisation seems to be messing up the model. Perhaps, the GPTQ quantisation will be much better.


Use `--top_p 2 --top_k 40 --repeat_penalty 1.176 --temp 0.7` with llama.cpp


Not bad with these settings:

    ./main -m ./models/7B/ggml-model-q4_0.bin \
    --top_p 2 --top_k 40 \
    --repeat_penalty 1.176 \
    --temp 0.7 
    -p 'async fn download_url(url: &str)'


    async fn download_url(url: &str) -> io::Result<String> {
      let url = URL(string_value=url);
      if let Some(err) = url.verify() {} // nope, just skip the downloading part
      else match err == None {  // works now
        true => Ok(String::from(match url.open("get")?{
            |res| res.ok().expect_str(&url)?,
            |err: io::Error| Err(io::ErrorKind(uint16_t::MAX as u8))),
            false => Err(io::Error


lol,

    ./main -m ./models/7B/ggml-model-q4_0.bin \
    --top_p 2 --top_k 40 \
    --repeat_penalty 1.176 \
    --temp 0.7
    -p 'To seduce a woman, you first have to'
output:

    import numpy as np
    from scipy.linalg import norm, LinAlgError
    np.random.seed(10)
    x = -2\*norm(LinAlgError())[0]  # error message is too long for command line use
    print x [end of text]


What fork are you using?

repeat_penalty is not an option.



It's a new feature :) Pull latest from master.


Have you tried using the original repo?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: