Based on my limited runs, I think 4 bit quantization is detrimental to the outpu...

enduser · on March 12, 2023

Yes, I have found 65B quantized to be more nonsensical than 13B unquantized.

cypress66 · on March 13, 2023

The performance loss is because this is RTN quantization I believe. If you use the "4chan version" that is 4bit GPTQ, the performance loss from quantization should be very small.

xdennis · on March 13, 2023

What's the 4chan version?

aseipp · on March 13, 2023

See https://github.com/ggerganov/llama.cpp/issues/62 (the related repo was originally posted on 4chan, is all, but the code is on GitHub)

cypress66 · on March 13, 2023

https://rentry.org/llama-tard-v2

crdrost · on March 13, 2023

The OP article eventually gets around to demonstrating the model and it is similarly bad, zooming from George Washington to the purported physical fitness of Donald Trump?