There a lot of optimizations that can be done. Here's one w/ potentially a 15X AVX speedup for example: https://github.com/ggerganov/llama.cpp/pull/996
There a lot of optimizations that can be done. Here's one w/ potentially a 15X AVX speedup for example: https://github.com/ggerganov/llama.cpp/pull/996