You can look at https://github.com/ggerganov/llama.cpp/blob/master/llama.cpp... for examples of the different layers in a number of different models, and further down in the code for their implementations. tldr, yes they are very similar. I can see lots of value in something that can just run these models. Even if you just supported llama2 there are tons of options available.