Meta has an in-house accelerator that the Triton inference engine supports (whic...

Meta has an in-house accelerator that the Triton inference engine supports (which they use almost exclusively for their fake content/fake profiles project). Triton is legacy software and, afaik, does not have a Vulcan backend, so Meta may be locked out of better options until it does.

That doesn't stop Meta's Llama family of models running on anything and everything _outside_ of Meta, though. Llama.cpp works on everything, for example, but Meta doesn't use it.