https://github.com/feizc/Visual-LLaMA
Someone claims to be working on a modified LLaMA model to understand images. He also appears to be working on a large number other projects, so it could be pure vapor.