the question is: how does the prompt processing time on this compare to M3 Ultra because that one sucks at RAG even though it can technically handle huge models and long contexts...
Prompt processing time on Apple Silicon might benefit from making use of the NPU/Apple Neural Engine. (Note, the NPU is bad if you're limited by memory bandwidth, but prompt processing is compute limited.) Just needs someone to do the work.