If you described Speculative Decoding, 3bit Quantization, and Adapter based weig...

reqo · on June 14, 2024

I am curious, what do you mean by "Adapter based weight selection"?

supriyo-biswas · on June 14, 2024

That's apparently Apple's term for LoRA[1].

jimmySixDOF · on June 14, 2024

Sort of but Adapters allow for multiple weight adjustments (think loras) for specific skills so it is more like extra optimized mixture of experts or multi agent approach. They have a slide with adapters listed like summarization, prioritization, tone (happy, business, etc), editor, etc) -- this is not to be mixed up with Intents which is how on device apps publish their capabilities to the Intelligence system for real npu os level multi agent tool use.