I have MLCCHAT on my old Note 9 phone. It is actually still a great phone, but has 5GB RAM. Running an on device model is the first and only use case the RAM actually matters.
And it has a headphone jack, OK? I just hate Bluetooth earbuds. And yeah, it isna problem, but I digress.
When I run a 2.5B model, I get respectable output. Takes a minute or two to process the context, then output begins at somewhere on the order of 4 to 10 tokens per sec.
So, I just make a query and give it a few and I have my response.
Here is how I see it:
That little model, which is Gemma 2.2b sorry, knows a lot of stuff. It has knowledge I don't and it gives it to me in a reasonable, though predictable way. Answers are always of a certain teacher reminding student how it all goes way.
I don't care. Better is nice, but if I were stuck somewhere with no network, being able to query that model is amazing!
First aid, how to make fires, materials and uses. Fixing stuff, theories of operation, what things mean and more are in that thing ready for me to take advantage of.
I consider what I have fast. And it will get one or two orders faster over the next few years too.
I did it on a lark (ask the model what that means) and was surprised to see I gained a nice tool.
> First aid, how to make fires, materials and uses
This scares me more than it should...
Please do not trust an AI in actual life and death situations... Sure if it is literally your only option, but this implies you have a device on you that could make a phone call to an emergency number where a real human with real training and actually correct knowledge can assist you.
Even as an avid hiker the amount of times I've been out off cell service is miniscule and I absolutely refresh my knowledge on first aid regularly and any potential threats before a hike somewhere new.
I fired the model up at 8:39, ready to chat just a bit before 8:40, during which I input the following prompt:
Please define "lark" for me, as in "I did it on a lark."
Output began a bit before 8:41, and was complete before 8:43:
---Begin Model Output
"On a lark" means without much planning or reason, impulsively or unexpectedly.
Here's how it works:
"I did it on a lark." This means the person acted without a specific plan or reason.
"A lark" refers to the lark bird, a small songbird known for its playful nature.
So, "on a lark" suggests something spontaneous, casual, and often with a touch of playful whimsy.
Here are some other ways to understand it:
Without much thought: The person was likely to do it quickly and without much consideration.
Based on a whim: It's something they decided to do without a solid plan.
You know that's not the point. You can't even ask the dictionary app a question. You more or less just input and it'll give you what it has.
I can ask this model some pretty intense things and I get meaningful output back and it doesn't take very long and I don't need an internet connection. Like I said it's spiffy.
I will, have, and will use it again sans network connection and be happy I had the option.
Better will come. That is just fine. I have something very useful today, right now.
Anyone who has a/the need for or understands the value of a local LLM would be OK with this kind of output.