I have MLCCHAT on my old Note 9 phone. It is actually still a great phone, but has 5GB RAM. Running an on device model is the first and only use case the RAM actually matters.
And it has a headphone jack, OK? I just hate Bluetooth earbuds. And yeah, it isna problem, but I digress.
When I run a 2.5B model, I get respectable output. Takes a minute or two to process the context, then output begins at somewhere on the order of 4 to 10 tokens per sec.
So, I just make a query and give it a few and I have my response.
Here is how I see it:
That little model, which is Gemma 2.2b sorry, knows a lot of stuff. It has knowledge I don't and it gives it to me in a reasonable, though predictable way. Answers are always of a certain teacher reminding student how it all goes way.
I don't care. Better is nice, but if I were stuck somewhere with no network, being able to query that model is amazing!
First aid, how to make fires, materials and uses. Fixing stuff, theories of operation, what things mean and more are in that thing ready for me to take advantage of.
I consider what I have fast. And it will get one or two orders faster over the next few years too.
I did it on a lark (ask the model what that means) and was surprised to see I gained a nice tool.
> First aid, how to make fires, materials and uses
This scares me more than it should...
Please do not trust an AI in actual life and death situations... Sure if it is literally your only option, but this implies you have a device on you that could make a phone call to an emergency number where a real human with real training and actually correct knowledge can assist you.
Even as an avid hiker the amount of times I've been out off cell service is miniscule and I absolutely refresh my knowledge on first aid regularly and any potential threats before a hike somewhere new.
And it has a headphone jack, OK? I just hate Bluetooth earbuds. And yeah, it isna problem, but I digress.
When I run a 2.5B model, I get respectable output. Takes a minute or two to process the context, then output begins at somewhere on the order of 4 to 10 tokens per sec.
So, I just make a query and give it a few and I have my response.
Here is how I see it:
That little model, which is Gemma 2.2b sorry, knows a lot of stuff. It has knowledge I don't and it gives it to me in a reasonable, though predictable way. Answers are always of a certain teacher reminding student how it all goes way.
I don't care. Better is nice, but if I were stuck somewhere with no network, being able to query that model is amazing!
First aid, how to make fires, materials and uses. Fixing stuff, theories of operation, what things mean and more are in that thing ready for me to take advantage of.
I consider what I have fast. And it will get one or two orders faster over the next few years too.
I did it on a lark (ask the model what that means) and was surprised to see I gained a nice tool.