More and more I start to realize that cost saving is a small problem for local LLMs. If it is too slow, it becomes unusable, so much that you might as well use public LLM endpoints. Unless you really care about getting things done locally without sending information to another server.
With OpenAI API/ChatGPT, I get response much faster than I can read, and for simple question, it means I just need a glimpse of the response, copy & paste and get things done. Whereas on local LLM, I watch it painstakingly prints preambles that I don't care about, and get what I actually need after 20 seconds (on a fast GPU).
And I am not yet talking about context window etc.
I have been researching about how people integrate local LLMs in their workflows. My finding is that most people play with it for a short time and that's about it, and most people are much better off spending money on OpenAI credits (which can last a very long time with typical usage) than getting a beefed up Mac Studio or building a machine with 4090.
My tooling doesn't measure TPS yet. It feels snappy to me on MLX.
I agree that hosted models are usually a better option for most people - much faster, higher quality, handle longer inputs, really cheap.
I enjoy local models for research and for the occasional offline scenario.
I'm also interested in their applications for journalism, specifically for dealing with extremely sensitive data like leaked information from confidential sources.
>I'm also interested in their applications for journalism, specifically for dealing with extremely sensitive data like leaked information from confidential sources.
Think it is NOT just you. Most company with decent management also would not want their data going to anything outside the physical server they have in control of. But yeah for most people just use an app and hosted server. But this is HN,there are ppl here hosting their own email servers, so shouldn't be too hard to run llm locally.
Yeah, this has been confusing me a bit. I'm not complaining by ANY means, but why does it suddenly feel like everyone cares about data privacy in LLM contexts, way more than previous attitudes to allowing data to sit on a bunch of random SaaS products?
I assume because of the assumption that the AI companies will train off of your data, causing it to leak? But I thought all these services had enterprise tiers where they'll promise not to do that?
Again, I'm not complaining, it's good to see people caring about where their data goes. Just interesting that they care now, but not before. (In some ways LLMs should be one of the safer services, since they don't even really need to store any data, they can delete it after the query or conversation is over.)
Laundering of data through training makes it a more complicated case than a simple data theft or copyright infringement.
Leaks could be accidental, e.g. due to an employee logging in to their free-as-in-labor personal account instead of a no-training Enterprise account.
It's safer to have a complete ban on providers that may collect data for training.
Their entire business model based on taking other peoples stuff. I cant imagine someone would willingly drown with the sinking ship if the entire cargo is filled with lifeboats - just because they promised they would.
Being caught doing they would be wildly harmful to their business - billions of dollars harmful, especially given the contracts they sign with their customers. The brand damage would be unimaginably expensive too.
There is no world in which training on customer data without permission would be worth it for AWS.
? One single random document, maybe, but as an aggregate, I understood some parties were trying to scrape indiscriminately - the "big data" way. And if some of that input is sensitive, and is stored somewhere in the NN, it may come out in an output - in theory...
Actually I never researched the details of the potential phenomenon - that anything personal may be stored (not just George III but Random Randy) -, but it seems possible.
There's a pretty common misconception that training LLMs is about loading in as much data as possible no matter the source.
That might have been true a few years ago but today the top AI labs are all focusing on quality: they're trying to find the best possible sources of high quality tokens, not randomly dumping in anything they can obtain.
> Turns out that LLMs learn a lot better and faster from educational content as well. This is partly because the average Common Crawl article (internet pages) is not of very high value and distracts the training, packing in too much irrelevant information. The average webpage on the internet is so random and terrible it's not even clear how prior LLMs learn anything at all.
Obviously the training data should be preferably high quality - but there you have a (pseudo-, I insisted also elsewhere citing the rights to have read whatever is in any public library) problem with "copyright".
If there exists some advantage on quantity though, then achieving high quality imposes questions about tradeoffs and workflows - sources where authors are "free participants" could have odd data sip in.
And the matter of whether such data may be reflected in outputs remains as a question (probably tackled by some I have not read... Ars longa, vita brevis).
In Scandinavian financial related severs must in the country! That always sounded like a sane approach. The whole putting your data on saas or AWS just seems like the same "Let's shift the responsibility to a big player".
Any important data should NOT be in devices that is NOT physically with in our jurisdiction.
Or GitHub. I’m always amused when people don’t want to send fractions of their code to a LLM but happily host it on GitHub. All big llm providers offer no-training-on-your-data business plans.
Unlikely they think Microsoft or GitHub wants to steal it.
With LLMs, they're thinking of examples that regurgitated proprietary code, and contrary to everyday general observation, valuable proprietary code does exist.
But with GitHub, the thinking is generally the opposite: the worry is that the code is terrible, and seeing it would be like giant blinkenlights* indicating the way in.
That's why AWS Bedrock and Google Vertex AI and Azure AI model inference exist - they're all hosted LLM services that offer the same compliance guarantees that you get from regular AWS-style hosting agreements.
AWS has a strong track record, a clear business model that isn’t predicated on gathering as much data as possible, and an awful lot to lose if they break their promises.
Lots of AI companies have some of these, but not to the same extent.
> "Most company with decent management also would not want their data going to anything outside the physical server they have in control of."
Most companies physical and digital security controls are so much worst than anything from AWS or Google. Note I dont include Azure...but a physical server they have control of is a phrase that screams vulnerability.
If they get hit by a government subpoena because a journalist has been using them to analyze leaked corporate or government secret files I also trust them to honor that subpoena.
Sometimes journalists deal with material that they cannot risk leaving their own machine.
"News is what somebody somewhere wants to suppress"
> Whereas on local LLM, I watch it painstakingly prints preambles that I don't care about, and get what I actually need after 20 seconds.
You may need to "right-size" the models you use to match your hardware, model, and TPS expectations, which may involve using a smaller version of the model with faster TPS, upgrading your jardware, or paying for hosted models.
Alternatively, if you can use agentic workflows or tools like Aider, you don't have to watch the model work slowly with large modles locally. Instead you queue work for it, go to sleep, or eat, or do other work, and then much later look over the Pull Requests whenever it completes them.
I have a 4070 super for gaming, and used it to play with LLM a few times. It is by no means a bad card, but I realize that unless I want to get 4090 or new Macs that I don't have any other use for, I can only use it to run smaller models. However, most smaller models aren't satisfactory and are still slower than hosted LLMs. I haven't found a model that I am happy with for my hardware.
Regarding agentic workflows -- sounds nice but I am too scared to try it out, based on my experience with standard LLMs like GPT or Claude for writing code. Small snippets or filling in missing unit tests, fine, anything more complicated? Has been a disaster for me.
As I understand it, these models are limited on GPU memory far more than GPU compute. You’d be better off with dual 4070s than with a single 4090 unless the 4090 has more RAM than the other two combined.
I have never found any agent able to put together sensible pull requests without constant hand holding. I shudder to think of what those repositories must look like.
Sometimes TPS doesn't matter. I've generated textual descriptions for 100K or so images in my photo archive, some of which I have absolutely no interest in uploading to someone else's computer. This works pretty well with Gemma. I use local LLMs all the time for things where privacy is even remotely important. I estimate this constitutes easily a quarter of my LLM usage.
This is a really cool idea. Do you pretrain the model so it can tag people? I have so many photo's that it seems impossible to ever categorize them,using a workflow like yours might help a lot
No, tagging of people is already handled by another model. Gemma just describes what's in the image, and produces a comma separated list of keywords. No additional training is required besides a few tweaks to the prompt so that it outputs just the description, without any "fluff". E.g. it normally prepends such outputs with "Here's a description of the image:" unless you really insist that it should output only the description. I suppose I could use constrained decoding into JSON or something to achieve the same, but I didn't mess with that.
On some images where Gemma3 struggles Mistral Small produces better descriptions, BTW. But it seems harder to make it follow my instructions exactly.
I'm looking forward to the day when I can also do this with videos, a lot of which I also have no interest in uploading to someone else's computer.
Search is indeed hit and miss. Immich, for instance, currently does absolutely nothing with the EXIF "description" field, so I store textual descriptions on the side as well. I have found Immich's search by image embeddings to be pretty weak at recall, and even weaker at ranking. IIRC Lightroom Classic (which I also use, but haven't found a way to automate this for without writing an extension) does search that field, but ranking is a bit of a dumpster fire, so your best bet is searching uncommon terms or constraining search by metadata (e.g. not just "black kitten" but "black kitten AND 2025"). I expect this to improve significantly over time - it's a fairly obvious thing to add given the available tech.
I was thinking of doing the same, but I would like to include people's name. in the description. For example "Jennifer looking out in the desert sky.".
As it stands, Gemma will just say "Woman looking out in the desert sky."
Most search rankers do not consider word order, so if you could also append the person's name at the end of text description, it'd probably work well enough for retrieval and ranking at least.
If you want natural language to resolve the names, that'd at a minimum require bounding boxes of the faces and their corresponding names. It'd also require either preprocessing, or specialized training, or both. To my knowledge no locally-hostable model as of today has that. I don't know if any proprietary models can do this either, but it's certainly worth a try - they might just do it. The vast majority of the things they can do is emergent, meaning they were never specifically trained to do them.
> More and more I start to realize that cost saving is a small problem for local LLMs. If it is too slow, it becomes unusable, so much that you might as well use public LLM endpoints. Unless you really care about getting things done locally without sending information to another server.
There is another aspect to consider, aside from privacy.
These models are trained by downloading every scrap of information from the internet, including the works of many, many authors who have never consented to that. And they for sure are not going to get a share of the profits, if there is every going to be any. If you use a cloud provider, you are basically saying that is all fine. You are happy to pay them, and make yourself dependent on their service, based on work that wasn't theirs to use.
However, if you use a local model, the authors still did not give consent, but one could argue that the company that made the model is at least giving back to the community. They don't get any money out of it, and you are not becoming dependent on their hyper capitalist service. No rent-seeking. The benefits of the work are free to use for everyone. This makes using AI a little more acceptable from a moral standpoint.
More and more I start to realize that cost saving is a small problem for local LLMs. If it is too slow, it becomes unusable, so much that you might as well use public LLM endpoints. Unless you really care about getting things done locally without sending information to another server.
With OpenAI API/ChatGPT, I get response much faster than I can read, and for simple question, it means I just need a glimpse of the response, copy & paste and get things done. Whereas on local LLM, I watch it painstakingly prints preambles that I don't care about, and get what I actually need after 20 seconds (on a fast GPU).
And I am not yet talking about context window etc.
I have been researching about how people integrate local LLMs in their workflows. My finding is that most people play with it for a short time and that's about it, and most people are much better off spending money on OpenAI credits (which can last a very long time with typical usage) than getting a beefed up Mac Studio or building a machine with 4090.