More

hagope · 2024-08-23T17:36:08 1724434568

Anthropic and all the AI vendors need to implement "Login with ___" allowing users to trust sites to use their own AI resources, similar to how Dropbox allows 3rd party access to the User's storage. Most users don't want to bother with generating and loading API keys, nor can they manage it safely.

pzo · 2024-08-23T22:07:37 1724450857

Agree. I wish OpenRouter did something like that so that we can have only 1 vendor and app can decide themself what AI API want to use. Then user don't need to have digital wallets topped up with money for 3+ different AI vendors. Basically you want to have something similar to crypto wallet that you can easily fill with money and authorize apps.

hagope · 2024-07-16T05:04:59 1721106299

I used to be excited about running models locally (LLM, stable diffusion etc) on my Mac, PC, etc. But now I have resigned to the fact that most useful AI compute will mostly be in the cloud. Sure, I can run some slow Llama3 models on my home network, but why bother when it is so cheap or free to run it on a cloud service? I know Apple is pushing local AI models; however, I have serious reservations about the impact on battery performance.

PostOnce · 2024-07-16T10:51:21 1721127081

Maybe you want to conduct experiments that the cloud API doesn't allow for.

Perhaps you'd like to plug it into a toolchain that runs faster than API calls can be passed over the network? -- eventually your edge hardware is going to be able to infer a lot faster than the 50ms+ per call to the cloud.

Maybe you would like to prevent the monopolists from gaining sole control of what may be the most impactful technology of the century.

Or perhaps you don't want to share your data with Microsoft & Other Evils (formerly known as dont be evil).

You might just like to work offline. Whole towns go offline, sometimes for days, just because of bad weather. Nevermind war and infrastructure crises.

Or possibly you don't like that The Cloud model has a fervent, unshakeable belief in the propaganda of its masters. Maybe that propaganda will change one day, and not in your favor. Maybe you'd like to avoid that.

There are many more reasons in the possibility space than my limited imagination allows for.

tarruda · 2024-07-16T14:30:04 1721140204

It is not like strong models are at a point where you can 100% trust their output. It is always necessary to review LLM generated text before using it.

I'd rather have a weaker model which I can always rely on being available than a strong model which is hosted by a third party service that can be shut down at any time.

Aurornis · 2024-07-16T16:01:22 1721145682

> I'd rather have a weaker model which I can always rely on being available than a strong model which is hosted by a third party service that can be shut down at any time.

Every LLM project I’ve worked with has an abstraction layer for calling hosted LLMs. It’s trivial to implement another adapter to call a different LLM. It’s often does as a fallback, failover strategy.

There are also services that will merge different providers into a unified API call if you don’t want to handle the complexity on the client.

It’s really not a problem.

PostOnce · 2024-07-16T22:38:13 1721169493

Suppose you live outside of America and the supermajority of LLM companies are American. You want to ask a question about whisky distillation or abortion or anything else that's legal in your jurisdiction but not in the US, but the LLM won't answer.

You've got a plethora of cloud providers, all of them aligned to a foreign country's laws and customs.

If you can choose between Anthropic, OpenAI, Google, and some others... well, that's really not a choice at all. They're all in California. What good does that do an Austrian or an Australian?

jacooper · 2024-07-17T08:22:15 1721204535

Personally I found the biggest problem for local models is the lack of integrationa, it can't search the web, it can't use wolfram alpha for math, etc

LLMs are great as routers, only rarely are they good doing something on their own.

gtirloni · 2024-07-16T15:55:16 1721145316

> eventually your edge hardware is going to be able to infer a lot faster than the 50ms+ per call to the cloud.

This is interesting. Is that based on any upcoming technology improvement already in the works?

a_t48 · 2024-07-16T16:40:21 1721148021

GP is likely referring to network latency here. There's a tradeoff between smaller GPUs/etc at home that have no latency to use and beefier hardware in the cloud that have a minimum latency to use.

yjftsjthsd-h · 2024-07-16T19:10:33 1721157033

Sure, but if the model takes multiple seconds to execute, then even 100 milliseconds of network latency seems more or less irrelevant

datameta · 2024-07-16T20:23:16 1721161396

Comms is also the greatest battery drain for a remote edge system. Local inference can allow for longer operation, or operation with no network infra.

sharpshadow · 2024-07-16T19:27:34 1721158054

Excellent points and being able to use available hardware in unison is amazing and I guess we are not far away from botnets utilising this kind of technology like they did with mining coins.

neop1x · 2024-07-17T08:55:09 1721206509

Also hosted models are often censored and refuse talking about various topics.

jumpCastle · 2024-07-16T14:40:19 1721140819

Aren't services like runpod solve half of these concerns?

wokwokwok · 2024-07-16T05:28:14 1721107694

> Sure, I can run some slow Llama3 models on my home network, but why bother when it is so cheap or free to run it on a cloud service?

Obvious answer: because it's not free, and it's not cheap.

If you're playing with a UI library, lets say, QT... would you:

a) install the community version and play with ($0)

b) buy a professional license to play with (3460 €/Year)

Which one do you pick?

Well, the same goes. It turns out, renting a server large enough to run big (useful, > 8B) models is actually quite expensive. The per-api-call costs of real models (like GPT4) adds up very quickly once you're doing non-trivial work.

If you're just messing around with the tech, why would you pay $$$$ just to piss around with it and see what you can do?

Why would you not use a free version running on your old PC / mac / whatever you have lying around?

> I used to be excited about running models locally

That's an easy position to be one once you've already done it and figured out, yes, I really want the pro plan to build my $StartUP App.

If you prefer to pay for an online service and you can afford it, absolutely go for it; but isn't this an enabler for a lot of people to play and explore the tech for $0?

Isn't having more people who understand this stuff and can make meaningful (non-hype) decisions about when and where to use it good?

Isn't it nice that if meta released some 400B llama 4 model, most people can play with it, not just the ones with the $7000 mac studio? ...and keep building the open source ecosystem?

Isn't that great?

I think it's great.

Even if you don't want to play, I do.

jrm4 · 2024-07-16T13:52:06 1721137926

Right, I think people here are vastly underestimating this idea of

"What if I want to play around with really PERSONAL stuff."

I've been keeping a digital journal about my whole life. I plan to throw that thing into an AI to see what happens, and you can be damn sure that it will be local.

monkmartinez · 2024-07-16T18:16:58 1721153818

Yes, I am with you 100% and keep several LLaMA's on my workstation for that reason. I use Openrouter for everything else. Everything that isn't sensitive goes to one of the big kid models because they are just sooooo much better. LLaMA 400b might be the start of running with the big kids, but I know we are not close with the current available models.

itake · 2024-07-16T05:42:23 1721108543

I’m a bit confused. Your reasoning doesn’t align with the data you shared.

The startup costs for just messing around at home are huge: purchasing a server and gpus, paying for electricity, time spent configuring the api.

If you want to just mess around, $100 to call the world’s best api is much cheaper than spending $2-7k Mac Studio.

Even at production level traffic, the ROI on uptime, devops, utilities, etc would take years to recapture the upfront and on-going costs of self-hosting.

Self hosting will have higher latency and lower throughput.

zeta0134 · 2024-07-16T08:06:50 1721117210

You are vastly overestimating the startup cost. For me this week it was literally these commands:

pacman -S ollama

ollama serve

ollama run llama3

My basic laptop with about 16 GB of RAM can run the model just fine. It's not fast, but it's reasonably usable for messing around with the tech. That's the "startup" cost. Everything else is a matter of pushing scale and performance, and yes that can be expensive, but a novice who doesn't know what they need yet doesn't have to spend tons of money to find out. Almost any PC with a reasonable amount of RAM gets the job done.

monkmartinez · 2024-07-16T18:08:42 1721153322

llama3 at 8billion params is weak sauce for anything serious, it just isn't in the same galaxy as Sonnet 3.5 or GPT-4o. The smaller and faster models like Phi are even worse. Once you progress past asking trivial questions to a point where you need to trust the output a bit more, its not worth effort in time, money and/or sweat effort to run a local model to do it.

A novice isn't going to know what they need because they don't know what they don't know. Try asking a question to LLaMA 3 at 8 billion and the same question to LLaMA 3 at 70 billion. There is a night and day difference. Sonnet, Opus and GPT-4o run circles around LLaMA 3 70b. To run LLaMA at 70 billion you need serious horse power as well, likely thousands of dollars in hardware investment. I say it again... the calculus in time, money, and effort isn't favorable to running open models on your own hardware once you pass the novice stage.

I am not ungrateful that the LLaMA's are available for many different reasons, but there is no comparison between quality of output, time, money and effort. The API's are a bargain when you really break down what it takes to run a serious model.

jononor · 2024-07-16T22:13:14 1721167994

Using an LLM as a general purpose knowledge base is only one particular application of an LLM. And on which is probably best served by ChatGPT etc.

A lot of other things are possible with LLMs using the context window and completion, thanks to their "zero shot" learning capabilities. Which is also what RAG builds upon.

Aurornis · 2024-07-16T15:54:26 1721145266

I’m familiar with local models. They’re fine for chatting on unimportant things.

They do not compare to the giant models like Claude Sonnet and GPT4 when it comes to trying to use them for complex things.

I continue to use both local models and the commercial cloud offerings, but I think anyone who suggests that the small local models are on par with the big closed hosted models right now is wishful thinking.

sudohackthenews · 2024-07-16T05:48:10 1721108890

People have gotten manageable results on all sorts of hardware. People have even squeezed a few tokens/second out of Raspberry PIs. The small models are pretty performant- they get good results on consumer gaming hardware. My 2021 laptop with a 3070m (only 8gb vram) runs 8b models faster than I can read, and even the original M1 chips can run the models fine.

monkmartinez · 2024-07-16T18:36:34 1721154994

You are right of course.... IF your metric for manageable/useable is measured only tokens per second (tok/s).

If your metric is quality of output, time, money and tok/s, there is no comparison; Local models just aren't there yet.

LorenDB · 2024-07-16T14:02:17 1721138537

And why would you buy a Mac Studio? You could build a reasonable GPU-accelerated Linux box for well under $1500. For example: https://pcpartpicker.com/guide/BCWG3C/excellent-amd-gamingst...

J_Shelby_J · 2024-07-16T14:16:33 1721139393

Devs that refuse to move off Apple are severely disadvantaged in the LLM era.

jondwillis · 2024-07-16T14:38:04 1721140684

lol tell that to the 3 year old laptop with 64 GB of RAM that I use exclusively for local LLMs while dev’ing on my work laptop with 96 GB of RAM…

wokwokwok · 2024-07-16T06:34:59 1721111699

> The startup costs for just messing around at home are huge

No, they are zero.

Most people have extra hardware lying around at home they're not using. It costs nothing but time to install python.

$100 is not free.

If you can't be bothered, sure thing, slap down that credit card and spend your $100.

...but, maybe not so for some people?

Consider students with no credit card, etc; there are a lot of people with a lot of free time and not a lot of money. Even if you don't want to use it do you do seriously think this project is totally valueless for everyone?

Maybe, it's not for you. Not everything has to be for everyone.

You are, maybe, just not the target audience here?

Aurornis · 2024-07-16T15:57:42 1721145462

> You are, maybe, just not the target audience here?

The difference between an open model running on a $100 computer and the output from GPT4 or Claude Sonnet is huge.

I use local and cloud models. The difference in productivity and accuracy between what I can run locally and what I can get for under $100 of API calls per month is huge once you get past basic playing around with chat. It’s not even close right now.

So I think actually you are not the target audience for what the parent comments are taking about. If you don’t need cutting edge performance then it’s fun to play with local, open, small models. If the goal is to actually use LLMs for productivity in one way or another, spending money on the cloud providers is a far better investment.

Exceptions of course for anything that is privacy-sensitive, but you’re still sacrificing quality by using local models. It’s not really up for debate that the large hosted models are better than what you’d get from running a 7B open model locally.

lynx23 · 2024-07-16T06:41:48 1721112108

And its not entitled to cliam that "Most people have extra hardware lying around at home". Your story doesn't sound plausible at all.

bryanrasmussen · 2024-07-16T09:43:12 1721122992

Most people who would want to be running machine learning models probably have some hardware at home that can handle a slow task for playing around and determining if it is worthwhile to pay out for something more performant.

This is undoubtedly entitled, but thinking to yourself huh, I think it's time to try out some of this machine learning stuff is a pretty inherently entitled thing to do.

wokwokwok · 2024-07-16T06:50:59 1721112659

This project is literally aiming to run on devices like old phones.

I don't think having an old phone is particularly entitled.

I think casually slapping down $100 on whim to play with an API... probably, yeah.

/shrug

itake · 2024-07-16T06:57:26 1721113046

According to this tweet, Llama 3 costs about $0.20 per Million tokens using an M2.

https://x.com/awnihannun/status/1786069640948719956

In comparison, GPT3.5-turbo costs $0.50 per million tokens.

Do you think an old iPhone will less than 2x efficient?

nightski · 2024-07-16T11:46:24 1721130384

FWIW depends on cost of power. Where I live cost of power is less than half the stated average.

nl · 2024-07-16T13:14:46 1721135686

> Well, the same goes. It turns out, renting a server large enough to run big (useful, > 8B) models is actually quite expensive. The per-api-call costs of real models (like GPT4) adds up very quickly once you're doing non-trivial work.

I run my own models, but the truth is most of the time I just use an API provider.

TogetherAI and Groq both have free offers that are generous enough I haven't used them up in 6 months of experimentation and TogetherAI in particular has more models and gets new models up quicker than I can try them myself.

FeepingCreature · 2024-07-16T05:36:00 1721108160

I just prepay $20/mo to openrouter.ai and can instantly play with every model, no further signup required.

Aurornis · 2024-07-16T15:51:47 1721145107

> Why would you not use a free version running on your old PC / mac / whatever you have lying around?

Because the old PC lying around can’t come anywhere near the abilities or performance of the hosted AI compute providers. Orders of magnitudes of difference.

The parent commenter is correct: If you want cutting edge performance, there’s no replacement for the hosted solutions right now.

Running models locally is fun for playing around and experimenting, but there is no comparison between what you can run on an old PC lying around and what you can get from a hosted cluster of cutting edge hardware that offers cheap output priced per API call.

friendly_chap · 2024-07-16T06:46:49 1721112409

We are running smaller models with software we wrote (self plug alert: https://github.com/singulatron/singulatron) with great success. There are obvious mistakes these models make (such as the one in our repo image - haha) sometimes but they can also be surprisingly versatile in areas you don't expect them to be, like coding.

Our demo site uses two NVIDIA GeForce RTX 3090 and our whole team is hammering it all day. The only problem is occasionally high GPU temperature.

I don't think the picture is as bleak as you paint. I actually expect Moore's Law and better AI architectures to bring on a self-hosted AI revolution in the next few years.

dotancohen · 2024-07-16T06:13:46 1721110426

I have found many similarities between home AI and home astronomy. The equipment needed to get really good performance is far beyond that available to the home user, however intellectually satisfying results can be had at home as a hobby. But certainly not professional results.

grugagag · 2024-07-16T11:51:00 1721130660

When learning and experimenting it could make a difference.

Cantinflas · 2024-07-16T05:10:19 1721106619

Why bother running models locally? Privacy, for once, or censorship resistance.

seasonman · 2024-07-16T05:20:34 1721107234

Also customizability. Sure, you can fine-tune the cloud hosted models (to a certain degree of freedom), but it will probably be expensive, inefficient, difficult and unmaintainable.

hanniabu · 2024-07-16T05:35:37 1721108137

And offline access

dsign · 2024-07-16T07:58:09 1721116689

For my advanced spell-checking use-case[^1], local LLMs are, sadly, not state-of-the-art. But their $0 price-point is excellent to analyze lots of sentences and catch the most obvious issues. With some clever hacking, the most difficult cases can be handled by GPT4o and Claude. I'm glad there is a wide variety of options.

[^1] Hey! If you know of spell-checking-tuned LLM models, I'm all ears (eyes).

bruce343434 · 2024-07-16T10:16:50 1721125010

I think the floating point encoding of LLMs is inherently lossy, add to that the way tokenization works. The LLMs I've worked with "ignore" bad spelling and correctly interpret misspelled words. I'm guessing that for spelling LLMs, you'd want tokenization at the character level, rather than a byte pair encoding.

You could probably train any recent LLM to be better than a human at spelling correction though, where "better" might be a vague combination of faster, cheaper, and acceptable loss of accuracy. Or maybe slightly more accurate.

(A lot of people hate on LLMs for not being perfect, I don't get it. LLMs are just a tool with their own set of trade offs, no need to get rabid either for or against them. Often, things just need to be "good enough". Maybe people on this forum have higher standards than average, and can not deal with the frustration of that cognitive dissonance)

Hihowarewetoday · 2024-07-16T07:59:54 1721116794

I'm not sure why you have resigned?

If you don't care about running it locally, just spend it online. Everything is good.

But you can run it locally already. Is it cheap? No. Are we still in the beginning? yes. We are still in a phase were this is a pure luxury and just getting into it by buying a 4090, is still relativly cheap in my opinion.

Why running it locally you ask? I personally think running anythingllm and similiar frameworks on your own local data is interesting.

But im pretty sure in a few years you will be able to buy cheaper ml chips for running models locally fast and cheap.

Btw. aat least i don't know a online service which is uncensored, has a lot of loras as choice and is cost effective. For just playing around with LLMs for sure there are plenty of services.

fouc · 2024-07-24T17:08:54 1721840934

https://x.com/karpathy/status/1814038096218083497

LLMs will start shrinking massively in size soon, without any loss in performance.

bongodongobob · 2024-07-16T05:14:32 1721106872

I have a 2 year old Thinkpad and I wouldn't necessarily call llama3 slow on it. It's not as fast as ChatGPT but certainly serviceable. This should only help.

Not sure why your throwing your hands up because this is a step towards solving your problem.

diego_sandoval · 2024-07-16T16:02:44 1721145764

> why bother when it is so cheap or free to run it on a cloud service?

For the same reasons that we bother to use Open Source software instead of proprietary software.

jrm4 · 2024-07-16T13:48:50 1721137730

What do you mean by useful here?

I'm saying because I've had the exact OPPOSITE thought. The intersection of Moore's Law and the likelihood that these things won't end up as some big unified singularity brain and instead little customized use cases make me think that running at home/office will perhaps be just as appealing.

cess11 · 2024-07-16T19:15:36 1721157336

I don't want people I don't know snooping around in my experiments.

dws · 2024-07-16T15:58:17 1721145497

> Sure, I can run some slow Llama3 models on my home network, but why bother when it is so cheap or free to run it on a cloud service?

Running locally, you can change the system prompt. I have Gemma set up on a spare NUC, and changed the system prompt from "helpful" to "snarky" and "kind, honest" to "brutally honest". Having an LLM that will roll its eyes at you and say "whatever" is refreshing.

nhod · 2024-07-16T05:17:20 1721107040

Is this a hunch, or do you know of some data to back up your reservations?

Copilot+ PC’s, which all run models locally, have the best battery life of any portable PC devices, ever.

These devices have in turn taken a page out of Apple Silicon’s playbook. Apple has the benefit of deep hardware and software integration that no one else has, and is obsessive about battery life.

It is reasonable to think that battery life will not be impacted much.

fragmede · 2024-07-16T05:31:13 1721107873

That doesn't seem totally reasonable. The battery life of an iphone is pretty great if you're not actually using it, but if you're using the device hard, it gets hot to the touch, along with the battery getting drained. playing resource intensive video games, maxing out the *PU won't stop and let the device sleep at all, and has a noticable hit on battery life. Where inference takes a lot of compute to perform, it's hard to imagine inference being totally free, battery-wise. It probably won't be as hard on the device as playing specific video games non-stop, but I get into phone conversations with ChatGPT as it is, so I can imagine that being a concern if you're already low on battery.

aftbit · 2024-07-16T15:03:32 1721142212

What if you want to create transcripts for 100s of hours of private recorded audio? I for one do not want to share that with the cloud providers and have it get used as training data or be subject to warrentless search under the third party doctrine. Or what if you want to run a spicy Stable Diffusion fine-tune that you'd rather not have associated with your name in case the anti-porn fascists take over? I feel like there are dozens of situations where the cost is really not the main reason to prefer a local solution.

hagope · on March 1, 2023

I can confirm it is TOTAL tokens, this from the account/usage page:

gpt-3.5-turbo-0301, 2 requests 28 prompt + 64 completion = 92 tokens

hagope · on Feb 22, 2023

This is really cool! Are you using OpenAI API under the hood or have you trained your own model?

rushingcreek · on Feb 22, 2023

We use both our own models and OpenAI.

hagope · on July 11, 2022

This is cool, but wouldn't creating a constraint using a nullable column be considered a poor design decision? In which scenarios would this be a good idea?

cptn_badass · on July 11, 2022

When an entry can belong to 0 or 1 related object only. Not that I'd put a constraint in such scenario, but I imagine a User can optionally have a Subscription, so subscription_id is either nil or present, and said subscription cannot be associated to any other User.

hagope · on July 11, 2022

in that scenario why not use foreign key? any advantage to using constraint?

Pxtl · on July 11, 2022

You do both.

The foreign key only gaurantees that the other entity exists.

The unique constraint ensures that only one pair of entities has this relationship, preventing a one-to-many binding.

The distinctness of NULL allows you to have multiple entities with the same NULL value without violating the above UNIQUE constraint.

The "NULL is empty" vs "NULL is unknown" is a series of trade-offs of labor-saving. Imho, the wrong trade-offs were made, but once the choice is made it makes sense to continue and be consistent with it. I'd rather be consistently wrong than inconsistently right.

hagope · on July 11, 2022

If the entity doesn't exist, wouldn't it violate the FK and therefore no need for the nullable constraint?

hagope · on Jan 4, 2022

Would be cool if you could enter Yotube URL and you parse out the recipe from the audio/video.

hagope · on May 21, 2020

Also, what if people move every year? are they going to readjust after every move?

Apocryphon · on May 21, 2020

Who in this modern housing market is able to move every year?

typest · on May 21, 2020

Renters.

hagope · on April 10, 2020

Coursera has great content from Industry partners (Google Cloud, Amazon AWS, IBM etc) that teach everything you need to know for hacking in cloud. These skills are not widely taught in University, but skills are highly valued in the Tech industry. Three specializations (a collection of courses) that are hands-on and I would highly recommend 1.) https://www.coursera.org/specializations/aws-fundamentals 2.) https://www.coursera.org/specializations/gcp-data-machine-le... 3.) Anything from deeplearning.ai [disclosure: I work at Coursera]

hagope · on Nov 5, 2019

They should have also renamed the parent company to Face, Inc.

hagope · on Feb 2, 2019

Omar from Coursera. Our mission is to educate and reach as many people as possible. We are hiring across several different job roles mainly in Engineering and Data. Our main offices are in Mountain View, CA (HQ) and Toronto, Canada. Also, there are many non-technical roles and international locations including NYC, London and Abu Dhabi. Please reach out if you have any questions! https://jobs.lever.co/coursera?lever-via=-kpP7dimO_