Having spent quite a bit of time playing around with llama.cpp, alpaca.cpp, lora...

tmountain · on March 29, 2023

This sentence defies lay people:

The biggest deal with this isn't the published lora adapter (which seems limited to llama 7b), but the cleaned training data, which is likely better than the previous data sets used to train the alpaca-inspired loras that have been publicly released so far.

slg · on March 29, 2023

I have casually followed countless different news cycles on various complicated tech topics over my decades long career. I can't recall a single one that has consistently made me feel like an idiot more than how people talk about this recent AI wave. There just seems to be so much more jargon involved in this subject that makes casual perusing of the latest developments impenetrable.

kolinko · on March 29, 2023

I had the same issue, and I just caught up over the weekend. Three books I can recommend to get up to speed:

- NumPy basics pdf - first 2-3 chapters - Deep Learning with PyTorch by Voight Godoy [2] - first 2-3 chapters if you had experience with neural networks, or the whole of it if you didn't.

With the above, you will get the basics to understand this book about transformers, and the architecture of the models, and everything else, from this book:

- Natural Language Processing with Transformers: Building Language Applications with Hugging Face ( https://www.amazon.com/Natural-Language-Processing-Transform... ).

I took a weekend to go through the books in this order, and now I finally understand what people mean with all that jargon :)

1 - https://numpy.org/doc/1.18/numpy-user.pdf - 2 - https://www.amazon.com/Deep-Learning-PyTorch-Step-Step/dp/B0...

rlevy · on March 29, 2023

What’s the third book?

biomcgary · on March 29, 2023

"How to deal with off-by-one errors"

EntrePrescott · on March 29, 2023

reminds me of julia (the language): wanted to give it a try recently, until I read in their documentation: "In Julia, indexing of arrays, strings, etc. is 1-based not 0-based"… which made me wonder for a moment how many off-by-one errors may be caused by mismatches between different programming languages.

pegasus · on March 29, 2023

Look again, two different books are referred on one line, then a third lower down.

kolinko · on March 30, 2023

All of them in the comment. I forgot to do double-newlines, so the formatting is broken, and I can't edit the post any more.

akiselev · on March 29, 2023

Ah, my fellow citizen of the interwebs, fear not! Your intellectual frustrations are but a natural reaction to the tsunami of technological jargon. You see, the AI wave is the epitome of obfuscation, a testament to the labyrinthine lexicon of the digital age. It's as if a group of caffeinated, sleep-deprived tech enthusiasts assembled in the dark of night and decided to create an impenetrable fortress of vernacular, just to keep the uninitiated at bay.

Grothendank · on March 29, 2023

Should jackasses on HN use plain language instead of jargon? Surely.

But AI workers mainly develop and use jargon because it is an easy and natural way to consolidate concepts.

Sure, there is a kind of conspiracy caused by publish or perish. Researchers may use jargon to make their work harder to reject on review; laborious speech and jargon can make statements sound more profound. However, no technical field is immune to this. We'll need to systematically change science before we can eliminate that problem.

Until we manage that, if you care about the concepts enough to want to understand them before there are good plain speech descriptions, just pop the jargon into google scholar and skim read a few papers, and you're good to go. If you don't care about the concepts that much, then don't worry about the jargon. The important concepts will get their own non-technical explanations in time.

As it stands, AI jargon is not that bad. It tends to be pretty fair and easy to understand, compared to jargon in, say, biochemistry or higher math.

kordlessagain · on March 29, 2023

With chatots, explaining everything clearly is an option.

manojlds · on March 31, 2023

Ironically, ChatGPT can help you understand the jargon.

pps · on March 29, 2023

Every human group has its jargon, it's normal, it's how people compress knowledge into smaller chunks to communicate efficiently.

Grothendank · on March 29, 2023

There's a neat trick when you encounter jargon.

1. Identify the jargon terms you don't understand

2. Lookup papers that introduce the jargon terms

3. Skim-read the paper to get the gist of the jargon

If you don't want to do this, then you don't have to feel uneducated. You can simply choose to feel like your time is more important than skimming a dozen AI papers a week.

But for example, here's what I did to understand the parent comment:

1. I had no idea what lora is or how it relates to alpaca.

2. I looked up https://github.com/tloen/alpaca-lora

3. I read the abstract of the Lora paper: https://arxiv.org/pdf/2106.09685.pdf https://github.com/tloen/alpaca-lora

4. Now I know that Lora is just a way of using low rank matrices to reduce finetuning difficulty by a factor of like 10,000 or something ridiculous

5. Since I don't actually care about /how/ Lora does this, that's all I need to know.

6. TLDR; Lora is a way to fine-tune models like Llama while only touching a small fraction of the weights.

You can do this with any jargon term at all. Sure, I introduced more jargon in step 4 - low rank matrices. But if you need to, you can use the same trick again to learn about those. Eventually you'll ground yourself on basic college level linear algebra, which if you don't know, again you should learn.

The sooner you evolve this "dejargonizing" instinct rather than blocking yourself when you see new jargon, the less overwhelmed and uneducated you will feel.

mszcz · on March 29, 2023

> 3. Skim-read the paper to get the gist of the jargon

Or, you know, you could ask ChatGPT to explain it to you... Granted the term was coined 2021>=. Even if it wasn't but the paper is less than 32k tokens... 0.6c for the answer doesn't seem all that steep.

edit: grammar

Grothendank · on March 29, 2023

This actually works!

It works astoundingly well with poorly written technical manuals. Looking at you, CMake reference manual O_O. It also helps translate unix man pages from Neckbeardese into clean and modern speech.

With science papers it's a bit more work. You must copy section by section into GPT4, despite the increased token limit.

But sure. Here's how it can work:

1. Copy relevant sections of the paper

2. As questions about the jargon:

"Explain ____ like I'm 5. What is ____ useful for? Why do we even need it?"

"Ah, now I understand _____. But I'm still confused about _____. Why do you mean when you say _____?"

"I'm starting to get it. One final question. What does it mean when ______?"

"I am now enlightened. Please lay down a sick beat and perform the Understanding Dance with me. Dances"

This actually works surprisingly well.

mszcz · on March 29, 2023

Yeah, I think education is a great use case here. Sure, the knowledge that's built into the model might be inaccurate or wrong but you can feed the model the knowledge you want to learn/processed.

What you get is a teacher that never tires, is infinitely patient, has infinite time, doesn't limit questions, doesn't judge you, really listens and has broad, multidisciplinary knowledge that correct-ish (for when it's needed). I've recently read somewhere that Stanford (?) has almost as many admin workers as they do students. Seems to me that this is a really bad time to be that bloated. Makes you wonder what you really spend your money on, is it worth it (yeah, I know, it's not just education that you get in return) and if you can get the same-ish effect for a lot cheaper and on your timetable.

Not that the models or field, now, are in a state that would produce a good teaching experience. I can however imagine a future not so distant that this would be possible. Recently on a whim I've asked it to produce a options trading curriculum for me. It did a wonderful job. I wouldn't trust it if I didn't know a little bit myself about the subject before but I came off really impressed.

lordswork · on March 29, 2023

No need to pay yourself. Uploaded https://arxiv.org/pdf/2106.09685.pdf to scisummary:

This text discusses various studies and advancements in the field of natural language processing (NLP) and machine learning. One study focuses on parameter-efficient transfer learning, and another examines the efficiency of adapter layers in NLP models. Further studies evaluate specific datasets for evaluating NLP models. The article proposes a method called LoRA (low rank adaptation) for adapting pre-trained neural network models to new tasks with fewer trainable parameters. LoRA allows for partial fine-tuning of pre-trained parameters and reduces VRAM usage. The article provides experimental evidence to support the claims that changing the rank of Delta W can affect the performance of models, and that LoRA outperforms other adaptation methods across different datasets. The authors propose LoRA as a more parameter-efficient approach to adapt pre-trained language models to multiple downstream applications.

seydor · on March 29, 2023

I think the opposite. Any field, from physics to biology tends to have overly opaque jargon. AI grew on its own and quickly shed the idea of being "based on biology" or the rest of science, so its basic jargon is pretty much understandable. Things like Dropout, Attention etc are intuitively named. I think people like me underestimated, however, how fast the field evolved and how big the corpus became, so specific architectures got specific names and more are being created every day. There is no shortcut around that though, because they are in the discovery phase. Once things settle down to a few architecutres they ll make some kind of IUPAC

ginger-hot-tea · on March 29, 2023

I am keeping a glossary page for this reason, maybe this can help others: https://daily.ginger-t.link/glossary

I am trying to be very selective about what to add in there and as concise as possible, but I would welcome any suggestions for format and additional content.

lucidyan · on March 29, 2023

> https://daily.ginger-t.link/glossary

Big thanks for your glossary, find it very useful and overlapping with my personal obsidian notes, hope it continues to receive updates

keyle · on March 29, 2023

I feel you! Did you miss the web 3 wave though? I still can't imagine what that was about.

kolinko · on March 29, 2023

Web3 was about decentralised web - as in, more stuff, like login and data, moving client-side. E.g. instead of having "login in facebook", having Metamask plugin in your browser, that holds your private keys, and allows you to log into a website.

Also, building websites that don't store user data at all. Everything is kept in browser storage. You could say that the chat-gpt interfaces people are building now are web3, because they don't store your api keys, nor your converstation history.

Second part was decentralising as much as possible. Decentralised domain-name systems (ENS), storage, hosting, and money of course. So that you own your data, and your identity.

The last time I checked, the decentralised storage and hosting were the most difficult to solve. That is - we have torrents of course, but if you wanted to pay decentralised web to host and run your scripts indefinitely, it was not feasible.

rablackburn · on March 29, 2023

Web 3 seemed silly enough that I didn’t bother really following it. I know that there is probably some very good work going on around blockchain stuff, but NFTs ain’t it.

LLM assistants is genuinely just moving _very_ fast, so if you don’t pay attention every day you just miss things.

I’m just enjoying having my interactive rubber-duck tbh

actionfromafar · on March 30, 2023

I think NFTs may really be on to something, but probably mundane little things like tickets to concerts, not eye-wateringly expensive collectible monkeys.

Like how SMS is now a thing used for all sorts of little stuff, but nobody thinks much about it.

spyder · on March 29, 2023

Good thing you can ask LLMs about these jargons (preferably Bing because it can search for recent data). I just tried it and the answers to explain OPs comment are not too bad. (I'm not gonna paste it here just because I don't wanna fill HN with AI text. Trying to preserve some the human content until we can :-O )

rexreed · on March 29, 2023

The AI Today glossary and podcast series has been helpful for a grounding on basic concepts: https://www.aidatatoday.com/aitoday/

LeBit · on March 30, 2023

Thanks for that. I too feel like a very old man now.

joshxyz · on March 29, 2023

its ok you are not alone, most of us feel the same way on theiri buzwords

Grothendank · on March 29, 2023

There's a difference between buzzwords and jargon. Buzzwords can start out as jargon, but have their technical meaning stripped by users who are just trying to sound persuasive. Examples include words like synergy, vertical, dynamic, cyber strategy, and NFT.

That's not what's happening in the parent comment. They're talking about projects like

https://github.com/ZrrSkywalker/LLaMA-Adapter

https://github.com/microsoft/LoRA

https://github.com/tloen/alpaca-lora

and specifically the paper: https://arxiv.org/pdf/2106.09685.pdf

Lora is just a way to re-train a network for less effort. Before we had to fiddle with all the weights, but with Lora we're only touching 1 in every 10,000 weights.

The parent comment says GPT4all doesn't give us a way to train the full size Llama model using the new lora technique. We'll have to build that ourselves. But it does give us a very huge and very clean dataset to work with, which will aid us in the quest to create an open source chatGPT killer.

refulgentis · on March 29, 2023

A Lora is a layer on top of a model, the big deal isn’t that this exists (it’s a Lora for the weakest llama), but the fact they shared their dataset. The stronger llamas trained with this data will produce even better Lora’s and better results.

robwwilliams · on March 29, 2023

Here is a good start in “low-rank adaptation” or LoRA module. A way to train/adapt a general purpose LLM to efficiently and iteratively accommodate specialize data types and knowledge. A bolt-on.

https://arxiv.org/pdf/2106.09685.pdf

Not “on top” but more “in parallel” if I understand correctly

axlee · on March 29, 2023

What is a lora or llama? Google gives me nothing.

Majromax · on March 29, 2023

LLaMA is the large language model published by Facebook (https://ai.facebook.com/blog/large-language-model-llama-meta...). In theory the model is private, but the model weights were shared with researchers and quickly leaked to the wider Internet. This is one of the first large language models available to ordinary people, much like Stable Diffusion is an image generation model available to ordinary people in contrast to DALL-E or MidJourney.

With the model's weights open to people, people can do interesting generative stuff. However, it's still hard to train the model to do new things: training large language models is famously expensive because of both their raw size and their structure. Enter...

LoRA is a "low rank adaptation" technique for training large language models, fairly recently published by Microsoft (https://github.com/microsoft/LoRA). In brief, the technique assumes that fine-tuning a model really just involves tweaks to the model parameters that are "small" in some sense, and through math this algorithm confines the fine-tuning to just the small adjustment weights. Rather than asking an ordinary person to re-train 7 billion or 11 billion or 65 billion parameters, LoRA lets users fine-tune a model with about three orders of magnitude fewer adjustment parameters.

Combine these two – publicly-available language model weights and a way to fine tune it – and you get work like the story here, where the language model is turned into something a lot like ChatGPT that can run on a consumer-grade laptop.

chronicler · on March 29, 2023

Thanks for sharing. How do you know this? Can you recommend any papers to read to start learning about LLMs? I have very limited ML/AI knowledge.

xmonkee · on March 29, 2023

Thanks, very helpful. Are llama and chatGPT essentially the same “program”, just with different weights? And is one better than the other (for the same number of parameters) just because it has better weights?

lbotos · on March 29, 2023

My understanding is they are both "LLM" (Large language models). That's the generic term you are looking for.

I don't think you can compare one LLMs weights to another directly, because the weights are a product of the LLM. In theory (I don't know actually) llama and chatGPT may be using different source datasets so you can't compare them like for like.

manojlds · on March 31, 2023

LLaMA and GPT are like Pepsi and Coke.

FooBarWidget · on March 29, 2023

How are the llama weights usable by the public? Even if leaked, doesn't using it count as piracy and thus a violation of either copyright or database laws?

jsnell · on March 29, 2023

It's not at all clear whether weights are copyrightable.

Radim · on March 29, 2023

There's some irony in BigCos using everyone's actual IP freely to train their models, no qualms whatsoever.

And then people being scared to even download said models because of "OMG IP!"

The asymmetry of power (and dare I say, domestication) is astounding.

FooBarWidget · on March 29, 2023

I'm pretty sure they are. If not copyrightable, then at least the database law should apply. One can easily make the case in front of a judge that the situation is similar to databases: the value of weights lies in the amount of work needed to gather the training data, thus weights should be considered a sort of crystallization of a database.

jsnell · on March 29, 2023

But the entire business model of the companies making the models seems to be including copyrighted data into the training set under the guise of fair use. If the weights are considered to be a derived work of the training data as a whole, it seems the weights would also have to be a derived work of the individual items in the training data. So I doubt any of them will be making that argument.

(Except maybe companies that have access to vast amounts of training data with an explicit license, e.g. because the content is created by their users rather than just scraped from the web?)

FooBarWidget · on March 29, 2023

That doesn't matter to database laws. Databases are protected under the premise that collecting the data takes work. How that data is licensed is orthogonal to database law.

jsnell · on March 29, 2023

If I understand correctly your claim was that "the value lies in gathering [a database] of the training data"; that the curation of the training data is what gives the trainer an intellectual property claim on the otherwise mechanical process of creating a model, right? Not that the model itself was a database.

For them to make the argument in court that database rights over the database of training data mean they have rights over the model too, they'd need to argue that the model is a derivative work training data. And then it'd mean their model is also a derived work from all the billions of works they scraped to get that data set. It would destroy the business model of the OpenAIs of the world, there is no chance they try to argue this in court.

nl · on March 30, 2023

> For them to make the argument in court that database rights over the database of training data mean they have rights over the model too, they'd need to argue that the model is a derivative work training data. And then it'd mean their model is also a derived work from all the billions of works they scraped to get that data set. It would destroy the business model of the OpenAIs of the world, there is no chance they try to argue this in court.

This doesn't follow at all.

They can argue they used that work under fair-use and/or that their work was transformative. This is a fairly clear extension of arguments used by search engines that indexing and displaying summaries is not copyright violation and these arguments have been accepted by courts in most circumstances.

jsnell · on March 30, 2023

If the uncreative and automated work of training the model is transformative enough to impact the rights of the original content creators, it would also be transformative enough to impact the rights of the database curator.

The fair use case is much harder to make here than for search engines since the model will be directly competing with the content creators. And again, how could e.g. OpenAI simultaneously claim that their use of the original content to train the model, and then subsequent use the model and the model outputs, while simultaneously claiming that the model could not be used without infringing their DB rights? You can argue fair use for both or neither; trying to argue it for just one of my the two is just incoherent.

And everyone building models needs free access to the training data way more than they need copyright as a means to protect the model.

nl · on March 30, 2023

I don't necessarily disagree, but it's very unclear what a court would find.

I suggest https://arxiv.org/abs/2303.15715 for a complete overview.

jsnell · on March 31, 2023

Agreed! It being unclear was in fact my first message in this discussion :) Thanks for the link, I'll definitely need to read it.

muyuu · on March 29, 2023

yea I do wonder about this, but even Meta are acting as if their releasing it means in effect that the cat is out of the bag

at this point, their not even complaining about it must mean that they accept the data is public now

dpiers · on March 29, 2023

LoRA: https://arxiv.org/pdf/2106.09685.pdf

LLaMA: https://ai.facebook.com/blog/large-language-model-llama-meta...

Both have been the subjects of numerous HN posts in the last month.

corford · on March 29, 2023

And the LLaMA paper from Meta.ai team is here: https://arxiv.org/pdf/2302.13971v1

h11h · on March 29, 2023

LLaMA is Facebook's LLM (large language model, comparable with GPT). It's publicly available (anyone can download the weights and run it themselves), so it's popular here.

LoRA, or Low-Rank Adaptation of Large Language Models, lets people fine tune a LLM (making it perform better for a particular application) using vastly less resources. Paper: https://arxiv.org/pdf/2106.09685.pdf

bestcoder69 · on March 29, 2023

llama: gpt-3 alternative that you can download and run on a toaster

lora: efficient way of fine-tuning a model like llama, where instead of recreating an entire model, you're keeping the base model and generating a fine-tunings file to apply on top of it.

toaster: any machine with like 4GB of RAM available to fit the model

physPop · on March 29, 2023

silly back-ronyms made by academics

jhbadger · on March 29, 2023

As someone who is following this technology while not really an expert (I'm a computational biologist in my day job) LoRA is a way of reducing the number of parameters in a large language model (LLM, the technology behind all these new chatbots) so that it can be run on less powerful hardware (say a laptop or even a phone). The OP is saying that the improvements this chatbot provides isn't so much being more clever about reducing parameters but being trained on text that has been cleaned up rather than the rather messy training sets used in other small LLMs.

f_devd · on March 30, 2023

LoRA (Low-Rank Adapter) is way to customize/finetune the LLM to a new datasets without needing to retrain the entire network (which makes it better (and in theory easier to do). It doesn't not change the speed significantly afaik

throwaway2865 · on March 29, 2023

Less garbage in, less garbage out?

alchemist1e9 · on March 29, 2023

I’ll ask a dumb question. On another of the numerous LLM related posts I was asking if any of the self host-able open model can do code summaries at close to the quality of GPT 3.5 turbo. I was basically told nowhere close yet.

Can this potentially do that?

Ideally I’d like to have it generate descriptions of large amounts of code but would rather not burn tokens and lose privacy via OpenAI api. But I’d gladly keep a high end GPU burning on such task , even if that was actually slightly more expensive.

Edit: To clarify I do this partially now on batches of code via openAI api currently. It’s around 1-3 cents for a typical 400-800 line source code file. And I don’t mean feeding the full code base in as a single input.

2bitencryption · on March 29, 2023

Here's my experience, having used llama+lora 7b, 13b, and 30b, on both cpu and gpu:

On gpu, processing the input prompt, even for huge prompts, is almost instant. Meaning, even if your prompt is huge, it will start generating new tokens after your prompt very quickly. On a rented A6000 gpu, using llama+lora 30b, you can use huge prompts and it will start giving a new output right away.

On cpu (i.e. the project llama.cpu), it takes a very, very long time to process the input prompt, before it begins to generate new tokens. Meaning, if you provide a huge copy/paste of code, it will take a long time to ingest all that input, before it begins outputting new tokens.

Once it finally starts outputting new tokens, the rate is surprisingly fast, not much slower than gpu.

I wish I knew the reason for this, but I'm not an expert :) I've just seen this in practice.

ineedasername · on March 29, 2023

How long is very very long? Am I going to get coffee while it works, going to lunch, doing it right before I go to bed, or hoping it finishes in time to come up with the most perfect epitaph on my tombstone? ;)

alchemist1e9 · on March 29, 2023

That sounds promising but how about the quality of the output.

I’ve been using OpenAI API with chatblade and been giving it code in various languages and it’s quite surprising how well it describes the purpose and code implementation in english. The english description would be useful and relevant for developers trying to quickly familiarize themselves with a code base.

For a typical 400-600 line file it looks like it would cost around 1-3 cents per file. However loss of privacy isn’t so great.

How do you find output quality of llama + lora for such a task?

Code is research code I’d run it on.

londons_explore · on March 29, 2023

That is likely a bug. There is nothing in the maths that should make prompt tokens slower on CPU.

loufe · on March 29, 2023

It is a bug, it's been clearly discussed under the issues of LLaMA.cpp. The wait time is far from atrocious, with a 5600x and the 13B I would wait 15 seconds before input starts for a relatively complex prompt.

stolsvik · on March 30, 2023

As I understand these models, that makes no sense whatsoever. Is it the tokenisation that takes this crazy amount of time? Because each additional token should take exactly the same amount of time as the first.

kolinko · on March 29, 2023

Any tips on setting up llama+lora on 30b? There are so many resources that I can't figure out which models to use, and which projects to use to set everything up.

nickthegreek · on March 29, 2023

text-generation-webui (https://github.com/oobabooga/text-generation-webui)

kolinko · on March 30, 2023

I tried it, but I can't find the right model for llama+lora 30B . Google, nor Bing ;) is not helpful.

nickthegreek · on March 30, 2023

I think this is the one you want: https://huggingface.co/elinas/alpaca-30b-lora-int4

aunty_helen · on March 29, 2023

We're a long long long way off that. So check back in two months.

Jokes aside, the limiting factor will be either a technique to pack all of the code into smaller tokens like semantic search (someone else will be able to comment on this as that's the limit of my understanding) or GPU memory for input tokens.

Buying a "high end GPU" isn't buying a 4090 or even two, it's 250k on a DGX unit and putting it in a datacentre. You will probably be able to find a service that would sign a confidentiality agreement and provide you with this service for less than 250k.

2bitencryption · on March 29, 2023

It's $0.590/hr to rent an A6000 48GB gpu on jarvislabs.ai, which you can do right now, and run the 30b (or 65b with some hard work) model and get incredible results. No confidentiality agreement required :)

Casteil · on March 29, 2023

Do you know how their "spot pricing" works, exactly? "lets you use spare GPUs" doesn't tell us much. Is it granular to how long the duration the model is actively processing your prompt? ..or do you get dinged for an hour immediately and again on the hour?

alchemist1e9 · on March 29, 2023

Are you sure?

This is for inference, not training, and I’d do it on blocks of code with less than 4096 tokens. That’s what I have to do with gpt 3.5 turbo via api.

I can do what I want with the API and it is around 1-2 cents per batch of code I run through it, typically 400 lines of average code is around 4k tokens.

aunty_helen · on March 29, 2023

If you can get it within 4k of tokens then your vram needs will be much less. As the other two have pointed out, there's workarounds to this.

If you're trying to get codebase level introspection you might need to wait a bit for some of these techs to mature.

Exciting space and yea, learning heaps myself day by day.

alchemist1e9 · on March 29, 2023

The approach I was going to take is generating english code descriptions/summaries in related chunks of under 4k tokens then those can be combined and also summarized.

In my opinion the code descriptions gpt 3.5 turbo has been spitting out for me are good quality and concise. I’d argue they are probably better than what many of the developers themselves would write, especially when english isn’t native for the developer.

barking_biscuit · on March 29, 2023

>Buying a "high end GPU" isn't buying a 4090 or even two, it's 250k on a DGX unit and putting it in a datacentre. You will probably be able to find a service that would sign a confidentiality agreement and provide you with this service for less than 250k.

I haven't started to play with LLMs locally in anger yet, but I was under the impression that you could use a 4090 in combination with FlexGen to achieve this rather than having to buy special hardware?

tetete · on March 29, 2023

With 8 bit training you can do ~13B pram LLM on 3090/4090. https://huggingface.co/blog/trl-peft

But it is pretty cheap to rent something at vast.ai or whatever to get 40GB for a final run.

nullsense · on March 29, 2023

Awesome. Between crypto hype in 2017 and AI hype in 2023 I've acquired a collection of 2x 1080ti, an RTX 3060 and an RTX 4090. All together it's a total of 58GB of VRAM. Is there a way I can pool it all across a distributed cluster of 2 machines for doing anything? I'm assuming it would bottleneck on both network speeds and the slowest GPUs if it's possible at all...

all2 · on March 31, 2023

Could you package all that compute into a "virtual" graphics card?

pmoriarty · on March 29, 2023

"Buying a "high end GPU" isn't buying a 4090 or even two, it's 250k on a DGX unit and putting it in a datacentre. You will probably be able to find a service that would sign a confidentiality agreement and provide you with this service for less than 250k."

Does anyone have a projection based on historical GPU cost reductions as to how long we have to wait until a DGX unit costs as little as a 4090 does today?

phh · on March 29, 2023

The issue is largely around available RAM capacity, and sadly RAM price have barely decreased in the last 5-10 years

nathanasmith · on March 29, 2023

There is ChatGLM[0], a 6 billion parameter Chinese/English bilingual model that is gaining a reputation as the leading locally runnable LLM for code generation. Maybe look into that. Demo is here[1].

[0]https://github.com/THUDM/ChatGLM-6B/blob/main/README_en.md [1]https://huggingface.co/spaces/multimodalart/ChatGLM-6B

nmfisher · on March 29, 2023

I tried this the other day for generating basic Chinese conversations - the quality is surprisingly good. While it's still behind the latest GPT, but the gap isn't as large as I thought it would be.

alchemist1e9 · on March 29, 2023

This does seem promising when playing with the demo. That says running on a T4 I believe. This probably means we are only months away from viable self hosted code generation and analysis that could compete with gpt 3.5 turbo.

It seems like working on locally runnable LLM that has been fine tuned to focus on code should be a high priority.

garblegarble · on March 29, 2023

I'm doing the same tasks but on english text, I have not found a local LLM anywhere near the summarisation ability of GPT 3.5 and 4.

That said, there are some summarisation tasks I prefer to run locally even given the massive drop in quality (e.g. using Alpaca), both for privacy reasons and to keep myself up-to-date on local LLMs.

P.S. I'm discounting older local summarisation-specific networks, I have found local LLMs to be a jump in quality over them

mightytravels · on March 29, 2023

Same question here - alpaca does not do well with long inputs :( Ideally i can throw a 100 pages PDF at it and get a summary and a response document :)

alchemist1e9 · on March 29, 2023

Can’t you just break it into chunks and the summarize the summaries?

mightytravels · on March 29, 2023

Think long lawsuits and discovery. Possible to break it up but really not ideal.

alchemist1e9 · on March 29, 2023

Yeah I think you just need a tool built with Langchain and it can do what you want.

stolsvik · on March 30, 2023

Care to share what prompts you run this with? What is the output you desire, and how do you achieve this?

alchemist1e9 · on March 30, 2023

Prompt: You are an expert python programmer. I will give you some python code below. I want you to describe the code and it’s likely purpose.

The answer will be pretty damn good , especially with gpt 4, but it’s still good with gpt 3.5 turbo. Try it out yourself with whatever code you have.

This english description of code can then be searched against. That’s the primary practical application I’m thinking about doing currently.

thelittleone · on March 29, 2023

Could envision some dystopian future where we pay for access to AI with varying tiers of training data.

bioemerl · on March 29, 2023

That's not dystopian, that's already happened and is happening

saurik · on March 29, 2023

To verify, the reason that this is not dystopian is because you are assuming we aren't already in the dystopia?

pmoriarty · on March 29, 2023

It is dystopian when the way humans are using AI is causing inequality to deepen.

teamspirit · on March 29, 2023

I'm sorry, this medical ai model only has a small training set and runs on limited resources. It may only provide a treatment with a 50% chance of survival. If you'd like, you can apply for a loan for our MedAI 3000 that creates custom drugs to target your child's cancer.

We're headed to Elysium and that's dystopian.

Tepix · on March 29, 2023

Yes, i haven't seen any fine-tuned LLaMA-65B model so far unfortunately. I guess the cost is a bit high. Perhaps with LoRa someone will do it.

SilentM68 · on March 30, 2023

Might want to check out these guys as well: https://cocktailpeanut.github.io/dalai/#/

Tepix · on March 30, 2023

That's for CPU (and they only provide 7B and 13B).

Rzor · on March 29, 2023

Keep an eye on these guys: https://github.com/ZrrSkywalker/LLaMA-Adapter/issues/2

saurik · on March 29, 2023

maybe https://huggingface.co/chavinlo/Alpaca-65B/tree/main

via https://github.com/antimatter15/alpaca.cpp/issues/124#issuec...

? (I have not tried this yet.)

m3kw9 · on March 29, 2023

How does training on just 800k pieces of data need 7b parameters?

comex · on March 29, 2023

Because it’s fine-tuning an existing 7B-parameter language model, not training from scratch.

sp332 · on March 29, 2023

Llama 7B was trained on a trillion tokens. The Lora is a small fraction of extra neurons that get integrated into the structure, and those are what get trained on the new data. It's like fine-tuning but takes less RAM and compute than retraining the whole model.

KRAKRISMOTT · on March 29, 2023

The chinchilla formula demands a 20:1 ratio

airstrike · on March 29, 2023

"demands" is great

KRAKRISMOTT · on March 29, 2023

Hey I didn't invent Chinchilla, blame google for that.

adt · on March 29, 2023

DeepMind

(apparently there's some friction between those two!)

pmoriarty · on March 29, 2023

Is it possible to use AI's to clean training data?

bitshiftfaced · on March 29, 2023

The Alpaca folks used GPT to generate training data. Yeah you can also use it to find issues. It's not perfect, though. What's interesting is the idea of training an LLM, using it to improve the training data, train a better LLM with that, and repeat.