More

belladoreai · on Jan 6, 2025

classic neal.fun

belladoreai · on June 9, 2024

Yep, it's super powerful.

I would say that the "more secure way" is to just use ComfyUI without installing any obscure nodes from unknown developers. You can do pretty much anything using just the default nodes and the big node packs.

belladoreai · on June 9, 2024

Well, Docker is great for this as long as you're not one of the unlucky few whose machine is bricked because of Docker. So, mostly yes, I suppose.

lyu07282 · on June 9, 2024

What does that even mean?

belladoreai · on June 9, 2024

"Bricking" is when your electronic device stops working, i.e. becomes a brick. Docker is known to occasionally brick Windows machines.

jiggawatts · on June 9, 2024

Wait… what!?

This is the first I’m hearing of this. Do you have any references?

belladoreai · on June 9, 2024

You can find many references by googling some variations of keywords Docker, Windows, brick

ASalazarMX · on June 9, 2024

Googled that, thanks for not providing clear references to your claims, and found that docker can crash Windows on boot, but not "brick" it. People are still able to safe boot, run system recovery/restore, or even reinstall Windows if they choose.

Besides, bricking software is impossible, bricking refers to physical devices unable to bootstrap anymore.

Timber-6539 · on June 10, 2024

Not exactly. Hard brick is what you are referring to where you need to repair/reset the hardware OEM after corruption.

A soft brick is the actual reference here where you can easily recover from software/re-install.

justinclift · on June 9, 2024

Docker itself doesn't seem to have the best quality control for their official releases, so blindly upgrading Docker will likely bite you in the ass if you do it for a few years. :(

belladoreai · on June 9, 2024

I have not seen a statement from Nullbulge so it's not appropriate to say that they took over the repo.

The author of the repo is claiming that their repo is hacked, but this is an obvious lie, because their very first GitHub commit is the one where they push the malware. Nobody would hack an empty GitHub account.

I don't know if the author of the repo is lying when they say that Nullbulge is behind the attack (perhaps the author is part of Nullbulge, perhaps not).

millzlane · on June 9, 2024

I wouldn't be so sure no one would hack an idle account. I had my Spotify account taken before I even used it. I think in my case they used my account to pump up other lesser known artists.

belladoreai · on June 9, 2024

Okay, sure. But if we have an account which has never had any legitimate activity on it ever - an account that has only ever been used to push malware - then I don't know if it matters much who is the "rightful owner" of the account. Things would be different if the GitHub account had some legitimate activity before the "hack".

millzlane · on June 9, 2024

I agree it doesn't matter much. Could be a noob mistake by the account owner and this is damage control.

janoc · on June 9, 2024

There was also an actively exploited XSS vulnerability on Github in the recent days.

Doesn't mean that this guy was not a malicious actor, only that one shouldn't be so quick to cast stones without evidence.

belladoreai · on June 9, 2024

The person who created the custom node is the same person who "hacked" it. Whether or not the account is technically owned by some unrelated civilian is not important, because there is no other activity on the account.

belladoreai · on June 9, 2024

It was an extension for ComfyUI, which has 37k stars on GitHub. The way ComfyUI is commonly used is that a person shares a "workflow" file, which utilizes various obscure extensions (called "custom nodes") and then the people who want to run the workflow on their own computer will install all these obscure custom nodes that have like 40 stars on GitHub or so.

szundi · on June 9, 2024

Just like an npm install

belladoreai · on May 13, 2024

This is not true at all. I'm active in multiple NSFW AI Discords and Subreddits, and looking at the type of material people engage with, almost all of it is very clearly targeted at heterosexual men. I'm not even aware of any online communities that would have NSFW AI stuff targeting mainly female audience.

astrange · on May 13, 2024

Women aren't in NSFW discords and subreddits - as you probably know, any "topical" social media forum of any kind is mostly men.

They're using Replika and other platforms that aren't social. When they do use a social platform it has more plausible deniability - book fans on TikTok is one, they're actually there for the sex scenes.

belladoreai · on May 14, 2024

"The invisible spaghetti monster exists! It's just invisible so you can't see it!"

Where's the evidence that there's any significant use of NSFW AI by women?

astrange · on May 14, 2024

Replika's PR where they say their userbase is 40% women.

AI_beffr · on May 17, 2024

the guy who strawmans spaghetti is all of a sudden very quiet

belladoreai · on May 22, 2024

Well, I asked for evidence and nobody provided any.

AI_beffr · on May 14, 2024

this comment makes me laugh

belladoreai · on April 22, 2024

If you need to work with multiple LLMs, you probably want to use transformers.js

mrbishalsaha · on April 22, 2024

Isn't it to much for just calculating the number of token?

belladoreai · on April 23, 2024

It's the best option you have if you need to work with multiple LLMs in the browser.

belladoreai · on April 22, 2024

> I'm not sure it's working correctly, I entered the word "what" and it says "4 characters, 3 tokens", I type a space and it says "4 tokens" - shouldn't it just be 1 token? and the space shouldn't count in this case?

When you enter the word "what", the 3 tokens were: start-of-string token, the token "what", and end-of-string token. I made a change now to hide the special start-of-string and end-of-string tokens so that the visualization is a bit simplified.

Adding a space to input changes the tokenization of the input. Sometimes the resulting token count is the same (if the space is merged into some other text), sometimes the resulting token count increases by one (if the space does not get merged).

That part of the tokenizer is working correctly.

> Also occasionally a space appears as a capital G (in Chrome)

Fixed, thanks for reporting! This is a fork of my earlier tokenizer for LLaMA 1 and the demo visualizer had special handling for tokens 0-256 in LLaMA 1. This LLaMA 3 tokenizer doesn't have same special tokens, so some tokens would be visualized in a weird way (like that G thing you reported). I removed that special handling now and it fixed the visualization issue.

> Question: Is there a special ruleset that llama3 follows that other LMs don't as far as what qualifies as a token?

Different models use different tokenization schemes. Most models use some kind of variant of Byte Pair Encoding, trained with their data (the tokenizer itself is also trained, not only the language model).

_akhe · on April 22, 2024

Hm I had not heard of tokenizing like that, typically it's just words or occasionally a word + some adjacent stuff like a punctuation or space. "What " might be a different token than "What" but the total token count shouldn't increment, would just be a different token, right?

> Different models use different tokenization schemes

Curious then why this is called "LLaMA 3 tokenizer" what does it have to do with llama3?

belladoreai · on April 22, 2024

> "What " might be a different token than "What" but the total token count shouldn't increment, would just be a different token, right?

The input string "What" (without trailing space) tokenizes into 1 token. The input string "What " tokenizes into 2 tokens. In theory, one might have a tokenizer that would simply tokenize "What " into a single token, but the actual tokenizers we have will tokenize that into at least 2 tokens.

> Curious then why this is called "LLaMA 3 tokenizer" what does it have to do with llama3?

When you input text into any of the LLaMA 3 models, the first step in the process is tokenizing your input. This library is called "LLaMA 3 tokenizer", because it produces the same tokenization as the official LLaMA 3 repo.

When I said that different models use different tokenization schemes, I am talking in comparison to other models, such as LLaMA 1, or GPT-4. Different models use different tokenizers, so the same text is tokenized into different tokens depending on if you're using GPT-4 or LLaMA 3 or what not.

_akhe · on April 22, 2024

Thanks for clarifying, this is exactly where I was confused.

I just read about how both sentencepiece and tiktoken tokenize.

Thanks for making this (in JavaScript no less!) and putting it online! I'm going to use it in my auto-completion library (here: https://github.com/bennyschmidt/next-token-prediction/blob/m...) instead of just `.split(' ')` as I'm pretty sure it will be more nuanced :)

Awesome work!

_akhe · on April 23, 2024

Well I installed your npm and tried to integrate it, but no matter what every token is always " word" with a leading space, and it's isolating foreign symbols as standalone tokens. I tried different options to strip those or to not include preceding spaces but it's always that way. It's probably how llama3 tokenizes text but I can't get use out of it for my autocomplete library unfortunately. I would need more-or-less the tokens to be words or occasional phrases.

I really love that it is 0 deps and that you provided the npm, and would love to defer this part of my work to an efficient library like this.

belladoreai · on April 23, 2024

I don't think I really understand your use case.

My library solves the following problem: how to tokenize text in a way that is compatible with llama3.

If you don't have any particular constraint (as in "tokenize text in a way that is compatible to model X"), then you can just write your own tokenization that tokenizes the text however you want. It doesn't really make sense to use a complicated tokenization scheme from some LLM model if you don't need to be compatible with that model.

If you really want each word to be its own token, you can easily do that by just splitting on whitespace and punctuation (though that will lead to a huge vocabulary).

belladoreai · on April 21, 2024

GitHub link: https://github.com/belladoreai/llama3-tokenizer-js

belladoreai · on March 16, 2024

If you're an artist who needs to eat food in order to live, this "payment system" does not facilitate your ability to achieve that.