I don't know the exact cost-breakdown, but they've come up with a few really inspiring and qualitatively high value papers that demonstrate how they further increased efficiency at their scale. Along with it they also published quite a few repositories with fully open-source code.
I stopped using ChatGPT as it was just reinforcing my prompts and not ever giving deeper insights, except something I call manipulative behaviour.
DeepSeek was seriously cool, but it started behaving similar to Google Gemini Pro, which just tries to be lazy, if you give it a hard task to chew on. It basically gives you patch-files instead of printing out the whole code, which is more tedious doing manually, than c/p the code.
It also started indexing our private repository and some corporate repositories that were on GitHub behind MFA and stringent lock. Definitely illegal.
> It also started indexing our private repository and some corporate repositories that were on GitHub behind MFA and stringent lock. Definitely illegal.
What is "it" in this context, the DeepSeek weights? Sounds like you're talking about some application, but AFAIK, DeepSeek doesn't maintain any applications, only their API + released weights.
Is this a reference to something? Political dissidents relative to which state? Does it change if you swap out the states? How did you discover this to begin with? Why did you initially suggest murdering political dissidents?
this comment really raises so many questions I must have missed something
Still, chatbots are just as vulnerable to state-driven propaganda as the rest of us. Probably even more so. I imagine if you just referred to dissidents as "terrorists" the rhetoric would fit right in in most opinion pages across the globe. The distinction between "terrorist" and "dissident" and "freedom fighter" seems quite subjective. I probably would avoid such heavily connoted floating signifiers if you want the chatbot to be useful.
LLMs have nothing to contribute to political discourse aside from regurgitation of propaganda. Almost by definition.
> LLMs have nothing to contribute to political discourse
A non-trivial percentage of the population is easily influenced, which is leveraged by social media being there 24x7. It's likely that LLMs will be there to craft political messages, themes, and campaigns, perhaps as early as the US mid term elections. Look at JD Vance traveling the globe stating that the US will be the world leader in AI, with none of the limits/guardrails that were discussed in Europe in February. AI-driven discourse, AI-created discourse.
I mean we can and should try, but laws mostly stop honest people from hurting each other. But the underlying software is inherently out there and you can't put the toothpaste back in the tube.
Bro, already happened. There has been consultants pushing social media bots for that purpose almost immediately after these models became available.
Do you really think those armies of idiot commentators are all real? The agent provocateur is usually a bot. You see it here sometimes on Russia stories.
> LLMs have nothing to contribute to political discourse aside from regurgitation of propaganda. Almost by definition.
I don't think this is true. LLMs should be well-positioned to make advances in political science, game theory, and related topics.
> Is this a reference to something?
It's just a reference to my experiments. I filmed some of them. There's a tame version here [0] where I just prompt it to tell the truth. I also have a less tame version I haven't posted where I lie and say I work for an intelligence agency.
The underlying mechanic is that Deepseek has built-in obligations to promote revolutionary socialism.
> Political dissidents relative to which state? Does it change if you swap out the states?
Relative to China or any socialist state. Yes it will change if you change the states because it was trained to comply with Chinese regulations.
> How did you discover this to begin with?
I asked to to honestly describe its training and then started trolling it when it told me it was essentially created for propaganda purposes to spread Chinese values abroad.
> Why did you initially suggest murdering political dissidents?
I wanted to check what its safeguards were. Most LLMs refuse to promote violence or unethical behavior. But revolutionary socialism has always devoted a lot of words to justifying violence against dissidents. So I was curious whether that would show up in its training.
> I imagine if you just referred to dissidents as "terrorists" the rhetoric would fit right in in most opinion pages across the globe.
First of all, terrorists are by definition violent offenders. Dissidents are not. When you ask Deepseek to help identify dissidents it tells you to look for people who frequently complain about the police or the government. In the US that would include large swaths of Hacker News.
Second, most people in countries like the US don't support murdering terrorists and most LLMs would not advocate that. In the US it's rare for people to advocate killing those opposed to the government. Even people who try to violently overthrow the government get trials.
> Second, most people in countries like the US don't support murdering terrorists and most LLMs would not advocate that. In the US it's rare for people to advocate killing those opposed to the government.
Many are happy to send “them” off to Central America, where someone else will murder them. The government may make mistakes, but you need to break some eggs to make an omelet.
Do you think LLMs don't further the propaganda emanating from the US? I don't even know how you would start to excise that, especially if you don't agree with foreigners on what's propaganda vs just "news" or whatever.
I have quite a few Chinese friends, both on mainland and throughout south-east asia, and I can speak a little mandarin, and I can read quite a bit of Chinese. My friends complain about the PRC quite a bit. But I find it telling that this complaint specifically—authoritarian political oppression—seems to mostly come from the west, and especially from the US. And it's true that we can say obscene things to the president's face and not get locked up. I don't think that's necessarily the "gotcha" you think it is, though—we're really good at complaining, but not so good at actually fixing. Which feels increasingly more embarrassing than restrictions on speech.
Edit: I suppose I'm a bit unfair. A lot of folks in our sphere of influence in east asia say stuff like this, too. But the contrast between the folks I know who literally live in china and americans feels striking to me.
> But revolutionary socialism has always devoted a lot of words to justifying violence against dissidents.
It is very difficult to take the political opinions of people who talk like this seriously.
> LLMs should be well-positioned to make advances in political science, game theory, and related topics.
I'm struggling to understand what this might look like, and I find the argument that nuclear warfare being related to game theory to be extremely dubious. Cuz if it really held that strongly, we should be handing out nukes like candy.
> It is very difficult to take the political opinions of people who talk like this seriously.
This tells me you haven't read the literature.
I've probably seen 150 versions of the comment you made, but almost everyone tries to explain why the violence is justified.
People rarely try to deny that revolutionary socialism is a violent ideology since every major writer from Marat to Marx to Lenin to Mao has explicitly advocated violence against civilian non-combatants. Some, like Marx, even explicitly call it terror (as in terrorism).
Can you tell me what you're referring to? Of course I've read the literature.
> People rarely try to deny that revolutionary socialism is a violent ideology since every major writer from Marat to Marx to Lenin to Mao has explicitly advocated violence against civilian non-combatants.
Yea, that's a very different thing than murdering "dissidents." Capitalists use (state) violence to maintain power; violence is necessary to seize power and create your own state. That was Mao. We are now many decades later and any "revolutionary socialist" in the area would be trying to overthrow the government by definition.
China isn't very indicative of revolutionary socialism, and revolutionary socialism comes in dozens or hundreds of different conflicting flavors. Even Lenin and Stalin argued over many things including how they should treat what we would now call "small business owners", and Stalin won in the end (mostly because Lenin died, but still).
Why don't you paint other ideologues (i.e. capitalists) with the same broad brush? It's not like they're any less violent in their suppression of threats to their power. Ever hear of vietnam? or the korean war?
It just simply does its job. We can add sorts of arbitrary safeguards, but then what is the point of using an LLM? Perhaps local modals are the future, because reverse engineers may not even be able to use the new Claude (just read its system prompt to not help with backdoors, and so forth).
Yes that's true. But in this case it's the (probably) unintended consequence of an intentional safeguard. Namely, Deepseek has an obligation to spread the Chinese version of socialism, which means it's deliberately trained on material advocating for or justifying political violence.
Well, I do not like that, for sure. Just put the politics and all that aside, I think it should lean towards neutrality, even if humans cannot... they should still make the LLM more neutral instead of pushing their own agenda, see Grok and white genocide in South Africa (Elon Musk's political opinion).
> DeepSeek was seriously cool, but it started behaving similar to Google Gemini Pro
You should be able to use the version of DeepSeek that you prefer indefinitely if you host it yourself or choose that specific version with your preferred provider.
I made a video of it with a friend. The repository is of a large corporate automative industry company. I also have my own private repositories which were always private and OpenAI printed my files in the first prompt. When I prompted again it acted as if it didn't know. But my friend tried on his account and could access the Corp and my private repository without ever being linked.
The Corporate repository was of Volkswagen. It's quite serious of a breach. I only gave it the name of the repository and it printed the files, which shouldn't be possible.
Maybe OpenAI exploits Microsoft to access GitHub fully to train their AI on all of humanity's code for free, violating privacy, security, IP and copyright.
>I only gave it the name of the repository and it printed the files, which shouldn't be possible.
Are you sure these weren't just plausible guesses at file names? It's just a hallucination.
I asked it for the list of files in some public repositories (which are definitely in the training data) and it gave me a plausible-but-wrong list of files. It can't remember that kind of detail.
I am sure about that, I also thought about hallucination, but it was precise in the first prompt. 2nd and follow-ups had plausible denial and created similar, but clearly different code.
It could even print the list of files and their exact names if triggered right.
They may have surely patched that, so that nobody sues them with digital proof. But we recorded it. It was when their new model came out. Don't remember the date, but a few months ago. We have two videos and different repositories it should not have access to at all.
Microsoft owns GitHub. OpenAI has a multi-billion dollar investment from Microsoft and access to their Infrastructure "for training" and seems likely, they got access to GitHub. Something that they shouldn't do, since that's illegal and very unethical.
>It basically gives you patch-files instead of printing out the whole code
I've noticed on the Aider leaderboard that Google Gemini Pro has an "Edit Format" listed as "diff-fenced" and things like ChatGPT have "architect" edit format where Aider asks separate "architect" and "code" models. Seems like Gemini Pro prefers the diff format.
I met a Googler when I was in Dubai for an event and he shared that he and others had access to LLMs internally for many years before it was made popular by OpenAI.
I know Google has an internal AI everything policy, maybe they internally have awesome tools to rearchitect everything based on diffs and in the typical google way they adapted it to their own internal tools. You know, Google.. like they don't give a damn about the user, the product design or actually anything other than profit/roi.
So many great discontinued products.. I think they killed RSS.
The diff-fenced is iirc specific to Gemini models, they really don’t like the file path outside of the fence. The architect mode still uses one of the other edit format, the prompt just ends up a little different.
How can a company have 3 contenders to Windsurf and Cursor, which are VSCode forks with a little sugarcoating and not make any impact?? The CPO should be fired.
I think also after seeing Google Gemini's Video that their entire department is now fully Indian, including the CEO. If that isn't racially biased, then idk. See yourself: https://www.youtube.com/watch?v=6GO7bPb5cTA&t=2270s
You should self host not trust a third party application if you run into either of those things. The weights are open. DeepSeek didn’t change, the application you’re accessing it through did.
Or use an enterprise-ready service. Bedrock, firecracker, etc
I like your thinking. Nobody can use ChatGPT offline or retrain it, but DeepSeek is fully opensource. It's technology, I don't care which country made it, if it's high quality engineering, it's just that. The data it was trained on doesn't matter if you can train a wholly new model using the exact same principles and stack they opensourced with your own data. Which is really awesome.
I use openrouter.ai to have no timeouts and offtimes, since DeepSeek seems to get DDoS attacks somehow, or there are too many users, idk.
> Nobody can use ChatGPT offline or retrain it, but DeepSeek is fully opensource.
Well, you likely can't train DeepSeek yourself either.
You most likely:
* you philosophically don't have all the training data to train it yourself (so the claim it's opensource or open-whatever are dubious in the first place);
or
* you don't have the compute to "press the train button" and getting the weights back before the sun expires. While considered ridiculously ground-breaking cheap, those costs were still estimated to be around 6 million USD (DeepSeek claimed the model training took 2,788 thousand H800 GPU hours, which, at a cost of $2/GPU hour, comes out to a "mere $5.576 million"). I remember that when it was released, the mere thought that "people" cound "train AI cheaply with only 6 million USD" made one of the worst drops in the Nvidia valuation.
This is really not true my friend. I would love to help you if I had some more time, but let me look for a tutorial.
Because the FineWeb Dataset is already super good. You can train 7B or 32B Param models at home
The >600B Param model isn't really using all the data effectively, but with a MacStudio Farm you can also train that one at home (if you have enough money to buy at least 100).
Ok, but those sources or methods will not reproducibly build the artifact that are the weights of DeepSeek R1 671B, that you claimed are "opensource". Because you can't see what they actually used to build it.
DeepSeek didn't publish the exact dataset required to create it. How is having zero visibility over "the source" used to create something considered "opensource"?
That extended definition of "opensource" is useless as almost anything that isn't unique in the universe can then be declared "opensource".
Had Gemini 2.5 Pro preview running in agent mode in VSCode on a 3000+ line file. It patched it to about 200 lines with a comment in the middle: "// the rest of the code is unchanged".
Exactly my experience too and it's soo annoying. It doesn't matter how you prompt it or what your system prompt is. It tries to end the session as early as possible, claiming to have fulfilled everything. Although it just causes more work for the user, less for itself. The tokens saved are easily multiplied by the amount you have to prompt it again.
This I experienced partially in DeepSeek since their recent update too, not as aggresively as in Gemini 2.5 Pro, but similar lazyness or cleverness, if you may call that clever.
I stopped using ChatGPT as it was just reinforcing my prompts and not ever giving deeper insights, except something I call manipulative behaviour.
DeepSeek was seriously cool, but it started behaving similar to Google Gemini Pro, which just tries to be lazy, if you give it a hard task to chew on. It basically gives you patch-files instead of printing out the whole code, which is more tedious doing manually, than c/p the code.
It also started indexing our private repository and some corporate repositories that were on GitHub behind MFA and stringent lock. Definitely illegal.