Hacker Newsnew | past | comments | ask | show | jobs | submit | esperent's commentslogin

I'm outside the US, use Gemini models quite a bit, and I've never run into any refusals of any kind. I'm using them for a fairly wide range of things, I'm sure at least as risqué as asking how to transfer a sim. As a matter of fact I actually asked it's advice on how to transfer banking apps and auth apps from one phone about 3 weeks ago and got decent answers.

It's more dependent on the specific country they are in (and I don't know the specifics). But Google is large enough to have lawyers for every country, and Google is in a never ending whirlwind of national lawsuits/fines, so you end up at the mercy of whatever the lawyers for your country think will not piss off regulators. The EU (and individual states) have pretty heavy AI regulations, and Google even just got fined for an AI overview being incorrect.

It also could just be which way the wind was blowing for OP, the models are stochastic to some degree, but there is no shortage of complaints from (mostly euro) users getting stonewalled.


I've seen similar refusals on X from Claude from users in Germany when the LLM assumed the users are asking for something forbidden about certain topics.

Ultimately I think that in 10 years time, this is what's gonna kill paid consumer LLMs, and boost the usage of Chinese LLMs self hosted at home an your own hardware that people will torrent via VPNs, as they will also be banned because of "disinformation and misinformation".

So the end winners will be the hardware companies that will sell AI chips to consumers after the datacenter bubble pops. Unless of course the EU will ban the sale of AI chips that don't have some limitations baked in on which models you're allowed to run (the state approved ones). Interesting times ahead. I think in 10-20 years time we'll look back at present day LLMs the way we look back at the open internet of the 90's-00's.


The reason I use a VPS is to not use hosted anything, personally. So I guess whether that makes it a better product or not is highly personal.

Speaking for myself, I used $5 DO droplets for quite a while when learning but as soon as I switched to real projects and realized how quickly the price ramped up, I moved to Hetzner and the simplicity of their interface was a breath of fresh air. I saved a ton of time after switching. So to me, Hetzner has the superior product.


For the most part I agree but poor performance on network block storage is very limiting. Also there is no object store service available in Ashburn, its only in EU. My point is more that DO has a complete modern cloud platform, and Hetzner doesn't quite.

Did you make any attempt at tuning the prompt to reduce false positives? Or did you just say "find bugs"? Because if you tell it to do that, it will.

The point I was trying to make is that there will always be people reporting "critical" bugs.

> they only work if the quaternion's magnitude is exactly one

That's why you always normalize the quaternion first, and the article seems to require the normalized form:

Q.54 How do I convert a quaternion to a rotation matrix?

Assuming that a quaternion has been created in the form:

Q = |X Y Z W|

At least, I would read |X Y Z W| as meaning normalized(X Y Z W)

I don't see this notation explicitly defined when they describe quaternion normalization (Q.52) though, so I agree this leaves much out. It's more a cheat sheet than learning material.

> You need to replace sqrt(1 - a*a) assumptions with actual components, and use atan2 instead of acos

I'm kind of rusty with this, but I think the reason we don't do that is that it's cheaper to normalize then convert rather than use the non-normalized conversion formula. Correct me if I'm wrong.


> That's why you always normalize the quaternion first

Again, that won't solve the problem of floating point components not perfectly adding up to one.

Constantly normalizing quats are unnecessary performance hits and worsens your accuracy with no benefit other than being marginally "easier for low-math programmers to reason about", when you could instead just work with non-unit quats homogeneously and divide the final result with the quat square magnitude just once in the entire pipeline, or not at all and instead just pass the square magnitude as your w factor that gets divided on the GPU anyways


I'm still using node/npm and it's... fine. Every so often I read these posts and think I should change but node/npm is already a low friction part of my workflow.

However, what I have seen is that a lot of other libraries I use have switched to Bun. I haven't seen any that switched to Deno, and so I've been under the impression that Bun is becoming a strong node replacement candidate while Deno is not, or at least that the community is making a strong preference for Bun. Anyone have more insight into this?


How much does this limit what a computer can do? E.g. if I converted an Ubuntu desktop into a "secure microkernel", what functionality would be lost?

The linux programs you run expect access to everything, which is the source of the problem. It's like expecting to be able to draw a megawatt from an outlet in your house.

A capabilities based system could emulate that, and then only let the program access files you allow it any time you use the system's "powerbox" which replaces the open/close file dialogs.

From the users, and the programmers, perspective, it works the same. But if the program has a bug, or virus, or anything else, it won't get access to anything except what you allowed.

We can have easy to use, actually secure, computing for everyone.


Everything but the specific usage

But what does that mean? Can I browse a webpage, open a doc, if those are listed as specific usage? And if not, what's the purpose of this and why are people talking about it with such import?

most people dont do a lot on their machines. they have specific tasks they want to do. The idea is to isolate by default and crack open gaps by policy. You can still do 'anything' but you wouldnt want to enable 'anything' to be possible in the policy..

Sounds like security through compartmentalization is more user-friendly: You can run whatever you want and how you want it in a dedicated VM, keeping sensitive things safely isolated, without much thinking of what to enable. Case in point: Qubes OS, my daily driver. Btw it already exists and is stable.

> security through compartmentalization is more user-friendly: You can run whatever you want and how you want it in a dedicated VM, keeping sensitive things safely isolated

My brain hurts. How is a system where you can run whatever you want, however you want, but still keep sensitive things safely isolated possible?

Either you have restrictions on what you can run or access (in which case those limit sandboxed capabilities) or you have a hypothetically secure system, the security features of which you never leverage (because sandboxes have absolute freedom).

Unless you were talking about the ability to guarantee a monitor-only hypervisor or resource slice a machine into multiple tenants? (i.e. no/light touch hypervisor situations)


this relys wholly on user skill which most people will not be able to do. you need extreme tradecraft and opsec to keep really secure. any little mistaken copy between domains etc. might compromise.

This is the downside of isolation machines and their upside.

Hard to make a completely isolated machine for all workflows and keep all data at all times inaccessible for exploits. But because each user has their own ways its more potential that 'your particular way of breaking the model' is not known or exploitable (yet).

A lot of holes you open are one-time actions from within a restricted domain.

in qubes you have cross domains tools from domain0 for this, which is very hard to reach (but not impossible).

And then supplychain is also hard. Qubes have canaries, but i think most ISO people copy into their dom0 and spinnVMs off of are not doing such rigorous things. (depends what u use ofc).


> this relys wholly on user skill which most people will not be able to do. you need extreme tradecraft and opsec to keep really secure.

This depends on the chosen level of compartmentalization. For most people, it might be sufficient to store passwords in a dedicated, offline VM and do everything else in another one. This will already be huge improvement.


I'm not sure I understand your question. VMs run full operating systems on top of Xen hypervisor relying on hardware-assisted virtualization (VT-d or similar). You can run untrusted software in a dedicated VM and keep your sensitive data in another offline VM.

The dom0 has no network and doesn't manage, e.g., USB devices.


You can't have full general purpose computing on a system and perfect isolation for free.

By definition, the latter implies limits on the former.

Either you have complete freedom to run whatever you want, however you want, or you enforce limits to guarantee system behavior and enforce isolation.

And if you do the latter... then you don't have the former.


Can you elaborate? I'm not a computer scientists. In my understanding, full VMs are practically equivalent to general purpose computers. What are their limitations? Malware escapes?

Last VM escape in VT-d was discovered in 2006 by the Qubes founder, so I really feel safe on Qubes, https://en.wikipedia.org/wiki/Blue_Pill_(software)


We're talking about apples and oranges.

I thought your original point above was that VMs freed you from having to come up with policy-based isolation rules (which have always been a UX weakness of policy-based isolation systems).

The point I was making is that VMs don't provide any security guarantees unless you also use the trusted hypervisor layer to enforce something.

At lightest touch, this might be time-slicing resources and ensuring they're evenly split between VMs, regardless of what individual VMs try to do.

But to provide policy-alike granular security control on VMs, you fundamentally have to generate similar rules. E.g. network can only be used by this VM in this way, etc.

Which gets you right back to having to define policies.

From an architecture security perspective, sure, having a trusted hypervisor enforcing the rules is nice. But it doesn't fundamentally fix the problem of getting policies right... if you're trying to guarantee the same level of control.


QubesOS is not bad indeed. its not perfect (they are looking i think to replace Xen or make it much more thin layer). Its definitely the way i think if u want to retain compatibility with existing OSes/tools.

They are not looking to replace Xen. They plan to add support of KVM without breaking Xen: https://github.com/QubesOS/qubes-issues/issues/7051#issuecom...

They also plan to replace Fedora in dom0 with something minimized https://github.com/QubesOS/qubes-issues/issues/1919#issuecom.... Is this a problem for you?


> Is it fair when the one is heavily subsidized

As a consumer, yes, it's totally fair. All that matters to me is the price I pay at the pump, not whether that price is "real" or not.


On the other hand, I did just leave my pi agent running GPT 5.5 overnight on a clearly defined, long running task. It's been running about 10 hours now and it's mostly done. So this kind of use case is also valid.

Thinking about it, I would say that the majority of agentic work I do, by a long shot, is subagents which are launched from the main session, using a prompt of its choosing. Those could be considered short versions of these fully autonomous tasks.


Yes, part of the reason I chose the one-shot test was really to test long-running tasks. A lot of people seem to be experimenting with this format, for example in the now trending loop-writing workflows. And really I am interested in diving into the murky waters of these novel workflows.

Care to share more about your pi setup? I've recently started using it (after long-time Claude Code work) and was wondering how you'd achieve these long-running tasks. Do you allow it to spawn sub-agents? Thank you!

My pi usage over the past ~5 months went roughly like this:

* Install pi and a bunch of extensions from their package repo

* Realize that all the packages (with a few exceptions) are massively overcomplicated and vibe coded

* Ask pi to rebuild a very simple version of the packages I used. So e.g. subagents - all the default subagent extensions are massively complicated with named agents, recursion, communication. I made one that stripped all that out.

* Then whenever I hit an annoyance, spin up a parallel session and fix it.

It's less work than it appears because I have ~5 extensions: hooks, subagents, background processes, a custom footer, a loop command... Maybe that's it. Within a couple of days you can have a setup pretty close to Claude Code but with a fraction of the base context use. After gradual improvements over a few weeks/months you'll have a system far better, tuned to your exact preference.

Of course, just like Linux or any other highly tunable system equally important is having the restraint to not spend all your time tuning it. I've definitely had a couple of days where I was bored with my real work and did that, but whatever, it beats browsing reddit.

As for getting long running tasks, I set a looping message every ~20m and tell the agent to strictly track progress in a session doc, then reread and continue after each compaction.


What type of task are you running for ten hours? Is this a programming task?

I've not come across a programming task that would take an LLM ten hours.


There's quite a few tasks I've found that work like this, although if course most tasks don't and require a much higher degree of interaction. The prime examples are read only audits of very large codebases, and that's what I was was running overnight. One file per subagent, each subagent writes a report with recommendations. Since it's pi and the subagents have very now scope, looking at them they ranged between 7-40k context use per subagent. I've found codex maxes out at about 50 concurrent subagents before I start getting rate limited, so the coordinating session is instructed to run them in batches of 50. My subagent extension is set up to make this as efficient as possible, the subagents can share a prefix and suffix prompt then a list of name + specific prompt in json format.

Overnight it ran ~800 tiny auditors. I then run synthesis on the written audit files, extract bugs, then another round to find which ones have a common source, group them by priority etc.

I've cautiously started doing larger tasks that are not just read only, for example I was dealing with a large codebase full of lint and type errors, so I sent out waves of workers with clear instructions to only fix obvious/trivial issues to and otherwise to append to a todo file for my review. That worked well and cleared a few thousand issues over several hours.

I don't really want to share any other tasks I've worked on this way because it'll draw out the agentic coding sceptics and I'm not interested in defending my workflow.


I'm not the person you asked, but if they're running in their own local hardware, then it might just be a lot slower than what the big providers run their models on. System RAM is a lot cheaper than VRAM, especially if you bought it last year.

They said it’s GPT 5.5.

I'd like to study your setup. Would you be willing to share? Perhaps a github repo of your 5 extensions or even a pastebin if you would be so inclined. I would be grateful to learn more about this by studying from your success...

I might share it at some point but I think it's quite similar to a lot of others out there, except that it's very specific to my personal projects and goals. If I shared it I'd need to spend at least a while cleaning up and improving docs.

It's one of the reasons I suggest you study the famous setups (oh my pi, or superhuman skills etc.) and convert them to your personal needs.


So what's the solution here? Keep the cache artificially warm for even obscure routes?

There may not be a solution in every case, but it's a reminder that dashboards & metrics are no replacement for actually talking to your users. Metrics are at best a proxy for user experience, don't let them be the tail that wags the dog.

Like the story Bezos told of his execs claiming call wait times were under 1 minute, so he called the service line from the conference room on the spot and made everyone sit there for 10 minutes waiting to get thru..


Ok, that's a fun anecdote and I agree it has real world value to think that way. But it doesn't answer my question - this thread is full of people pointing out problems but nobody offering solutions. So I was asking, specifically in the case of an app with a very long tail of cache misses, what's the solution? Do you have to keep potentially millions of routes artificially warm?

Apparently I'm either a child voice actor on the cartoon Bluey or an adult film actor. Those were the most interesting anyway. But all of them were hallucinations. The most interesting thing about this, IMO is that none of the models could simply say that they didn't recognize the name.

I was told I had a roll on "The Walking Dead". Interesting that it decided I was an actor, but didn't list the one adult film that somebody with my name was actually in.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: