Hacker News new | past | comments | ask | show | jobs | submit | half-kh-hacker's comments login

you probably want to look at their 'libmalloc'

"climate change advocate", in contrast to some "climate change opponent"?


Yeah, this sounds like someone who doesn't realize that climate change is a real effect driven by CO2 emissions, regardless of the precise economics of renewable energy.


An Advocate/activist is someone who puts politics and policy ahead of science and economics, and it isn't limited to climate change either.

For example, it was gay advocacy/activists not heath sciences professionals who made sure more money was spent on Aids/HIV than all childhood diseases combined.


plainly, free our feeds are grifting.

the relay at this point is non-archival and can be spun up trivially. with a small sliding history window for subscriber catchup u can use like 32gb of scratch disk space and keep a few hours, the relay is literally just a subscribeRepos forwarder from PDSes.

the AppView is vastly more expensive to run since you need to handle the write volume of all bsky activity. if you build a non-bsky app on atproto this is a non-issue

the issue here really is that nobody writes about the state of things in long form outside the network so it's not really known how fast things move and change by those not engaged with the platform


Re: the relay, that depends on your needs. My impression from these sorts of "run a relay on an RPi" projects is that they're only dealing with a subset of the full firehose Bluesky's relay has to deal with - be it a shorter timeline (as you mention) or only concerning themselves with specific accounts (like the relay operator's own "following" feed, in the case of someone running a personal relay) or what have you. Pretty sure even Bluesky's relay doesn't try to drink from the whole firehose (or if it does, it's tolerant of "dribbling" so to speak; I recall a Bluesky dev blog post about how temporarily dropping posts from users' feeds is acceptable if the relay can't keep up).

Re: write traffic, my understanding is that the appviews shift most (if not all) of that burden to the PDSes, no?


- the relay storage volume scales only with the backfill window for consumers that drop briefly - the bluesky pbc operated relays let you reconnect up to 24h later and not miss any events but that requires around 200gb of scratch disk space -- live tailing an rpi relay without dropping a connection can give you events from the full network span (ie the complete set of the firehose) without requiring any backfill window, but it's nice to use a few tens of gigabytes anyway. -- the full firehose is like 20mbps at maximum so it's far from hard to serve a few live consumers

- bluesky's feed gen post-dropping is about internal operation of their appview and not anything to do with network sync semantics

- if you're running an AppView for the bsky data you are likely keeping a copy of all bsky posts in a database, since fetching from PDSes on-the-fly is network intensive over a relatively small pipe, which is what i mean by write volume requirements.


> the relay storage volume scales only with the backfill window for consumers that drop briefly […] bluesky's feed gen post-dropping is about internal operation of their appview and not anything to do with network sync semantics

Gotcha; thanks for the clarifications/corrections. Good to know that the firehose bandwidth is a lot less than I thought (though 20Mbps can certainly add up to some hefty pricetags depending on how you're billed for traffic).

> if you're running an AppView for the bsky data you are likely keeping a copy of all bsky posts in a database, since fetching from PDSes on-the-fly is network intensive over a relatively small pipe, which is what i mean by write volume requirements.

Right, but how much of that actually needs to hit the disk? I'd imagine most appviews can readily get away with just keeping posts in RAM, and even if disk storage is desired (e.g. to avoid needing to pull everything from the PDSes if an appview server reboots), it ain't like the writes need to be synchronous or low-latency. A full-blown ACID-compliant DBMS is probably overkill.

It'd also be overkill to cache all posts, rather than subsets (e.g. each users' "Discover" and "Following" feeds), so I reckon that'd also reduce the in-appview caching needs further.


the reference bluesky backend does just keep everything around but this idea has merit!! you're actually reinventing something like AppViewLite right now, which does throw away old data: https://github.com/alnkesq/AppViewLite

bluesky chooses to not refetch data from PDSes all the time so that the load for a PDS stays low (they like it to be possible to run on a home connection)


this is not correct


that's not the appview, that's the client


App View is a bit fuzzy of a term. To me it seems like a combination of frontend, backend, custom lexicon, and supporting services. There isn't really another place in the spec or design where clients or browsers fit in, which do in fact provide a view of the network via an app.


"UI" is part of the definition they give in the glossary

https://atproto.com/guides/glossary#app-view


the backend (the AppView) can be found here:

https://github.com/bluesky-social/atproto/tree/main/packages...

there are various supporting services written in Go as well

https://github.com/bluesky-social/indigo


you can run a resequencing relay good enough to feed an atproto appview pretty cheaply - it just needs to subscribe to a few thousand websocket endpoints to get a live tail of the whole network (incoming traffic is well under 50Mbps at peak in my experience)

running an archival mirroring relay is storage-intensive (on the order of tens of TB iirc?) but only serves as an optimization (you can backfill full atproto repositories straight from the relay instead of needing to reach out to the relevant data server)


one cool difference between Bluesky and Mastodon (et al) is that server choice on registration is not an immutable permanent decision and you can choose to seamlessly migrate at a later time by updating your DID document

so the slick registration flow nets you less lock-in compared to if e.g. the mainstream Mastodon app were funnelling users onto one megainstance, since you can still get away afterwards without needing people to re-follow you


gay online sociolect


>gay online sociolect

the vibe indeed


What else would "self-hosting all of Bluesky" mean other than a copy of the entire site? If you just want to participate in the network host a PDS, which only stores your own posts.


Surely there's some middle ground between only hosting your own data and being reliant on another site to keep track of your following / followers and hosting a duplicate copy of the entire network?


For sure. If you just want to host your own data, you can do that. A PDS for you and maybe some friends is very small and cheap to host.


My understanding though is that having a PDS on its own is useless without an AppView to collect the data from the relay? Or am I misunderstanding the architecture here? https://docs.bsky.app/docs/advanced-guides/federation-archit...


I'm talking about the case where you wanted to run your own PDS and use all of the other infrastructure being run by Bluesky.

If you fully want your own copy of everything, then you'd want to run a copy of everything. But you don't have to. It really depends on what your goals are. That's why the post is about the maximal scenario. "Just your own PDS" is the minimalist scenario. But I think it's the one that makes sense for 95% of users who want to self-host.


Right, and I'm saying "surely there must be a middle ground between "using all of Bluesky's infrastructure" and "having a 4.5tb copy of every post ever made on the network""


What exactly would that be?

I feel like the middle ground your talking about could be just a feed?

A feed is: a server that consumes the firehose and decided on whether to store posts, when loaded in the app it returns some post to create a feed

So essentially you only store references to part of the network rather than storing the whole thing


consider the nostr protocol


Your following list is stored in your own repo, so it lives on your PDS. You can theoretically have partial replicas of the network but nobody has bothered yet; if you want to make software like that, a good start would be subscribing to the firehose and filtering down to DIDs you care about / supplying the watched DIDs parameter to a Jetstream instance


The middle ground you're looking for is impossible in the AT protocol, it is however what the Nostr protocol is aiming towards.


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: