cross-posted from: https://discuss.online/post/5484255

February 22, 2024 Bluesky writes:

Up until now, every user on the network used a Bluesky PDS (Personal Data Server) to host their data. We’ve already federated our own data hosting on the backend, both to help operationally scale our service, and to prove out the technical underpinnings of an openly federated network. But today we’re opening up federation for anyone else to begin connecting with the network.

The PDS, in many ways, fulfills a simple role: it hosts your account and gives you the ability to log in, it holds the signing keys for your data, and it keeps your data online and highly available. Unlike a Mastodon instance, it does not need to function as a full-fledged social media service. We wanted to make atproto data hosting—like web hosting—into a fairly simple commoditized service. The PDS’s role has been limited in scope to achieve this goal. By limiting the scope, the role of a PDS in maintaining an open and fluid data network has become all the more powerful.

We’ve packaged the PDS into a friendly distribution with an installer script that handles much of the complexity of setting up a PDS. After you set up your PDS and join the PDS Admins Discord to submit a request for your PDS to be added to the network, your PDS’s data will get routed to other services in the network (like feed generators and the Bluesky Appview) through our Relay, the firehose provider. Check out our Federation Overview for more information on how data flows through the atproto network.

Read Early Access Federation for Self-Hosters

  • owls@community.yshi.org
    link
    fedilink
    arrow-up
    0
    ·
    9 months ago

    not the OP, but

    i was going to point to the did:plc thing they made up and went with. but since the last time i looked, it looks like they support (and prefer) did:web, so that’s sorted out.

    the “wtf” i have is more to do with actually running a community with atproto. you need a central crawler service that knows about all the PDSes you want to be friends with (this is presumably why you need to sign up in their discord right now, they gotta tell their crawler to look at your PDS)

    but i think the crawler has to grab the entire PDS for everyone it knows about? if you want a large community, it seems pretty resource-intensive since the ceiling is “infinity posts”. bsky’s open source code suggests they have chosen not to deal with this problem.

    with most AP services (e.g. mastodon), you can prune the data and the only consequence is “you don’t get full text search for super old posts received from other services that we pruned”, so there are ways to limit the cost other than “limit the number of users in our community”.

    but this may just be an implementation detail and not an issue with atproto, e.g. git shallow clones are a thing, and the PDS is also storing a big merkel tree. i am not sure if the indexer relies on having the complete history or not (since you do need it for certain operations). bsky’s own code just shrugging suggests maybe limiting it is challenging, i dunno.

    • Elle@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      9 months ago

      What you note about the crawler appears to be essentially by design:

      The federation architecture allows anyone to host a Relay, though it’s a fairly resource-demanding service. In all likelihood, there may be a few large full-network providers, and then a long tail of partial-network providers. Small bespoke Relays could also service tightly or well-defined slices of the network, like a specific new application or a small community.

      In their section on so-called “Big World” design:

      As a result, we opted to architect the protocol in a “big world with small world fallbacks” way. With the web, individual computers upload content to the network, and then all of that content is then broadcasted back to other computers. Similarly, with the AT Protocol, we’re sending messages to a much smaller number of big aggregators, which then broadcast that data to personal data servers across the network.

      Emphasis mine.