> Most of the metadata activity is contained within a single shard:
>
> - File creation, same-directory renames, and deletion.
> - Listing directory contents.
> - Getting attributes of files or directories.
I guess this is a trade-off between a file system and an object store? As in S3, ListObjects() is a heavy hitter and there can be potentially billions of objects under any prefix. Scanning only on a single instance won't be sufficient.
It's definitely a different use case but given they haven't had to tap into their follower replicas for scale, it must be pretty efficient and lightweight. I suspect not having ACLs helps. They also cite a minimum 2MB size, so not expecting exabtyes of little bytes.
I wonder if a major difference is listing a prefix in object storage vs performing recursive listings in a file system?
Even in S3, performing very large lists over a prefix is slow and small files will always be slow to work with, so regular compaction and catching file names is usually worthwhile.
I guess this is a trade-off between a file system and an object store? As in S3, ListObjects() is a heavy hitter and there can be potentially billions of objects under any prefix. Scanning only on a single instance won't be sufficient.