Why did S3/object storage succeed while WebDAV apparently failed?

Admiral Patrick@dubvee.org · edit-2 8 months ago

Why did S3/object storage succeed while WebDAV apparently failed?

hperrin@lemmy.world · edit-2 8 months ago

To give you a real answer, from someone who loves WebDAV and has written a WebDAV server with an S3 backend, object storage is easier/possible to run at scale and serves a different purpose.

Object storage is and always has been based on a key-value model. You put a key and value in, and later you can request that key to get that value. It technically has no concept of hierarchy. WebDAV supports so much more than that. WebDAV has collections (hierarchy), live and dead properties (S3 has something similar to these), methods like MOVE and PROPFIND, and a system of hierarchical locking (depth 1 locking on a collection and depth infinity locking on an entire namespace).

This means that in order to build a WebDAV server, you need to know a lot of information about what exists in the data storage. S3 is a lot “dumber” in that regard. The funny thing is S3 has added functionality that essentially rewrites most of WebDAV in a more convoluted form. Whereas on WebDAV you can just propfind a collection with depth 1, on S3 you need to list keys with a prefix and delimiter, then make additional requests for any other props you may need.

Unfortunately, the one thing WebDAV is missing that users of S3 often need is the concept of partial listing. In S3, when you list keys, you tell it how many keys you want back, then it will only give you that many keys max. If there are more keys that it didn’t give you, it will tell you the results are truncated and give you a continuation token. You can use this token in your next request to continue listing keys.

This is where the “at scale” thing comes in. If you have hundreds of millions of keys in a bucket, getting them all back at once would certainly break your system, and probably would tax the server unnecessarily. So basically the answer is S3 is designed for scale.

That being said, S3 is not really designed for humans to interact with. This is where the “different purpose” thing comes in. It doesn’t have a real concept of hierarchy, just common prefixes and delimiters. So something like renaming a directory would require copying every object with that prefix to a new key, then deleting the originals (which is what my S3 adapter does for my WebDAV server). S3 is more meant to be used with something like UUIDs or hashes for keys. Keys that don’t change. WebDAV is designed more like a file system.

I hope that explains it well.

PS: Two minor corrections, WebDAV itself does not support random writes. That’s a separate RFC that’s not part of WebDAV, but is perfectly compatible, and many WebDAV servers offer that functionality. Also S3 does support random read requests via the Range header.

hperrin@lemmy.world · 8 months ago

An additional point is that CardDAV and CalDAV are both extensions of the WebDAV spec, and are widely used by a number of products, so WebDAV is definitely not a legacy spec. It’s the foundation to two very popular specs supported by billions of devices.

Björn Tantau@swg-empire.de · 8 months ago

Wait, so when I want a directory listing from WebDAV and the directory contained 1000 files, I would always have to wait for the whole thing? That explains so much.

hperrin@lemmy.world · 8 months ago

Yep. PROPFIND only has a Depth option.

Admiral Patrick@dubvee.org · 8 months ago

Thanks for the detailed reply. That pretty much answers it.

I definitely agree on the different purposes, but sadly that doesn’t help where object storage is used where it really doesn’t make sense (my org replaced their fileserver with object storage and a client sync app - grr).

WebDAV itself does not support random writes. That’s a separate RFC that’s not part of WebDAV, but is perfectly compatible, and many WebDAV servers offer that functionality

Ah, true. I was looking at SabreDAV specifically which does support it and made a leap that it was part of the spec.

Also, I am definitely going to check out your Nephele Serve project. Thanks for mentioning that.

maynarkh · 8 months ago

I don’t know much about the history, but I would guess that adoption was driven by the actual service that was provided, not how good the protocol was. AWS did their own thing instead of adopting WebDAV, who knows why. Then people started using S3 and building stuff on it since it was cheap. Now people build services that are S3 conformant so that the stuff built on S3 can be migrated to it.

This is all just an educated guess though.

redcalcium@lemmy.institute · edit-2 8 months ago

When S3 was released, the huge draw was its pay-as-you-go model, not its new protocol. If amazon was using webdav instead of making their own protocol, I bet it’ll still got popular.

maynarkh · 8 months ago

Yeah, that was kinda my point. Economics drove adoption, not technological brilliance or even ease-of-use.

key@lemmy.keychat.org · 8 months ago

S3 succeeded due to the scaling capabilities and the ability to abstract completely away from a server or disk. The straight forward Key/Value nature of the s3api was a big assistance in achieving the scaling and adoptability.

Comparing it to WebDav seems like comparing apples and… an orange smoothie.

Skull giver@popplesburger.hilciferous.nl · edit-2 8 months ago

deleted by creator

blakemiller@lemmy.world · 8 months ago

Couldn’t say for sure but WebDAV probably would be clunky if fronted by a distributed database. The beauty of S3 is you add more servers, add more disks, and bam you’ve got more S3. That happens most easily when the metadata system sitting in the front can expand easily. I don’t know how easy that would be to plumb up with WebDAV. Whether or not one was better here, S3 ultimately won because it’s a primitive API that was essentially impossible to fuck up.

Skull giver@popplesburger.hilciferous.nl · edit-2 8 months ago

deleted by creator

litchralee@sh.itjust.works · 8 months ago

I’m only cursorily familiar with WebDAV, but I think the needs of cloud storage aligned much better to the object storage model than WebDAV’s file/directory structure. For example, in a distributed cloud across continents, referencing a file in WebDAV might have a canonical path, but object storage would just need a key or hash. And by using a key/hash, automatic deduplication is achieved, since the same object should hash to the same key. File paths necessarily imply context, but the point of clouds is to be homogeneous. If paths need to be world-unique but locally-cached, then the path is just a unique identifier and we slowly end up with the database-like semantics of object storage anyway.

Phrased another way, a file/directory structure imparts an organization to the contents of those files. Cloud doesn’t need that organization, so throwing stuff in the junk drawer is perfectly reasonable.

theit8514@lemmy.world · 8 months ago

This is funny because most object storages now use keys that represents a path. For example, you can host a website on S3 with folders for js/css/etc and it “just works”.

Admiral Patrick@dubvee.org · 8 months ago

Thanks. So content-based addressing is the draw then? I guess I can see that. Unfortunately, that’s one of the things I really dislike about it (and why it feels like throwing files in a junk drawer lol).

Baggins [he/him]@lemmy.ca · 8 months ago

Dunno but I remember trying WebDAV back in the day when my webhost offered it as an alternative to FTP and I remember it not working very well for that.