It would be helpful if there were an instance that migrated all of this to Lemmy so that we could access it from any other instance, instead of having to download it for local browsing.
This was data from pushshift before Reddit nuked it in March. You can find this torrent (called “Reddit comments/submissions 2005-06 to 2022-12”) and others, including 2023-01 and 2023-02, on https://academictorrents.com by user Watchful1.
JSON compressed with zstd. You can also grab individual subreddits at https://the-eye.eu/redarcs/
What’s the context and background here? It would be nice to know what’s in some of these 4GB compressed files before downloading them.
It has json files with every written post on Reddit.
Agreed, what’s in these? Raw text? Image metadata?
Nothin’ but JSON compressed with zstd. You can also grab individual subreddits at https://the-eye.eu/redarcs/