It’s a bit of a weird shower thought but basically I was wondering hypothetical if it would be possible to take data from a social media site like Reddit and map the most commonly used words starting at 1 and use a separate application to translate it back and forth.

So if the word “because” was number 100 it would store the value with three characters instead of seven.

There could also be additions for suffixes so “gardening” could be 5000+1 or a word like “hoped” could be 2000-2 because the “e” is already present.

Would this result in any kind of space savings if you were using larger amounts of text like a book series?

  • number6
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    10 months ago

    There’s a website, I can’t remember the link, but any text you search can be found to have already been written on one of its pages.

    It’s The Library of Babel

    You can type up to 3200 characters in lower case. With a short sentence though, the “title” and page number may be longer than your original text!

    Edit: Fixed url title