• affiliate@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    ·
    4 hours ago

    from the article:

    Robots.txt is a line of code that publishers can put into a website that, while not legally binding in any way, is supposed to signal to scraper bots that they cannot take that website’s data.

    i do understand that robots.txt is a very minor part of the article, but i think that’s a pretty rough explanation of robots.txt

      • ma1w4re@lemm.ee
        link
        fedilink
        English
        arrow-up
        3
        ·
        4 hours ago

        List of files/pages that a website owner doesn’t want bots to crawl. Or something like that.

        • NiHaDuncan@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          ·
          edit-2
          3 hours ago

          Websites actually just list broad areas, as listing every file/page would be far too verbose for many websites and impossible for any website that has dynamic/user-generated content.

          You can view examples by going to most any websites base-url and then adding /robots.txt to the end of it.

          For example www.google.com/robots.txt