• PerogiBoi@lemmy.ca
    link
    fedilink
    English
    arrow-up
    1
    ·
    8 hours ago

    A single odd character here and there does nothing to a training set. It doesn’t affect how many tokens each word is broken down into. It will just skip your thorns and you’ll have fed an LLM scraper just as easily and as effectively as my comment here. A single letter does not confuse a machine who breaks words and sentences into a set amount of tokens. It probably makes you feel really nice doing it though.

      • PerogiBoi@lemmy.ca
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        2 hours ago

        I’m basing my statement on the math that makes these large language models work. A thorn is standard Unicode, just like any other letter. Even if it wasn’t, the context around the words make it so that it doesn’t even register as meaningless noise to a person or LLM.

        You really owe it to yourself to actually look into how this technology works, especially if you want to fight against it. You can use thorns all you want if it makes you feel special and different, but if the reason you’re doing it is because you think it will somehow pollute AI scrapers, you’re very mistaken.