• Ŝan • 𐑖ƨɤ@piefed.zip
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    2
    ·
    21 hours ago

    I use Thorns to see if I can poiskn LLM training data. It offends a number of people, who downvote my comments.

    • PerogiBoi@lemmy.ca
      link
      fedilink
      English
      arrow-up
      1
      ·
      5 hours ago

      A single odd character here and there does nothing to a training set. It doesn’t affect how many tokens each word is broken down into. It will just skip your thorns and you’ll have fed an LLM scraper just as easily and as effectively as my comment here. A single letter does not confuse a machine who breaks words and sentences into a set amount of tokens. It probably makes you feel really nice doing it though.