Researchers published a massive database of more than 2 billion Discord messages that they say they scraped using Discord’s public API. The data was pulled from 3,167 servers and covers posts made between 2015 and 2024, the entire time Discord has been active.

Though the researchers claim they’ve anonymized the data, it’s hard to imagine anyone is comfortable with almost a decade of their Discord messages sitting in a public JSON file online. Separately, a different programmer released a Discord tool called “Searchcord” based on a different data set that shows non-anonymized chat histories.

  • FaceDeer
    link
    fedilink
    412 months ago

    If they aren’t comfortable with their Discord messages being public, perhaps they shouldn’t have posted those messages in a public forum that the public can access.

  • unalivejoy
    link
    fedilink
    English
    122 months ago

    Public data should be accessible anonymously. You can’t change my mind.

  • @[email protected]
    link
    fedilink
    English
    62 months ago

    Every time you post, you’re posting so that Meta, Google, Reddit and every known retail store like Walmart, Target, Kroger, etc. can see it because they bought that info or harvested it themselves. I think these are great announcements so people can see who sees and manipulates you with your own contributions of data.

  • @[email protected]
    link
    fedilink
    English
    15
    edit-2
    2 months ago

    “anonymized” sure. I highly doubt they read every message. I’m sure there is lots of de-anonymizing information in the messages themselves

    For example–

    Anon1: “hey jeff, wanna play Minecraft?”

    Anon2: “sure”

    Thus we know Anon2’s name is Jeff. I imagine there’s a lot of this.

    • Mose13
      link
      fedilink
      English
      12 months ago

      Shit. My name is Jeff. Now they know

  • @[email protected]
    link
    fedilink
    English
    2562 months ago

    Probably our only chance to find solutions to problems with open source software that uses Discord as their forum

    • nawa
      link
      fedilink
      English
      142 months ago

      Lol, I’ve read this headline and thought “thank fuck, probably the only option to have Discord’s content readable”, I like how universal this opinion is

    • @[email protected]
      link
      fedilink
      English
      1402 months ago

      Seriously. It’s beyond painful when some open source project only uses Discord for communication. You have to hope that you post your question at a time when the right people are online, and that there’s not a more interesting conversation going on, otherwise it just gets lost. Index that whole dataset.

      • Ulrich
        link
        fedilink
        English
        2
        edit-2
        2 months ago

        Index that whole dataset

        I’ve seen a few projects doing just that with answeroverflow.com and they have come up in my web searches. Not really a solution but at least a stopgap.

        • The Quuuuuill
          link
          fedilink
          English
          542 months ago

          there’s a difference between using irc for livetime troubleshooting and not having a forum at all and directing everyone to your livechat discord. i’m sure some sicko out there has run an OSS project on only IRC, but their project likely got no traction because a history of problemsolving posts is important in open source. generally speaking, you need:

          • a wiki
          • a static indexable searchable forum
          • a live chat place for real time communication for novel problems

          too many projects these days only have that last one in the form of discord

        • @[email protected]
          link
          fedilink
          English
          122 months ago

          That would be equally annoying. Probably a better signal to noise ratio on IRC though; Discord descends into memes almost instantly.

        • AugustWest
          link
          fedilink
          English
          92 months ago

          For projects I am involved with all irc chats are archived and searchable. There is nothing private, no registration needed and searchable.

          Quite a bit different.

    • Leon
      link
      fedilink
      English
      182 months ago

      I spent nearly three hours today between discord and matrix trying to figure out how to get these two pieces of software to talk using a certain protocol.

      Imagine if there were online indexable platforms where people could publish this information so it’s easily accessible rather than having to scour through message logs hoping to find the right keywords. Such a technology surely doesn’t exist already, right?

      I hate discord.

      • @[email protected]
        link
        fedilink
        English
        42 months ago

        Yeah, but then you have something like when people protest deleted their history on reddit which is fine as a protest tactic but leaves a hole where your specific question came up but now there’s nothing there.

      • dual_sport_dork 🐧🗡️
        link
        fedilink
        English
        372 months ago

        I don’t hate Discord, I simply hate that so many projects and companies have unanimously decided to use it as the wrong tool for the wrong job.

        It’s fine for its intended use case, which is bickering with my friends about video games and fiction, and spamming each other with .gifs and meme images.

        • @[email protected]
          link
          fedilink
          English
          212 months ago

          Discord is genuinely a great tool for what I used to use Skype for. Talking to my friends, and sharing dumb memes with them in a groupchat format. Companies need to learn that using it as a forum, a Q&A service, a wiki or any other information sharing purpose, is simply fucking retarded.

              • @[email protected]
                link
                fedilink
                English
                1
                edit-2
                2 months ago

                You can say whatever you want but people are going to judge you for it.

                Most people here prefer to use inclusive language that doesn’t make anyone feel unwelcome.

                People with actual mental deficiencies exist on Lemmy and it may or may not hurt them to use words like this.

                If it hurts even one good person then it’s probably just worth increasing your vocabulary to avoid that.

                Not to mention imo suggesting someone is actually “retarded” means they should be treated with understanding and patience. People who are deliberately ignorant don’t deserve that patience.

      • @[email protected]
        link
        fedilink
        English
        12 months ago

        you get it to work? i didnt have time to get it working in both directions. matrix to discord worked fine but not the other way.

        • Leon
          link
          fedilink
          English
          1
          edit-2
          2 months ago

          I’m not entirely sure what you’re asking here. I do not use any bridge between the two, but rather searched in separate communities for my answer. Would’ve been lovely if I could just use a search engine to search indexed forums or so, but since for some reason chat clients have taken the place of forums that’s just not doable.

          I’d like to move away from Discord but sadly a bunch of friends still use it. I haven’t read up enough about the bridge thing to figure out if it actually serves a purpose I’d be interested in or not.

  • @[email protected]
    link
    fedilink
    English
    262 months ago

    So this is:

    'Uh guys, Discord chats leaked…"

    For… what, just literally everyone who used Discord between 2015 and 2017, everyone who was an early adopter?

    Dear fucking god.

    I used to say ‘someday, people will learn’, but fucking no obviously not, no they won’t, almost everyone is an idiot and/or truly doesn’t care.

    … I guess this’ll be fodder for a whole bunch of dramatubers / pedohunters for the next year or so…

    • @[email protected]
      link
      fedilink
      English
      12 months ago

      It wasn’t the chats though. It was public servers that can be found through the discovery tab. I would love to be up and arms about this and convince people to switch but… Looking at it objectively, this isn’t terribly different from if they’d archived public subreddits and their posts.

    • @[email protected]
      link
      fedilink
      English
      372 months ago

      The disappearance of forum public discussion to unsearchable, unpreserved, discord semi-private discussion chambers is probably the largest informational catastrophe of the internet so far.

      • @[email protected]
        link
        fedilink
        English
        102 months ago

        I potentially agree, but as a possible competitor, I submit:

        Everything DOGE has done in the last 3 months.

        • @[email protected]
          link
          fedilink
          English
          32 months ago

          That information was at least available to capture.

          With discord it was EULA-walled and anti-scraper locked

          • @[email protected]
            link
            fedilink
            English
            32 months ago

            … No no no it all wasn’t.

            The DOGE goons made up multiple logins to multiple US Gov databases that are not open to the public… inucluding the DoD’s SIPRNet…

            … and we know at least some of these logins were also used from utterly unsecure personal devices, remotely, not onsite, and that they’ve been getting used by IP addresses from all over the place, all over the world, meaning said login creds have either outright been given away, or been compromised by other nation state’s hackers, or just total rando hackers.