@Inca:
@awemany i was thinking last night about the best way to uncover alts. It is probably possible to mathematically compare the collected written messages on reddit of two users to see if there is a match.
I did such a thing, and so did someone else, I think about a year back. I made a reddit submission about it AFAIK, look at my posting history on reddit.
Back then, I wasn't really convinced that there are many sockpuppets around after my analysis, and I am still doubtful that it is the case now.
Note that there are many, many ways to calculate such similarity scores and that there is no good way to calibrate the algorithm without knowing real identities so that you could attach proper probabilities to the result of your analysis. That's why I refrained from saying 'user X writes like user Y' because I knew the data is basically BS without the calibration. The other guy went and said so anyways, but admitted that it is basically BS after I poked him on that. Lots of people complained in his analysis submission that 'no way this other guy is me'.
So what I suspect could have happened in the mean time or could have happened even back then (hard to say because there's no conclusive data that I can think of that would show that) that maybe some or many of the fresh accounts are sockpuppets.
However, I believe that some of our most dangerous enemies, such as Greg, have a very unique writing style and I believe the mental effort it would take to create and keep a separate sockpuppet identity for more than maybe a couple of unlinkable posts is too demanding - and not effective, he could rather write as himself. I'd certainly consider it too demanding myself. Note also that my recent post on reddit shows that he has some very peculiar things he does and says. Any account with similar post frequency as Greg's (He's second on /r/btc, right after /u/jstolfi AFAIR) would have been intuitively found out.
Writing a python script to do collect each users posts would be trivial.
That data already exists, you can download it e.g. on files.pushshift.io and I believe some people specifically crawled the Bitcoin subreddits. That's what I did - I just filtered it down to Bitcoin data, because I don't need the rest.
Comparing words used, frequency, common turns of phrase or patterns would be relatively easy. It may be that certain people use a rare word or phrase that could be identified in the alt account. Punctuation styles, grammatical peculiarities and sentence structure could also be assessed but would be more difficult. Sources of info such as irc logs could generate a very large written data set to mine.
Yes. But as I said above the calibration is missing and the frequent posters are (I believe) mostly unique. And the suspected random, often replaced, fresh sockpuppets - they appear and disappear so quickly (I guess the negative karma is when they are abandoned) that it is hard to put the pieces of the puzzle together there.
This could all be cross referenced with posting times and taking things a step farther to see if certain accounts post in the same threads as the suspected original account as the alt or to see if the suspected alts post in threads or in response to messages which use the original authors name..
I did that for a quite a few guys, manually, that were suspected to be Alts of each other. I haven't found anything even close to suspicious yet.