UPDATE 2011-06-15: this issue has been added as bug #1124 on the Diaspora Issue Tracker. If fixed, I will remove the .txt.gz file containing (only) the names of Diaspora users. It has then served its purpose of eliciting a potential gap between an expectation of privacy (e.g. “my name should not be shown to Diaspora-members I don’t know”) and actual privacy (e.g. “my name is shown to any Diaspora-member”).
Here is the list of all ~55k users of the Diaspora social network at JoinDiaspora.com, retrieved without committing cybercrime: 20110610_diasporaUsersOn20110329-csv.txt.gz
Why publish it here? For the same reason as collecting 35M Google Profiles and blogging about it. Like any social network, Diaspora (software for decentralized social networking) too is designed to hand out this kind of information. In this particular case, I believe demonstrating what can be gathered by unknown third parties (like me) with hardly any effort will, in the end, do more good than bad. But I do understand objections — I encourage you to explain them in comments below; I will be happy to read them and respond if applicable.
In this case, two minor privacy invasions(*) might occur:
- Given a known real or fictitious person name, ‘reveal’ (hint at) a login/username;
- Given a known login/username, ‘reveal’ (hint at) a real or fictitious person name;
- Given either, ‘reveal’ (hint at) X being a user of the Diaspora network at JoinDiaspora.com.
Some thoughts on social media in general:
- Do users really ‘know’/expect that their data can and probably will be collected by unknown third parties, companies, govts, outside their control?
- Do users really ‘know’/expect that all data of the network can and probably will be collected by unknown third parties, companies, govts, outside their control? (The ‘network effect‘ also holds true for copies of social network data. A network that is easy to harvest completely is more likely to be harvested completely, and will be a more attractive ‘starting point’ for, e.g., filtering out interesting targets than networks that are difficult to harvest.)
- Do users have sufficient risk assessment skills and information to decide what chance x impact to associate with their personal data and communication being copied into rogue databases outside their control?
Although social media providers can enforce restrictions at their servers –e.g. only allow search by name, only allow 1000 queries per day from the same IP address–, searching, matching and linking is completely unrestricted to unknown third parties once they gathered a rogue copy of the data. The data can be collected slowly, over a longer period of time and from varying IP addresses/ranges, ‘staying under the radar’ of monitoring and restricting measures. In this particular case there is no ‘real’ problem and publishing the data is mostly harmless; otherwise I wouldn’t have. To repeat from my blogpost about Google Profiles: my activities are directed at inciting, or poking up, debate about privacy.
Mind you, I’m a happy Diaspora user myself and support the work of the developers. They’re doing good work!
(*) Whether or not these would be perceived as a privacy invasion depends on each of the 55k individual users – because in the end, privacy is an individual experience.
It would be great help to the developers if remove that file from your post
@srinivas I’m always willing to change my behavior if reason convinces me to. I carefully deliberated whether to publish the file or not. It is not my intention to cause damage; my intention is to inspire debate about unforeseen (ab)uses of social media by unknown third parties, companies and governments. Investigating the ‘harvestability’ of social media, and accompanying claims (“it is possible to enumerate all Diaspora users”) by evidence (the file), supports that intention.
How would removing the file help the developers?
There are two of my accounts on that list. And I’m totally fine with your action. In fact, even if you published my passwords, I’d be fine. Why? Because diaspora is alpha software and stuff like this happens to alpha software. On the other hand, I’m sure this will be great motivation to diaspora guys 🙂
Anyway, bravo for doing this.
@dijxtra Thanks for commenting. It’s hopeful to see ‘harvestability’ (in this case: getting a list of all usernames + real of fictitious name) being attributed to the software being alpha, implying an expectation that it will/should become more difficult to harvest as software development progresses.
In case of Google Profiles, not everyone agrees that there is, generally speaking, a “reasonable expectation of privacy”: read Google’s response [1] and the post by Forbes-blogger Kashmir Hill [2].
It’s hopeful that you have higher-than-that expectation of Diaspora – I (too?) personally have higher expectation of privacy protection from Diaspora(‘s developers) than I have from Google.
[1] http://www.theregister.co.uk/2011/05/25/google_profiles_database_dump/
[2] http://blogs.forbes.com/kashmirhill/2011/05/26/google-profiles-is-easy-aggregation-an-invasion-of-privacy/