June 2011 – Matthijs R. Koot's notebook

UPDATE 2011-06-15: this issue has been added as bug #1124 on the Diaspora Issue Tracker. If fixed, I will remove the .txt.gz file containing (only) the names of Diaspora users. It has then served its purpose of eliciting a potential gap between an expectation of privacy (e.g. “my name should not be shown to Diaspora-members I don’t know”) and actual privacy (e.g. “my name is shown to any Diaspora-member”).

Here is the list of all ~55k users of the Diaspora social network at JoinDiaspora.com, retrieved without committing cybercrime: 20110610_diasporaUsersOn20110329-csv.txt.gz

Why publish it here? For the same reason as collecting 35M Google Profiles and blogging about it. Like any social network, Diaspora (software for decentralized social networking) too is designed to hand out this kind of information. In this particular case, I believe demonstrating what can be gathered by unknown third parties (like me) with hardly any effort will, in the end, do more good than bad. But I do understand objections — I encourage you to explain them in comments below; I will be happy to read them and respond if applicable.

In this case, two minor privacy invasions(*) might occur:

Given a known real or fictitious person name, ‘reveal’ (hint at) a login/username;
Given a known login/username, ‘reveal’ (hint at) a real or fictitious person name;
Given either, ‘reveal’ (hint at) X being a user of the Diaspora network at JoinDiaspora.com.

Some thoughts on social media in general:

Do users really ‘know’/expect that their data can and probably will be collected by unknown third parties, companies, govts, outside their control?
Do users really ‘know’/expect that all data of the network can and probably will be collected by unknown third parties, companies, govts, outside their control? (The ‘network effect‘ also holds true for copies of social network data. A network that is easy to harvest completely is more likely to be harvested completely, and will be a more attractive ‘starting point’ for, e.g., filtering out interesting targets than networks that are difficult to harvest.)
Do users have sufficient risk assessment skills and information to decide what chance x impact to associate with their personal data and communication being copied into rogue databases outside their control?

Although social media providers can enforce restrictions at their servers –e.g. only allow search by name, only allow 1000 queries per day from the same IP address–, searching, matching and linking is completely unrestricted to unknown third parties once they gathered a rogue copy of the data. The data can be collected slowly, over a longer period of time and from varying IP addresses/ranges, ‘staying under the radar’ of monitoring and restricting measures. In this particular case there is no ‘real’ problem and publishing the data is mostly harmless; otherwise I wouldn’t have. To repeat from my blogpost about Google Profiles: my activities are directed at inciting, or poking up, debate about privacy.

Mind you, I’m a happy Diaspora user myself and support the work of the developers. They’re doing good work!

(*) Whether or not these would be perceived as a privacy invasion depends on each of the 55k individual users – because in the end, privacy is an individual experience.

Matthijs R. Koot's notebook

Personal blog. Hobbies: IT, security, privacy, democracy.

Month: June 2011

Diaspora/JoinDiaspora.com 55k User Enumeration