Public Figures and Their Personal Data

Politicians are public figures and therefore have reduced reasonable expectations of privacy. The Dutch House of Representatives provides information about all 150 representatives in a single XML file: http://www.tweedekamer.nl/xml/kamerleden.xml (mirror of today’s copy; also in Google-cache, but not archive.org). Some of the personal information it contains (not all values are present for all representatives):

full name
gender
date of birth
place of birth
home town
education
work experience
work e-mail (@tweedekamer.nl)
travels
personal website
personal statement
(past) affiliations w/foundations, associations
political affiliation
photo

When stumbling upon that file, the following thoughts came to mind:

I hope these public figures don’t use that information as password or answer to security question in their private life.
With personal data being readily available, these high-profile targets surely must have already been victim (although maybe not be aware of it) of password-guessing and social engineering attacks?
If they aren’t, is that…

…because nobody cared to target them?
…because this particular knowledge does not pose a threat?

…because their personal subscriptions/service-usage is unknown?

E.g. you don’t know they use Gmail, which bank, insurance, webshops.

…because their personal logins/names are unknown?

E.g. you know they are customer/employee/student at X but you don’t know their username for logging in to X

…because this personal info was not used as password or answer to a security question?

E.g. you know <username>@gmail.com but can’t guess the password

…because this personal info is, by itself, insufficient to compromise accounts?

E.g. more information is needed (SSN, bank account number), or multifactor authentication requires possession of token

…because of something else?

In a sense, our representatives function as guinea pigs for testing assumptions about the risk associated with disclosing personal data — or rather, at least with disclosing this particular personal data. Disclosing SSN, bank account numbers, credit card numbers and DigiD credentials probably remains a bad idea.

UPDATE 2011-04-23: I suddenly realize that A Study on the Re-Identifiability of Dutch Citizens (.pdf) presented at HotPETS 2010 is relevant here. Guido van ‘t Noordende, Cees de Laat and I studied registry office (GBA) data of 2.7 million Dutch citizens (~16% of the total population) to explore their identifiability by various quasi-identifiers consisting of partial or full postal code, partial or full date of birth and gender. We also included this one (tables 2 and 3 in the paper):

QID = { town + date-of-birth + gender }

The median anonymity set size was 2, meaning that half of the combinations of town + date of birth + gender in our data set either unambiguously identified an individual (Dutch citizen), or a group of only 2 individuals. The numbers vary depending on town size, but for ~37% of Dutch citizens in our set that QID is identifying up to a group of 5 or less individuals. As you see on the above list, the disclosed personal information possibly includes quasi-identifier value + real identity for the representatives. Just thought this is worth mentioning.

Since the data is publicly available anyway: here is the list of all representatives and their quasi-identifier value.

Matthijs R. Koot's notebook

Personal blog. Hobbies: IT, security, privacy, democracy.

Leave a Reply Cancel reply