Public Figures and Their Personal Data

Politicians are public figures and therefore have reduced reasonable expectations of privacy. The Dutch House of Representatives provides information about all 150 representatives in a single XML file: (mirror of today’s copy; also in Google-cache, but not Some of the personal information it contains (not all values are present for all representatives):

  1. full name
  2. gender
  3. date of birth
  4. place of birth
  5. home town
  6. education
  7. work experience
  8. work e-mail (
  9. travels 
  10. personal website
  11. personal statement
  12. (past) affiliations w/foundations, associations
  13. political affiliation
  14. photo 

When stumbling upon that file, the following thoughts came to mind:

  • I hope these public figures don’t use that information as password or answer to security question in their private life.
  • With personal data being readily available, these high-profile targets surely must have already been victim (although maybe not be aware of it) of password-guessing and social engineering attacks?
  • If they aren’t, is that…
    • …because nobody cared to target them?
    • …because this particular knowledge does not pose a threat?
      • …because their personal subscriptions/service-usage is unknown?
        • E.g. you don’t know they use Gmail, which bank, insurance, webshops.
      • …because their personal logins/names are unknown?
        • E.g. you know they are customer/employee/student at X but you don’t know their username for logging in to X
      • …because this personal info was not used as password or answer to a security question?
        • E.g. you know <username> but can’t guess the password
      • …because this personal info is, by itself, insufficient to compromise accounts?
        • E.g. more information is needed (SSN, bank account number), or multifactor authentication requires possession of token
    • …because of something else?

In a sense, our representatives function as guinea pigs for testing assumptions about the risk associated with disclosing personal data — or rather, at least with disclosing this particular personal data. Disclosing SSN, bank account numbers, credit card numbers and DigiD credentials probably remains a bad idea.

UPDATE 2011-04-23: I suddenly realize that A Study on the Re-Identifiability of Dutch Citizens (.pdf) presented at HotPETS 2010 is relevant here. Guido van ‘t Noordende, Cees de Laat and I studied registry office (GBA) data of 2.7 million Dutch citizens (~16% of the total population) to explore their identifiability by various quasi-identifiers consisting of partial or full postal code, partial or full date of birth and gender. We also included this one (tables 2 and 3 in the paper):

QID = { town + date-of-birth + gender }

The median anonymity set size was 2, meaning that half of the combinations of town + date of birth + gender in our data set either unambiguously identified an individual (Dutch citizen), or a group of only 2 individuals. The numbers vary depending on town size, but for ~37% of Dutch citizens in our set that QID is identifying up to a group of 5 or less individuals. As you see on the above list, the disclosed personal information possibly includes quasi-identifier value + real identity for the representatives. Just thought this is worth mentioning.

Since the data is publicly available anyway: here is the list of all representatives and their quasi-identifier value.

Leave a Reply

Your email address will not be published. Required fields are marked *