Month: April 2011

Meta-Data in Public Documents, Cont’d

For fun, I extracted metadata from most of the documents publicly available at these websites:

aivd.nl
belastingdienst.nl
cia.gov
ctivd.nl
defensie.nl
eerstekamer.nl
europol.eu
fbi.gov
gchq.gov.uk
minbuza.nl
mindef.nl
nsa.gov
officielebekendmakingen.nl
om.nl
overheid.nl
politie.nl 
rijksbegroting.nl
rijksoverheid.nl
sis.gov.uk
tno.nl
tweedekamer.nl

Here is a count of e-mail addresses I found in Tag_AuthorEmail and Bytes:

1    accor.com
1    aesn.fr
1    agentschapnl.nl
1    atech-acoustictechnologies.com
1    bda.amsterdam.nl
1    bieleveldvanhoek.nl
1    bletchleypark.org.uk
1    brgm.fr
2    cbs.nl
2    cesg.gsi.gov.uk
1    coe.int
1    CvT.nl
1    diplomatie.gouv.fr
3    ec.europa.eu
2    ecologie.gouv.fr
4    eerstekamer.nl
1    europolhq.net
7    fbinet.fbi  <— internal FBI mail
1    gakushikai.jp
2    gchq.gsi.gov.uk
1    gmail.com
2    hotmail.com
1    hydro.nl
6    ic.fbi.gov
1    inro.tno.nl
1    isc-cie.com
1    iwiweb.nl
1    kabinets-formatie.nl
1    klpd.politie.nl
7    leo.gov
1    let.ru.nl
1    mail.ing.unibo.it
3    militairefondsen.nl
2    minbuza.nl
14    minbzk.nl
1    mindef.nl
7    minfin.nl
22    minjus.nl
3    minlnv.nl
8    MINSZW.NL
1    minvws.nl
1    mma.es
1    mrw.wallonie.be
2    noord-holland.nl
1    oieau.fr
1    olemiss.edu
1    prv.gelderland.nl
1    ross.nl
1    sdu.nl
1    SMOJMD.USDOJ.gov
5    sp.nl
1    sp.se
1    steunfondsofficieren.nl
2    tg.nl
3    tk.parlement.nl
1    tmleuven.be
2    tno.nl
146    tweedekamer.nl
2    unesco.org
2    uwv.nl
2    wereldschool.nl
1    wwi.minbzk.nl
1    wxs.nl
1    xs4all.nl

Furthermore, these are some network/directory paths found in Title and Hyperlinks tags:

http://cd0.bistro.ro.minjus/cgi1frnt.exe
Sggv12fkdbbTemplates GMODCDC-kl+DB 2.jpg
VAF0002groups03$COAlHDPDPLMAPMPmm100 Projecten190 Business Proces Redesign296Fase 11a-04-Digitaliseren formulierenPi Digitale formulierenkastLogo defensie.gif
tante-eshome$LienekeSdatapdfHeffingsverordening Marktgelden Zeeburg 2009, tabel 2009.d…
sk1ntdata03homedir$MRoosDesktopTekening deel 1.xps
U:wp51wp51verlof tbsgesteldeverlof tbs gestelde 7-7-2010.wpd
N:HDP AIMPF GO4 Processen99. Financiële werkinstructiesWerkmap Gerard3TekentjesLogo defensie.gif
T:_PPentaBP Badge.jpg
F:dataProjectenCivTecGroenBomenbeleidsplanBomenbeleidsplan 25-10-2010 Totaal (1)
G:Realisatie en BeheerTeam Vastgoed en ProjectenStedenbouwKimAlgemeenkomgrenzenkomgrenzen Layout1 (1)
sfgvp12FEBCOBiaProjectenZeusZeuswerkODPPwerkChrisLogo´sdefensie.wmf, sfgvp12FEBCOBiaProjectenZeusZeuswerkODPPwerkChrisLogo´sdefensie.wmf, sfgvp12FEBCOBiaProjectenZeusZeuswerkODPPwerkChrisLogo´sdefensie.wmf
V:SHAREDNICS SHAREDEDASDRAFTSKisnerOPS 2008FINAL OPs 2008Copy of 2008 NICS OPERATIONS REPORT PDF.wpd
V:SHAREDNICS SHAREDEDASDRAFTSKisnerOPS 20072007 Operations Report PDF.wpd
H:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA2006 NICS Operations Report PDF.wpd
N:AutocadMedewerker TVDBStrooiroutes Berkenwoude (1)
/cgi-bin/pdcsns.cgi?user=%26dir=/9202000/g/%26filename=/9202000/t5.sns%26via=direct%26v01=25910
C:WpdocWORD97LogoConsLogoCons.jpg
C:Documents and Settingsu0072s1Local SettingsTemporary Internet Filesu0072s1Local Settingsu00a110Local SettingsLocal Settingsu00g5m0Local SettingsTemporary Internet FilesOLK12wetsuwior1tog00.htm

Public Figures and Their Personal Data

Politicians are public figures and therefore have reduced reasonable expectations of privacy. The Dutch House of Representatives provides information about all 150 representatives in a single XML file: http://www.tweedekamer.nl/xml/kamerleden.xml (mirror of today’s copy; also in Google-cache, but not archive.org). Some of the personal information it contains (not all values are present for all representatives):

  1. full name
  2. gender
  3. date of birth
  4. place of birth
  5. home town
  6. education
  7. work experience
  8. work e-mail (@tweedekamer.nl)
  9. travels 
  10. personal website
  11. personal statement
  12. (past) affiliations w/foundations, associations
  13. political affiliation
  14. photo 

When stumbling upon that file, the following thoughts came to mind:

  • I hope these public figures don’t use that information as password or answer to security question in their private life.
  • With personal data being readily available, these high-profile targets surely must have already been victim (although maybe not be aware of it) of password-guessing and social engineering attacks?
  • If they aren’t, is that…
    • …because nobody cared to target them?
    • …because this particular knowledge does not pose a threat?
      • …because their personal subscriptions/service-usage is unknown?
        • E.g. you don’t know they use Gmail, which bank, insurance, webshops.
      • …because their personal logins/names are unknown?
        • E.g. you know they are customer/employee/student at X but you don’t know their username for logging in to X
      • …because this personal info was not used as password or answer to a security question?
        • E.g. you know <username>@gmail.com but can’t guess the password
      • …because this personal info is, by itself, insufficient to compromise accounts?
        • E.g. more information is needed (SSN, bank account number), or multifactor authentication requires possession of token
    • …because of something else?

In a sense, our representatives function as guinea pigs for testing assumptions about the risk associated with disclosing personal data — or rather, at least with disclosing this particular personal data. Disclosing SSN, bank account numbers, credit card numbers and DigiD credentials probably remains a bad idea.

UPDATE 2011-04-23: I suddenly realize that A Study on the Re-Identifiability of Dutch Citizens (.pdf) presented at HotPETS 2010 is relevant here. Guido van ‘t Noordende, Cees de Laat and I studied registry office (GBA) data of 2.7 million Dutch citizens (~16% of the total population) to explore their identifiability by various quasi-identifiers consisting of partial or full postal code, partial or full date of birth and gender. We also included this one (tables 2 and 3 in the paper):

QID = { town + date-of-birth + gender }

The median anonymity set size was 2, meaning that half of the combinations of town + date of birth + gender in our data set either unambiguously identified an individual (Dutch citizen), or a group of only 2 individuals. The numbers vary depending on town size, but for ~37% of Dutch citizens in our set that QID is identifying up to a group of 5 or less individuals. As you see on the above list, the disclosed personal information possibly includes quasi-identifier value + real identity for the representatives. Just thought this is worth mentioning.

Since the data is publicly available anyway: here is the list of all representatives and their quasi-identifier value.

U.S.-Owned Trackers on Dutch Govt Websites

I used Firebug and manual code inspection to puzzle out which Dutch govt websites have which (ad)trackers like Google Analytics and Nedstat comScore (who bought Nedstat in Q3/2010). Some reflection is desirable, IMHO, on whether or not to disclose which (Dutch) IP address accessed what (Dutch govt) content to foreign-owned companies who’s government may require/force them to hand it over. Note: I only looked at the homepage of each site.

First the good (tracking-free –> kudos!):

Then the bad:

              I don’t know what data is collected / is not collected by the various trackers, and lack the time to carry out that analysis. If you feel like it, please do so; I will be more than happy to link to your results or post them on this blog on your behalf.

              Easter Egg on Dutch Govt Recruitment Website. Intended or Not?

              It’s almost Easter, and I stumbled upon an early Easter egg: the Dutch govt recruitment website http://www.werkenvoornederland.nl/ includes, at least on the homepage, /static_shared/pd/scripts/egg.js containing a Konami code. Visit the site and press Up, Up, Down, Down, Left, Right, Left, Right, B, A. The browser now fetches and executes remote javascript: http://kottke.org/plus/misc/asteroids.js. I don’t now whether this egg –which dates back to at least 2009– is known-and-intended to be at this government website or that its presence indicates sloppy Copy/Paste development. Fortunately it seems that egg.js is not included in the part of the website where applicants manage their resume and personal data, and the remote code stays remote unless the 10-key sequence is typed. But that would be a weak argument against the unnecessary exposure.

              2012 = Year of Alan Turing! Kick-Off: EUR 100 Code-Breaking Challenge

              UPDATE 2013-12-24: today, Queen Elizabeth II pardoned Alan Turing and remitted Turing’s sentence! See the document here (.pdf).
              UPDATE 2013-07-20: UK government has signaled its intent to support a bill that would issue a posthumous pardon to Alan Turing.
              UPDATE 2012-02-25:
              this challenge is still up. Also: Nature News Special : Alan Turing at 100 

              UPDATE 2012-01-03: this challenge is still up. Happy Alan Turing Year!
              Statue of Alan Turing at Bletchley Park

              This photo was taken on October 28th 2010 at Bletchley Park . You see the commemorating statue of Dick Berlijn Alan Mathison Turing (1912-1954) that was donated to Bletchley Park by an American billionaire/philantropist Sidney E. Frank and unveiled on June 19th 2007. Alan Turing’s 1st centennial is due at June 23rd 2012, and 2012 will be the “Year of Alan Turing” Why? Because of:

              1. his contributions to mathematics and logic, and as it turned out, philosophy; (excellent book: The Annotated Turing);
              2. his contributions to code-breaking Enigma during WW2, standing on the shoulders of these Polish mathematicians;
              3. the embarrassingly late apology, in 2009, by the British government for inhumane way Turing was treated by them for being gay.

              As a small kick-off, I have a EUR 100 code-breaking challenge for you! In the above photo, I hid the IMEI of the iPhone used to take the photo using Niels Provosoutguess-0.2, which dates back to 2001. Stegdetect and stegbreak provide a starting point, but a simple stegbreak alanturing+imei.jpg won’t provide the answer… I will pay EUR 100 to the first successful code-breaker, provided you allow me to post your method on this blog. You get full credits (as well as EUR 100). Send your answer to mrkoot at gmail dot com, or leave a comment on this blog.

              Consider celebrating 2012 as the “Year of Alan Turing”, e.g. by supporting the ongoing efforts by United Kingdom Mathematics Trust http://www.turingcentenary.eu/, already backed by ACM, Wolfram and others.