Month: May 2012

Measuring and Predicting Anonymity (PhD thesis)

UPDATE 2012-07-20: govt answers to the Parliamentary question by Dutch MP Jeroen Recourt (PvdA).

UPDATE 2012-07-03: Webwereld article “PvdA: staatssecretaris omzeilt privacy-vraagstuk“.

UPDATE 2012-06-29: govt answers to the parliamentary questions.

UPDATE 2012-06-25: Dutch MP Jeroen Recourt (PvdA) sent parliamentary questions to the Ministry of Security and Justice. Recourt mistakenly believes that the 2.7 million citizen records I collected were gathered via some data leak. I in fact collected the data via official means, as explained in my dissertation. Recourt did not contact the University of Amsterdam, nor me personally, to verify that belief, and decided to jump to asking Parliamentary questions instead.

UPDATE 2012-06-24: webpage about June 27th 2012 by prof. Cees de Laat (one of my supervisors)

UPDATE 2012-06-21: press release by University of Amsterdam (in Dutch), article on (in Dutch), article on (in Dutch), radio interview at Q-Music (.mp3, in Dutch). I’m happily surprised.

UPDATE 2012-06-15: news article on Computable (in Dutch)


====== START OF ORIGINAL BLOGPOST FROM 2012-05-22 ======

I finished my PhD thesis entitled Measuring and Predicting Anonymity (.pdf, 2.8MB; permalink: and will publicly defend it in Amsterdam on June 27th 2012. The thesis is about data anonymity and contributes novel probabilistic methods for the analysis of anonymity.


In our increasingly computer-networked world, more and more personal data is collected, linked and shared. This raises questions about privacy — i.e. about the feeling and reality of enjoying a private life in terms of being able to exercise control over the disclosure of information about oneself. In attempt to provide privacy, databases containing personal data are sometimes de-identified, meaning that obvious identifiers such as Social Security Numbers, names, addresses and phone numbers are removed. In microdata, where each record maps to a single individual, de-identification might however leave columns that, combined, can be used to re-identify the de-identified data. Such combinations of columns are commonly referred to as Quasi-IDentifiers (QIDs).Sweeney’s model of k-anonymity addresses this problem by requiring that each QID value, i.e., a combination of values of multiple columns, present in a data set must occur at least k times in that data set, asserting that each record in that set maps to at least k individuals, hence making records and individuals unlinkable. Many extensions have been proposed to k-anonymity, but always address the situation in which data has already been collected and must be de-identified afterwards. The question remains: can we predict what information will turn out to be identifiable, so that we may decide what (not) to collect beforehand?

To build a case we first inquired into the (re-)identifiability of hospital intake data and welfare fraud data about Dutch citizens, using large amounts of data collected from municipal registry offices. We show the large differences in (empirical) privacy, depending on where a person lives. Next, we develop a range of novel techniques to predict aspects of anonymity, building on probabilistic theory, and specifically birthday-problem theory and large-deviations theory.

Anonymity can be quantified as the probability that each member of a group can be uniquely identified using a QID. Estimating this uniqueness probability is straightforward when all possible values of a quasi-identifier are equally likely, i.e., when the underlying variable distribution is homogenous. We present an approach to estimate anonymity for the more realistic case where the variables composing a QID follow a non-uniform distribution. We present an efficient and accurate approximation of the uniqueness probability using the group size and a measure of heterogeneity called the Kullback-Leibler distance. The approach is thoroughly validated by comparing the approximation with results from a simulation using the real demographic information we collected in the Netherlands.

We further describe novel techniques for characterizing the number of singletons, i.e., the number of persons have 1-anonymity and are unambiguously (re-)identifiable, in the setting of the generalized birthday problem. That is, the birthday problem in which the birthdays are non-uniformly distributed over the year. Approximations for the mean and variance are presented that explicitly indicate the impact of the heterogeneity, expressed in terms of the Kullback-Leibler distance with respect to the homogeneous distribution. An iterative scheme is presented for determining the distribution of the number of singletons. Here, our formulas are experimentally validated using demographic data that is publicly available (allowing our results to be replicated/reproduced by others).

Next, we study in detail three specific issues in singletons analysis. First, we assess the effect on identifiability of non-uniformity of the possible outcomes. Suppose one has the ages of the members of the group; what is the effect on the identifiability that some ages occur more frequently than others? Again, it turns out that the non-uniformity can be captured well by a single number, the Kullback-Leibler distance, and that the formulas we propose for approximation produce accurate results. Second, we analyze the effect of the granularity chosen in a series of experiments. Clearly, revealing age in months rather than years will result in a higher identifiability. We present a technique to quantify this effect, explicitly in terms of interval. Third, we study the effect of correlation between the quantities revealed by the individuals; the leading example is height and weight, which are positively correlated. For the approximation of the identifiability level we present an explicit formula, that incorporates the correlation coefficient. We experimentally validate our formulae using publicly available data and, in one case, using the non-public data we collected in the early phase of our study.

Lastly, we give preliminary ideas for applying our techniques in real life. We hope these are suitable and useful input to the privacy debate; practical application will depend on competence and willingness of data holders and policy makers to correctly identify quasi-identifiers. In the end, it remains a matter of policy what value of k can be considered sufficiently strong anonymity for particular personal information.


Notes on Electromagnetic Pulse (EMP) in US, UK, NL

UPDATE 2016-03-19: comment from Winn Schwartau in response to a YouTube-video on EMP posted on LinkedIn: “There is actually a reasonable solution to terrestrial effects of CME that every decent EE should intuitively understand. It’s a fundamental analogue outgrowth of Time Based Security, applying the math and Just Fricking Doing It. Oy. Satellites are toast, but we CAN keep the majority of the lights on. When someone really gives an IT… let me know. (No offense… but so tired of the ignorance, apathy and arrogance that was as is still endemic to The Entire Security Industry.”

UPDATE 2015-06-10: Ex-CIA Director: We’re Not Doing Nearly Enough To Protect Against the EMP Threat (Slashdot)

UPDATE 2015-02-xx: Electromagnetic Pulses (EMPs): Myths vs. Facts (.pdf, factsheet by the Edison Electrical Institute)

UPDATE 2014-10-22: Countering Electromagnetic Pulse (EMP) Threats (.pdf, slides by US Ambassador Henry F. Cooper, Chairman of High Frontier)

UPDATE 2014-09-15: EMP, Debunked: The Jolt That Could Fry The Cloud (John Barnes, article in Information Week)

UPDATE 2014-05-08: Electromagnetic Pulse: Threat to Critical Infrastructure (.pdf, testimony by dr. Peter Vincent Pry given before the US House Committee on Homeland Security, Subcommittee on Cybersecurity, Infrastructure Protection and Security Technologies. Pry is executive director of the Task Force on National and Homeland Security, a Congressional advisory board.)

UPDATE 2014-04-xx: Electromagnetic Pulse (EMP): An Overview of Threats and Mitigation Solutions for Operations Centers and Substations (.pdf, slides by Michael A. Caruso presented at 2014 Int’l Conference of Doble Clients)

UPDATE 2013-08-27: Protecting America Against “Permanent Continental Shutdown” From Electro-Magnetic Pulse Events (.pdf, slides by Chuck Manto of the InfraGard National Electromagnetic Pulse Special Interest Group, presented at Idaho National Laboratories)

UPDATE 2013-10-xx: Terminal Blackout: Critical Electric Infrastructure Vulnerabilities and Civil-Military Resiliency (.pdf, paper by Ayers & Chrosniak, US Army War College CSLD)

UPDATE 2012-07-30: EU FP7 project, 2012-2015: STRUCTURES, Strategies for Improvement of Critical Infrastructure Resilience to EM Attacks

UPDATE 2012-07-21: 1975 Introduction to Explosive Magnetic Flux Compression Generatos by Los Alamos (.pdf, document from the Los Alamos Scientific Laboratory. Via Cryptocomb).

I decided to gather some information on Electromagnetic Pulse (EMP) threats; here it is.

In 1990, the Engineering and Design – Electromagnetic Pulse (EMP) and Tempest Protection for Facilities  document was published. It focuses on USG facilities.

Between 2001 and 2010 (and still?), the U.S. had an EMP Commission (excellent resource).

In 2006, the Washington State Department of Health published a factsheet about EMP.

In 2008, the Congressional Research Service published a report on High Altitude Electromagnetic Pulse (HEMP) and High Power Microwave (HPM) Devices: Threat Assessment” (.pdf) (recommended read).

In 2009, there was a discussion on a forum for pilots about a New Scientist article that argued that a commercial aircraft could be brought down by DIY EMP bombs. Also in 2009, the U.S. Patent Application for an Electromagnetic pulse (EMP) hardened information infrastructure was filed.

In 2010, Business Insider had an article “Gauging The Threat Of An Electro-Magnetic Pulse Attack In The US“.

In 2011, some items appeared about Newt Gingrich’s interest in EMP: this blogpost by Dick Destiny (some profanity there) and this post on

In February 2012, the U.K. Defence Committee published the report Developing Threats: Electro-Magnetic Pulses (EMP). It refers to statements made by the U.S. EMP Commission.

In April 2012, the U.K. report, or rather this Telegraph news article about it, led to Parliamentary questions (.pdf) in the Netherlands. In response to those questions, Dutch Secretary of Defense Hans Hillen stated that he sees the EMP threat as “low” for the Netherlands. Here is my (unofficial) translation of the actual questions & answers:

  1. Are you aware of the article “Britain at risk from ‘GoldenEye’ electromagnetic pulse attack from space, MPs warn“?Yes.
  2. Do you still support your relativistic perspective on the threat of EMP that you expressing during the debate on the policy letter “Defence after the credit crisis: a smaller force in a troubled world” on June 6th 2011, in which you suggested that EMP is a remnant of the Cold War, that the EMP instrument is not practically applicable and that the threat can be considered to be low for the Netherlands?Yes, I consider this threat to be low. Also see the answers to questions 4 and 5.
  3. If so, how do you interpret the warning from the British Defence commission, which contradicts your vision, about the big risks for British national security? Are you aware that also the U.S. EMP commission and several leading U.S. politicians have warned of the great dangers of an EMP attack earlier?I have taken note of the report of the British Defence commission and the references therein to rulings of the U.S. EMP commission and U.S. politicians. The information that is available to me gives me no reason to change my position. Also see the answers to questions 4 and 5.
  4. How do you assess the specific comments of the President of the British Defence committee, James Arbuthnot, about the probability of an EMP attack considering that it is a convenient way to use a small number of nuclear weapons to create a large devastating effect?An electromagnetic pulse caused by a nuclear explosion can disrupt or destroy unprotected electronic systems by burning out electronic circuits. To create a nuclear EMP attack that has the greatest possible effect, an explosion of a nuclear weapon at several hundred kilometers height is necessary. This requires a launch vehicle that is only at the disposal of States. The Dutch intelligence services assess the likelihood of a nuclear EMP attack as low.
  5. Do you, like the British parliamentarians, see major risks in the possibility for terrorists to build a primitive non-nuclear EMP weapon that is devastating on a smaller scale? If not, why?It is possible to build small, improvised non-nuclear EMP-weapons using commercially available componnents. The area in which such a weapon can cause damage, however, is small. The impact of a terrorist attack using an improvised EMP-weapon is, therefore, comparable to that of an attack using a conventional explosive. The main objective of such terrorist attacks is to frighten the population, more than causing damage itself. Prevention is the appropriate protection against such attacks.
  6. What do you think of the criticism of the British Defence Committee that the British Ministry of Defence is unwilling to take these threats seriously? Do you see a similar situation in the Netherlands? If not, why?
  7. What do you think of the advice of the British Defence committee that the U.K. ough to immediately protect its critical infrastructure against EMP attacks?I abstain from commenting on the specific British situation. The Dutch intelligence services monitor the proliferation of nuclear weapons. In addition, the terrorist threat is monitored by the National Coordinator for Counterterrorism (NCTV). The Parliament is informed quarterly about developments about this through the Terrorist Threat Assessment Netherlands.
  8. Can you support with financial data your earlier claims that protection of critical infrastructure against EMP carries “enormous costs” with it? If not, why?Given the amount of electronic systems, their applications and the scope of potential measures, the costs of protection will be very high. Considering the answers to the previous questions, I foresee that establishing a detailed estimate will require a disproportionate effort.
  9. Are you willing to promote that an interdepartmental working group is formed to make inventory of the dangers of EMP for the Netherlands and advise about the possibilities to protect the Dutch critical infrastructure against the consequences of EMP? If not, why?I don’t see the need for this.

Other informative resources:


Facebook “Like” Button = Privacy Violation + Security Risk

If you walk into a store, would you appreciate it if the store owner phoned a random stranger to tell him/her that you are at their store? Probably not. Because it’s weird. Because it serves no purpose to you. Because you feel it could, in fact, be harmful to you. Or simply because you feel it is none of their frickin’ business. To put it more eloquently, it intuitively constitutes a violation of contextual integrity.

Yet, that is exactly what happens when you visit many websites.

To me, Facebook is equivalent to a random stranger. And every time I visit a website that has a Facebook `Like’-button, that website makes my browser disclose that visit to Facebook, despite the fact that I do not have a Facebook profile. When I visit Dutch online bookstore, their website makes my browser send the following HTTP request to

GET /plugins/likebox.php? HTTP/1.1
User-Agent: Mozilla/5.0 (X11; OpenBSD i386; rv:5.0) Gecko/20100101 Firefox/5.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Proxy-Connection: keep-alive

The Referer-header discloses to Facebook that I’m visiting Chances are that if Facebook would want to, they could easily identify me by matching my IP address + HTTP headers to data collected by themselves or (other) private intelligence agencies (.pdf) during my prior (non-anonymous) online purchases and my (non-anonymous) social media activity.

When I visit Dutch take-away food ordering webshop, my browser fetches a page from Facebook, Twitter, Google and Hyves (Hyves is a Dutch/Belgian social network):

So, effectively, makes my browser tell four random strangers my identity and that I’m interested in take-away dinners.

In case of there is another subtlety. Whenever I visit the website, I have to fill in my postal code:

When clicking the `Search’-button, my browser opens :

…that URL contains the four numbers of my postal code at the end. Indeed, that page too makes my browser fetch content from Google’s systems. Now, thanks to the Referer-header, the postal code I provided is disclosed to Google as well. Specifically, it is disclosed to, and

GET /pagead/conversion/1071768439/?random=1337601791571&cv=7&fst=1337601791571&num=1&fmt=3&label=HMtdCNrcuAEQ98aH_wM&bg=666666&hl=en&guid=ON&u_h=1080&u_w=1920&u_ah=1080&u_aw=1920&u_cd=24&u_his=6&u_tz=120&u_java=true&u_nplug=8&u_nmime=81&ref=http%3A// HTTP/1.1
User-Agent: Mozilla/5.0 (X11; OpenBSD i386; rv:5.0) Gecko/20100101 Firefox/5.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Proxy-Connection: keep-alive

GET /__utm.gif?utmwv=5.3.1&utms=4&utmn=1587224412&×1080&utmvp=1024×605&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=11.2%20r202& HTTP/1.1
User-Agent: Mozilla/5.0 (X11; OpenBSD i386; rv:5.0) Gecko/20100101 Firefox/5.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Proxy-Connection: keep-alive

GET /pagead/viewthroughconversion/1071768439/?random=1337601791571&cv=7&fst=1337601791571&num=1&fmt=3&label=HMtdCNrcuAEQ98aH_wM&bg=666666&hl=en&guid=ON&u_h=1080&u_w=1920&u_ah=1080&u_aw=1920&u_cd=24&u_his=6&u_tz=120&u_java=true&u_nplug=8&u_nmime=81&ref=http%3A// HTTP/1.1
User-Agent: Mozilla/5.0 (X11; OpenBSD i386; rv:5.0) Gecko/20100101 Firefox/5.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Proxy-Connection: keep-alive
Cookie: id=ccfc97b450000c1||t=1337591014|et=730|cs=002213fd4815288209299939c3

(Yes, GeoIP services may already reveal the geographical location of an IP address with more precision and accuracy, but that is besides the point.)

Information disclosure via these types of web bugs is old and well-known. In fact, EFF’s The Web Bug FAQ dates back to 1999. But the problem is becoming more relevant now that those third parties are used by 100M+ people and more and more personal data is collected and sold in the market.

Besides a violation of your visitors’ privacy, loading external content may also pose a security risk to your visitors: every system that your website requires your visitors’ browser to load content from can get compromised and serve malware. That also holds for Google, Facebook and Twitter. The more systems you make your visitors’ browser load content from, the more risk you expose your visitors to.

`Browser-Reflected Information Disclosure” might be an appropriate label for these types of privacy violations. (If you have a better suggestion, please comment.)

The solution is very simple: instead of including a `Like’-button e.g. via an IFRAME that loads likebox.php from Facebook’s systems, put up a hyperlink to the Facebook page you want your visitors to `Like’. Instead of including a `+1′-button, put up a hyperlink to your Google Plus page. Instead of including a Paypal `Donate’-button from Paypal’s systems, make a local copy of that button image and link to that image in your <img>-tags.

Dutch MoD Innovation Competition 2012: “CYBER Operations 2.0”

UPDATE 2012-11-09: and the winner is…. Dutch technology start-up BusinessForensics that submitted a solution for in-memory big data analysis (Dutch). Congrats!

The Dutch Ministry of Defense (MoD) annually issues a “Defense Innovation Competition”, a competition that is intended to get input from and foster relations with Dutch industry and SME. This year’s theme is “CYBER Operations 2.0”. The project document (.pdf, in Dutch) describes it as follows:

For operations and command, the Dutch MoD relies on radio and satellite connections and the internet. But developments such as WiFi, smartphones and tablets will eventually make their appearance in the armed forces. The difference between military radio networks and the internet is therefore becoming more diffuse. And cyber is therewith definitively added to the domain of Defense.Guided by the Dutch National Cyber Security Strategy, the government, industry and knowledge institutions join forces. Externally, the MoD closely cooperates with these other players in the cyber security chain. But internally, the MoD must guarantee the integrity of its own information provisioning, networks and IT infrastructure. Therefore, the MoD is actively pursuing enhanced digital defensibility and the development of cyber as an operation capability. Regarding cyber, the MoD is expeditious and innovative; under the motto “Cyber, more than Defense!”, the MoD must be able to operate in the same way it does in other dimensions (land, sea, air, space), in other words, the MoD must also defend, delay, maneuver, attack and gather intelligence in the cyber dimension. Cyber Security thus entails more than Cyber Defence: for the MoD, it means: Cyber Operations. To this end, the MoD founded the Taskforce Cyber in January 2012.

In order to guarantee its future military capability (power) in the cyber domain, the MoD is in need of new technologies and innovations. With the Defense Innovation Competition 2012, the MoD is challenging the SME and Dutch industry. Use your innovation, your creativity and technological ingenuity to make a tangible contribution to the future of cyber operations.

The proposals are judged by seven criteria:

  1. Applicability/implementability
  2. Innovation
  3. Feasibility
  4. Quality (in terms of language, argumentation)
  5. Competence, reputation of submitting entity
  6. Risk analysis of follow-up phase

The MoD has reserved EUR 200k to make the winning idea become reality. The deadline for submitting proposals is August 22nd 2012. Participation is restricted to Dutch industry and SME.