Monday, May 23, 2011

Google Profiles Exposes Millions of Usernames, Gmails

Please also read the follow-up post I published on May 24th 2011. It contains a better description of my motivation and less technical details.
-- Matthijs R. Koot, 2011-06-01




UPDATE 2011-05-23 #1: I'm currently writing a paper about the topic discussed below. The activities are performed as part of my research on anonymity/privacy in the System & Network Engineering research group at the University of Amsterdam. A tweet on May 20th 2011 by Mikko Hypponen, as described here, urged me to post a bit prematurely. Google has been informed.

UPDATE 2011-05-23 #2: here is code that can convert most of the data in your Google Profile into a single SQL statement: http://cyberwar.nl/GProfile2SQL.js . When accessing a profile in a browser, the profile data (names, profession, education, ...) is stored in a single multidimensional Javascript array named OZ_initdata[][][...]. Install spidermonkey for its C-based Javascript engine js, download your own profile and save it as e.g.  mrkoot.html. Then execute someting like sed -n '/var OZ_initData = /,/^;window/{ s/.*var OZ_initData = /var OZ_initData = /g; s/^;window.*//g; p; }'  mrkoot.html | tee tmpjs | js -f tmpjs -e 'print(OZ_initData[5]);' | js -f tmpjs -f GProfile2SQL.js to get an INSERT statement. Optimizations are left as an exercise to the reader; you can figure out the table structure from the Javascript code and extend everything as you wish.

====== START OF ORIGINAL BLOGPOST FROM 2011-05-23 ======
The existence of Google's profiles-sitemap.xml has been known outside Google since at least 2008. The XML file, last updated March 16th 2011, points to 7000+ sitemap-NNN(N).txt files that each contain 5000 hyperlinks to Google profiles; 35M links in total. Snippet from sitemap-000.txt:

https://profiles.google.com/117135902571938793602
https://profiles.google.com/112006952710949332145
https://profiles.google.com/105382462492606983441
https://profiles.google.com/109299750146769054739
https://profiles.google.com/104555562341640123846
https://profiles.google.com/112956845518767535694

Google Profile allows users to choose whether they want to use their username in the Google Profile URL to make it more easy to find and remember:

The text explicitly warns the user about possible exposure (bold emphasis added):
"To make it easier for people to find your profile, you can customize your URL with your Google email username. (Note this can make your Google email address publicly discoverable.)" 
Selecting the second option gives an URL like https://profiles.google.com/USERNAME. Accessing profiles using the identifiers found in the sitemaps indeed reveals the Google username -- and therefore @gmail.com address. E.g. for me w/username "mrkoot":

irbaboon:be monkey$ curl -i -X HEAD http://www.google.com/profiles/115572197788225218471
HTTP/1.1 301 Moved Permanently
Location: /profiles/mrkoot
Content-Type: text/html; charset=UTF-8
Date: Mon, 23 May 2011 14:00:31 GMT
Expires: Mon, 23 May 2011 14:00:31 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Transfer-Encoding: chunked


Note that the HTTP 301 Redirect discloses the username before any HTML is requested. During February 2011 I checked ALL 35 MILLION LINKS --my connection did NOT get blocked after any amount of connections-- and found that ~40% of the Google Profiles expose their owner's username and hence @gmail.com address in this way. It totals to ~15 MILLION exposed usernames / @gmail.com addresses(*). With no apparent download restriction in place for connections to https://profiles.google.com and Google users disclosing their profession, employer, education, location, links to their Twitter account, Picasa photoalbums, LinkedIn accounts et cetera this seems like a large-scale spear phishing attack waiting to happen?(**) But hey, the users HAVE been warned.

This blog runs at Google Blogger. I sincerely hope my account "mrkoot" and blog.cyberwar.nl will not be blocked or banned - I do NOT publish any usernames or other profile data and did not violate policy I am aware of.

(*) I can provide proof if necessary.
(**) Pardon the alarmist tone.

5 comments:

  1. Goede morgen,
    Wat mij recent opviel was, dat toen ik bij Google mijn naam ingaf (initialen + achternaam) er vrij hoog in de resultaten mijn Amazon.com profile gevonden was (naam, foto, leeftijd, ge"tagde" boeken etc.). Amazon vertelde mij dat zij daar niets aan konden doen, maar dat ik natuurlijk wel zelf mijn profile zodanig kon aanpassen dat er minder informatie zichtbaar werd. Ze verwezen mij verder naar Google als ik hier iets aan wilde veranderen! Ik voeg mij af of dat zou werken.
    MVG, jlb

    ReplyDelete
  2. hello,
    i am from Dokuz Eylul University at Turkey,
    thanks for sharing, it will be really useful for writing sthg academic. but someone can make spam :(
    i think you should have to give the tip for who request via an academic mail address.
    have a nice day,
    twitter @mustafaturan

    ReplyDelete
  3. The key point here is you are free what to add as personal information in your Google profile.

    ReplyDelete
  4. If you post public information, you should expect to have it viewed.

    Regardless of whether you view one public profile or scrape a million, the information you will receive is only what the person behind the profile has allowed you to see.

    Simply put, if you're afraid of your information being seen by another person or aggregated into a database by someone who thinks they could make money out of your information, it's best you don't publish anything you wouldn't want people seeing.

    ReplyDelete