share
Meta Stack OverflowIs Gravatar a privacy risk?
[+91] [11] matt wilkie
[2010-03-31 17:01:24]
[ discussion gravatar privacy ]
[ http://meta.stackoverflow.com/questions/44717/is-gravatar-a-privacy-risk ] [DELETED]

I ran across the comment below at http://onemansblog.com/2007/02/02/protect-your-privacy-delete-internet-usage-tracks/#comment-58200 about Gravatar [1]. I'm particularly curious of Meta Stack Overflow's opinions on points 4 and 6, though the others may be of interest too. Are these concerns real, and if so, what defensive measures might be used?

Comment by AL 2009-02-18 00:03:55

I’m a lawyer specialising in internet and privacy issues at a Fortune 100 company and I personally think that Gravatar is easily the worst service available in terms of your data security and privacy. I generally don’t comment on any blogs that are Gravatar-enabled (this being an exception), for the following reasons:

  1. The entire reason Gravatar offers their service is to collect internet usage data across multiple sites. It is not offered free out of the goodness of their heart. The entire purpose of the service is to analyse the way YOU navigate the internet.

  2. Gravatar has clear plans to monetise this data. Whether they are successful or not is another story.

  3. It is unlikely that Gravatar would ever disclose individual user’s personal information, but it is not impossible. The Chinese government has often requested to these kind of information aggregators to disclose data for the prosecution of political dissidents – and very often these requests are met resulting in bloggers being jailed (see Yahoo!’s experiences in China). For example, if I leave a number of comments promoting democracy criticising the PRC government on various blogs, it is entirely possible that the Chinese government could use legal authority to request the holder of information to disclose that to them. By retaining this information and preventing you from stoppping it’s collection, Gravatar is putting both bloggers and commenters at risk. This is not just in China. The Patriot Act and many other new pieces of post-9/11 legislation in Western countries convey similar powers to government.

  4. The most egregious part of Gravatar’s service is the inability to stop them from collecting your data. I have in the past tried to cancel a Gravatar registration. Gravatar does not allow this and will continue to track your e-mail address for the rest of time.

  5. Gravatar does not provide any details about how they use your personal information and does not respond to any queries relating to privacy issues.

  6. I do not believe Gravatar is an opt-in service. Obviously they will not display an avator unless you register, but if a blog is Gravatar-enabled, every time you comment on it, your e-mail address is sent to Gravatar. Even if they do not retain this address (and it is quite possible that they do – their Privacy Policy is silent on this point and they have not responded to any of my enquiries on this point), it is VERY likely that your internet usage is still tracked in an anonymous fashion. That is, if I use the same e-mail address to comment on 5 different blogs, even if I am not a registered Gravatar user the fact that a user has accessed those 5 blogs is very likely retained by Gravatar.

Much is made of facebook and Google Chrome’s use of personal information, but Gravatar is far and away the worst popular internet service I have encountered in terms of user (and non-user) personal information.

As a lawyer, I strongly urge all blog authors and users who are concerned about their privacy to avoid Gravatar.

 

Related: Is using Gravatar a security risk? [2]

See also some thoughts on privacy at meta.stackoverflow.com/questions/4553/non-gravatar-avatar/… - Arjan
(2) @Arjan: the other talked mainly (only?) about ways for a malicious user to obtain our email addresses from the MD5 hash. This talks about gravatar itself being evil. - Andreas Bonini
As for 3), that's really bad. On many sites I could use any email address when leaving a comment, even addresses that are not mine at all. I'd rather have some government act on that. :-( (More on that Yahoo! incident at news.bbc.co.uk/2/hi/4221538.stm) - Arjan
(13) Not to minimize the problem, but using the internet is a privacy risk. - tvanfosson
(4) #6 is wrong. Gravatar only gets an MD5 digest of your email address. - Brad Gilbert
(1) Here is Automattic's privacy policy: automattic.com/privacy - Jon Seigel
@Brad: Does the MD5 change between postings? - mmyers
@mmyers if you are an unregistered user changing the email, yes. - perbert
(1) @Downvoter - You lose. Besides, how is gravatar privacy concerns not on-topic on meta? - Adam Davis
I see we've been switched to Unicornify. But that's probably because it's now April 1 in UTC. - mmyers
@mmyers - it was switched an hour or two before midnight UTC. Jeff probably couldn't contain himself. I can imagine him giggling spuriously now. - Adam Davis
@Polly: I think you meant "furiously". Spuriously means without grounds or merit. - Lawrence Dol
Beware of following that link to onemansblog. My antivirus blocked a trojan horse there called JS:Redirector-MA[Trj] - DOK
(2) Seems the issue is far worse now than it was when this question was posted originally. arstechnica.com/security/2013/07/… - GordonM
(1) to summarize the Arstechnica article "Got an account on a site like Github? Hackers may know your e-mail address" found by @GordonM, above: It's getting easier and easier to extract a person's email address from their Gravatar hash (45% from one forum in a day or so). If having your email address known is risky for you, then Gravatar is risky. - matt wilkie
[+22] [2010-03-31 17:58:36] Adam Davis [ACCEPTED]

Is gravatar a privacy risk?

Yes.

Is it as great a risk as DoubleClick [1]/Google?

No. Notably these sites use Google Analytics [2]. Just like Gravatar, they don't have access to personally identifiable information (that is, the email is hashed before they get their hot mitts on it).

Is it a small risk?

Yes. If you don't like someone noting that an unidentifiable user (that's you) visited two different websites - well, they have that information now. That aggregate data can, in theory, be mined (as was the "anonymous" AOL search data [3] of yore) to identify you.

Should we give up on the gravatar service?

No. It's a useful service for many people, and many of them accept the cost for this "free" service.

Who is laughing at us right now?

The Amish [4].

Actually, no, they don't even care [5].

[1] https://en.wikipedia.org/wiki/DoubleClick
[2] http://en.wikipedia.org/wiki/Google_Analytics
[3] https://en.wikipedia.org/wiki/AOL_search_data_leak
[4] https://en.wikipedia.org/wiki/Amish
[5] http://failblog.org/2009/10/27/amish-fail/

(2) @json - lol! I remember that site. While the idea and front page were funny, the rest of it was hilarious - someone put a lot of time into that one! - Adam Davis
1
[+14] [2010-04-01 00:05:47] Steve Jessop

IMO sites need to use gravatar sensibly.

If you say you don't publish your users' email addresses, that should mean you don't publish an MD5sum of their email addresses either. Hashing sensitive data without a salt is a schoolboy error: web developers should know better. Publishing the hash of some private data is a breach of privacy if the data is subject to a dictionary attack, which email addresses are.

Just replace address@domain with address+salt@domain.

SO in effect allows you to do this manually, by setting the email address for your account. It doesn't use the address for anything other than gravatar unless you ask it to, so it doesn't have to really be your email address.

I'm pretty sure this is an accident, though, not a security feature, since SO also uses your IP address in the absence of an email address. IP addresses are even more subject to dictionary attack than email addresses.

Of course for the salt to be effective in preventing gravatar tracking you across sites, gravatar has to not know the email address behind it (since if it did know, it could merge the records of address+*@domain). This means that (a) you must live with a random icon, and therefore (b) the user should be able to specify whether they want the salt added or not. If your email provider doesn't support +salt, and you want the site to be able to send you email without publishing the hash of your email address, then you're generally out of luck: you can have one or the other.

In fact I'd say that ideally sites should default to just generating a random "md5sum" for each user, and only use the email address to generate a gravatar URL with permission. For users with no interest in uploading an image to gravatar, there's no earthly reason why any site should use a gravatar URL based on supposedly-private data. Unless you count ignorance of basic security principles as a "reason" ;-)


ideally sites should default to just generating a random "md5sum" -- nice! Maybe a less secure alternative, as a quick workaround, might be to automatically add a well-known salt (like +stackoverflow) and then tell users to use an email provider that supports plus-addressing (en.wikipedia.org/wiki/E-mail_address#Sub-addressing) and then register a Gravatar for that "salted" address, if they want to customize the avatar. Not perfect, but no need for database and interface changes. - Arjan
Still, the third-party avatar site can track any visitor's whereabouts (which is not the same as user activity) on sites that use their avatar (though Gravatar tells a browser to use a 5 minute cache, so currently won't find fine-grained data in their logs). - Arjan
(1) @Arjan: Sure, it's still a web bug, just with salt it's harder to deduce links across sites, and impossible to reverse. Any out-of-site content is a web bug: twitter feed widgets, advertising content, etc. If a user wants to avoid the bugging, they can block image loading from that site or redirect it through TOR. Maybe sites which say or imply as part of their privacy T&C that they won't sell their weblogs to the highest bidder, should asterisk that with a note that they effectively give all their weblogs to gravatar for free via icons. - Steve Jessop
Somebody who has enough reputation to do so, please up-vote this answer. It speaks directly to one of the core questions and suggests a reasonable counter-measure (albeit only applicable to web developers). Thanks. :) - matt wilkie
Using a salt here would serve no purpose at all. In order for the gravatar "service" to work, the identifier it gets needs to be the same from all of the various sites on the net, so the same salt would have to be used in each case, and it could hardly be kept a secret from gravatar.com, or for that matter the NSA or the chinese. - mc0e
(1) @mc0e: that's not correct. Gravatar is used for two distinct but related purposes: (1) to generate pseudo-random avatars, (2) to use the same avatar across multiple sites. For sites that respect their users' privacy, the first is sufficient until the user requests otherwise. That is why I say "For users with no interest in uploading an image to gravatar, there's no earthly reason why any site should use a gravatar URL based on supposedly-private data." - Steve Jessop
@Steve Jessop: Point taken, but in that case why use the email address at all in constructing the identifier string for gravatar? - mc0e
(1) @mc0e: why indeed? This is why my answer says, "In fact I'd say that ideally sites should default to just generating a random "md5sum" for each user, and only use the email address to generate a gravatar URL with permission" - Steve Jessop
2
[+12] [2010-03-31 17:40:15] balpha

As someone who provides a similar service (however on a scale that's a tiny bit smaller), I have to say that I myself am sometimes concerned what kind of information I could pull from the access logs if I wanted to.

On the other hand, whenever I comment on some blog, join a forum or whatever, it's my own choice to provide my personal email address. If I'm concerned about that, I can either a) not join at all, b) not provide an email address (if the site allows that, as SO does), or c) create an extra email address for this purpose.

So my view is: It's no bigger or smaller problem than any other privacy concerns resulting from data collection, be it PayPal knowing where you shop, myOpenId knowing where you log in, or Google knowing... well, everything.

That's not to say it's something that can be ignored, but I don't think Gravatar is a special case.


(3) For PayPal and OpenID one explicitly chooses to use it; I don't think there's any PayPal or OpenID-enabled site that sends hashed information about its users when just viewing a page. But indeed for Google Analytics et al such details are sent without explicit consent. However, for Gravatar et al, a web site owner can decide any moment to start using it, and then send hashed details of existing and new users (not necessarily the current visitor) to that third party... - Arjan
@balpha - You're just trying to get SOFU to switch to your service! Consulting - If you're not part of the solution, there's a great deal of money to be made prolonging the problem. - Adam Davis
Hmmm, so you teamed up with Jeff et al to get more data today? How's the quota at Google Apps doing today? ;-) - Arjan
(5) "it's my own choice to provide my personal email address." The problem is that you make that choice with an insufficient data basis. Many websites claim that your email will never be published, but still use gravatar. - CodesInChaos
"It's no bigger or smaller problem than any other privacy concerns resulting from data collection", except for Stack Exchange making the list of MD5 hashes public in the data dump. Which, according to an answer on "Someone contacted me by email but my email is not public", might only take 3 days to get all email addresses with 10 or less characters from major email providers. - Arjan
@Arjan That is true (and is something we've recently started addressing), but isn't related to the topic of this very question. - balpha
True, Balpha, and thanks for that reference. - Arjan
3
[+11] [2011-02-18 19:54:47] Arjan

In December 2009, somebody tested getting email addresses from some of the Stack Overflow users, by assuming the display name might be related to an email account at some of the major providers. According to Gravatars: why publishing your email's hash is not a good idea [1] that assumption is true for about 10% of the SO users:

Running my program on a list of 80871 users I was able to extract 8597 email addresses, associated to their users. This means that for a bit more than 10% of the users, the username and the gravatar URL are enough to deduce the email address they used to register to the website.

(Apart from this, I also dislike the web bug nature [2] of Gravatar and the like.)

[1] http://www.developer.it/post/gravatars-why-publishing-your-email-s-hash-is-not-a-good-idea
[2] http://meta.stackoverflow.com/questions/4553/can-we-use-non-gravatar-avatars/5658#5658

(1) I don't think that Gravitar had much impact there. The bigger issue is that the username for an account was identical to the username used for another site. You could easily test if I had a valid yahoo|google|hotmail account and skip the md5 verification against gravitar step. I would bet you would find far more email accounts by just sending email to any single word username. - Zoredache
@Zoredache, to me spam is not related to privacy (unless it's very personally targeted spam, maybe). - Arjan
(1) Still I am not sure I understand the point. If I can guess that you have username@gmail.com, and I want to test it, why wouldn't I just send you a message? I could learn the same information by sending a message as I can learn by trying a md5(username@gmail.com). - Zoredache
(1) Good point, @Zoredache. It would leave some trail when sending the messages, or kind of warn the user(s) if the email was delivered, or might have a slightly lower success rate if arjan@gmail is not actually me (and it isn't). But for many/most hits that wouldn't matter a lot indeed. - Arjan
(1) Anyway, the article is valid, I just don't think it really means much. If I use zoredache as an accout name someone can easily guess that I may use zoredache everywhere. The gravitar thing just lets a person test that without having to creating a fake email account to test from. - Zoredache
(1) An update: this attack was recently used in Sweden, where the attackers managed to de-anonymize 45% of accounts based solely upon this weakness in Gravatar. - D.W.
4
[+7] [2011-03-25 09:22:39] CodesInChaos

One problem with Gravatar isn't solved by blocking the server.

The website you use publishes the hash of your email address. At a minimum, this makes it possible to find other websites where you used the same email address.

Looking at the Jan 2011 Stack Exchange data dump:

  • 105k unique Gravatar hashes
  • 10k are IPv4 [1] based, the IP address can be trivially found
  • At least 27k of these are so simple they can be guessed. This does not require a relation between nick and address, but just that the email address is built using a common pattern, such as firstname.lastname@gmail.com.
  • Stack Exchange does not validate email addresses, so some of them are invalid. Since fake addresses are harder to guess (gdsfgsdf.sdfadf.com etc.), I assume that if we use valid addresses as a basis, the percentage of guessable addresses is even larger.

All of this applies even to users who have not registered an account with Gravatar.

Many websites (including Stack Overflow) promise to not publish your email address, but at the same time use Gravatar and thus leak information about about the email address they promised to keep secret. If websites insist on using Gravatar, they should at least tell the user that the email address gets published, instead of lying to their users.

[1] http://en.wikipedia.org/wiki/IPv4

Like referenced in my answer: 10% of Stack Overflow email addresses could be harvested (or be validated to be correct), if the user name is related to the email address. - Arjan
You can also try combinations of two words and some other dictionary attacks. With that you get more than 10%. - CodesInChaos
(3) IP-based gravatars are now salted, so that's one problem resolved. - Jeremy Banks
(1) Care you extend a bit on the 27k firstname.lastname@gmail.com users? Like: are those people who use Firstname Lastname as their display name here, and happen to have a matching email address? Or was the *.*@gmail.com pattern enough to get that 27k addresses from the hashes? (In the first case, a spammer probably wouldn't even care to validate if the address matches the hash, but just assume the address exists, but that's not about privacy of course. I'm still happy I'm not arjan@gmail... And: nice overview, thanks!) - Arjan
(3) I tried several patterns using different wordlists, lists of common first-/surnames, the SO usernames, random characters, numbers, and combinations thereof on common email providers. And obviously all IPv4 addresses. Getting the first 20k is pretty easy, after that there is a lot of overlap to emails found earlier, and search becomes increasingly expensive. - CodesInChaos
(1) One interesting tidbit: <gmane.org> doesn't expose the hashes; it uses an opaque URL (with a rather small numeric component) on gmane.org itself, rather than using anything that embeds the gravitar hash. (Granted, the images weren't actually working last I checked, but that's just an implementation problem.) - SamB
5
[+7] [2012-08-05 20:43:39] coding

Gravataring an email address ( MD5 [1] hashing) is effectively making a single ID that identifies you publicly. Even if you never signed up with Gravatar, they're still tracking your ID that comes in - and the site using Gravatar is providing that ID to all users. This ID can be found on other sites, so if someone does a full transitive search across all public Internet forums, they could see what the same ID has posted. Where privacy breaks down is that if all of these sites know your real email address, and has publicly given everyone your Gravatar ID, then the Chinese government (for example) could harass not just a single entity, but ANY one of those websites that published your Gravatar ID.

What makes more sense to me is if Gravatar simply "GAVE OUT" an identifier (to you) if you sign up. And when you want to use your Gravatar, you simply give that same ID to sites that use Gravatar.

Sites that use Gravatar without your consent are the ones to blame. Gravatar's user base grows because of this principle (it's their business model - people that want a picture associated with their latest post).

I wish people would try to understand this concept first. Another poster here is generally correct in that if someone REALLY wanted to find out your real identity, they probably could, but Grav IDs make it even easier. You'll see when the equivalent of rainbow tables [2] comes along for Grav IDs (like a site that lets you enter a gravid and it will tell you all URLs that use that GravID.)

I just thought of another way to secure gravatars- they could have used PKI [3] rather than hashing. With PKI with some time based or per-instance/site salt, the ID would not be normalizable to an outsider. In fact, shame on Gravatar for not thinking of this. They're going to claim it has something to do with access to encryption APIs in the languages they are supporting, but I'm pre-emptively calling bullshit right now.

[1] http://en.wikipedia.org/wiki/MD5
[2] https://en.wikipedia.org/wiki/Rainbow_table
[3] http://en.wikipedia.org/wiki/Public-key_infrastructure

6
[+5] [2010-03-31 17:57:54] Jon Seigel

There's a guy and a girl in a bar, and the guy says: &lquot;I'm not a lawyer but I give advice as one on the internet.&rquot;


(14) It's immaterial to me whether the poster is a lawyer or a dog. It's the statements themselves I wanted reflection on not their source. - matt wilkie
@mattwilkie Yaw callin' me a dawg? - muntoo
(3) @MatB, replacing the image with an imgur.com version that lacks the copyright notice, and linking to the source, is not satisfying that copyright, I guess? (But it does make it easier for the author to find this site.) Jon, that website, mtncartoons.com, claims "Typically, $100 newsletter, $50 Web use." I guess we should delete this answer? - Arjan
Oh, that should have read: @SamB. (And the previous version, being a deep link to that site, was as easy to find for the author I guess. Still, removing the copyright notice makes things worse, I feel.) - Arjan
@Arjan: I'm not sure how this makes things worse, given that the link goes right to a page with the other version, and that the notice is no longer a prerequisite for any of the rights? Regardless, I could (try to) change it to the copy of this version of the image that is actually on the author's server (the path is pretty strange, though), or you could change it to use the other version, if that makes you feel better. - SamB
7
[+3] [2010-03-31 17:48:31] jmfsg

As a lawyer, I strongly urge all blog authors and users who are concerned about their privacy to avoid Gravatar.

What does being a lawyer have to do with any of that rant?


(7) He is giving legal advice. Which, of course, lawyers never do, so my guess is that by saying this line he is actually proving that he is not a lawyer. - Adam Davis
(4) I don't think any of that qualifies as "legal" advice @pol - jmfsg
Giving legal advice stating that you are a lawyer when you are not is illegal in virtually every country in the world. Anyway I agree with @Downvoter; none of it is legal advice - Andreas Bonini
(5) @Downvoter - Have you ever talked with a lawyer? They are very careful to point out that everything that comes forth from their mouth is "not intended to be legal advice" in seemingly every situation where they identify themselves as a lawyer and say anything related to legal/ethical/privacy/freedom/liberty/death. Only the inexperienced lawyers appear eager to share their great wisdom and power without disclaiming liability - and those lawyers aren't usually worth trusting... - Adam Davis
So, no, technically he's not "giving legal advice" but he could be liable for what he said, and it's possible that someone could successfully sue him for following his advice if the outcome was not the desired result. In that sense, he is giving legal advice. - Adam Davis
(3) That one sentence alone is full of fail if this guy is an actual lawyer. Pollyanna's got it covered, no competent lawyer would so expertly paint himself into a corner like this, it goes against everything taught in law school. - David
@David - Yes, especially since they can be disbarred if it is found that they are giving bad legal advice. If nothing else, the annoyance of a complaint to the relevant bar association is enough to keep most lawyers on their feet about what they say in their official capacity as lawyers. - Adam Davis
Agreed, the lawyer part is tangential. I included the whole post to be faithful to the source although personally I don't care if the poster was a lawyer or a smart dog. Can answers be merged? If so this should be combined with the cartoon. - matt wilkie
8
[+3] [2010-03-31 17:51:15] Adam Davis

Note that DoubleClick [1] (now Google) has for a very long time collected such usage data.

If this is a concern, use your browser or your hosts file [2] to block all accesses to Gravatar's servers. This will disable their ability to follow you.

[1] https://en.wikipedia.org/wiki/DoubleClick
[2] http://en.wikipedia.org/wiki/Hosts_%28file%29

If he has a Unix-like desktop or any kind of proxy he could even strip all the referrers to gravatar.com and still see the pretty pictures. en.wikipedia.org/wiki/HTTP_referrer#Referrer_hiding - perbert
(1) Or replace it with the unicorns URL =) *waits for the china unicorns to knock on his door* - Andreas Bonini
@Kop - Hey, jeff followed your idea! - Adam Davis
(6) How would this help? If I, as an individual user, block my computer from accessing gravatars servers, aren't they still getting my email address or hash from the website-using-gravatar that I'm leaving a comment on? - matt wilkie
9
[+3] [2010-04-01 00:36:08] Adam Davis

Given the recent switch to Unicornify, I believe this question can be closed as status-completed.


Not to mention that now Google and balpha know which avatars are downloaded by which IP address... - Arjan
@Arjan: SO already has Google analytics on it, so Google already knew who hit what pages. If they wanted to know which icons were on which page, they can just scrape the pages, none of them are private. So balpha is the big winner here :-) - Steve Jessop
@Steve, no Analytics allowed here, but the sudden Unicornify got me. ;-) Too bad balpha has set the cache to one day! - Arjan
"no Analytics allowed here" - that's OK, I'm sure Google can send ninjas to break into SO's server room and steal the logs to fill in the blanks. If they ever need to know. - Steve Jessop
Could you please expand on how/why Unicornify, whatever that is, brings this to status-completed for SO? Also the Q is still relevant in a broader non-SO context . - matt wilkie
10
[0] [2012-05-30 05:32:18] Wombat_RW

Is Gravatar a privacy risk? WT* [1]

I realize these posts are quite old, but I have been dismayed at the level of paranoia shown by some of the posters on this and similar threads.

Surely, in this day and age the cautious consciously obtains / maintains a few email accounts which matter little and are serviced even less... Just for all those mostly sign up and forget situations. So who cares if someone without a life and with huge amounts of computer power (and hacker software) eventually figures out an address from an hash.

If the name part of the free (probably) mail account(s) is mostly gobbledygook then that makes hash parsing / matching even harder. And if you use simple, real words, names or numbers for passwords to boot you have to accept you are really stupid.

But at the end of the day, a lot more worrisome for the paranoid (and so called "lawyers" with a bone to pick) are the plethora of form submission checks (AKA like excellent "aKismet") and general RBL [2] access blockers / redirects used (quite rightly so) on so many web sites you won't even know about unless you are actually a spammer of various sorts.

All those rely on and receive our email addresses and IP addresses of which they wholly or in part rely on. Is an upfront service like Gravatar to be trusted less when those others could track FAR more than a few sites here and there displaying an avatar pic. I think not.

And while we're at it, get the Ghostery [3] add-on for your browser and see just how many months it can take to get on top of (blocking) the huge amount of third-party tracking cookies we are barraged with. That's really scary too.

Discovery of a mere email address one consciously supplies has nothing on all those occurrences I mentioned, any of which if harvested can also be packaged for others at a profit. Any monetized site especially SHOULD declare it all in a readily accessed privacy declaration; does yours?

[1] https://en.wiktionary.org/wiki/WTF#Initialism
[2] https://en.wikipedia.org/wiki/DNSBL
[3] https://en.wikipedia.org/wiki/Ghostery

11