Advocate (Dr.) Prashant Mali's Blog: Google User Search Logs – Is it Personal Data or Information as per LAW?

Sunday, June 16, 2013

Google User Search Logs – Is it Personal Data or Information as per LAW?

Google User Search Logs – Is it Personal Data or Information?

Privacy concerns relate to personally information or personal data, that is, as defined in The IT Rules under The ITAct, 2000 i.e “Personal information” means any information that relates to a natural person, which, either directly or indirectly, in combination with other information available or likely to be available with a body corporate, is capable of identifying such person. Information which can be used to uniquely identify, contact, or locate a specific individual person. Federal privacy legislation protects personal data in a number of contexts, such as health information, financial data, or credit reports. Similarly, the European data protection framework applies to "personal data," defined as "any

information relating to an identified or identifiable natural person ('data subject'); an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity."

Information that cannot be linked to an individual person is not problematic from a privacy standpoint. Imagine we have highly revealing data about Sify user 200805, but we do not know, nor can we find out, who the user is. Or consider I tell you that X is a drugs-addicted, searches for teen sex, who earns Rs.80,000 a month, half of which is spent on online porn. Absent any indication as to the identity of X, this information is meaningless from a privacy perspective.

Do users' search logs constitute "personal data"?

Can the data in search logs be traced to specific individuals?

I show that they do, and therefore raise serious privacy problems. First, as noted above, search engines log a user's queries under such user's IP address. An IP address is a unique string of numbers assigned to a user's computer by his/her Internet Service Provider (ISP) in order to communicate with her computer on the network. Simply put, it is the cyberspace equivalent of a real space street address or phone number. An IP address may be dynamic, meaning a different address is assigned to a user each time He/she logs on to the network; or static, that is assigned to a computer by an ISP to be its permanent Internet address. The question of whether an IP address constitutes "personal data" has been much debated in the EU. It is equivalent to asking whether "Plot no. 435, Malabar Hill, Mumbai" or "919821763157" constitutes personal data. The answer depends on whether the address might be linked to an "identified or identifiable natural person" through reasonable means. Clearly, a static address is more "personal" than a dynamic address; and in either case, an address is more "personal" in the possession of an ISP, which has the capacity to link it to a specific user's registration information, than in the hands of other parties. The European data protection watchdog, the Article 29 Working Party, has already opined that even dynamic IP addresses constitute "personal data." It stated that "unless the ISP is in a position to distinguish with absolute certainty that the data correspond to users that cannot be identified, it will

have to treat all IP information as personal data, to be on the safe side."

Consequently, even if Google could not link an IP address (and therefore her search log) to a specific individual, the fact that ISPs have such capability and that the government may order them to do so renders search logs "personal data" for privacy purposes. It is the capacity to link, not the actual linking, that makes the data personal.

Second, to overcome the difficulty of profiling users who access search engines using a dynamic IP address, search engines set "cookies" which tag users' browsers with unique identifying numbers. Such cookies enable search engines to recognize a user as a recurring visitor to the site and amass her search history, even if she connects to the Internet via a different IP address. As a result of pressure by EU data protection

regulators, Google has already shortened the duration of its cookie,

which was initially set to expire in 2038, to a period of two years after a user's last Google search. The privacy benefits of such a move are doubtful, however, since as long as Google remains the Internet's leading search engine, users are bound to renew the two-year period on a daily basis.

The Google privacy policy states: "When you use our services or view content provided by Google, we may automatically collect and store certain information inserver logs. This may include: cookies that may uniquely identify your browser or your Google Account.

We use various technologies to collect and store information when you visit a Google service, and this may include sending one or more cookies or anonymous identifiers to your device. We also use cookies and anonymous identifiers when you interact with services we offer to our partners, such as advertising services or Google features that may appear on other sites. You may also set your browser to block all cookies, including cookies associated with our services, or to indicate when a cookie is being set by us. However, it’s important to remember that many of our services may not function properly if your cookies are disabled. For example, we may not remember your language preferences”. See Privacy Policy of Google. As a matter of fact, few users change their browser's default settings to reject cookies

One of the major weaknesses of a cookie as a tracking device is the fact that it is accessibly only by the web server that placed it on a user's computer. In other words, the Times of India cookie is read by the Times of India web site, but not by Yahoo or Wikipedia. You might therefore think of a cookie as a device that helps one snoop after a guest in his/her own house, but not in neighboring houses or public areas. However, this weakness has been overcome by Google in its takeover of advertising powerhouse DoubleClick. DoubleClick was the leading provider of Internet-based advertising, tracking users' behavior across

cyberspace and placing advertising banners on web sites. The company is a long-time nemesis of privacy advocates. In February 2000, EPIC filed a complaint with the FTC alleging that DoubleClick was unlawfully tracking the online activities of Internet users and combining surfing records with detailed personal profiles contained in a national marketing database. The case ended in a settlement, pursuant to which DoubleClick undertook a line of commitments to improve its data collection practices, increase transparency and provide users with opt out options. DoubleClick continues to utilize third-party cookies as well as its "DART" (Dynamic, Advertising Reporting, and Targeting) technology to track user activity across multiple web sites.

In its complaint to the FTC about the Google-DoubleClick merger, EPIC

had alleged that by purchasing Doubleclick, Google expanded its ability to pervasively monitor users not only on its web site but also on cyberspace as a whole.

Third, much like IP addresses, cookies are arguably not "personal data" because they identify a specific browser (typically, a computer) as opposed to an individual person. Yet, if a cookie and related search log could be cross-referenced with an individual's name, the cookie itself would become personal data. Think of the cookie as a label on

a "box of personal data" of an unnamed person, who is under investigation by a Investigating Officer. Typically, the label says something like "740674ce2123e969," and thus does not implicate anyone's privacy. Yet, once the Investigating Officer comes across the person's name, she immediately affixes it to the label, rendering the

contents of the box "personal data." The box of personal data is of course analogous to a user's search log and Google to the Investigating Officer. And there are plenty of instances in which Google comes across a user's real name. In addition to its search engine, Google provides users with a wide array of online services, many of which require registration using real name and e-mail address credentials. First and

foremost is Gmail, the ubiquitous web based e-mail service launched in April 2004 as a private beta release by invitation only and opened to the public in February 2007.

Gmail gained its prominence and notoriety by providing a simple bargain for users: get an unprecedented amount of online storage space; [When Gmail was initially launched in 2004 with 1GB of storage space, Hotmail, its leading competitor, provided users with 2MB (that is, 0.2% of what Gmail gave). ] gave Google the opportunity to scan your e-mails' contents and add to them context-sensitive advertisements. The launch of Gmail turned out to be one of the most controversial product launches in the history of the Internet and placed Google at the center of a fierce privacy debate.

Privacy advocates criticized the precedent set by Google of eliminating a person's expectation of privacy in the contents of her communications, as well as the consequential violation of non-subscribers' privacy interests in their correspondence.

This Blog does not address the serious privacy issues raised by Gmail itself, but rather the synergetic privacy risk created by cross-referencing user search logs with information collected by Gmail as part of the registration process. In other words,registration to Gmail or additional Google services such as Google Talk (instant messaging service), Google Reader (RSS feeds), Google Calendar (a user’s schedule),

or Google Wallet (credit card/payment information for use on other sites), places the missing "name tag" on a user's search log, thereby rendering its contents highly combustive from a privacy perspective.Notice that cross-referencing user search logs with registration information is distinct from Google correlating search logs with users' e-mail contents, the prospect of which is an additional cause of concern for privacy advocates. It simply means Google can pick the name of a user off of his/her registration form and attach it to a cookie, which serves as the key to her search log.

In other words, because Google uses the same cookie to maintain a particular user's search history and to identify her when she logs-on to her Gmail account, the anonymous nature of the cookie is lost and the search log becomes sensitive personal data.

Finally, even thoroughly anonymized search logs can be traced back to their originating user. This can be done by combing search queries for personal identifiers, such as a social PAN Numbers or credit card details. It becomes simpler yet by the tendency of users to run "ego

searches" (also known as "vanity searches" or "egosurfing"), the practice of searching for one's own name on Google (once, twice, or many times per day). In fact, in its effort to quash the government subpoena issued in Gonzales v. Google, Google itself posited that "search query contents can disclose identities and personally identifiable information such as user-initiated searches for their own social security or credit card

numbers, or their mistakenly pasted but revealing text."

There is also Google Web History, of course, which provides consenting users with a personalized search experience linked to a personal account. Hence, Google Web History explicitly de-anonymizes one's search log.

While it is true that users may register for services such as Gmail with a false or pseudonymous name, I suspect few do. I use Gmail as my main e-mail account due to its geographic and chronological versatility (you do not have to change e-mail addresses each time you relocate or switch jobs) and storage space. I use my real name, since I would not want colleagues or friends to receive e-mails from

"ADV" or "Prashant197" and have to guess that I am the sender.

To sum, the contents of user search logs are clearly personal in nature. The question is whether such contents may be traced to a specific user. Google's ability to combine IP addresses, persistent cookies and user registration information renders the data in search logs not only personal but also personally identifiable.

I suggest this in to be added in the definition of “personal data” of The Indian Privacy (Protection) Act,2013 an also in the definition of “Personal Information” as defined in The IT Act,2000

..Adv Prashant Mali is an Renowed Cyber Law & Cyber Security Expert Lawyer based out of Mumbai