Google User Search Logs – Is it Personal Data or Information as per LAW?
Google User Search Logs – Is it Personal Data or Information?
Privacy concerns relate to personally information or personal data,
that is, as defined in The IT Rules under The ITAct, 2000 i.e “Personal information” means any information that relates
to a natural person, which, either directly or indirectly, in combination with
other information available or likely to be available with a body corporate, is
capable of identifying such person. Information which can be used to uniquely identify, contact, or
locate a specific individual person. Federal privacy legislation protects
personal data in a number of contexts, such as health information, financial
data, or credit reports. Similarly, the European data protection framework
applies to "personal data," defined as "any
information relating to an identified or identifiable natural
person ('data subject'); an identifiable person is one who can be identified,
directly or indirectly, in particular by reference to an identification number
or to one or more factors specific to his physical, physiological, mental,
economic, cultural or social identity."
Information that cannot be linked to an individual person is not
problematic from a privacy standpoint. Imagine we have highly revealing data
about Sify user 200805, but we do not know, nor can we find out, who the user is.
Or consider I tell you that X is a drugs-addicted, searches for teen sex, who
earns Rs.80,000 a month, half of which is spent on online porn. Absent any
indication as to the identity of X, this information is meaningless from a
privacy perspective.
Do users' search logs constitute "personal data"?
Can the data in search logs be traced to specific individuals?
I show that they do, and therefore raise serious privacy problems.
First, as noted above, search engines log a user's queries under such user's IP
address. An IP address is a unique string of numbers assigned to a user's
computer by his/her Internet Service Provider (ISP) in order to communicate
with her computer on the network. Simply put, it is the cyberspace equivalent
of a real space street address or phone number. An IP address may be dynamic,
meaning a different address is assigned to a user each time He/she logs on to
the network; or static, that is assigned to a computer by an ISP to be its
permanent Internet address. The question of whether an IP address constitutes
"personal data" has been much debated in the EU. It is equivalent to
asking whether "Plot no. 435, Malabar Hill, Mumbai" or "919821763157"
constitutes personal data. The answer depends on whether the address might be
linked to an "identified or identifiable natural person" through reasonable
means. Clearly, a static address is more "personal" than a dynamic address;
and in either case, an address is more "personal" in the possession
of an ISP, which has the capacity to link it to a specific user's registration
information, than in the hands of other parties. The European data protection
watchdog, the Article 29 Working Party, has already opined that even dynamic IP
addresses constitute "personal data." It stated that "unless the
ISP is in a position to distinguish with absolute certainty that the data
correspond to users that cannot be identified, it will
have to treat all IP information as personal data, to be on the
safe side."
Consequently, even if Google could not link an IP address (and
therefore her search log) to a specific individual, the fact that ISPs have
such capability and that the government may order them to do so renders search
logs "personal data" for privacy purposes. It is the capacity to
link, not the actual linking, that makes the data personal.
Second, to overcome the difficulty of profiling users who access
search engines using a dynamic IP address, search engines set
"cookies" which tag users' browsers with unique identifying numbers. Such
cookies enable search engines to recognize a user as a recurring visitor to the
site and amass her search history, even if she connects to the Internet via a
different IP address. As a result of pressure by EU data protection
regulators, Google has already shortened the duration of its
cookie,
which was initially set to expire in 2038, to a period of two
years after a user's last Google search. The privacy benefits of such a move
are doubtful, however, since as long as Google remains the Internet's leading
search engine, users are bound to renew the two-year period on a daily basis.
The Google privacy policy states:
"When you use our services or
view content provided by Google, we may automatically collect and store certain
information inserver logs. This may include: cookies that may uniquely
identify your browser or your Google Account.
We use
various technologies to collect and store information when you visit a Google
service, and this may include sending one or more cookies or anonymous
identifiers to your device. We also use cookies and anonymous identifiers
when you interact with services we offer to our partners, such as advertising
services or Google features that may appear on other sites. You may also set your browser to block all cookies, including
cookies associated with our services, or to indicate when a cookie is being set
by us. However, it’s important to remember that many of our services may not
function properly if your cookies are disabled. For example, we may not
remember your language preferences”. See
Privacy Policy of Google. As a matter of fact, few users change their browser's default
settings to reject cookies
One of the major weaknesses of a cookie as a tracking device is
the fact that it is accessibly only by the web server that placed it on a
user's computer. In other words, the Times of India cookie is read by the Times
of India web site, but not by Yahoo or Wikipedia. You might therefore think of
a cookie as a device that helps one snoop after a guest in his/her own house,
but not in neighboring houses or public areas. However, this weakness has been
overcome by Google in its takeover of advertising powerhouse DoubleClick. DoubleClick
was the leading provider of Internet-based advertising, tracking users'
behavior across
cyberspace and placing advertising banners on web sites. The
company is a long-time nemesis of privacy advocates. In February 2000, EPIC
filed a complaint with the FTC alleging that DoubleClick was unlawfully
tracking the online activities of Internet users and combining surfing records
with detailed personal profiles contained in a national marketing database. The
case ended in a settlement, pursuant to which DoubleClick undertook a line of
commitments to improve its data collection practices, increase transparency and
provide users with opt out options. DoubleClick continues to utilize
third-party cookies as well as its "DART" (Dynamic, Advertising Reporting,
and Targeting) technology to track user activity across multiple web sites.
In its complaint to the FTC about the Google-DoubleClick merger,
EPIC
had alleged that by purchasing Doubleclick, Google expanded its
ability to pervasively monitor users not only on its web site but also on
cyberspace as a whole.
Third, much like IP addresses, cookies are arguably not
"personal data" because they identify a specific browser (typically,
a computer) as opposed to an individual person. Yet, if a cookie and related
search log could be cross-referenced with an individual's name, the cookie
itself would become personal data. Think of the cookie as a label on
a "box of personal data" of an unnamed person, who is
under investigation by a Investigating Officer. Typically, the label says
something like "740674ce2123e969," and thus does not implicate
anyone's privacy. Yet, once the Investigating Officer comes across the person's
name, she immediately affixes it to the label, rendering the
contents of the box "personal data." The box of personal
data is of course analogous to a user's search log and Google to the Investigating
Officer. And there are plenty of instances in which Google comes across a
user's real name. In addition to its search engine, Google provides users with
a wide array of online services, many of which require registration using real
name and e-mail address credentials. First and
foremost is Gmail, the ubiquitous web based e-mail service
launched in April 2004 as a private beta release by invitation only and opened
to the public in February 2007.
Gmail gained its prominence and notoriety by providing a simple
bargain for users: get an unprecedented amount of online storage space; [When
Gmail was initially launched in 2004 with 1GB of storage space, Hotmail, its
leading competitor, provided users with 2MB (that is, 0.2% of what Gmail gave).
] gave Google the opportunity to scan your e-mails' contents and add to
them context-sensitive advertisements. The launch of Gmail turned out to be one
of the most controversial product launches in the history of the Internet and
placed Google at the center of a fierce privacy debate.
Privacy advocates criticized the precedent set by Google of
eliminating a person's expectation of privacy in the contents of her
communications, as well as the consequential violation of non-subscribers'
privacy interests in their correspondence.
This Blog does not address the serious privacy issues raised by
Gmail itself, but rather the synergetic privacy risk created by
cross-referencing user search logs with information collected by Gmail as part
of the registration process. In other words,registration to Gmail or additional
Google services such as Google Talk (instant messaging service), Google Reader
(RSS feeds), Google Calendar (a user’s schedule),
or Google Wallet (credit card/payment information for use on other
sites), places the missing "name tag" on a user's search log, thereby
rendering its contents highly combustive from a privacy perspective.Notice that
cross-referencing user search logs with registration information is distinct
from Google correlating search logs with users' e-mail contents, the prospect
of which is an additional cause of concern for privacy advocates. It simply
means Google can pick the name of a user off of his/her registration form and
attach it to a cookie, which serves as the key to her search log.
In other words, because Google uses the same cookie to maintain a
particular user's search history and to identify her when she logs-on to her
Gmail account, the anonymous nature of the cookie is lost and the search log
becomes sensitive personal data.
Finally, even thoroughly anonymized search logs can be traced back
to their originating user. This can be done by combing search queries for
personal identifiers, such as a social PAN Numbers or credit card details. It
becomes simpler yet by the tendency of users to run "ego
searches" (also known as "vanity searches" or
"egosurfing"), the practice of searching for one's own name on Google
(once, twice, or many times per day). In fact, in its effort to quash the
government subpoena issued in Gonzales v. Google, Google itself posited that "search query contents can
disclose identities and personally identifiable information such as
user-initiated searches for their own social security or credit card
numbers, or their mistakenly pasted but revealing text."
There is also Google Web History, of course, which provides
consenting users with a personalized search experience linked to a personal
account. Hence, Google Web History explicitly de-anonymizes one's search log.
While it is true that users may register for services such as
Gmail with a false or pseudonymous name, I suspect few do. I use Gmail as my
main e-mail account due to its geographic and chronological versatility (you do
not have to change e-mail addresses each time you relocate or switch jobs) and storage
space. I use my real name, since I would not want colleagues or friends to
receive e-mails from
"ADV" or "Prashant197" and have to guess that
I am the sender.
To sum, the contents of user search logs are clearly personal in
nature. The question is whether such contents may be traced to a specific user.
Google's ability to combine IP addresses, persistent cookies and user
registration information renders the data in search logs not only personal but
also personally identifiable.
I suggest this in to be added in the definition of “personal data”
of The Indian Privacy (Protection) Act,2013 an also in the definition of “Personal
Information” as defined in The IT Act,2000
..Adv Prashant Mali is an Renowed Cyber Law & Cyber Security Expert Lawyer based out of Mumbai
Needed for the hour.. great point sir...
ReplyDelete