There’s nothing wrong with saying that you “own” data. Public Knowledge has supported data ownership as a colloquialism that reflects an intuition: Data about us provides information regarding the intimacies of our very identity and existence. Speaking in this way, we should certainly “own” or have control over that data to protect our fundamental right to privacy.
But it’s a different matter to say that the law should treat data as a property, as a thing to be owned, in the same sense as a car, a bag of chips, or copyright. Such a data ownership regime is not practicable, and would not protect individual privacy as effectively as any number of other approaches.
The idea of data ownership is not new, but it has recently seen some airtime in the debate around federal privacy legislation. On Capitol Hill, senators have expressed interest in data ownership as a way to give individuals control over their personal data and for them to be compensated for commercial uses of that data. This conception of data ownership is not a figure of speech where “ownership” is used to express the principle of user control. Rather, the issue here is an individual’s right to literally own personal data in the same way that an individual can own other forms of personal property. While ownership is an increasingly foreign concept in the world of consumer goods where you are much more likely to license the thing you “buy” from a vendor, many commercial data practices already exchange data as if it were some kind of personal property.
Treating data as “property” in the legal sense raises some puzzling questions. For example, you might not want Facebook to use certain data it knows about you in various ways, and a property right in that data might be a way to accomplish this. But what if your spouse knows the same things about you? And what if he or she tells your mother-in-law? Is that trespassing? And can you use the traditional tools of property rights enforcement to put a stop to it? Does it infringe on your property rights for third parties to learn something about you by happenstance, or as a result of your interactions with them, or is it just the use of data about you that you are concerned with?
It is the nature of property rights that they apply against everyone. If there is some right you want that applies to entities like Facebook or AT&T, but not to your cousin or anyone else, it’s not “property” as traditionally understood. Property is widely understood to be an exclusive right given to individuals against everyone, not one that conditionally applies sometimes against some other parties but not other times and not against everyone. (That’s not to say that a property owner can’t choose who to give permission to, and under what circumstances.) The actual goals of most privacy advocates and ordinary people who want greater privacy protections have little to do with the legal rights and tools that “property” systems provide.
The purpose of this blog post is to illustrate in more depth a few reasons why it is problematic to use data ownership as the foundation of a comprehensive federal privacy legislation. To the extent that data ownership even addresses the privacy problem — a tenuous connection — data ownership should not be grounded in copyright law, and new (sui generis) data ownership rights are likely to create a practical and legal mess that will not meaningfully protect consumer privacy. Privacy is a basic consumer protection issue best resolved through comprehensive federal privacy legislation. To achieve the worthy goal of data sharing to promote competition or scientific research, lawmakers should instead look at imposing data portability and interoperability mandates on certain online platforms to give users true choice and control over what to do with their data.
What Is Personal Data Anyway?[1]
Before we dive in, it might be useful to think about what types of data exist. Or, to put it another way, if we had the right(s) to own our own data, what would we own? In his book, “The Data Revolution,” Rob Kitchin notes that data can be broadly categorized as representative data (like a person’s name and age), implied data (data that exists in the absence of other data), and derived data (such as data that is created through artificial intelligence or other algorithm-driven processing). These are important distinctions because personal privacy in the online world is threatened by a host of data processing activities that are driven by automated decision making, most of which takes place “behind the scenes” of the user experience.
While personal data will typically fall into the category of representative, implied and derived data provides economically valuable information about a person, often at the expense of individual privacy, and it certainly should fall within the scope of personal data. It’s important to note that any data that is related to individuals and groups will reflect our inherently flawed human behavior, including biases and conflicting norms. We’re seeing this play out with artificial intelligence (AI), where predictive analytics has generated outcomes that are biased against marginalized and minority communities.
Further, personal data has some important characteristics that impact potential ownership frameworks. First, unlike physical goods, data is non-rivalrous, meaning that more than one entity can possess the same data at the same time. Second, it is generally non-excludable, meaning that it is easily shared, so restrictions and limitations on sharing must be imposed to avoid widespread dissemination of data. This is particularly relevant with digital data on the internet. Third, data can be reproduced with often negligible or zero marginal cost, meaning it’s usually very cheap to create copies of data.
What Does This Have to Do With Privacy?
Proponents of data ownership argue that property rights in personal information allow consumers to retain control over how information about them is used because negotiating such rights through private contracts could (in theory) allow an individual to limit corporate (and government) uses of personal data. They also argue that contracting could facilitate compensation arrangements under which consumers may sell or lease their data for commercial uses, instead of the current system where you receive ad-supported services in exchange for use of your data.
This overlooks the fact that the asymmetric information and power imbalances that plague the current data ecosystem would persist under a data ownership regime. Individuals would not have the information to understand what they are selling, or the bargaining power to get a fair price. Aside from the means of compensation, it’s hard to see how this is any different from the current failed “notice and choice” privacy regime. Consumers are already faced with the impossible task of reading and understanding countless privacy policies (read: contracts) that outline the scope of how their information is used by the companies that profit off of data, many of which we don’t have any direct contact with. Note that individuals have zero leverage to negotiate these privacy policies and terms of use. Would this change under a data ownership regime? More on this later…but the short answer is, almost certainly, no.
If the goal of data ownership is simply to get paid in money for your data rather than with a service, this looks a lot more like some kind of federal statutory personality or publicity right — a mechanism whereby you may monetize your identity — than a privacy right. Discussing the merits of creating a federal right of publicity, for example through the Lanham Act, which governs trademark protection, is outside the scope of this post. In general, however, Congress should not be creating incentives for individuals to accept payment in exchange for signing over their personal data, which could include incredibly privacy-invasive information (such as biometric, health, and precise geolocation data) as well as seemingly non-sensitive information that could be used by trained algorithms to infer intimate information. Such arrangements could lead to disparate impacts affecting members of low-income and other marginalized communities who might not be so privileged to sell or lease their data sparingly. We can be confident that pay-for-surveillance will be popular among data-hungry businesses. Companies have been more than happy to pay users, including teens, to collect data on them, and they have the leverage to change the terms of the contracts at their whim, almost always to the detriment of users.
Keep Copyright Law out of This
Copyright law often gets implicated in discussions surrounding data ownership, typically because creative works share the non-rivalrous and non-excludable characteristics of data. But even if personal data was covered under copyright (which it is not), the policy goals of copyright differ from those of privacy in important ways, making copyright the wrong approach to a privacy law based on data ownership.
To be clear, you likely do not, and definitely should not, have ownership in your personal data under copyright law. Under the Copyright Act, copyright protection exists, “in original works of authorship fixed in any tangible medium of expression,” but, “in no case does copyright protection for an original work of authorship extend to any idea, procedure, process, system, method of operation, concept, principle, or discovery.” Common sense tells us that facts about a person in the form of data points are not original works of authorship, and indeed it’s well-settled as a matter of law that facts are in the public domain.
Note also that copyright protection doesn’t extend to procedures, processes, or systems. But what about the implied or derived data that is created by those systems? Even assuming that the algorithm or AI can be an author, a dubious proposition to say the least, is the data or data set that is created sufficiently original to receive copyright protection? Courts that have grappled with these or similar questions have held that data created by committee or machine is not eligible for copyright protection. This is how it should be. In no way should the law be changed to take facts out of the public domain, even if those facts are generated by AI. A robust public domain is critical to provide raw materials for the creation of new knowledge and to fulfill the constitutional purpose of copyright, “to promote the progress of science and useful arts.”
Despite the shared characteristics of data and creative works as outlined above, the policy goal of copyright is to incentivize artists and creators to create new works and to give them control over the commercialization of their work. Contrast this goal with that of privacy, which is to protect individual privacy rights from the panoply of harms that arise through unauthorized uses of personal information. The goal is not to incentivize the creation of more data, it is to protect an individual’s privacy interests in their personal data that already exists or could be created through implication or inference later. This fundamental distinction makes copyright the wrong regime for privacy protection.
A New Personal Data Property Right Would Do Little (If Anything) to Change the Status Quo
A personal data economy already operates in which data is treated in a manner similar to personal property like tradable financial assets. In an article published last year in the Boston College Law Review, Professor Stacy-Ann Elvy illustrated how both privacy policies and financial frameworks like Article 9 of the Uniform Commercial Code and the Bankruptcy Code commodify consumer data with detrimental effects. For example, an Internet of Things (IoT) device company that collects a vast trove of biometric and precise location data from its customers can use such a database as collateral to finance its operations, and in the event of a default, a lender can generally do what it wants with that database to satisfy the debtor’s obligations. Such aggregate data is valuable on the secondary market and could readily find its way into the hands of bad actors like predatory lenders or stalkers. It might also be used to train AI to identify new members of a group based on seemingly unrelated information. Even if the data is de-identified, such data (including metadata) may be re-identified.
This personal data economy could be organized around data ownership. While property rights in personal data don’t exist under U.S. law, lawmakers have the power to create them from whole cloth. In fact, the European Union’s General Data Protection Regulation and the California Consumer Privacy Act provide a right to data portability, which operates as a quasi-ownership right. Despite this, a number reasons cut against adopting data ownership to protect user privacy.
First, data ownership presents thorny questions of law, not the least of which is determining questions of ownership — particularly when multiple potential authors are implicated. Take, for example, biometric data related to a person who has had reconstructive surgery on his face. If a company seeks to purchase this data for AI training on facial recognition, who owns the data? Is the doctor a co-owner? When Cambridge Analytica created psychographic profiles of users, would they have partial ownership over those profiles? How does the ownership get divvied up? Questions of ownership are complex and highly contentious in all areas of the law, but if you need a related example, look no further than copyright law where the issue has been highly litigated. Any such issues in a data ownership regime would be significantly amplified in our connected world of IoT where over 2.5 quintillion bytes of data are generated each day.
Second, the information and transaction costs involved in administering the data ownership economy would be staggering. Imagine having to negotiate individual agreements for every exchange of data that affects you online. Do I have to obtain a license from an athlete to use his homerun and batting average statistics in my fantasy baseball league? How could I be expected to oversee every microtransaction that involves my data? And how could I ensure that further transactions weren’t taking place among third, fourth, and nth parties? This is where the non-rivalrous and non-excludable characteristics of data really rears its ugly head. It’s an unworkable situation that borders on the comical when we fold in the reality of multiple data owners described above.
More importantly, these negotiations would often entail the same significant information asymmetries and power imbalances that leave consumers with no choice but to “consent” to the sharing of their data in our current online ecosystem. The data ownership contracts would probably look a lot like this license “agreement” that Amazon’s Ring doorbell users enter into, signing away the copyright they have in their images:
You hereby grant Ring and its licensees an unlimited, irrevocable, fully paid and royalty-free, perpetual, worldwide right to re-use, distribute, store, delete, translate, copy, modify, display, sell, create derivative works from and otherwise exploit such Shared Content for any purpose and in any media formats in any media channels without compensation to you. You shall not use […]
Further, the actual value of data is an open question, and determining such value (or getting such value wrong) for purposes of negotiation imposes costs on the contracting parties. Care must be taken when addressing questions of data valuation because it risks transforming privacy, a human rights issue, into an economic exercise. Senators Mark Warner and Josh Hawley have recently introduced the DASHBOARD Act, which requires commercial data operators to file an annual report on the aggregate value of user data that they have collected. Such transparency reporting requirements can be helpful for consumers and regulators to better understand the value proposition that data operators offer to their users, but it should not be used to justify a data ownership regime, or its mirror image: A pay-for-privacy regime.
Data is important and can be used in myriad ways that benefit society and the public interest. To achieve the worthy goal of data sharing to promote competition or scientific research, data portability and interoperability mandates on certain online platforms is the best policy solution. Such mandates must, however, be integrated into a robust, consumer protection privacy regime starting with comprehensive federal privacy legislation.
[1] The author is aware that “data” is a plural noun, but he just can’t use it that way—just can’t. Apologies to the grammar purists.