Get to Know Me: Protecting Privacy and Autonomy under Big Data's Penetrating Gaze
Sheri B. Pan
Abstract
TABLE OF CONTENTS I. INTRODUCTION II. CURRENT CONCEPTIONS OF PERSONAL INFORMATION A. Privacy Theories B. Privacy Laws C. Privacy Policies III. CHARACTERISTICS OF BIG DATA A. Data Collection Is Constant and Imperceptible B. New Insights Are Generated C. Inferred Information Is Often Sensitive D. Discovered Correlations Are Unexpected IV. PRIVACY AND AUTONOMY HARMS FROM BIG DATA A. Use Harms B. Non-Use Harms 1. Learning Private Information 2. Limiting Autonomy 3. Impeding Anonymity 4. Eroding Belief in Human Agency V. ASSESSING ALGORITHMS AND HARMS VI. CONCLUSION I. INTRODUCTION Big data, storage and analysis of large datasets, now affects everyday life. (1) It personalizes ads, calculates criminal sentences, and predicts criminal activity or, recast in a different light, constructs filter bubbles, (2) violates rights of procedural due process, and enables police departments to target communities on a discriminatory basis. (3) Both benefits and dangers of applications of big data have been widely discussed in popular discourse and legal literature. (4) But before big data can be used by companies and governments to provide services or make decisions, it must first derive inferences about people within datasets. It compiles, analyzes, evaluates, and predicts a person's actions and attributes, all before conclusions are used for a business or state purpose. Current privacy discussions are predominantly concerned with how inferred information is used. (5) This Note, however, proposes that process of analyzing data to infer information about people also threatens their privacy and autonomy interests. This Note proceeds in four parts: Part II summarizes current academic, legal, and industry conceptions of informational privacy and argues they have failed to consider harm potentially posed by big data's capability of inferring new personal information; Part III considers novel and unique characteristics of big data collection and analytics; Part IV discusses how big data threatens privacy and autonomy interests by making inferential conclusions about people's attributes and conduct, even if conclusions are never used; and Part V proposes a framework to differentiate between data analysis that is innocuous and harmful. The framework states that a data-mining algorithm violates privacy and autonomy interests if: (1) it relies on an unexpected correlation between data points, (2) it infers personal information of a particularly sensitive nature, and (3) generating inference breaches contextual integrity. II. CURRENT CONCEPTIONS OF PERSONAL INFORMATION Privacy has traditionally been difficult to define and regulate. Despite disagreement over how to best treat issue, privacy theories, privacy law, and privacy policies share a characteristic in common: conceptualizing personal information as static pieces of knowledge about someone. Part II makes this observation by examining theories of privacy, privacy laws, and privacy policies. A. Privacy Theories A fundamental theory of privacy defines privacy as control over personal information. In his seminal book on privacy, privacy scholar Alan Westin articulates control theory as the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others. (6) Legal scholar Arthur Miller writes that privacy is the individual's ability to control circulation of information relating to him. (7) In other words, privacy-as-control perspective concludes that a person maintains privacy when she can decide how her information is collected, shared, used, retained, or otherwise manipulated. Before big data, maintaining control over data one shared with others necessarily meant controlling one's personal information. If a viewer voluntarily gave Netflix her ratings of certain movies and decided how Netflix could share, use, and retain ratings, she maintained control over information. …
12 citations
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.58 × 0.4 = 0.23 |
| M · momentum | 0.80 × 0.15 = 0.12 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.