add intro before differential privacy

This commit is contained in:
fria 2025-07-07 10:39:55 -05:00 committed by GitHub
parent 4a6a15d213
commit 9a4f49a2b3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -29,6 +29,12 @@ Obviously, being able to identify individuals based on publicly available data i
### Before Differential Privacy ### Before Differential Privacy
Being able to collect aggregate data is essential for research. It's what the U.S. Census does every 10 years.
Usually we're more interested in the data as a whole and not data of individual people as it can show trends and overall patterns in groups of people. However, in order to get that data we must collect it from individuals.
It was thought at first that simply removing names and other obviously identifying details from the data was enough to prevent re-identification, but [Latanya Sweeney](https://latanyasweeney.org/JLME.pdf) (a name that will pop up a few more times) proved in 1997 that even without names, a significant portion of individuals can be re-identified from a dataset by cross-referencing external data.
Previous attempts at anonymizing data have relied on been highly vulnerable to reidentification attacks. Previous attempts at anonymizing data have relied on been highly vulnerable to reidentification attacks.
#### AOL Search Log Release #### AOL Search Log Release
@ -43,6 +49,8 @@ Analyst [Nathan Ruser](https://x.com/Nrg8000/status/957318498102865920) indicate
It was also possible to [deanonymize](https://steveloughran.blogspot.com/2018/01/advanced-denanonymization-through-strava.html) individual users in some circumstances. It was also possible to [deanonymize](https://steveloughran.blogspot.com/2018/01/advanced-denanonymization-through-strava.html) individual users in some circumstances.
####
### Dawn of Differential Privacy ### Dawn of Differential Privacy
Most of the concepts I write about seem to come from the 70's and 80's, but differential privacy is a relatively new concept. It was first introduced in a paper from 2006 called [*Calibrating Noise to Sensitivity in Private Data Analysis*](https://desfontain.es/PDFs/PhD/CalibratingNoiseToSensitivityInPrivateDataAnalysis.pdf). Most of the concepts I write about seem to come from the 70's and 80's, but differential privacy is a relatively new concept. It was first introduced in a paper from 2006 called [*Calibrating Noise to Sensitivity in Private Data Analysis*](https://desfontain.es/PDFs/PhD/CalibratingNoiseToSensitivityInPrivateDataAnalysis.pdf).