— Cathy Hills (@fleurhills) February 22, 2017
In this discussion and report on privacy, anonymity and big data obtained from mooc participation, the authors state that
It is impossible to anonymize identifiable data without the possibility of affecting some future analysis in some way.
The de-identification or anonymizing student data compromises the integrity of the resulting dataset. The authors suggest strategies for resolution, using either ‘differential privacy’ which ‘hides’ information in a database which can then be interrogated statistically, or leaving the original data intact and controlling its use:
Realizing the potential of open data in social science requires a new paradigm for the protection of student privacy: either a technological solution such as differential privacy,3 which separates analysis from possession of the data, or a policy-based solution that allows open access to possibly re-identifiable data while policing the uses of the data.