Parallel Algorithms and Applications in Data Science
School of Computer Science
Carleton University, Ottawa, Canada
Privacy-Preserving Data Publishing (PPDP) is an ongoing field of research that involves de-identification of data so it can be shared for secondary use such as analytics and health care research while minimizing information loss.
Balancing data utility and data privacy is a challenging problem and this research project intends to analyze the implementation of Top-Down Specialization anonymization algorithm on a Spark™ cluster. Top-Down Specialization is a technique where values are specialized from the most generic to the most specialized until k-anonymity is violated.
Top-Down Specialization is one of the methods recommended for anonymizing datasets with large k value requirements. The larger the k the more anonymous the dataset is.