Project Title: Top-Down Specialization on Apache Spark™

Name: Macarious Abadeer

School of Computer Science
Carleton University, Ottawa, Canada

Project Outline

Privacy-Preserving Data Publishing (PPDP) is an ongoing field of research that involves de-identification of data so it can be shared for secondary use such as analytics and health care research while minimizing information loss.
Balancing data utility and data privacy is a challenging problem and this research project intends to analyze the implementation of Top-Down Specialization anonymization algorithm on a Spark™ cluster. Top-Down Specialization is a technique where values are specialized from the most generic to the most specialized until k-anonymity is violated.
Top-Down Specialization is one of the methods recommended for anonymizing datasets with large k value requirements. The larger the k the more anonymous the dataset is.

Startup Reference Paper(s)

U. Sopaoglu and O. Abul. A top-down k-anonymization implementation for apache spark. In 2017 IEEE International Conference on Big Data (Big Data), pages 4513– 4521, December 2017.

Deliverables

Slide Presentation (*.pdf | *.pptx)
Final Paper
Code and Data (*.zip)

COMP 5704