Finding the balance between privacy protection and data sharing is one of the main challenges in managing human genomic data nowadays. Novel privacy-enhancing technologies are required to address the known disclosure threats to personal sensitive genomic data without precluding data sharing. In this paper, we propose a method that systematically detects privacy-sensitive DNA segments coming directly from an input stream, using as reference a knowledge database of known privacy-sensitive nucleic and amino acid sequences. We show that adding our detection method to standard security techniques provides a robust, efficient privacy-preserving solution that neutralizes threats related to recently published attacks on genome privacy based on short tandem repeats, disease-related genes, and genomic variations. Current global knowledge on human genomes demonstrates the feasibility of our approach to obtain a comprehensive database immediately, which can also evolve automatically to address future attacks as new privacy-sensitive sequences are identified. Additionally, we validate that the detection method can be fitted inline with the NGS - Next Generation Sequencing - production cycle by using Bloom filters and scaling out to faster sequencing machines.
|Original language||English (US)|
|Title of host publication||WPES 2015 - Proceedings of the 2015 ACM Workshop on Privacy in the Electronic Society, co-located with CCS 2015|
|Publisher||Association for Computing Machinery, Incacmhelp@acm.org|
|Number of pages||10|
|State||Published - Oct 12 2015|