Challenges in the Multivariate Analysis of Mass Cytometry Data: The Effect of Randomization

Georgios Papoutsoglou, Vincenzo Lagani, Angelika Schmidt, Konstantinos Tsirlis, David Gomez-Cabrero, Jesper Tegner, Ioannis Tsamardinos

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Cytometry by time-of-flight (CyTOF) has emerged as a high-throughput single cell technology able to provide large samples of protein readouts. Already, there exists a large pool of advanced high-dimensional analysis algorithms that explore the observed heterogeneous distributions making intriguing biological inferences. A fact largely overlooked by these methods, however, is the effect of the established data preprocessing pipeline to the distributions of the measured quantities. In this article, we focus on randomization, a transformation used for improving data visualization, which can negatively affect multivariate data analysis methods such as dimensionality reduction, clustering, and network reconstruction algorithms. Our results indicate that randomization should be used only for visualization purposes, but not in conjunction with high-dimensional analytical tools.
Original languageEnglish (US)
Pages (from-to)1178-1190
Number of pages13
JournalCytometry Part A
Volume95
Issue number11
DOIs
StatePublished - Nov 6 2019

Fingerprint Dive into the research topics of 'Challenges in the Multivariate Analysis of Mass Cytometry Data: The Effect of Randomization'. Together they form a unique fingerprint.

Cite this