TY - JOUR
T1 - Impact of data preprocessing on cell-type clustering based on single-cell RNA-seq data.
AU - Wang, Chunxiang
AU - Gao, Xin
AU - Liu, Juntao
N1 - KAUST Repository Item: Exported on 2020-10-19
Acknowledged KAUST grant number(s): BAS/1/1624, FCC/1/1976-18-01, FCC/1/1976-23-01, FCC/1/1976-25-01, FCC/1/1976-26-01
Acknowledgements: This work was supported by the National Natural Science Foundation of China (61801265 and 11931008), and King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Awards Nos.BAS/1/1624-01, FCC/1/1976-18-01, FCC/1/1976-23-01, FCC/1/1976-25-01, FCC/1/1976-26-01, REI/1/0018-01-01, REI/1/4216-01-01, and URF/1/4098-01-01. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
PY - 2020/10/8
Y1 - 2020/10/8
N2 - BACKGROUND:Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. RESULTS:We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. CONCLUSION:The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.
AB - BACKGROUND:Advances in single-cell RNA-seq technology have led to great opportunities for the quantitative characterization of cell types, and many clustering algorithms have been developed based on single-cell gene expression. However, we found that different data preprocessing methods show quite different effects on clustering algorithms. Moreover, there is no specific preprocessing method that is applicable to all clustering algorithms, and even for the same clustering algorithm, the best preprocessing method depends on the input data. RESULTS:We designed a graph-based algorithm, SC3-e, specifically for discriminating the best data preprocessing method for SC3, which is currently the most widely used clustering algorithm for single cell clustering. When tested on eight frequently used single-cell RNA-seq data sets, SC3-e always accurately selects the best data preprocessing method for SC3 and therefore greatly enhances the clustering performance of SC3. CONCLUSION:The SC3-e algorithm is practically powerful for discriminating the best data preprocessing method, and therefore largely enhances the performance of cell-type clustering of SC3. It is expected to play a crucial role in the related studies of single-cell clustering, such as the studies of human complex diseases and discoveries of new cell types.
UR - http://hdl.handle.net/10754/665618
UR - https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-03797-8
U2 - 10.1186/s12859-020-03797-8
DO - 10.1186/s12859-020-03797-8
M3 - Article
C2 - 33028196
VL - 21
JO - BMC bioinformatics
JF - BMC bioinformatics
SN - 1471-2105
IS - 1
ER -