An approach for classification of highly imbalanced data using weighting and undersampling

Ashish Anand, Pugalenthi Ganesan, Gary B. Fogel, P. N. Suganthan

    Research output: Contribution to journalArticlepeer-review

    76 Scopus citations

    Abstract

    Real-world datasets commonly have issues with data imbalance. There are several approaches such as weighting, sub-sampling, and data modeling for handling these data. Learning in the presence of data imbalances presents a great challenge to machine learning. Techniques such as support-vector machines have excellent performance for balanced data, but may fail when applied to imbalanced datasets. In this paper, we propose a new undersampling technique for selecting instances from the majority class. The performance of this approach was evaluated in the context of several real biological imbalanced data. The ratios of negative to positive samples vary from ~9:1 to ~100:1. Useful classifiers have high sensitivity and specificity. Our results demonstrate that the proposed selection technique improves the sensitivity compared to weighted support-vector machine and available results in the literature for the same datasets.

    Original languageEnglish (US)
    Pages (from-to)1385-1391
    Number of pages7
    JournalAmino Acids
    Volume39
    Issue number5
    DOIs
    StatePublished - Nov 1 2010

    Keywords

    • Imbalanced datasets
    • SVM
    • Undersampling technique

    ASJC Scopus subject areas

    • Biochemistry
    • Clinical Biochemistry
    • Organic Chemistry

    Fingerprint Dive into the research topics of 'An approach for classification of highly imbalanced data using weighting and undersampling'. Together they form a unique fingerprint.

    Cite this