Today: Apr 29, 2025
RU / EN
Last update: Mar 25, 2025
Statistical Classification of Immunosignatures under Significant Reduction of the Feature Space Dimensions for Early Diagnosis of Diseases

Statistical Classification of Immunosignatures under Significant Reduction of the Feature Space Dimensions for Early Diagnosis of Diseases

Andryushchenko V.S., Uglov A.S., Zamyatin A.V.
Key words: early diseases diagnosis; immunosignature; feature space dimensionality reduction; immunosignature classification; selection of informative features; informative criterion.
2018, volume 10, issue 3, page 14.

Full text

html pdf
2479
1958

The aim of the study is to explore the options of significantly reducing the feature space of immunosignatures by selecting the most informative features while maintaining the reasonable quality of the human disease classification.

Materials and Methods. The immunosignature technology is based on the use of peptide microchips, where peptides with random amino acid sequences serve for diagnostic purposes. Such peptides have partial or complete similarity with the antigen epitopes. The diagnosis is made by using classification algorithms, developed from a reduced sample of immunosignature data of patients with known diagnoses.

The data. To carry out the experiments, the immunosignature data obtained from high-resolution peptide microchips containing about ten thousand peptide cells were used. The digitized data for composing the samples was obtained from the public NCBI database (identified as GSE52580).

Searching for informative parameters. To reduce the dimensionality of the data space, we conducted a search for the most informative peptides. For this purpose, we tested various statistical criteria and group discriminators (such as the Student’s t-test, the Mann–Whitney–Wilcoxon U test, the Kolmogorov–Smirnov test, and the Jeffries–Matusita distance) for their applicability to this search.

Classification methods. Classifiers based on various mathematical models were used: i.e. the support vector machine, the naive Bayesian classifier, the random forest, and the gradient boosting.

Evaluation of the quality of classification. The proportion of correct accuracy was used to evaluate both binary and multiclass classification.

Results. The present studies demonstrate that by reducing the dimensionality and by searching for the informative peptides it becomes possible to reduce the time needed for the classification processing (ranged from 16-fold to 1625-fold), as well as to reduce the feature space (240-fold) without compromising the quality of classification. It has been shown that all tested classifiers are equally successful in solving the problem of immunosignature classification.

Conclusion. The results rationalize the proposed approach to reducing the initial feature space of immunosignature data in order to accelerate the classification process without reducing its accuracy.


Journal in Databases

pubmed_logo.jpg

web_of_science.jpg

scopus.jpg

crossref.jpg

ebsco.jpg

embase.jpg

ulrich.jpg

cyberleninka.jpg

e-library.jpg

lan.jpg

ajd.jpg

SCImago Journal & Country Rank