Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 7;11(1):7567.
doi: 10.1038/s41598-021-87171-5.

Development of machine learning model for diagnostic disease prediction based on laboratory tests

Affiliations

Development of machine learning model for diagnostic disease prediction based on laboratory tests

Dong Jin Park et al. Sci Rep. .

Abstract

The use of deep learning and machine learning (ML) in medical science is increasing, particularly in the visual, audio, and language data fields. We aimed to build a new optimized ensemble model by blending a DNN (deep neural network) model with two ML models for disease prediction using laboratory test results. 86 attributes (laboratory tests) were selected from datasets based on value counts, clinical importance-related features, and missing values. We collected sample datasets on 5145 cases, including 326,686 laboratory test results. We investigated a total of 39 specific diseases based on the International Classification of Diseases, 10th revision (ICD-10) codes. These datasets were used to construct light gradient boosting machine (LightGBM) and extreme gradient boosting (XGBoost) ML models and a DNN model using TensorFlow. The optimized ensemble model achieved an F1-score of 81% and prediction accuracy of 92% for the five most common diseases. The deep learning and ML models showed differences in predictive power and disease classification patterns. We used a confusion matrix and analyzed feature importance using the SHAP value method. Our new ML model achieved high efficiency of disease prediction through classification of diseases. This study will be useful in the prediction and diagnosis of diseases.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Confusion matrix of the ensemble model (optimal accuracy model): (A) The predictive power (accuracy) of TOP1 (representing the most likely disease) result, (B) TOP5 (the five most likely diseases) result.
Figure 2
Figure 2
The mean SHAP method result between parameters and disease classifications in LightGBM.
Figure 3
Figure 3
The mean SHAP method result between parameters and disease classifications in XGBoost.
Figure 4
Figure 4
Overall framework of DPMLT. The development of DPMLT methodology involved for three major steps. (1) Data collection and preprocessing (2) Model selection, training and ensemble modeling (3) Performance evaluation.

Similar articles

Cited by

References

    1. Esteva A, et al. A guide to deep learning in healthcare. Nat. Med. 2019;25:24–29. doi: 10.1038/s41591-018-0316-z. - DOI - PubMed
    1. Wainberg M, Merico D, Delong A, Frey BJ. Deep learning in biomedicine. Nat. Biotechnol. 2018;36:829–838. doi: 10.1038/nbt.4233. - DOI - PubMed
    1. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. - DOI - PubMed
    1. Zhang Y, et al. Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework. Brief. Bioinform. 2019;20:2185–2199. doi: 10.1093/bib/bby079. - DOI - PMC - PubMed
    1. Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief. Bioinform. 2017;18:851–869. - PubMed

Publication types

MeSH terms

  NODES
INTERN 3
twitter 2