Development of machine learning model for diagnostic disease prediction based on laboratory tests

doi:10.1038/s41598-021-87171-5

. 2021 Apr 7;11(1):7567.

doi: 10.1038/s41598-021-87171-5.

Development of machine learning model for diagnostic disease prediction based on laboratory tests

Dong Jin Park¹, Min Woo Park², Homin Lee³, Young-Jin Kim⁴, Yeongsic Kim⁵, Young Hoon Park⁶

Affiliations

¹ Department of Laboratory Medicine, College of Medicine, Ewha Womans University of Korea, Seoul, South Korea.
² Department of Laboratory Medicine, St. Vincent's Hospital, The Catholic University of Korea, Seoul, South Korea.
³ Department of Research, Future Lab, Seoul, South Korea.
⁴ Finance, Fishery, Manufacture Industrial Mathematics Center on Big Data, Pusan National University, Pusan, South Korea.
⁵ Department of Laboratory Medicine, College of Medicine, The Catholic University of Korea, Seoul, South Korea.
⁶ Division of Hematology, Department of Internal Medicine, College of Medicine, The Catholic University of Korea, Seoul, South Korea. carrox2yh@gmail.com.

PMID: 33828178
PMCID: PMC8026627
DOI: 10.1038/s41598-021-87171-5

Development of machine learning model for diagnostic disease prediction based on laboratory tests

Dong Jin Park et al. Sci Rep. 2021.

. 2021 Apr 7;11(1):7567.

doi: 10.1038/s41598-021-87171-5.

Authors

Dong Jin Park¹, Min Woo Park², Homin Lee³, Young-Jin Kim⁴, Yeongsic Kim⁵, Young Hoon Park⁶

Affiliations

¹ Department of Laboratory Medicine, College of Medicine, Ewha Womans University of Korea, Seoul, South Korea.
² Department of Laboratory Medicine, St. Vincent's Hospital, The Catholic University of Korea, Seoul, South Korea.
³ Department of Research, Future Lab, Seoul, South Korea.
⁴ Finance, Fishery, Manufacture Industrial Mathematics Center on Big Data, Pusan National University, Pusan, South Korea.
⁵ Department of Laboratory Medicine, College of Medicine, The Catholic University of Korea, Seoul, South Korea.
⁶ Division of Hematology, Department of Internal Medicine, College of Medicine, The Catholic University of Korea, Seoul, South Korea. carrox2yh@gmail.com.

PMID: 33828178
PMCID: PMC8026627
DOI: 10.1038/s41598-021-87171-5

Abstract

The use of deep learning and machine learning (ML) in medical science is increasing, particularly in the visual, audio, and language data fields. We aimed to build a new optimized ensemble model by blending a DNN (deep neural network) model with two ML models for disease prediction using laboratory test results. 86 attributes (laboratory tests) were selected from datasets based on value counts, clinical importance-related features, and missing values. We collected sample datasets on 5145 cases, including 326,686 laboratory test results. We investigated a total of 39 specific diseases based on the International Classification of Diseases, 10th revision (ICD-10) codes. These datasets were used to construct light gradient boosting machine (LightGBM) and extreme gradient boosting (XGBoost) ML models and a DNN model using TensorFlow. The optimized ensemble model achieved an F1-score of 81% and prediction accuracy of 92% for the five most common diseases. The deep learning and ML models showed differences in predictive power and disease classification patterns. We used a confusion matrix and analyzed feature importance using the SHAP value method. Our new ML model achieved high efficiency of disease prediction through classification of diseases. This study will be useful in the prediction and diagnosis of diseases.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Confusion matrix of the ensemble model (optimal accuracy model): (A) The predictive power (accuracy) of TOP1 (representing the most likely disease) result, (B) TOP5 (the five most likely diseases) result.

**Figure 2**
The mean SHAP method result between parameters and disease classifications in LightGBM.

**Figure 3**
The mean SHAP method result between parameters and disease classifications in XGBoost.

**Figure 4**
Overall framework of DPMLT. The development of DPMLT methodology involved for three major steps. (1) Data collection and preprocessing (2) Model selection, training and ensemble modeling (3) Performance evaluation.

See this image and copyright information in PMC

Cited by

Identification and Prediction of Chronic Diseases Using Machine Learning Approach.
Alanazi R. Alanazi R. J Healthc Eng. 2022 Feb 25;2022:2826127. doi: 10.1155/2022/2826127. eCollection 2022. J Healthc Eng. 2022. PMID: 35251563 Free PMC article.
Development of Machine-Learning Model to Predict COVID-19 Mortality: Application of Ensemble Model and Regarding Feature Impacts.
Baik SM, Lee M, Hong KS, Park DJ. Baik SM, et al. Diagnostics (Basel). 2022 Jun 14;12(6):1464. doi: 10.3390/diagnostics12061464. Diagnostics (Basel). 2022. PMID: 35741274 Free PMC article.
Machine Learning for Patient-Based Real-Time Quality Control (PBRTQC), Analytical and Preanalytical Error Detection in Clinical Laboratory.
Lorde N, Mahapatra S, Kalaria T. Lorde N, et al. Diagnostics (Basel). 2024 Aug 20;14(16):1808. doi: 10.3390/diagnostics14161808. Diagnostics (Basel). 2024. PMID: 39202296 Free PMC article. Review.
Development of blood demand prediction model using artificial intelligence based on national public big data.
Kwon HJ, Park S, Park YH, Baik SM, Park DJ. Kwon HJ, et al. Digit Health. 2024 Jan 17;10:20552076231224245. doi: 10.1177/20552076231224245. eCollection 2024 Jan-Dec. Digit Health. 2024. PMID: 38250146 Free PMC article.
Developing an AI-based prediction model for anaphylactic shock from injection drugs using Japanese real-world data and chemical structure-based analysis.
Enokiya T, Ozaki K. Enokiya T, et al. Daru. 2024 Jun;32(1):253-262. doi: 10.1007/s40199-024-00511-4. Epub 2024 Apr 5. Daru. 2024. PMID: 38580799

See all "Cited by" articles

References

1. Esteva A, et al. A guide to deep learning in healthcare. Nat. Med. 2019;25:24–29. doi: 10.1038/s41591-018-0316-z. - DOI - PubMed
1. Wainberg M, Merico D, Delong A, Frey BJ. Deep learning in biomedicine. Nat. Biotechnol. 2018;36:829–838. doi: 10.1038/nbt.4233. - DOI - PubMed
1. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. - DOI - PubMed
1. Zhang Y, et al. Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework. Brief. Bioinform. 2019;20:2185–2199. doi: 10.1093/bib/bby079. - DOI - PMC - PubMed
1. Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief. Bioinform. 2017;18:851–869. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- MedlinePlus Health Information

[1] Esteva A, et al. A guide to deep learning in healthcare. Nat. Med. 2019;25:24–29. doi: 10.1038/s41591-018-0316-z. - DOI - PubMed

[2] Esteva A, et al. A guide to deep learning in healthcare. Nat. Med. 2019;25:24–29. doi: 10.1038/s41591-018-0316-z. - DOI - PubMed

[3] Wainberg M, Merico D, Delong A, Frey BJ. Deep learning in biomedicine. Nat. Biotechnol. 2018;36:829–838. doi: 10.1038/nbt.4233. - DOI - PubMed

[4] Wainberg M, Merico D, Delong A, Frey BJ. Deep learning in biomedicine. Nat. Biotechnol. 2018;36:829–838. doi: 10.1038/nbt.4233. - DOI - PubMed

[5] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. - DOI - PubMed

[6] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. - DOI - PubMed

[7] Zhang Y, et al. Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework. Brief. Bioinform. 2019;20:2185–2199. doi: 10.1093/bib/bby079. - DOI - PMC - PubMed

[8] Zhang Y, et al. Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework. Brief. Bioinform. 2019;20:2185–2199. doi: 10.1093/bib/bby079. - DOI - PMC - PubMed

[9] Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief. Bioinform. 2017;18:851–869. - PubMed

[10] Min S, Lee B, Yoon S. Deep learning in bioinformatics. Brief. Bioinform. 2017;18:851–869. - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development of machine learning model for diagnostic disease prediction based on laboratory tests

Affiliations

Development of machine learning model for diagnostic disease prediction based on laboratory tests

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical