A novel and fully automated platform for synthetic tabular data generation and validation
- PMID: 39375404
- PMCID: PMC11458594
- DOI: 10.1038/s41598-024-73608-0
A novel and fully automated platform for synthetic tabular data generation and validation
Abstract
Healthcare data accessibility for machine learning (ML) is encumbered by a range of stringent regulations and limitations. Using synthetic data that mirrors the underlying properties in the real data is emerging as a promising solution to overcome these barriers. We propose a fully automated synthetic tabular neural generator (STNG), which comprises multiple synthetic data generators and integrates an Auto-ML module to validate and comprehensively compare the synthetic datasets generated from different approaches. An empirical study was conducted to demonstrate the performance of STNG using twelve different datasets. The results highlight STNG's robustness and its pivotal role in enhancing the accessibility of validated synthetic healthcare data, thereby offering a promising solution to a critical barrier in ML applications in healthcare.
© 2024. The Author(s).
Conflict of interest statement
The authors declare no competing interests.
Figures
Similar articles
-
Assessment of differentially private synthetic data for utility and fairness in end-to-end machine learning pipelines for tabular data.PLoS One. 2024 Feb 5;19(2):e0297271. doi: 10.1371/journal.pone.0297271. eCollection 2024. PLoS One. 2024. PMID: 38315667 Free PMC article.
-
A method for machine learning generation of realistic synthetic datasets for validating healthcare applications.Health Informatics J. 2022 Apr-Jun;28(2):14604582221077000. doi: 10.1177/14604582221077000. Health Informatics J. 2022. PMID: 35414269
-
MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation.PLoS One. 2024 Apr 17;19(4):e0302271. doi: 10.1371/journal.pone.0302271. eCollection 2024. PLoS One. 2024. PMID: 38630664 Free PMC article.
-
Systematic reviews of machine learning in healthcare: a literature review.Expert Rev Pharmacoecon Outcomes Res. 2024 Jan;24(1):63-115. doi: 10.1080/14737167.2023.2279107. Epub 2024 Jan 18. Expert Rev Pharmacoecon Outcomes Res. 2024. PMID: 37955147 Review.
-
Synthetic data generation methods in healthcare: A review on open-source tools and methods.Comput Struct Biotechnol J. 2024 Jul 9;23:2892-2910. doi: 10.1016/j.csbj.2024.07.005. eCollection 2024 Dec. Comput Struct Biotechnol J. 2024. PMID: 39108677 Free PMC article. Review.
References
-
- Office, U. S. G. A. Artificial Intelligence in Health Care, Benefits and Challenges of Machine Learning Technologies for Medical Diagnostics (2022).
MeSH terms
LinkOut - more resources
Full Text Sources