Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 1:12:713118.
doi: 10.3389/fphys.2021.713118. eCollection 2021.

Clinically-Driven Virtual Patient Cohorts Generation: An Application to Aorta

Affiliations

Clinically-Driven Virtual Patient Cohorts Generation: An Application to Aorta

Pau Romero et al. Front Physiol. .

Abstract

The combination of machine learning methods together with computational modeling and simulation of the cardiovascular system brings the possibility of obtaining very valuable information about new therapies or clinical devices through in-silico experiments. However, the application of machine learning methods demands access to large cohorts of patients. As an alternative to medical data acquisition and processing, which often requires some degree of manual intervention, the generation of virtual cohorts made of synthetic patients can be automated. However, the generation of a synthetic sample can still be computationally demanding to guarantee that it is clinically meaningful and that it reflects enough inter-patient variability. This paper addresses the problem of generating virtual patient cohorts of thoracic aorta geometries that can be used for in-silico trials. In particular, we focus on the problem of generating a cohort of patients that meet a particular clinical criterion, regardless the access to a reference sample of that phenotype. We formalize the problem of clinically-driven sampling and assess several sampling strategies with two goals, sampling efficiency, i.e., that the generated individuals actually belong to the _target population, and that the statistical properties of the cohort can be controlled. Our results show that generative adversarial networks can produce reliable, clinically-driven cohorts of thoracic aortas with good efficiency. Moreover, non-linear predictors can serve as an efficient alternative to the sometimes expensive evaluation of anatomical or functional parameters of the organ of interest.

Keywords: clinically-driven sampling; digital twin; generative adversarial network; in-silico trials; support vector machine; synthetic population; thoracic-aorta; virtual cohort.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Geometric biomarkers and phenotypes used in the study. Left, graphical representation of some of the considered anatomical biomarkers superimposed on the anatomy of an aorta. Right, the three phenotypes defined by Schaefer et al. (2008), that are used in the clinically-driven cohort generation. The reader can refer to Table 1 for the detailed meaning of the acronyms.
Figure 2
Figure 2
A sample with the three aortic root phenotypes (labeled as N, A, and E) defined in Schaefer et al. (2008) represented in the biomarkers space (left) and in the feature space (right). Each point represents an aorta. In the biomarkers representation, the coordinates correspond to the three biomarkers involved in the phenotype definition, in millimeters (refer to Figure 1 and Table 1 for acronym meanings and phenotype definitions). In the feature space representation, the coordinates are the coefficients of the three deformation modes, c3, c6 and c9, that are most discriminant in this problem. Phenotype N is represented in red, phenotype A in green and phenotype E in blue. While in the biomarkers space the three phenotypes are clearly separable, the region occupied by a particular group in the feature space is much harder to identify and exploit for cohort synthesis.
Figure 3
Figure 3
An scheme of the workflow followed in our experimental setup. On the left track, the data-driven cohort generation scenario is shown; the reference cohort C0, that is characterized using PCA to generate new samples CB, CG, CU, and CGAN, which are assessed with the acceptance functions AX. The middle track, representing the clinically-driven experiments, starts splitting the samples of the boostrapping cohort, CB, generated on the previous scenario, onto the three _target phenotypes, N, A and E and, then, new cohorts CBX, CGX, CUX and CGANX are generated and, again, assessed by the corresponding acceptance functions AX. Finally the rightmost blocks represents the development of Machine Learning surrogates to predict the acceptance functions. The synthetic cohort CB is used to train two SVM models, Pp and Pdμ (PD in the chart), that predict the outcome of Aμ and the aorta phenotype, respectively. The models are evaluated with CU, that was not used during training. For any item of the picture, a purple frame means data-driven and an orange frame means clinically-driven. The reader can refer to the text for further detail.
Figure 4
Figure 4
Amount of shape variation explained by each of the components of the feature vector in the space defined by the PCA, in order of importance. Importance is computed by training a Random Forest model and computing the decrease of impurity of each subtree. The first n = 16 features are capable of explaining 95% of the shape variation.
Figure 5
Figure 5
Violin plots for the distribution of biomarkers on the original samples, alongside with those generated using the proposed methods: Bootstrap (Bts), Gaussian (Gau), Uniform (Unf) and generative adversarial network (GAN). Horizontal lines mark the bounds for the different acceptance criteria defined in section 2.5: Ar, with dotted lines, Aμ, with dash-doted lines, and AM with dashed lines. The units of the vertical axis are in millimeters, except for biomarkers k, which is expressed in mm−1, and h/w and tor which are a dimensional.
Figure 6
Figure 6
Examples of four synthetic aortas with decreasing feasibility of the biomarkers according to the acceptance functions. From left to right: the first one is accepted by all of the criteria; the second one is rejected by Ar, but not by the other acceptance functions; the third one is only accepted by AM and the last one is rejected by all the criteria.
Figure 7
Figure 7
Distributions of the three biomarkers that define the _target phenotypes (SoV, PA, and MA) in the set of aortas that actually belong to each one of the three classes. All the values are in millimeters.
Figure 8
Figure 8
Confusion Matrices obtained for the Pdμ and Pp models.
Figure 9
Figure 9
Graphic representation of the confusion matrix of Pp. Examples of aortas of phenotype N, A and E row-wise, with the phenotype predicted column-wise arranged.
Figure 10
Figure 10
Left: Features importance. Right: Subpopulation classification accuracy as the number of features increases.

Similar articles

Cited by

References

    1. Allen R., Rieger T., Musante C. (2016). Efficient generation and selection of virtual populations in quantitative systems pharmacology models. CPT Pharmacometrics Syst. Pharmacol. 5, 140–146. 10.1002/psp4.12063 - DOI - PMC - PubMed
    1. Amidan B., Ferryman T., Cooley S. (2005). “Data outlier detection using the Chebyshev theorem,” in 2005 IEEE Aerospace Conference (Big Sky, MT: IEEE; ), 3814–3819.
    1. Bishop C. M. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics). Berlin; Heidelberg: Springer-Verlag.
    1. Bratt A., Kim J., Pollie M., Beecy A. N., Tehrani N. H., Codella N., et al. . (2019). Machine learning derived segmentation of phase velocity encoded cardiovascular magnetic resonance for fully automated aortic flow quantification. J. Cardiovasc. Magn. Reson. 21, 1. 10.1186/s12968-018-0509-0 - DOI - PMC - PubMed
    1. Britton O. J., Bueno-Orovio A., Van Ammel K., Lu H. R., Towart R., Gallacher D. J., et al. . (2013). Experimentally calibrated population of models predicts and explains intersubject variability in cardiac cellular electrophysiology. Proc. Natl. Acad. Sci. U.S.A. 110, E2098–E2105. 10.1073/pnas.1304382110 - DOI - PMC - PubMed
  NODES
twitter 2