Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 24:12:635863.
doi: 10.3389/fgene.2021.635863. eCollection 2021.

Identification and Validation of a Novel DNA Damage and DNA Repair Related Genes Based Signature for Colon Cancer Prognosis

Affiliations

Identification and Validation of a Novel DNA Damage and DNA Repair Related Genes Based Signature for Colon Cancer Prognosis

Xue-Quan Wang et al. Front Genet. .

Abstract

Backgrounds: Colorectal cancer (CRC) with high incidence, has the third highest mortality of tumors. DNA damage and repair influence a variety of tumors. However, the role of these genes in colon cancer prognosis has been less systematically investigated. Here, we aim to establish a corresponding prognostic signature providing new therapeutic opportunities for CRC. Method: After related genes were collected from GSEA, univariate Cox regression was performed to evaluate each gene's prognostic relevance through the TCGA-COAD dataset. Stepwise COX regression was used to establish a risk prediction model through the training sets randomly separated from the TCGA cohort and validated in the remaining testing sets and two GEO datasets (GSE17538 and GSE38832). A 12-DNA-damage-and-repair-related gene-based signature able to classify COAD patients into high and low-risk groups was developed. The predictive ability of the risk model or nomogram were evaluated by different bioinformatics- methods. Gene functional enrichment analysis was performed to analyze the co-expressed genes of the risk-based genes. Result: A 12-gene based prognostic signature established within 160 significant survival-related genes from DNA damage and repair related gene sets performed well with an AUC of ROC 0.80 for 5 years in the TCGA-CODA dataset. The signature includes CCNB3, ISY1, CDC25C, SMC1B, MC1R, LSP1P4, RIN2, TPM1, ELL3, POLG, CD36, and NEK4. Kaplan-Meier survival curves showed that the prognosis of the risk status owns more significant differences than T, M, N, and stage prognostic parameters. A nomogram was constructed by LASSO regression analysis with T, M, N, age, and risk as prognostic parameters. ROC curve, C-index, Calibration analysis, and Decision Curve Analysis showed the risk module and nomogram performed best in years 1, 3, and 5. KEGG, GO, and GSEA enrichment analyses suggest the risk involved in a variety of important biological processes and well-known cancer-related pathways. These differences may be the key factors affecting the final prognosis. Conclusion: The established gene signature for CRC prognosis provides a new molecular tool for clinical evaluation of prognosis, individualized diagnosis, and treatment. Therapies based on _targeted DNA damage and repair mechanisms may formulate more sensitive and potential chemotherapy regimens, thereby expanding treatment options and potentially improving the clinical outcome of CRC patients.

Keywords: DNA damage; DNA repair; colon cancer; mRNA signature; prediction; prognosis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Volcano plot of DNA damage and repair related genes and forest plot of the multivariate Cox regression analysis in TCGA cohorts. (A) Volcano plot of DNA damage and repair related genes: blue indicates protective genes, red indicates harmful genes and black indicates no significance genes. (B) Forrest plot of the multivariate Cox regression analysis OS of 12 genes. (C) Forrest plot of the multivariate Cox regression analysis OS of clinical factors and risk score. (D) Forrest plot of the multivariate Cox regression analysis OS of clinical factors and 12 genes. Beta values represent the coefficient index β for each gene and clinical factors.
Figure 2
Figure 2
Distribution of risk score, Gene expression heatmaps, Kaplan-Meieranalysis and ROC analysis of 12-gene signature in the training TGCA set, total TCGA set, and testing set. (A) Distribution of risk score and the cutoff point. (B–D) Gene expression heatmaps in the training TGCA cohort (B), total TCGA cohort (C), and testing TCGA (D; The blue color is the low-risk group and the red color is the high-risk group). (E,F) Correlation between the prognostic signature and the OS of patients in the training TGCA cohort (E), total TCGA cohort (F), and testing TCGA (G). (H–J) Kaplan-Meier survival analysis of the low‐ and high-risk group patients in the training TGCA cohort (H), total TCGA cohort (I), and testing TCGA (J). (K–M) ROC curve analysis according to the 1, 3, 5, 10-year survival of the area under the AUC value in the training TGCA cohort (K), total TCGA cohort (L), and testing TCGA (M).
Figure 3
Figure 3
Kaplan-Meier survival for OS in high-risk and low-risk group of different subgroup and ROC curve analysis of T, N, M and stage in the total TCGA cohort. (A) In subgroups stratified by T1, T2, T3, and T4. (B) In subgroups stratified by N0, N1, and N2. (C) In subgroups stratified by M0, M1, and MX. (D) In subgroups stratified by stage I, stage II, stage III, and stage IV. (E–H) ROC curve analysis of T, N, M and stage according to the 1, 3, 5, and 10-year survival of the area under the AUC value in the total TCGA cohort.
Figure 4
Figure 4
Kaplan-Meier survival and ROC curves of the 12-DNA signature, grade and stage in the two GEO sets. (A) Correlation between the 12-DNA signature and the overall survival of patients in the GSE 17538 set. (B,C) Kaplan-Meier survival for OS in high-risk and low-risk group of different subgroup in the GSE 17538 set: in subgroups stratified by stage I, stage II, stage III, and stage IV, in subgroups stratified by grade MD, grade PD, and grade WD. (D–F) ROC curve analysis of risk score, stage and grade according to the 1, 3, 5, and 10-year survival of the area under the AUC value in the GSE 17538 set. (G) Correlation between the 12-DNA signature and the disease specific survival of patients in the GSE 38832 set. (H) Kaplan-Meier survival for disease specific survival in stage 1, 2, and 3 subgroups of high-risk and low-risk group in the GSE 38832 set. (I) ROC curve analysis of risk score according to the 1, 3, and 5-year disease specific survival of the area under the AUC value in the GSE 38832 set. (J) Correlation between the 12-DNA signature and the disease-free survival of patients in the GSE 38832 set. (K) Kaplan-Meier survival for disease-free survival in stage 1, 2, 3, and 4 subgroups of high-risk and low-risk group in the GSE 38832 set. (L) ROC curve analysis of risk score according to the 1, 3, and 5-year disease-free survival of the area under the AUC value in the GSE 38832 set.
Figure 5
Figure 5
Nomogram construction based on 12-gene signature and prognostic value of 12 genes. (A) The nomogram for predicting the proportion of patients with 1-, 3-, or 5-year OS. (B) Calibration plots of the nomogram. (C,D) LASSO regression analysis used 10-fold cross-validation via the maximum criteria. (E) C-index of the nomogram (F) Decision curve analysis of nomogram predicting 1-, 3-, and 5-year OS of COAD comparing age, stage, the risk score, Pathologic T, Pathologic N, and Pathologic M. (G) Time-dependent ROC analysis of nomogram predicting 1-, 3-, and 5-year OS of COAD.
Figure 6
Figure 6
Biological pathways in two different risk groups by GSEA analysis. (A) Enriched pathways in the low-risk group. (B) Enriched pathways in the high-risk group.
Figure 7
Figure 7
Biological functions and pathways of co-expressed genes. (A) Venn diagram of overlapping genes among Normal group, Low-risk group, and High-risk group. (B) Topological overlap heatmap of gene co-expression network. Dark colors mean high topological overlap, while Light colors mean low topological overlap. (C) Co-expressed genes selected by R2 > 0.7. (D) The top 10 most significant results of KEGG. (E–G) The GO enrichment analysis of co-expressed genes, including the CC (E), the MF (F), and the BP (G).

Similar articles

Cited by

References

    1. Alvi M. A., Liu X., O'Donovan M., Newton R., Wernisch L., Shannon N. B., et al. . (2013). DNA methylation as an adjunct to histopathology to detect prevalent, inconspicuous dysplasia and early-stage neoplasia in Barrett's esophagus. Clin. Cancer Res. 19, 878–888. 10.1158/1078-0432.CCR-12-2880, PMID: - DOI - PMC - PubMed
    1. Astolfi A., Fiore M., Melchionda F., Indio V., Bertuccio S. N., Pession A. (2019). BCOR involvement in cancer. Epigenomics 11, 835–855. 10.2217/epi-2018-0195, PMID: - DOI - PMC - PubMed
    1. Bray F., Ferlay J., Soerjomataram I., Siegel R. L., Torre L. A., Jemal A. (2018). Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424. 10.3322/caac.21492, PMID: - DOI - PubMed
    1. Burrell R. A., McGranahan N., Bartek J., Swanton C. (2013). The causes and consequences of genetic heterogeneity in cancer evolution. Nature 501, 338–345. 10.1038/nature12625, PMID: - DOI - PubMed
    1. Chen L., Fu L., Kong X., Xu J., Wang Z., Ma X., et al. . (2014). Jumonji domain-containing protein 2B silencing induces DNA damage response via STAT3 pathway in colorectal cancer. Br. J. Cancer 110, 1014–1026. 10.1038/bjc.2013.808, PMID: - DOI - PMC - PubMed
  NODES
twitter 2