Table 1.
Variable | Simple linear regression | Lasso regression | |
---|---|---|---|
Significance (p value) | % variability explained | Coefficient value in the most regularised modelb, c | |
H3K4me3 enrichment, p10 ChIP-seq | 2.34 × 10−6 | 4.8 | N/A |
H3K4me2 enrichment, p10 ChIP-seq | 2.88 × 10−13 | 11.2 | −0.132 |
H3K36me3 enrichment, p10 ChIP-seq | 8.21 × 10−8 | 6.2 | 0.050 |
KDM1A dependence | 1.1 × 10−11 | 9.7 | −0.132 |
KDM1B dependence | 1.5 × 10−12 | 10.5 | −0.106 |
CpG density | 0.000767 | 2.5 | N/A |
%GC content | 0.169199 | 0.9 | N/A |
Transcription level (log-transformed) | 0.000297 | 2.9 | N/A |
Enriched motif occurences (CBCCGCC, CCCMAM, CBCCGGGa) | 3.97 × 10−8 | 6.5 | −0.019 |
Simple linear regressions (variables tested individually) and multiple linear regression (variables tested together) modelling the relationship between explanatory variables and DNA methylation level at CGIs in 60–65 µm oocytes. The outcome of the model is presented as a proportion of the variability in DNA methylation level at CGIs in 60–65 µm oocytes explained by the variables
aSee Fig. 6a for motifs details. These three motifs were selected as they represent binding sites of known proteins
bCoefficients of variables in the model selected after software cross-validation of models as the most regularised model. These coefficients correspond to the values on y axis in Fig. 8. N/A marks variables that are not included in the model
cThe Lasso regression model including the 5 variables indicated in the column accounts for 18.5% of the variation