- Split View
-
Views
-
Cite
Cite
A. Kel, T. Konovalova, T. Waleev, E. Cheremushkin, O. Kel-Margoulis, E. Wingender, Composite Module Analyst: a fitness-based tool for identification of transcription factor binding site combinations, Bioinformatics, Volume 22, Issue 10, May 2006, Pages 1190–1197, https://doi.org/10.1093/bioinformatics/btl041
- Share Icon Share
Abstract
Motivation: Functionally related genes involved in the same molecular-genetic, biochemical or physiological process are often regulated coordinately. Such regulation is provided by precisely organized binding of a multiplicity of special proteins [transcription factors (TFs)] to their _target sites (cis-elements) in regulatory regions of genes. Cis-element combinations provide a structural basis for the generation of unique patterns of gene expression.
Results: Here we present a new approach for defining promoter models based on the composition of TF binding sites and their pairs. We utilize a multicomponent fitness function for selection of the promoter model that fits best to the observed gene expression profile. We demonstrate examples of successful application of the fitness function with the help of a genetic algorithm for the analysis of functionally related or co-expressed genes as well as testing on simulated and permutated data.
Availability: The CMA program is freely available for non-commercial users. URL Author Webpage. It is also a part of the commercial system ExPlain™ (Author Webpage) designed for causal analysis of gene expression data.
Contact: [email protected]
1 INTRODUCTION
Massive application of microarray measurements of gene expression is a common route in the current studies of disease mechanisms. Hundreds of genes are revealed whose change of expression is associated with the disease. However, changed expression of these genes often represents just an ‘echo’ of the real molecular processes in the cells. Still, the available means of analysis and interpretation of these mechanisms are very limited.
Regulation of gene expression is accomplished through binding of transcription factors (TFs) to distinct regions of DNA. It is clear by now that combinations of TFs rather than single a TF drive gene transcription and define its specificity. At the level of DNA, the blueprints for assembling such variable TF complexes on promoter regions may be seen as specific combinations of TF binding sites located in close proximity to each other. We call such structures ‘composite regulatory modules’.
Known functional combinations of TF binding sites were used before in a number of approaches of promoter analysis: for the identification of muscle-specific promoters (Frech et al., 1998; Wasserman and Fickett, 1998), promoters of liver-enriched genes (Tronche et al., 1997), of yeast genes (Brazma et al., 1997), of immune-specific genes (Boehlk et al., 2000; Fessele et al., 2001; Kel et al., 1999), promoters of genes regulated during cell cycle (Kel et al., 2001) or antibacterial defense response (Shelest and Wingender, 2005). In the cases when the functional site composition is not known, a number of new predictive approaches identifying composite motifs can be used: BioProspector (Liu et al., 2001), Co-Bind (GuhaThakurta and Stormo, 2001), MITRA (Eskin and Pevzner, 2002), dyad search (van Helden et al., 2000). These programs help to discover new regulatory sites for yet unknown TFs, but an ‘ab initio’ motif finding method is limited in their ability to distinguish signal from noise in the long regulatory regions (10 Kb and more) of genes of human and other higher eukaryotic organisms. Novel methods have been developed that utilize information on known and potential binding sites in promoters (ClusterScan: Kel-Margoulis et al., 2002b; TOUCAN system: Aerts et al., 2003; probabilistic methods: Sinha et al., 2003). But still analysis of long regulatory regions remains a challenge for these approaches.
In this paper we are describing a novel tool, Composite Module Analyst (CMA), which has been created to make the next step in applying combinatorial approach to the analysis of regulatory regions of functionally related or co-expressed genes. This tool applies a novel approach for defining the promoter model based on composition of single TF binding sites as well as their pairs located inside local regulatory domains (corresponding to enchancer/silencer subregions). We utilize a multicomponent fitness function for selection of the promoter model, which fits best to the observed gene expression profile. We demonstrate some successful applications of the fitness function with the help of a genetic algorithm for the analysis of functionally related or co-expressed genes as well as testing of the tool on simulated data.
2 COMPOSITE MODULES
2.1 Databases and tools for identification of TF binding sites
In this study we used three databases: (1) TRANSFAC® is a database on gene regulation (Matys et al., 2006). It provides data on TFs and their binding sites in promoters and enhancers of eukaryotic genes as well as a library of position weight matrices (PWMs). This work has been done with release 8.4. (2) TRANSCompel® (rel.6.4) which contains known composite regulatory elements in mammalian genes (Kel-Margoulis et al., 2002a). (3) TRANSPRO™ is a database on promoters (Chen et al., 2006). Positions of transcription start sites (TSS) in genomes (human, mouse, rat) are collected from EPD, DBTSS and Ensembl, providing the possibility to retrieve about 13 000 human and about 10 000 murine promoters. Potential binding sites for TFs were identified by the TRANSFAC® tool Match™ (Kel et al., 2003) that uses a library of about 500 PWMs for vertebrate TFs (TRANSFAC® release 8.4).
2.2 Definition of composite promoter model
w i, the length of the window in promoter sequences that covers all sites included in the CM;
Ki, the number of individual PWMs in the module,
Ri, the number of pairs of PWMs,
cut-off value , relative impact values ϕ(k), maximal number of best matches κ(k) are assigned to every individual weight matrix k (k = 1, Ki),
cut-off value , relative impact values ϕ(r), orientation values o(r) and maximal and minimal distances and maximal number of best pairs of matches κ(r) are assigned to every matrix pair r (r = 1,Ri) in the CM.
2.3 Fitness function
Based on the assumption that promoters of co-regulated genes should share a common composition of TF binding sites we propose an approach which allows defining a composite promoter model for sets of co-regulated genes. We have constructed a fitness function that provides a means to direct the search in a wide space of various parameters and to find the optimal promoter model settings. In this approach we take as input a set of promoters of potentially co-regulated genes (set A) and a background set of promoters (set B) and design a fitness function, which is based on several properties of distributions of the θ function and cmsi values in these two promoter sets. As an alternative input we consider a single set of promoters with assigned expression values.
The components of our fitness function are the following:
R, regression value;
T, Student t-test value;
E, specificity and sensitivity value;
N, normality index;
P, penalty on the complexity of the model.
2.4 Regression value
This component shows how well the fuzzy promoter scores fps(n) of the promoters (n = 1, NP-total number of promoters) fit the expression values ξ(n) assigned to the corresponding genes. The ξ(n) values can be assigned either as 1/0 values to the promoters of two corresponding sets of genes A and B, or as any continuous function of the observed differential expression value (for instance as the log ratio of gene expression provided by Affymetrix software). The linear regression is done by fitting the fps(n) to the curve αξ(n) + β. We compute then the regression value R2 for the promoter model and normalize it to the maximal possible value. It is taken then as the component R in Equation (4) which shows the fit of the considered composite modules in the promoters to the differential expression of the corresponding genes.
2.5 Student's t-test value calculation using fuzzy logic
This component is equal to the value of the two-sided Student's t-test with different variances. Here we consider two alternative promoter sets, A and B, described above or, in the case of the alternative input of a single set of promoters with assigned expression values, the promoter set is divided into two groups A and B: with high and low differential expression values (in this case the value P0 dividing these two groups is specified by the user).
2.6 Specificity and sensitivity value
2.7 Normality index of the distribution
2.8 Penalty on the complexity of the model
2.9 Genetic algorithm implementation
In order to identify a composite promoter model best fitting the given data on gene expression we use the fitness function described above and apply an optimization strategy based on a genetic algorithm. The algorithm takes as an input two sets of promoters (the set of promoters of differentially expressed genes, A group; and a set of promoters of genes whose expression does not differ significantly between experiment and control, B group) or a set of promoters and the relative expression values assigned to the corresponding genes. On the first step, the program Match™ is used to identify all potential TF binding sites in these two sets of promoters. It uses a predefined subset of PWMs from the TRANSFAC® database, usually with the minFN cut-offs (minimizing the false-negative rate; see Kel et al., 2003 for details) that allow to identify most of the sites in the sequences. The genetic algorithm works then further with the binding sites found by Match™ and identifies the promoter model best fitting to the expression data.
Each model is characterized by its own set of parameters described in Subsection 2.2 and is represented in the genetic algorithm by 2 × m different ‘chromosomes’—2 chromosomes per composite module cmi (i = 1, 2, … , m). One chromosome to encode Ki individual matrices is associated with their cut-off values , relative impact values ϕ(k) and maximal number of best matches κ(k) (k = 1, Ki). And the other chromosome to encode Ri pairs of PWMs is associated with their cut-off values , relative impact values ϕ(r), (r) orientation values o(r) and maximal and minimal distances, and maximal number of best pairs of matches κ(r) (r = 1, Ri).
Initially, the program CMA generates a set of random models according to the predefined limits of the maximal and minimal number of matrices, pairs, distances between sites, matrix cut-offs and other parameters. These models are considered as a population of ‘organisms’ subjected to mutations, recombination, selection and multiplication. Mutations are done by random exchange of matrices in the model into any other matrices from the library and by random small increase or decrease of the associated parameters such as cut-off values, impact values and others. Recombination is done by random exchange of the parts of chromosomes between different models. During a number of iterations the program tries to find the ‘best’ model that is characterized by the highest value of fitness function. It identifies the set of matrices, optimizes their cut-offs, the relative impact values, maximum number of best matches in promoters and some other optional parameters such as site orientation and distance range and in pairs. The output of the CMA tool is the best discriminative promoter model with the optimized parameters.
More details of the encoding schema of the models in the genetic algorithm and definition of the mutation and recombination operators as well as the selection and multiplication rules and other options of CMA program are described in the documentation of CMA (see e.g. Author Webpage).
3 RESULTS
3.1 Testing on the simulated data
First, we did initial testing of the ability of the program to reveal combinations of single matrices. To test this we performed a series of simulation experiments. We generated sets of 1000 random nucleotide sequences of a length of 1000 bp each, ‘implanted’ a certain number of site combinations in random positions of these sequences based on a predefined composite module and let the program ‘blindly’ reveal the composite module back or at least some parts of it.
First of all, we observed that with increase of the size of the matrix library the accuracy of recognition (the percentage of correctly identified matrices as components of implanted composite modules) falls linearly (data not shown), but this can be generally recovered by increasing the number of iterations of the genetic algorithm. We also observed that the algorithm is able to recapture correctly not only CMs containing unique matrices but also modules comprising matrices that are similar to other matrices for other TFs characterizing by similar DNA binding patterns (e.g. V$GATA3_03 and V$MYOD_01). We found that the optimal parameters of the algorithm are the ratio of the population size to the number of iterations should be between 0.1 and 2; the optimal selection pressure is ∼40% (so the best 40% of the population goes into the next generation) and the optimal mutation level is ∼30% of population and recombination level ∼60%. With these optimal parameters we performed further testing. The results of the testing are shown in Tables 1–3.
Site scores . | Probability of site insertion . | ||
---|---|---|---|
. | 0.9 . | 0.6 . | 0.3 . |
High | +++ | +++ | +++ |
Low | +++ | ++− | ++− |
Site scores . | Probability of site insertion . | ||
---|---|---|---|
. | 0.9 . | 0.6 . | 0.3 . |
High | +++ | +++ | +++ |
Low | +++ | ++− | ++− |
We implanted sites into 30% of the sequences (set A); other sequences in the set are taken as background (set B). Scores of the implanted sites are high (optimum to reduce false positives) or low (optimum to reduce false negatives). Sites were implanted probabilistically, so the probability that a site is implanted in a given sequence (P) varies from 0.3 to 0.9. +++ indicates that all implanted matrices were correctly identified back, ++− that only two were identified correctly and one was wrongly identified, in 100 iterations; population size was 200.
Site scores . | Probability of site insertion . | ||
---|---|---|---|
. | 0.9 . | 0.6 . | 0.3 . |
High | +++ | +++ | +++ |
Low | +++ | ++− | ++− |
Site scores . | Probability of site insertion . | ||
---|---|---|---|
. | 0.9 . | 0.6 . | 0.3 . |
High | +++ | +++ | +++ |
Low | +++ | ++− | ++− |
We implanted sites into 30% of the sequences (set A); other sequences in the set are taken as background (set B). Scores of the implanted sites are high (optimum to reduce false positives) or low (optimum to reduce false negatives). Sites were implanted probabilistically, so the probability that a site is implanted in a given sequence (P) varies from 0.3 to 0.9. +++ indicates that all implanted matrices were correctly identified back, ++− that only two were identified correctly and one was wrongly identified, in 100 iterations; population size was 200.
The result of this simulation shows that the CMA program is able to determine implanted matrices correctly in most cases, even if just a few sequences of set A contain all of the sites of the module (e.g. for P = 0.3, only about 8 sequences out of 300 contain all 3 sites).
Microarray data usually contain a high level of noise and reveal no clear differentiation between ‘changed’ and ‘unchanged’ genes. The expression values vary considerably. In order to test the ability of our method to deal with such kind of noisy data we generated test data with various ratios of differential expression values (e) and the random variance of the expression value Δe (as a measure of noise). The result of this simulation (Table 2) shows that the CMA was able to reveal correct CM in cases of Δe < e and Δe ∼ e, although the fitness is decreasing in the second case. In case of Δe > e the method was able to reveal correctly only two matrices used for the sites implantation.
Expression parameters . | Fitness (Z) . | Random fitness . | CM . |
---|---|---|---|
Δe < e (x = 5) | 0.6 | 0.0045 ± 0.0035 | +++ |
Δe ∼ e (x = 10) | 0.4 | 0.0029 ± 0.036 | +++ |
Δe > e (x = 15) | 0.3 | 0.0032 ± 0.0028 | ++− |
Expression parameters . | Fitness (Z) . | Random fitness . | CM . |
---|---|---|---|
Δe < e (x = 5) | 0.6 | 0.0045 ± 0.0035 | +++ |
Δe ∼ e (x = 10) | 0.4 | 0.0029 ± 0.036 | +++ |
Δe > e (x = 15) | 0.3 | 0.0032 ± 0.0028 | ++− |
‘Changed genes’ comprised 30% of the random sequences. Sites were implanted in these sequences only (P = 0.9). Matrices were the same as in the Table 1. High cut-offs were used for implantation. We assigned to each ‘changed gene’ an expression value randomly generated from the interval [10, 10 + x]. Other ‘non-changed’ genes got the lower expression value from the interval [0,x]. By varying x = 5, 10, 15 we simulated 3 variants of the data with gradually increased noise. ‘Random fitness’—values of fitness function obtained in the shuffling experiments.
Expression parameters . | Fitness (Z) . | Random fitness . | CM . |
---|---|---|---|
Δe < e (x = 5) | 0.6 | 0.0045 ± 0.0035 | +++ |
Δe ∼ e (x = 10) | 0.4 | 0.0029 ± 0.036 | +++ |
Δe > e (x = 15) | 0.3 | 0.0032 ± 0.0028 | ++− |
Expression parameters . | Fitness (Z) . | Random fitness . | CM . |
---|---|---|---|
Δe < e (x = 5) | 0.6 | 0.0045 ± 0.0035 | +++ |
Δe ∼ e (x = 10) | 0.4 | 0.0029 ± 0.036 | +++ |
Δe > e (x = 15) | 0.3 | 0.0032 ± 0.0028 | ++− |
‘Changed genes’ comprised 30% of the random sequences. Sites were implanted in these sequences only (P = 0.9). Matrices were the same as in the Table 1. High cut-offs were used for implantation. We assigned to each ‘changed gene’ an expression value randomly generated from the interval [10, 10 + x]. Other ‘non-changed’ genes got the lower expression value from the interval [0,x]. By varying x = 5, 10, 15 we simulated 3 variants of the data with gradually increased noise. ‘Random fitness’—values of fitness function obtained in the shuffling experiments.
In order to estimate the significance of the fitness values obtained in the analysis we have performed multiple shuffling experiments. In each such experiment we took all the expression values in the set and reassigned them to randomly chosen sequences. After that, we applied CMA to these shuffled samples and computed the average and standard deviation of the observed fitness values (Table 2, ‘random fitness’).
Next, we tested the functionality of the program on revealing pairs of matrices that reflect composite elements composed of two closely situated sites. One pair of sites was implanted into the 30% of random sequences (pair: V$AP4_01/V$MEF2_01; distance vary: 5–25; cut-offs ‘high’). After that, we tried to reveal this CM back by varying the parameters of the search (Table 3).
Parameters of the search of the CM structure . | Probability of site insertion . | ||
---|---|---|---|
. | 0.9 . | 0.6 . | 0.3 . |
1 pair | (++) | (++) | (++) |
1–3 pair | (++) | (++) | (++) |
0–1 pair, 0–2 single matrices | (++) | ++ | ++ |
Parameters of the search of the CM structure . | Probability of site insertion . | ||
---|---|---|---|
. | 0.9 . | 0.6 . | 0.3 . |
1 pair | (++) | (++) | (++) |
1–3 pair | (++) | (++) | (++) |
0–1 pair, 0–2 single matrices | (++) | ++ | ++ |
(++) indicates that the correct pair was found, ++ that two single matrices were revealed and they are correct components of the pair.
Parameters of the search of the CM structure . | Probability of site insertion . | ||
---|---|---|---|
. | 0.9 . | 0.6 . | 0.3 . |
1 pair | (++) | (++) | (++) |
1–3 pair | (++) | (++) | (++) |
0–1 pair, 0–2 single matrices | (++) | ++ | ++ |
Parameters of the search of the CM structure . | Probability of site insertion . | ||
---|---|---|---|
. | 0.9 . | 0.6 . | 0.3 . |
1 pair | (++) | (++) | (++) |
1–3 pair | (++) | (++) | (++) |
0–1 pair, 0–2 single matrices | (++) | ++ | ++ |
(++) indicates that the correct pair was found, ++ that two single matrices were revealed and they are correct components of the pair.
When the frequency of the pairs in the sequences is high (P = 0.9), the program is able to reveal the correct matrix pair under a wide range of search parameters. If the frequency is low, it becomes more difficult to predict the correct structure of the CM without sufficient knowledge on the expected structure which can help to set optimal parameters of the search. Anyway, in most of the cases the program is able to predict correctly the components of the CM even if its fine structure is not correctly predicted.
3.2 Analysis of promoters of co-regulated genes with the help of CMA
3.2.1 Test 1: T-cell specific genes
In order to check the ability of CMA to reveal known composite promoter modules we analyzed a set of promoters of T-cell specific genes that have been shown to be regulated by a very specific type of composite elements: NF-AT/AP-1. The set includes genes for several interleukins/(IL-1,4,5,8); their receptors, signaling molecules such as TNF-alpha, IFN and others. It is based on the collection of experimentally proven NF-AT/AP-1 composite elements in TRANSCompel database (Kel-Margoulis et al., 2002a). In our earlier work (Kel et al., 1999) we showed that the promoters of these genes contain many copies of the NF-AT/AP-1 composite elements. Here, we would like to check if CMA would be able to reveal these composite elements without having explicit knowledge about their composition. Results of this test are given in Figure 1.
The (V$AP1_Q4@0.872000<*V$NFAT_Q6@0.791000,3,1..14) line in the output describes a pair of matrices found by the system with all the parameters defined by the genetic algorithm. The values after the sign ‘@’ give the cut-offs for the corresponding matrices; ‘<*’ means that the orientation of the matches of the AP-1 matrix should have a definite orientation relative to the matches of the second matrix, whereas the orientation of the NF-AT is not fixed; values ‘3,1..14’ mean that the distance between matches of two matrices should be in the range from 1 to 14 bp and the maximal number of considered pairs in one window is limited to three pairs.
All these values found by the genetic algorithm correspond well to the known nature of NF-AT/AP-1 composite elements in T-cell specific promoters (Kel et al., 1999). The orientation of the matches was found correctly as well, since the crystal structure of the ternary complex of NF-ATp/AP-1-DNA reveals only one valid orientation of NF-ATp factor in this complex (Chen et al., 1998).
Another pair of matrices (V$PU1_Q6 − V$AP1_Q4) that was found by the program, in fact, corresponds to known composite elements PU.1/AP-1 described in promoters of several genes regulating their expression in immune cells—macrophages (see TRANSCompel acc: C00251 in the promoter of mouse macrosialin gene). Interesting to note that consensus of NF-AT binding sites (AGGAAA) is similar to that of PU.1 binding sites (AGGAAC).
In Figure 2 we show the histogram of the fps score of the revealed composite module in promoters of T-cell specific genes in comparison with the randomly generated sequences. We see clear differentiation of these two sets of sequences. Although not in all promoters we can find sites for both matrix pairs, we still are able to identify the main components of the specific gene regulatory modules in the considered immune cells.
3.2.2 Test 2: 11 Kb upstream regions: T-cell specific genes versus housekeeping genes
In order to demonstrate the ability of the CMA algorithm to analyze long regulatory regions we compiled 5′ regulatory regions of the T-cell specific genes used in the test above. We extracted 11 Kb regions around TSS (10 Kb upstream and 1 Kb downstream). To test the algorithm on the real genomic data we have prepared a background set of 11 Kb upstream regions of 100 housekeeping genes from the list derived from an analysis of public gene expression data (Eisenberg and Levanon, 2003).
By running CMA we again revealed correctly the NFAT/AP-1 composite elements [with parameters of two modules slightly different from the test above: ([email protected]<<[email protected],1,1..16) ∣ ([email protected]**[email protected],2,1..16)]. In Figure 3 we show two histograms of the fuzzy promoter scores of the upstream regions of T-cell specific and housekeeping genes, respectively. The two distributions differ significantly, giving rise to the high fitness value = 0.738.
3.2.3 Permutation test
Since the method introduced here relies on the optimization of parameters of the model with the genetic algorithm there is a risk of overfitting. To evaluate the potential overfitting effect we performed a permutation test. We did 100 random shuffling of the grouping labels of the data used in the Test 2 (11 Kb upstream regions: 20 T-cell specific genes versus 100 housekeeping genes). Running the CMA algorithm 100 times with the randomized data resulted in the distribution of the obtained values of the fitness function that is shown in Figure 4.
As one can see from the distribution, the value of fitness function obtained in the analysis of T-cell specific promoters (0.738; see above) is much higher than expected by random chance. This provides a good evidence against the risk of overfitting and in favor of the realness of the promoter models revealed by the analysis.
3.2.4 Test 3: yeast cell cycle
In another example we performed an analysis of gene expression data on yeast cell cycle taken from Spellman et al. (1998). We selected five sets of genes according to their expression in different cell cycle phases: genes specific for G1, S phases and S/G2, M/G1, G2/M transitional states. We retrieved promoters of the length 1100 bp (−1000, +100) for these genes and applied CMA program to find cell cycle phase-specific composite promoter models.
To estimate whether the results are consistent with the known facts we compared them with the experimental data on ChIP (chromatin immunoprecipitation) analysis and microarray gene expression data summarized in a recent publication (Kato et al., 2004). The parameters of the search were set to reveal a model, which consists of a single CM containing sites for 3–4 single matrices co-localized in a window of 200 bp. The total number of yeast matrices considered was 31 including all yeast matrices from TRANSFAC and several matrices derived from the DNA binding consensi for the corresponding factors published in Kato et al. (2004). In Table 4 we present the list of TFs whose weight matrices have been revealed by CMA as components of the best promoter models and compared these factors with the factors associated with the corresponding cell cycle phase in the mentioned paper (Kato et al., 2004).
Factors found by CMA for specific cell cycle phases are marked with ‘+’. Gray color of the table cells indicates this factor was shown experimentally to be involved in regulating genes in the corresponding cell cycle phases. Factors FKH1 and FKH2 are very similar to each other in their DNA pattern. That is why the results for these two factors are summarized together. One can see a very good agreement of the predictions made by CMA with the experimental knowledge.
4 CONCLUSION
In this paper we describe a novel method for the analysis and interpretation of gene regulatory regions. The method identifies composite modules—stable combinations of TF binding sites that are shared by the most of the co-regulated promoters. It is generally accepted that such modules are responsible for a function-specific regulation of transcription.
In comparison to most of the previously published approaches that consider combinatorics of TF binding sites, the method described here has several advantages, such as (1) capability to work with data of microarray experiments; (2) optimization not only of the matrix sets, but also of cut-off values for each matrix; (3) analysis of large regulatory regions; (4) search for pairs of matrices, selecting best distance and orientation.
Testing our method on simulated and real data has shown: (1) it is able to correctly reveal CMs that are overrepresented in the set of sequences; (2) it can be used to analyze data and propose factor combinations that are playing key roles in transcriptional regulation in the given biological context. In our previous work we have demonstrated that the combinatorial approach described here allows increasing significantly the precision of the computational prediction of _target genes for TFs (Kel et al., 2004a; Kel et al., 2004b). Application of this approach to the analysis of microarray gene expression data is very promising. The Composite Module Analyst is implemented now as a part of the commercial software system ExPlain™ that provides a wide range of tools and databases for causal interpretation of gene expression data.
The authors would like to thank all co-workers of the BIOBASE GmbH who have been contributing to the integration of CMA into the ExPlain™ system. Parts of the work were funded by grant of the German Ministry of Education and Research (BMBF) together with BioRegioN GmbH ‘BioProfil’, Grant No. 0313092; ‘Intergenomics’ Project, Grant No. 031U110C/031U210C; by European Commission under FP6-‘Life sciences, genomics and biotechnology for health’ contract LSHG-CT-2004-503568 ‘COMBIO’; by European Commission under ‘Marie Curie research training networks’ contract MRTN-CT-2004-512285 ‘TRANSISTOR’ and by INTAS Grant No. 03-51-5218. Funding to pay the Open Access publication charges for this article was provided by BIOBASE GmbH.
Conflict of Interest: none declared.
REFERENCES
Author notes
Associate Editor: Thomas Lengauer