- Split View
-
Views
-
Cite
Cite
Anmol M. Kiran, John J. O'Mahony, Komal Sanjeev, Pavel V. Baranov, Darned in 2013: inclusion of model organisms and linking with Wikipedia, Nucleic Acids Research, Volume 41, Issue D1, 1 January 2013, Pages D258–D261, https://doi.org/10.1093/nar/gks961
- Share Icon Share
Abstract
DARNED (DAtabase of RNa EDiting, available at http://darned.ucc.ie) is a centralized repository of reference genome coordinates corresponding to RNA nucleotides having altered templated identities in the process of RNA editing. The data in DARNED are derived from published datasets of RNA editing events. RNA editing instances have been identified with various methods, such as bioinformatics screenings, deep sequencing and/or biochemical techniques. Here we report our current progress in the development and expansion of the DARNED. In addition to novel database features the DARNED update describes inclusion of Drosophila melanogaster and Mus musculus RNA editing events and the launch of a community-based annotation in the RNA WikiProject.
INTRODUCTION
The Database of RNA editing (DARNED) was created in 2010 as an effort to provide a centralized repository for observable RNA editing events in the Homo sapiens transcriptome. The initial release of the database provided access to ∼42 000 coordinates in the human reference genome which have been identified or predicted to undergo RNA editing process upon transcription (1).
RNA editing results in localized alterations in RNA sequences relative to their genomic templates. It is ubiquitous to almost all known life forms (2–8). A-to-I RNA editing (Deamination of Adenosine to Inosine) is the most common type of editing in organisms with a developed central nervous system (CNS) (2,9–13). Inosine is recognized as Guanosine by ribosomes and reverse transcriptase due to its base pairing with Cytidine (14). In H. sapiens, A-to-I editing dominates in transcripts from the repetitive regions of the genome, especially from Alu elements (15–17). However, it also occurs in non-repetitive regions and affects sequences of small non-coding RNAs (18–20) and pre-mRNAs. The latter event can generate alternative transcript variants by affecting splice junctions (21,22) or synthesize alternative protein isoforms by non-synonymous codon substitutions in coding regions of mRNA (23–25). In addition to A-to-I editing, a few examples of C-to-U RNA editing (Deamination Cytidine to Uridine) are also well-established in humans (26–28). Recently the existence of all other 10 possible discrepancies between RNA and DNA has also been reported (28). This study generated a major controversy, as many follow-up analyses demonstrated that the dataset contains large number of false positives (29–32). Therefore, these data are not included in the current update of DARNED. Nonetheless, RNA:DNA discrepancies (other than A-to-G and C-to-U) can be found within more confident datasets (33–35) which are included into the update. The absence of plausible mechanisms for RNA editing corresponding to these discrepancies combined with the presence of false positives even in high-confident datasets suggests that corresponding editing events are spurious. However, because we cannot exclude a possibility that these events are real, we felt that their inclusion in DARNED is important and appropriate. We hope that this will stimulate exploration of the true nature of these discrepancies.
In addition to improving DARNED interface, we expanded the database by including RNA editing data from mouse and flies (36,37). Inequalities in the functional significance of particular RNA editing events prompted us to start detailed annotation of events whose functions are well-established. Wikipedia was chosen as a platform for this purpose due to its popularity and convenience for community-based annotations (38). Integration of Wikipedia with DARNED through bidirectional hyperlinking is another novel feature of DARNED.
NOVEL DATABASE INTERFACE
Now the user can choose whether to search RNA editing in H. sapiens, in Drosophila melanogaster or in Mus musculus. As in the original version of the database, the search can be performed either by specifying the genomic coordinate range or the RefGene/RefSeq IDs (1). To facilitate new sequence-based search, we integrated the BLAST engine (39) which allows the retrieval of the genomic regions based on its similarity to a user-specified query sequence. The BLAST output is parsed and combined with the data on known RNA editing sites (Figure 1). We presume that this feature is particularly useful for researchers who are interested in exploring RNA editing in species closely related to those that are in DARNED. The sequence-based search is also useful when a user explores known RNA editing sites in transcripts whose genomic templates are unknown or difficult to map, e.g., when the sequence shares similarity with multiple regions of the genome.
The DARNED database is linked to the UCSC Genome Browser (40) and ENSEMBL (except for hg18 and mm9 assemblies) (41), so that each RNA editing event could be explored in conjunction with annotation tracks within these genome browsers. The DARNED data from the previous update are also directly available in the UCSC browser. ENSEMBL receives up-to-date date DARNED information through a dedicated Distributed Annotation System (DAS) server (42) for all assemblies, see AVAILABILITY AND LICENSE section.
Past experiences have demonstrated that available EST data are not strongly associated with RNA editing events and are often misleading—for most coordinates in DARNED, EST data are not available. ESTs derived from ultra-edited RNAs could not be aligned to the genome without special pre-processing (43). Not surprisingly, no correlation was found between occurrence of bona fide RNA editing sited and mismatches in EST alignments (44). Our aim is to provide users with the most relevant information in the simplest form as possible, therefore, EST data have been removed. For this reason, EST ID-based searches cannot be performed in a current version of DARNED. However, users can still obtain information on potential RNA editing sites matching particular EST sequences by performing a sequence-based search in DARNED.
NEW DATA
The newly available human-related data were collected from literature-published since the original launch of the DARNED database (33–36,43–48). As described in the next paragraphs, the data were generated with a variety of methods. At present, we do not provide any information on reliability of the methods, current version of DARNED provides access for the reported sites without estimation of their trustworthiness.
Sakurai et al. (44) developed a new technique named Inosine Chemical Erasing (ICE) to convert Inosines to N1-cyanoethylinosines in RNAs by treating it with acrylonitril using Michael addition reaction. This modification inhibits Watson–Crick base pairing of Inosine with Cytidine, which leads to truncated cDNA synthesis from edited RNA by reverse transcriptase. Comparison of the chromatograms for cDNA from acrylonitrile-treated RNA pool with normal RNA provides visual evidence of RNA editing. He et al. (45) used a novel computational approach to predict possible A-to-I editing by finding mRNA/EST and reference genome discrepancies in humans for selected tissues. Carmi et al. (43) also used a novel computational approach to predict possible RNA editing cases. They used >250 bp mRNA/ESTs high-quality sequences which were not aligned to the human reference genome. These unaligned sequences were realigned with the genomic sequence after masking EST and genomic sequences (by converting all A–G in genomic sequence) using MEGABLAST for all possible strand combinations. Original sequences were recovered for masked ESTs with good alignment, and mismatches were examined with statistical filters (43). Ramaswami et al. (35) applied a computational pipeline to predict RNA editing using RNA and DNA deep sequencing data from the same samples. Close to 150 000 RNA editing sites were predicted for lymphoblastoid cell line GM12878 and close to 500 000 RNA editing sites were predicted as a result of re-examination of Peng et al. data (34) who originally reported about 22 000 cases. Graveley et al. (36) and Rodriguez et al. (49) reported ∼2300 RNA editing sites in D. melanogaster. In mice, several studies (37,48,50,51) reported ∼8500 sites, combined.
LAUNCH OF A COMMUNITY-BASED ANNOTATION
Like a mutation, an RNA editing event may have different effects on the fitness of the organism. In H. sapiens, most of the known RNA editing events occur in repetitive regions of the genome and presumably have little or no effect on fitness. Such sites are often termed promiscuous RNA editing sites (13,52). However, a number of RNA editing events are known to play distinctive functional roles by generating protein isoforms that cannot be directly derived from the genomic sequences by the means of standard transcription, processing and translation (53,54). Apparent difference in the significance of promiscuous and recoding RNA editing sites justifies unequal attention to these sites by researchers and consequently requires differential annotation of such edits. Recoding RNA editing demands highly elaborated annotations. However, the incorporation of such annotations directly into DARNED would lead to substantial non-uniformity. To solve the problem we decided to use Wikipedia (38) as an external media for the annotation of recoding RNA editing sites with bidirectional links to DARNED. We have generated Wikipedia subsection entries on RNA editing for 16 genes and for Alu repeats, see Table 1. Given the success of community-based annotation in other databases, such as Rfam (55) and the overall growth of the RNA WikiProject (56), we hope that RNA editing representation on Wikipedia will expand and improve.
IMPLEMENTATION
The updated DARNED is designed using Python–Django web framework, with MySQL as the back-end database and published using Apache web server.
FUTURE PLANS
RNA editing datasets are highly heterogeneous due to the diversity of the methods used for the identification of RNA editing sites. The degree of confidence for particular RNA editing events varies not only between datasets, but also within datasets, e.g. due to differences in sequence depth. Besides incorporation of novel data into DARNED, the most important future goal of DARNED is the development of a grading scheme to measure confidence of RNA editing sites that will display the confidence rating of RNA editing sites. Other plans include annotating functional RNA editing cases through Wikipedia and engaging the scientific community in this process.
AVAILABILITY AND LICENSE
DARNED usage and redistribution is governed by Creative Commons Attribute-Non-commercial-Share Alike License. The Database is freely accessible at http://darned.ucc.ie. DARNED DAS servers for all species and assemblies are available at http://darned.ucc.ie:8000/.
FUNDING
Science Foundation Ireland Principle Investigator Award [06/IN.1/B81 to P.V.B.]; Wellcome Trust [094423 to P.V.B.]. Funding for open access charge: Wellcome Trust [094423 to P.V.B.].
Conflict of interest statement. None declared.
ACKNOWLEDGEMENTS
We are grateful to Paul Gardner for introducing us to WikiProject RNA and to all DARNED users for their critical suggestions and encouragement. We also thank Audrey Michel and Patrick O’Connor for their help during preparation of this manuscript, as well as many DARNED users for their useful suggestions. The development of the database was supported by Science Foundation Ireland.
Comments