Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001 Sep;11(9):1520-6.
doi: 10.1101/gr.190501.

Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST data

Affiliations

Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST data

E Beaudoing et al. Genome Res. 2001 Sep.

Abstract

Alternate polyadenylation affects a large fraction of higher eucaryote mRNAs, producing mature transcripts with 3' ends of variable length. This variation is poorly represented in the current transcript catalogs derived from whole genome sequences, mostly because such posttranscriptional events are not detectable directly at the DNA level. Alternate polyadenylation of an mRNA is better understood by comparison to EST databases. Comparing ESTs to mRNAs, however, is a difficult task subjected to the pitfalls of internal priming, presence of intron sequences, repeated elements, chimerical ESTs or matches with EST from paralogous genes. We present here a computer program that addresses these problems and displays ESTs matches to a query mRNA sequence to predict alternate polyadenylation and to suggest library-specific forms. The output highlights effective polyadenylation signals, possible sources of artifacts such as A-rich stretches in the mRNA sequences, and allows for a direct visualization of EST libraries using color codes. Statistical biases in the distribution of alternative mRNA forms among EST libraries were systematically sought. About 1450 human and 200 mouse mRNAs displayed such biases, suggesting in each case a tissue- or disease-specific regulation of polyadenylation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
EST-parser output for the 3′ untranslated region of a zinc-finger DNA-binding protein mRNA (EMBL accession no. D45132, Muraosa et al. 1996). The red line on top represents the query sequence. Potential poly(A) signals are shown with colored boxes: blue, AAUAAA signals; orange, AUUAAA signals; green, other alternate signals. The next line indicates regions masked for their unspecific content (low complexity, vectors, mammalian repeats) using a thickened line, and potential internal priming sites (adenine stretches) are indicated by open circles. Vertical broken lines indicate putative polyA sites. When a signal is present, the vertical line has the same color as the signal box, otherwise, the line is grey. Each EST is then represented by a horizontal line incorporating information by means of a color code. EST coloring is made according to the organ system of the EST library (see Table 1). Color coding is as follows: olive, cell line; lime, central nervous system; fuschia, connective tissues; orange, digestive system; green, endocrine glands; dark slate blue, exocrine glands; blue, immune system; purple, mixed tissues; yellow, peripheral nervous system; aqua, respiratory system; maroon, squelettic; pink, skin; grey, unknown; navy, uro-genital; red, vascular system. The EST line also shows dangling ends of 20 nt or more (dots at extremities); 5′ to 3′ direction of EST sequence (arrow at extremity); and possible evidence of library-specific 3′ end (black box around EST line). Asterisks indicate ESTs from normalized or subtracted libraries. In the Web interface, additional library information is available by sliding the mouse over any EST in the chart. Organ name and Library Id. Will appear in a pop-up box (using Microsoft Internet Explorer) or at the bottom of the window (using Netscape), along with various information on the EST match, such as: Genbank ID of EST, dbEST library Id, tissue name, disease/normal state, EST length, percent identity with query sequence, coordinates for query and EST, signal type, signal position on query, and presence or absence of A/T tail on EST.
Figure 2
Figure 2
EST-parser output for the 3′ untranslated region of mRNA for KIAA0764 protein (EMBL entry AB018307). See Figure 1 legend for color codes.

Similar articles

Cited by

References

    1. Agresti A. A survey of exact inference for contingency tables. Stat Sci. 1992;7:131–153.
    1. Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, Lipman D. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Beaudoing E, Freier S, Wyatt J, Claverie JM, Gautheret D. Patterns of variant polyadenylation signals in human genes. Genome Res. 2000;10:1001–1010. - PMC - PubMed
    1. Boguski MS, Lowe TM, Tolstoshev CM. dbEST—database for expressed sequence tags. Nat Genet. 1993;4:332–333. - PubMed
    1. Colgan DF, Manley JL. Mechanism and regulation of mRNA polyadenylation. Genes & Dev. 1997;11:2755–2766. - PubMed

Publication types

  NODES
chat 1
coding 2
HOME 2
Intern 3
Javascript 1
os 17
text 12
twitter 2
visual 1
web 6