Adenosine‐to‐inosine (A‐to‐I) RNA editing increases the complexity of the human transcriptome and is essential for maintenance of normal life in mammals. Most A‐to‐I substitutions occur within repetitive elements in the genome, mainly in Alu repeats. The phenomenon of A‐to‐I editing is far less abundant in mice, rats, chickens and flies than in humans, which correlates with the relative under‐representation of Alu repeats in these non‐primate genomes. Here, we review the recent results of bioinformatic and laboratory approaches that have estimated the extent of the editing phenomenon. We discuss the possible biological relevance of the editing pathway, its possible interaction with other cellular pathways that respond to double‐stranded RNA and its possible contribution to the accelerated evolution of primates.
RNA editing markedly increases the complexity of the human transcriptome. The essence of this widespread phenomenon is the enzymatic alteration of single or multiple nucleotides, either in coding or non‐coding sequences of RNA, which can occur concomitantly with transcription and splicing processes.
Adenosine‐to‐inosine (A‐to‐I) nucleoside modification is performed by the adenosine deaminases that act on RNA (ADAR) family of proteins and, until recently, it was believed to occur in only a small number of genes. The outcome of this editing is usually amino‐acid substitution in the resulting protein or the formation of a new transcript through the alteration of splice signals. In some cases, editing is functionally important and, accordingly, the disruption of ADAR genes in animal models is incompatible with normal life (Higuchi et al, 2000; Wang et al, 2000; Hartner et al, 2004). Several groups have recently used various computational approaches that have predicted A‐to‐I editing to be far more abundant than previously thought, and that it in fact affects thousands of human genes. It mostly occurs in non‐coding regions—that is, introns and untranslated regions (UTRs)—and is preferentially clustered in Alu repetitive elements (Athanasiadis et al, 2004; Blow et al, 2004; Kim et al, 2004; Levanon et al, 2004). A‐to‐I substitutions are significantly less abundant in mice, rats, flies and chickens than in humans, mainly owing to the low representation of Alu repeats in these genomes (Kim et al, 2004; Eisenberg et al, 2005). This review summarizes our current knowledge about A‐to‐I editing within Alu repeats, possible clues to the biological function of the human ‘editosome’, and the directions of future research efforts in this area.
The human editosome
There are two known families of RNA‐editing enzymes in humans: the ADAR family, as mentioned above, and the apoB mRNA‐editing catalytic peptide (APOBEC) family, which includes the related activation‐induced deaminases (AID) family and which induce the cytosine (C) to uracil (U) transformation in RNA and DNA. Both enzyme families have a highly homologous zinc‐coordinating catalytic domain, which emerged from a common cytidine deaminase during evolution. Interestingly, the APOBEC family underwent marked expansion in the primate genome owing to duplication events: 11 APOBEC enzymes were identified in humans compared with only four in mice (Jarmuz et al, 2002; Wedekind et al, 2003). C‐to‐U editing is involved in several important biological functions: the creation of two variants from the same gene, such as in the human apoB and NF1 (Skuse et al, 1996) mRNA; protection against retrotransposition (Esnault et al, 2005); antiviral activity, including protection against human immunodeficiency virus (HIV; Sheehy et al, 2002) and hepatitis B virus (HBV; Turelli et al, 2004); and the diversification of the vertebrate antibody repertoire (Nussenzweig & Alt, 2004).
The ADAR family acts on double‐stranded RNA (dsRNA), which is formed by the coupling of complementary regions within a single transcript. Adar1‐knockout mice show an embryonic lethal phenotype (Wang et al, 2000; Hartner et al, 2004), whereas Adar2‐knockout mice suffer from convulsions and die prematurely (Higuchi et al, 2000). A third enzyme, ADAR3, contains both single‐ and double‐stranded RNA‐binding domains, although its deaminating activity has not yet been proven (Chen et al, 2000).
The nucleoside inosine (I) is interpreted as guanosine (G) by the translation and splicing machinery, therefore there are several biological consequences of A‐to‐I substitutions: a change of codon, which leads to amino‐acid substitution (as exemplified by the glutamate receptor GluR‐B; reviewed in Seeburg et al, 1998); the insertion or elimination of a splice site (Rueter et al, 1999); or, potentially, the disruption of a stop codon. In addition to these recoding events, A‐to‐I editing might have other regulatory effects. For example, interferon‐induced ADAR1 is active on viral RNA (Patterson & Samuel, 1995). Although it has been proposed that viruses benefit from A‐to‐I substitutions (Polson et al, 1996), massive hyper‐editing of viral transcripts could be a cellular protection mechanism (Scadden & Smith, 1997). Recent works have also emphasized the multitude of A‐to‐I editing events in repetitive elements in the genome, the meaning of which has not yet been elucidated.
Repetitive elements in the human genome
Almost half of the human genome comprises mobile repetitive elements that were introduced by retrotransposition through RNA intermediates (Lander et al, 2001). These repetitive elements encompass autonomous long terminal repeats (LTRs), long interspersed elements (LINEs), L1 elements, processed pseudogenes and, most abundantly, non‐autonomous short interspersed elements (SINEs). SINEs depend on the open reading frame of LINEs for amplification, as they have no protein coding ability (Dewannieux et al, 2003). By far the most common SINE in humans is the primate‐specific Alu sequence, which, at more than one million copies, accounts for greater than 10% of the human genome. The Alu element is about 280 bp long and is composed of two similar monomers connected by an A‐rich region.
During the early steps of primate evolution, the rate of Alu amplification was about one for every primate birth (Shen et al, 1991) but this rate has now decreased by two orders of magnitude (Deininger & Batzer, 1999). Indeed, several thousand Alu elements have integrated into the human genome since the divergence of humans from the African apes (Roy et al, 1999), although few are potentially still active (Deininger et al, 1992).
Being clustered in gene‐rich regions, Alu repeats are able to modify genes through the insertion of mutations, gene conversion and recombination. They can also disrupt promoter regions, change methylation status, insert new regulatory features, interfere with alternative splicing and regulate the initiation of translation (Deininger & Batzer, 1999). Alu elements can also be incorporated into exons (Lev‐Maor et al, 2003) and therefore directly influence the open reading frame in a mature mRNA. Intriguingly, A‐to‐I editing of Alu repeats in the primate genome is a unique phenomenon, which implies that it is connected to the evolution of this mammalian branch.
Large‐scale survey of editing in human transcripts
Until recently, despite the detection of high levels of inosine in mRNAs (Paul & Bass, 1998), only a few edited transcripts had been characterized. Using an I‐specific cleavage reaction, Morse et al (2002) were the first to conduct a targeted search for additional A‐to‐I substitutions and revealed clusters of editing sites in 19 mRNAs derived from human brain. Of the clusters, 15 out of 19 occurred in repetitive elements, mainly in Alu sequences, within introns and UTRs. In the same year, clusters of predominantly A‐to‐G substitutions within full‐length mRNAs were also reported by the human unidentified gene‐encoded (HUGE) protein database (Kikuno et al, 2002).
Three independent groups recently launched systematic searches using computational algorithms, and these corroborated the existence of abundant A‐to‐I substitutions, mainly in Alu repetitive elements (Fig 1). All of these methods are based on the alignment of RNA transcripts with the genome, the identification of mismatches and the elimination of sequencing errors, single‐nucleotide polymorphisms (SNPs), mutations and other ‘contaminating mismatches’. The main difference between the three approaches is their use of either the database of more than five million expressed sequence tags (ESTs), which has up to 3% sequencing errors (Hillier et al, 1996), or the more accurate, but smaller, set of longer mRNAs (about 100,000 transcripts).
One group extracted 429,000 candidate dsRNA regions from 14,512 different genes (Levanon et al, 2004). Laborious cleaning procedures yielded 12,723 A‐to‐I substitutions in 1,637 human genes. A second method (Kim et al, 2004) revealed 30,085 A‐to‐I substitutions in 2,674 human transcripts. The third study (Athanasiadis et al, 2004) screened 103,723 human mRNAs for clusters of A‐to‐I substitutions, and searched for Alu repeats in the edited transcripts. They identified 1,445 human mRNAs that are subject to RNA editing at more than 14,500 sites. These discrepancies might have arisen because of differences in fidelity between the methods used in each search. In another study, Blow et al (2004) conducted a survey of RNA editing by comparing sequences from a library of human brain cDNA clones that cover a 3 Mb genomic locus with the reference human genome sequence and genomic DNA of the same individual.
In all four studies, A‐to‐I substitutions formed more than 80% of the 12 possible types of mismatch in the selected set of transcripts. Virtually all editing events occurred within Alu repetitive elements, which formed only 20% of the total length of the tested transcripts. More than 50% of the substitutions occurred in 3‘‐UTR regions, 12% in 5’‐UTRs and 33% in introns. Many of the edited exons showed non‐conformist characteristics, such as intron retention, extension beyond the known UTR and a marked deviation from those listed in RefSeq, all of which are related to defective splicing (Kim et al, 2004).
The characterization of a typical editing site is not straightforward as there is no consensus sequence to which the editing complex binds. Examining nucleotide triplets centred around the edited A, it is possible to identify more favourable sites: C and T are significantly overrepresented one nucleotide 5‘ to the edited A, whereas G is underrepresented; at the 3’ nucleotide, G is significantly overrepresented. This finding is consistent with the previously described pattern of action of both ADAR1 and ADAR2 (Lehmann & Bass, 2000). Focusing on the Alu sequence, there are several hotspots that are prone to editing. In particular, the A nucleotides at positions 27, 28, 136 and 162 of the consensus Alu‐J sequence account for 25% of the editing events within Alu repeats (Levanon et al, 2004). The internal poly‐A region is also prone to editing.
The level of editing depends strongly on tissue type. Higher levels of editing have been reported in different brain tissues and in the thymus, whereas lower editing levels were generally found in transformed cell lines (Athanasiadis et al, 2004; Kim et al, 2004; Levanon et al, 2004; Eisenberg et al, 2005).
An absolute prerequisite for the editing process is the temporary existence of a dsRNA. Therefore, two groups have investigated the correlation between the proximity of an inverted Alu repeat with the rate of A‐to‐I substitution (Athanasiadis et al, 2004; Blow et al, 2004). They found that the distance of an Alu sequence to an adjacent inverted Alu is indeed shorter for edited Alu repeats than for unedited ones. The shorter the distance (up to 2 kb), the more probable is the editing of the Alu element. Moreover, the editing frequency correlates with the number of Alu sequences in opposite orientation and close proximity.
Nearly all A‐to‐I substitutions result in changing the stability of the dsRNAs formed by Alu sequences. About 80% resulted in the destabilization of an A‐U pair (changed to I‐U) and 20% in the stabilization of an A‐C pair (changed to I‐C; Blow et al, 2004; Levanon et al, 2004). Nevertheless, the proportion of editing events that ‘correct a mismatch’ within dsRNA structures significantly outnumbers the prevalence of such mismatches, a fact that substantiates previous claims about the stabilizing effect of editing (Wong et al, 2001).
Despite the use of different computational algorithms to estimate their prevalence, all studies conclude that A‐to‐I substitutions within Alu repeats are widespread. Validation of the results by concomitant sequencing of both DNA and RNA of candidate genes predicted to have a cluster of editing sites, also adds to the degree of confidence in the computational results. However, the prevalence of the editing phenomenon is still probably underestimated owing to the use of strict ‘cleaning’ procedures of the sequences, as demonstrated by the only partial overlap between various editing data sets; a failure to predict already known editing sites (within coding regions); and the existence of more editing sites in highly edited sequences than predicted. Direct sequencing of 3 Mb of human brain cDNA has shown that the average editing rate within intronic and intergenic regions is approximately 1:1,000 bp (Blow et al, 2004). Furthermore, the prevalence of Alu elements in the human genes suggests that most human gene transcripts are subject to A‐to‐I RNA editing (Athanasiadis et al, 2004).
It is worth noting that in addition to the A‐to‐I substitutions that occur in poorly evolutionarily conserved Alu repeats, five novel editing targets were recently identified by computational methods that targeted the most evolutionarily conserved regions of the genome across various species (Hoopengardner et al, 2003; Clutterbuck et al, 2005; Levanon et al, 2005). Thus, editing is found in both extremes of the evolutionary conservation curve.
Primate specificity of editing
All mammalian genomes contain a high proportion of retro‐elements, although their character and number vary significantly. The dominance of Alu among the repetitive elements in the human genome is unique. All mammalian genomes sequenced so far have some LTRs or LINEs to support propagation of non‐autonomous SINEs; however, it is possible that some species have not created suitably efficient retroelements or acquired potent means for negative selection against highly active elements. The mouse genome, for instance, contains four types of SINE besides Alu repeats, and their total number is similar to that of Alu elements alone in humans (Waterston et al, 2002). Hence, the formation of dsRNA by two adjacent, identical and inverted SINEs within a single transcript is far more likely to occur in the human genome.
A natural extension of this is to examine whether the editing phenomenon is primate specific. The computational algorithms that were used to explore A‐to‐I editing in humans were applied to the mouse, rat, chicken and fly transcriptome databases to estimate the abundance of editing in Alu‐deficient organisms. Indeed, there are about 40‐fold less A‐to‐I substitutions in the mouse 4 million mRNA and ESTs data set (Eisenberg et al, 2005). By comparing human and mouse mRNA sequences with their genomes, a 35‐fold increase in editing events in humans was found (Fig 2). Furthermore, the number of multi‐edited transcripts in humans is estimated to be about 30‐fold higher than in mice (Kim et al, 2004; Eisenberg et al, 2005). Therefore, the available data imply that abundant editing might be primate specific and strongly interlinked with Alu sequences (Fig 3).
The role of editing
The alteration of A‐to‐I editing has been ascribed to several pathological conditions, mainly to central nervous system‐related abnormalities such as amyotrophic lateral sclerosis (ALS; Kawahara et al, 2004), epilepsy (Brusa et al, 1995), major depression disorder (MDD; Gurevich et al, 2002) and glioblastoma multiforme (GBM; Maas et al, 2001). Editing therefore could have physiological significance, but its precise role is still speculative.
Editing in coding regions can result in amino‐acid substitutions, but what might be the effects of editing in noncoding parts of the transcript? First, heavily edited transcripts are often retained in the nucleus. This might be interpreted as a means of protection against abnormal transcripts (Zhang & Carmichael, 2001). The fact that clusters of editing sites are abundant in splicing‐defective transcripts also support this assumption (Kim et al, 2004). Furthermore, I‐specific cleavage of RNAs can lead to the selective destruction of edited RNAs (Scadden & Smith, 2001). Second, non‐viral dsRNA can enter the RNA interference (RNAi) pathways for either cleavage and degradation or RNA‐mediated gene silencing at the heterochromatin level. As RNA editing alters the stability of dsRNA complexes, it can augment or counteract these RNAi mechanisms (Knight & Bass, 2002). Third, a direct role for editing in gene silencing was recently suggested from the observation of heterochromatin localization of complexes that contain the Vigilin protein, I‐rich transcripts and ADAR1 enzymes (Wang et al, 2005). Fourth, editing can also be viewed as an anti‐retroelement mechanism. In fact, the abundance of editing in repetitive elements such as Alu and the evidence for the involvement of APOBEC3G (Esnault et al, 2005) in anti‐retrotransposition activity raises the possibility that A‐to‐I editing defeated the enormous deleterious spread of Alu in the human genome. Finally, editing in 3’ UTR regions and introns might alter the stability, transport and handling of RNA transcripts by the translation machinery.
With deeper insight into other aspects of the dsRNA phenomena, an additional potential source of dsRNA are putative RNA complexes resulting from the abundant sense‐antisense transcript pairs (Yelin et al, 2003). However, almost all known A‐to‐I substitutions have occurred in dsRNA that has arisen from intra‐molecular pairing within the transcript, leaving this option as anecdotal at most (Athanasiadis et al, 2004). Yet another source of dsRNA are primary and precursor microRNA molecules, which are underrepresented in RNA databases. Indeed, there is emerging evidence for post‐transcriptional editing of these regulatory RNA transcripts (Luciano et al, 2004).
As editing controls the structure and stability of the dsRNA, it could cause a transition between distinct secondary structure conformations of a transcript. The increased frequency of editing in Alu elements with more than one inverted neighbouring Alu supports this idea.
Current studies are aimed at revealing the roles and mechanisms of RNA editing. Which ADAR enzyme is responsible for editing the Alu repeats? Is there any cross talk between different pathways that involve dsRNA? Can RNA editing change the stability of dsRNA in a manner that affects its further processing? How relevant is the mechanism of editing to the understanding of disease? What is the exact role of editing as an anti‐retroviral and anti‐retroelement tool? Does the remarkable diversity of the transcriptome as a result of editing have any meaning at all? And— maybe most intriguingly—did editing of primate repetitive elements contribute to the evolution of mankind?
We thank R. Sorek for helpful discussions, and M.F. Jantsch for critical reading of the manuscript. E.E. is supported by an Alon fellowship at Tel Aviv University.
- Copyright © 2005 European Molecular Biology Organization