The retroviral integrase superfamily (RISF) comprises numerous important nucleic acid‐processing enzymes, including transposases, integrases and various nucleases. These enzymes are involved in a wide range of processes such as transposition, replication and repair of DNA, homologous recombination, and RNA‐mediated gene silencing. Two out of the four enzymes that are encoded by the human immunodeficiency virus—RNase H1 and integrase—are members of this superfamily. RISF enzymes act on various substrates, and yet show remarkable mechanistic and structural similarities. All share a common fold of the catalytic core and the active site, which is composed primarily of carboxylate residues. Here, I present RISF proteins from a structural perspective, describing the individual members and the common and divergent elements of their structures, as well as the mechanistic insights gained from the structures of RNase H1 enzyme complexes with RNA/DNA hybrids.
See Glossary for abbreviations used in this article
RNase H1 was the first retroviral integrase superfamily (RISF) enzyme for which a three‐dimensional structure was solved (Table 1; Katayanagi et al, 1990; Yang et al, 1990). Since then, the characteristic fold of the catalytic cores of RISF proteins has been known as the ‘RNase H fold’. RNases H bind to RNA/DNA hybrids in a sequence‐nonspecific manner and degrade the RNA strand. Two groups of RNases H have now been identified: type 1 (RNase H1 or HI) and type 2 (RNase H2 or HII) enzymes. RNase H1 enzymes are present in all forms of life from bacteria to animals, as well as in retroviruses in which they constitute a domain of reverse transcriptases. RNase H1‐null mice die during embryonic development because the enzyme is essential for mitochondrial DNA replication (Cerritelli et al, 2003). RNase H1 enzymes have also been implicated in the removal of the RNA primers that are used to start the synthesis of Okazaki fragments during DNA replication (Kogoma & Foster, 1998), although they are not essential for this process. The RNase H activity of retroviral reverse transcriptases—in particular that of HIV—is essential for viral replication, where it has a crucial role in the conversion of viral genomic RNA into double‐stranded DNA, which is subsequently integrated into the host genome (Schultz & Champoux, 2008). The structures of viral, bacterial and human RNase H1 enzymes have now been solved, including those in complex with RNA/DNA hybrid substrates (Table 1; Nowotny et al, 2005; Nowotny et al, 2007).
RNase H2 enzymes have different substrate specificity and biochemical properties (Ohtani et al, 1999). Most importantly, type 2 enzymes cleave preferentially at the 5′ end of RNA in chimeric DNA–RNA–DNA/DNA hybrids. They can hydrolyse this substrate even if it contains only a single ribonucleotide, whereas type 1 enzymes require at least four ribonucleotides (Ohtani et al, 1999). Therefore, RNase H2 enzymes are thought to be involved in the removal of single ribonucleotides that have been misincorporated into DNA (Rydberg & Game, 2002). Mutations of the human enzyme lead to Aicardi–Goutières syndrome, which is an autosomal recessive genetic disorder that severely affects the nervous system and has symptoms that are reminiscent of those caused by in utero viral infection (Crow et al, 2006). Eukaryotic RNase H2 enzymes contain three subunits ( Jeong et al, 2004), whereas bacterial and archaeal RNase H2 enzymes are monomeric (Chapados et al, 2001; Lai et al, 2000).
Bacterial RuvC resolvase is another member of the RISF. It cleaves the four‐way DNA structures called Holliday junctions that are intermediates of the homologous‐recombination process (Bennett et al, 1993). RuvC is a dimeric enzyme that cleaves the Holliday junction at two strands of the same polarity (Dunderdale et al, 1991). Substrate binding does not depend on the DNA sequence, but the cleavage is specific and occurs only at an (A/T)TT↓(G/C) cognate sequence (Shida et al, 1996).
The transposases of the DDE family are also members of the RISF (Polard & Chandler, 1995), and catalyse the movement of DNA fragments called transposons from one location to another within or across genomes using a single active site. This multi‐step reaction begins with the hydrolysis of both transposon ends, which generates free 3′‐OH groups; the strategies that are used to produce this intermediate vary (Curcio & Derbyshire, 2003). Transposases such as MuA and Tn3 nick the transposon ends and join them with the target sequence forming a branched intermediate, which is later resolved by DNA replication. Another strategy that is often used by bacterial transposases such as Tn5 is the formation of hairpins on the ends of the transposon (Fig 1A). In this case, the transposon end is initially nicked in a hydrolysis reaction and the liberated 3′‐OH groups perform a nucleophilic attack on the phosphate of the other strand, thereby forming a hairpin. The hairpin is resolved by a repetition of the first hydrolysis step and the product is a linear excised transposon with free 3′‐OH groups, which is then joined with the target DNA by a nucleophilic attack of the 3′‐OH groups on a phosphate group of the target DNA.
The structure of the bacteriophage MuA transposase was the first to show that some transposases contain the RNase H fold (Rice & Mizuuchi, 1995); since then, several other structures of prokaryotic and eukaryotic transposases have been reported (Table 1). The related RAG1 protein—responsible for reshuffling the V(D)J segments during antigen–receptor gene assembly—is also thought to belong to the RISF on the basis of secondary‐structure prediction and the identification of active‐site residues (Kim et al, 1999; Landree et al, 1999).
Integrases catalyse the insertion of reverse‐transcribed retroviral DNA into the host genome (Chiu & Davies, 2004). This reaction usually involves two steps: 3′‐end processing and strand transfer. For example, in the case of HIV‐1 integrase, the end processing consists of the removal of a terminal GT dinucleotide to produce a 5′ overhang and a 3′ end with free OH group. In the strand‐transfer reaction, the target DNA—the genomic DNA of the host cell—is cleaved and the 3′ ends of the DNA that will become integrated are joined with the target. Integrases were found to be related to RNases H only when the structure of the HIV integrase was solved (Table 1; Dyda et al, 1994).
A recent addition to the RISF is Argonaute, the nuclease component of the RISC, which is a complex responsible for gene silencing by small‐interfering RNAs (Mello & Conte, 2004). The RISC also contains a 20–24 nucleotide RNA that acts to select and capture a complementary messenger RNA target, which, if the complementarity is perfect, is then cleaved by Argonaute. The first crystal structure of Argonaute revealed that one of its domains—known as the PIWI domain—contains an RNase H‐like segment (Table 1; Song et al, 2004). This structure, in combination with biochemical data, confirmed that Argonaute is the nuclease component of the RISC (Rivas et al, 2005).
UvrC is one of the crucial elements of the nucleotide‐excision DNA repair pathway in bacteria. This pathway removes a wide range of DNA lesions (Truglio et al, 2006), and the role of UvrC is to cleave the DNA on both sides of the damage so that the fragment harbouring the lesion can be removed by a helicase. Each cleavage is carried out by one of two nuclease domains in UvrC, one of which is related to homing endonucleases and the other of which was recently shown by protein crystallography to adopt an RNase H fold (Table 1; Karakas et al, 2007).
Prp8 is the largest protein component of the spliceosome and is considered to be its main regulator. Recent structures of a domain from its carboxy‐terminal region showed that it adopts the RNase H fold (Table 1; Pena et al, 2008; Yang et al, 2008). Prp8 is not a typical member of the RISF, however, as the carboxylate‐rich active site is not conserved in its RNase H‐like domain and there is no evidence of metal‐ion binding to this domain. Whether the RNase H‐like domain of Prp8 has catalytic activity remains to be seen.
The RNase H fold
In RISF members, the RNase H‐like domain is usually linked to other functional domains that are responsible for nucleic‐acid binding, protein–protein interactions or additional enzymatic activities (Fig 1B). Some RISF proteins function as dimers when required by their substrate. One such example is RuvC, which, as mentioned earlier, cleaves the two strands of the Holliday junctions. Also dimeric are the Tn5 transposase and integrases, which process the two ends of transposons and reverse‐transcribed DNA, respectively. The modes of dimerization are different; for example, RuvC dimerizes through interactions that are mediated by two helices located on the side of its central β‐sheet that is opposite to the active site (Ariyoshi et al, 1994). The Tn5 dimer contains two DNA molecules representing the transposon ends (Davies et al, 2000). Each DNA interacts with both protein molecules, and this bridging by nucleic acid is mainly responsible for dimer stabilization. Smaller contributions are made by the interactions between the C‐terminal domains of Tn5. For the Hermes transposase, a hexameric active form has been proposed based on modelling and biochemical studies, and on electron‐microscopy imaging (Hickman et al, 2005).
Although there is no detectable homology between the members of the RISF, the structure of their catalytic RNase H‐like domains is remarkably conserved (Yang & Steitz, 1995). The central and most invariant element of this domain is a five stranded β‐sheet (Fig 2). The first three strands are anti‐parallel and usually run without any insertions; the two exceptions are RNase H2, which has two α‐helices between strands 2 and 3, and Tn5, which has a short helix between strands 1 and 2. The shorter fourth and fifth strands run parallel to the first strand. In addition to the central β‐sheet, the fold contains α‐helices of variable position and arrangement. The most conserved α‐helix is located after strand 3. It is adjacent to one face of the β‐sheet and runs across it, probably stabilizing and reinforcing the central β‐sheet (Fig 2). The RNase H fold can be disrupted by the insertions of various structures, which most often occur after strand 5 and before the last catalytic residue (Figs 1A,2). In UvrC, Tn5 transposase and Argonaute, this insertion predominantly consists of β‐strands, and a large insertion in Hermes contains only α‐helices.
Catalysis and the active sites
The enzymes of the RISF catalyse two general types of reaction: the hydrolysis of the phosphate group of a nucleic acid that leads to the formation of products containing 5′‐phosphate and 3′‐OH groups; and strand‐transfer reactions in which a 3′‐OH group of one DNA molecule attacks a phosphate group of another DNA molecule to join the two (Fig 1A). Stereochemical studies show that reactions catalysed by RISF members occur by a one‐step SN2‐like mechanism that includes the generation of a pentacovalent intermediate and the inversion of the phosphate stereo configuration (Kennedy et al, 2000; Krakowiak et al, 2002).
Divalent metal ions are essential for catalysis; the preferred ion of RISF enzymes is Mg2+, but Mn2+ also supports catalysis. Ca2+ inhibits hydrolysis, but can support strand transfer in transposition (Savilahti et al, 1995). The crystal structures solved in the presence of nucleic acid—Tn5 (Davies et al, 2000) and RNase H1 enzymes (Nowotny et al, 2005, 2007)—point to the involvement of two ions and a general two‐metal ion mechanism (Steitz & Steitz, 1993; Yang et al, 2006). In this mechanism, metal ions are located on two sides of the scissile phosphate: the A‐site Mg2+ coordinates, positions and activates the nucleophile; and the B‐site Mg2+ stabilizes the transition state and the leaving group (Fig 3). The active sites of RISF members are composed predominantly of negatively charged carboxylate residues that coordinate the metal ions. Two residues—invariantly aspartates, except in the case of RuvC in which one is replaced by a glutamate—are particularly important in this process and are well conserved. The position of these two residues relative to the rest of the fold is also conserved: the first is in the middle of the first β‐strand and the second is at the end of the fourth strand, adjacent to strand 1 (Figs 1B,2). The first carboxylate is the only one that coordinates directly both metal ions and is located at the heart of the active site. The second residue coordinates metal ion B, and in some structures metal ion A, through a water molecule (Fig 3; Nowotny et al, 2005).
For most RISF proteins, mutations in the active site do not inhibit substrate binding, and in some cases they can even enhance it (Nowotny et al, 2005). However, even conservative mutations of either of the two crucial carboxylates to amides render RISF enzymes inactive (Chapados et al, 2001; Ichiyanagi et al, 1998; Kanaya, 1998; Peterson & Reznikoff, 2003). For Argonaute, only aspartate to alanine mutations have been reported, and they completely abolished catalytic activity (Rivas et al, 2005). These mutational data further emphasize the essential role of the two carboxylates in catalysis.
The third residue of the active site—the most C‐terminal of the three—coordinates metal ion A. This residue is comparatively variable and can be an aspartate (in RNases H and RuvC), a glutamate (in Tn5, Hermes and integrases) or a histidine (in UvrC). In many Argonautes this residue is a histidine, but there are also active Argonautes with a lysine or aspartate in this position ( Joshua‐Tor, 2006; Wang et al, 2008). In the RISF, this residue is always located in a less conserved part of the core structure, after strand 5 of the central β‐sheet. This region usually forms an α‐helix that is adjacent to the β‐sheet and runs roughly from strand 4 towards strand 3 (Fig 2). However, in RuvC, the last residue is located in a helix that runs in the opposite direction, and in RNase H2 enzymes it is located in a loop before the C‐terminal α‐helix. This last active‐site residue is also relatively tolerant to mutation; for example, in RNase H1 enzymes it can be replaced by asparagine or histidine without significant loss of activity (Kanaya, 1998), and in RNase H2 an aspartic acid to asparagine substitution leads to only a partial loss of activity of the enzyme (Chapados et al, 2001).
In RNases H, there is an additional active‐site residue, which in type 1 enzymes is a glutamate that coordinates metal ion B and forms contacts with the 2′‐OH group of the ribonucleotide adjacent to the scissile phosphate (Fig 3A). It probably acts as an additional specificity check, relaying the information about the presence of the 2′‐OH in the substrate to the active site (Nowotny et al, 2005). A similarly positioned glutamate is also present in RNase H2 enzymes; however, it comes from a different part of the fold and whether it has a similar function in the recognition of 2′‐OH groups remains to be seen. Nevertheless, this glutamate is essential for the activity of both types of RNase H (Chapados et al, 2001; Kanaya, 1998).
Substrate binding and catalysis
The structures of two RISF members in complex with nucleic acids have been described. The first structure to be solved was that of the Tn5 transposase with DNA forming a nicked hairpin, which represents the complex formed after the complete cleavage of the transposon from the flanking DNA (Davies et al, 2000). The other known structures are those of bacterial and human RNase H1 enzymes bound to RNA/DNA hybrids (Nowotny et al, 2005; Nowotny et al, 2007). In all cases, the nucleic acids were seen to interact with the active site; however, the positions of the nucleic acids relative to the RNase H fold are different (Fig 4). The Tn5 structures have been reviewed elsewhere (Steiniger‐White et al, 2004); therefore, I focus here on the RNase H1 enzyme structures.
Both bacterial and human RNase H1 enzymes contain two grooves on their surface (Fig 4A,B), each of which accommodates one strand of the RNA/DNA hybrid. The RNA‐strand groove contains the active site, and the RNA is specifically recognized by interactions with 2′‐OH groups. The key to DNA recognition is the conformation of the nucleic acid: one of the phosphate groups of the non‐cleaved strand binds to a tight phosphate‐binding pocket on the surface of the protein, and this interaction requires the nucleic acid to adopt a B‐form conformation. As only DNA can adopt such a conformation, the non‐cleaved strand must be DNA. In human RNase H1 enzymes, there is an additional element, known as basic protrusion, which introduces further deformations to the DNA strand and can accommodate only 2′‐deoxynucleotides, which leads to more stringent discrimination against RNA.
Two metal ions have been observed at the active sites of both bacterial and human RNase H1 enzymes (Fig 3A), which are coordinated not only by carboxylates from the active site but also by the backbone of the RNA. The presence of the correct nucleic acid is essential for the binding of both metal ions, which is probably the reason why often only one metal ion has been observed in the apo structures. This interdependence of metal ion and nucleic‐acid binding ensures that the catalysis occurs only when the correct substrate is bound, thereby enhancing the specificity of the enzyme. In Bacillus halodurans RNase H1 structures—solved at high resolution—the metal ion A coordinates a water molecule that is clearly positioned as the nucleophile attacking the phosphorus of the scissile phosphate (Fig 3A). Such a configuration of the active site indicates that the catalysis occurs through a two‐metal ion mechanism. Subsequent structures of B. halodurans RNase H1–RNA/DNA complexes allowed the reconstruction of the course of the catalytic reaction. The structure in complex with a non‐phosphorylated nick at the active site was used to model the transition state, and the structure with a 5′‐phosphorylated nick at the active site showed its configuration after the completion of the reaction (Fig 3B,C; Nowotny & Yang, 2006). On the basis of these structures, it has been proposed that the two metal ions move closer together to promote formation of the transition state. After the completion of the hydrolysis, the 5′‐phosphate is displaced from the active site.
The conservation of the RNase H1 active site and its geometry makes it possible to formulate predictions for the catalytic mechanism of other members of the RISF. A comparison of RNase H1 and Tn5 complex structures reveals that the same fold and two‐metal ion catalysis can be used to carry out a multi‐step reaction of DNA transposition (Nowotny et al, 2005). When the two structures are superimposed, the metal ions occupy similar positions, and the nucleophilic water in the RNase H1 structures can almost be superimposed with the 3′‐OH at the transposon end, which is the nucleophile for the strand‐transfer reaction (Fig 4D–F). This 3′‐OH group is generated by the attack of a water molecule on the phosphate from the opposite side of the 3′‐OH. It has been proposed that the symmetrically coordinated metal ions in the Tn5 transposase can alternately activate a water molecule and the 3′‐OH in successive chemical reactions, and that the 3′‐OH at the transposon end remains coordinated to the same metal ion throughout the course of transposition (Fig 1A; Nowotny et al, 2005). In the last step of the reaction, the 3′‐OH attacks the target DNA. These analyses show that the findings from one RISF enzyme can potentially be used to illuminate the details of the reaction catalysed by another enzyme.
The molecular details of substrate binding by RISF proteins are more difficult to predict because nucleic‐acid binding often triggers conformational changes of the protein. For example, the domains of Argonautes—in particular the MID and PAZ domains—are relatively mobile (Wang et al, 2008). Superposition of RNase H1–RNA/DNA complexes with Argonaute proteins confirms that the nucleic‐acid duplex should be placed in the central cavity of Argonaute to interact with the RNase H‐like active site. However, in the model, the nucleic acid clashes with other domains, and in the actual complex these clashes are probably alleviated by the movements of individual domains of Argonaute.
The RISF is fascinating and has diverse functions in nucleic‐acid metabolism. The structural studies of these enzymes provide an unparalleled insight into the molecular details of their mechanism of action. One clear goal for the near future is to obtain additional crystal structures of RISF members in complex with the nucleic acids that they act upon (Sidebar A). Together with biochemical data, these structures will provide mechanistic insights into the important biological processes that depend on members of this family of enzymes.
Sidebar A | In need of answers
Structures in complex with nucleic acid are not available for most enzymes of the retroviral integrase superfamily. These structures would help to answer the following questions: How is the target DNA captured by integrases and transposases? How does the strand transfer occur? Which protein conformational changes accompany this process? For the crossover junction endodeoxyribonuclease RuvC, the details of sequence‐specific Holliday junction recognition need to be elucidated.
New members of the retroviral integrase superfamily need to be identified through sequence alignment and structural studies. In particular, the structures of the RAG1 and endonuclease V proteins would resolve whether they belong to the retroviral integrase superfamily.
I apologize to all those whose work has not been cited owing to the space constraints. I thank W. Yang and J. Bujnicki for critical reading of the manuscript. This work was supported by a European Molecular Biology Organization Installation Grant and a Wellcome Trust Senior Research Fellowship.
- aspartate, aspartate, glutamate catalytic triad
- human immunodeficiency virus
- middle domain (in Argonautes)
- Piwi/Argonaute/Zwille (protein domain)
- protein domain homologous to piwi proteins (encoded by the ‘P‐element induced wimpy testis’ class of genes in Drosophila)
- precursor RNA‐processing protein 8
- recombination‐activating gene 1
- RNA‐induced silencing complex
- bimolecular nucleophilic substitution
- transposon 3
- transposon 5
- variable (V), diversity (D) and joining (J) immunoglobulin and T‐cell receptor gene segments
- Copyright © 2009 European Molecular Biology Organization