The RBMY (RNA‐binding motif gene on Y chromosome) protein encoded by the human Y chromosome is important for normal sperm development. Although its precise molecular RNA targets are unknown at present, it is suggested that human RBMY (hRBMY) participates in splicing in the testis. Using systematic evolution of ligands by exponential enrichment, we found that RNA stem–loops capped by a CA/UCAA pentaloop are high‐affinity binding targets for hRBMY. Subsequent nuclear magnetic resonance structural determination of the hRBMY RNA recognition motif (RRM) in complex with a high‐affinity target showed two distinct modes of RNA recognition. First, the RRM β‐sheet surface binds to the RNA loop in a sequence‐specific fashion. Second, the β2–β3 loop of the hRBMY inserts into the major groove of the RNA stem. The first binding mode might be conserved in the paralogous protein heterogeneous nuclear RNP G, whereas the second mode of binding is found only in hRBMY. This structural difference could be at the origin of the function of RBMY in spermatogenesis.
Testes have a highly specialized gene expression program, which is necessary to carry out the complex differentiation of spermatogonia into mature spermatozoa. In particular, alternative splicing of specific pre‐mRNAs is prevalent in testis, although little is known about how these events are regulated at the molecular level (Venables, 2002). Among the potential specific regulators of alternative splicing in testis, the family of human RBMY (hRBMY; RNA‐binding motif gene on Y chromosome) genes was identified as a candidate for the azoospermia factor (AZF; Elliott, 2004). Human RBMY is expressed specifically in the nuclei of adult male germ cells throughout all transcriptionally active stages of spermatogenesis, and deletion of the functional copies of RBMY is associated with an arrest of meiotic division I during spermatogenesis (Elliott et al, 1997). Human RBMY has an amino‐terminal RNA recognition motif (RRM) and a carboxy‐terminal domain composed of four repetitions of a Ser‐Arg‐Gly‐Tyr tetrapeptide motif (SRGY box; Ma et al, 1993). RBMY is found on the Y chromosome of all mammals (Mahadevaiah et al, 1998). The mouse RBMY (mRBMY) contains an RRM with 74% similarity to hRBMY, followed by only one SRGY box. RBMY‐deficient mice do not show the same phenotype as in humans; they have abnormal sperm development but are not sterile (Mahadevaiah et al, 1998).
Human and mouse RBMY have a chromosome X‐located paralogue (RBMX), which encodes the widely expressed heterogeneous nuclear ribonucleoprotein (RNP) G (Delbridge et al, 1999). Human hnRNP G contains an N‐terminal RRM with 88% similarity to hRBMY, followed by only one SRGY box. There is also a third human retrogene (hnRNP G‐T) that belongs to the same family, which is expressed only in the testis. hnRNP G‐T contains an N‐terminal RRM with 84% similarity to hRBMY but no SRGY box (Elliott et al, 2000b).
Although nothing is known about the RNA targets of these proteins, several studies suggest a role in the regulation of RNA processing (Venables et al, 1999, 2000; Elliott et al, 2000a). Human RBMY was shown to interact with Sam68 and the closely related T‐STAR protein (signal transduction and RNA binding), which are considered to be molecular transducers between cell signalling and splicing regulation (Elliott, 2004). Furthermore, hRBMY can affect splicing through its interaction with SRp20 or Tra2β (Elliott et al, 2000a, Venables et al, 2000), which belong to the family of SR‐rich pre‐mRNA splicing regulators (Bourgeois et al, 2004).
To understand better the biological function of this family of proteins, we carried out systematic evolution of the ligand by exponential enrichment (SELEX) with hRBMY RRM. We found that the RRM binds with high specificity to stem–loop RNAs containing a CA/UCAA consensus sequence in the loop. We determined the solution structure of hRBMY RRM in complex with one of its RNA targets. The structure shows that the hRBMY RRM not only recognizes the loop in a sequence‐specific manner, but also the shape of the RNA as the β2–β3 loop of the RRM is inserted into the major groove of the RNA stem.
Identification of human RBMY RNA targets by SELEX
Human RBMY contains an RRM with unknown properties; therefore, we used the SELEX approach to determine its RNA‐binding specificity (Cavaloc et al, 1999). The glutathione S‐transferase (GST) fusion protein included the 108 N‐terminal residues of hRBMY containing the RRM. After six cycles of selection, we obtained fully coherent results as the same consensus motif was obtained with two distinct 20‐nucleotide (nt) randomized matrices (supplementary Fig S1 online). Interestingly, the selected motifs (Fig 1) consist of a hairpin, including an invariable pentaloop (CA/UCAA) and a stem of variable length (between 4 and 11 perfect base pairs) in which a C‐G base‐pair is predominantly adjacent to the loop. Altogether, 51% of the selected clones had a GUC–loop–GAY structure. However, the selection of other stem sequences indicates that hRBMY can recognize stems with different base pair compositions. Electrophoretic mobility shift assays (EMSA) confirmed that the interaction occurs with high affinity (Kd≈10−9 M; see below).
Structure of RBMY RRM in complex with S1A RNA
To understand the molecular basis of the recognition by hRBMY RRM for the selected RNA stem–loops, we investigated the structure of one complex using nuclear magnetic resonance (NMR; supplementary Fig S2 online). The chosen RNA (‘S1A’) was derived from the S1 sequence. S1A is 21 nt‐long and contains eight base pairs capped by a CACAA loop (supplementary Fig S2A online). The stem contains the most common SELEX sequence, GUC–loop–GAC.
The structure of the complex was determined with very high precision because a structure calculation was used with a very high number of nuclear Overhauser effect (NOE) constraints (1,879 including 124 intermolecular), supplemented by 54 angle constraints from residual dipolar couplings in both the RNA and the protein (supplementary Table SI online; Fig 2A). Human RBMY RRM adopts the expected βαββαβ topology, with two α‐helices packed against one side of the four‐stranded β‐sheet (Maris et al, 2005).
Human RBMY RRM specifically recognizes a CAA triplet
C11, A12 and A13 (nucleotides are written in italics to distinguish them from amino acids) protrude from the loop and are spread on the RRM β‐sheet surface (Fig 2B). The three nucleotides are stabilized by contact with the protein main chain and several side chains but not by intra‐RNA interactions (Fig 3). A12 adopts an unusual syn conformation and the sugar puckers of C11 and A13 are both C2′‐endo. There are specific contacts with A12 and A13, which discriminate for adenines at these positions. A13 is recognized by K84 main‐chain oxygen and by K9 (β1) and L38 (β2) side chains (Fig 3C). Similarly, A12 is specifically recognized by the K84 main‐chain amide (Fig 3B), the Q82 (β4) main‐chain oxygen and the E81 (β4) side chain (K84 amide experiences a large chemical shift change on complex formation; supplementary Fig S2B online). C11 is recognized by a contact with the K79 (β4) side chain (Fig 3A). From the structure, it appears that a U could also be accommodated at this position.
The β2−β3 loop inserts into the RNA major groove
An unexpected structural feature of the complex is the interaction of the RRM β2–β3 loop with the major groove of the RNA helix from the second base pair to the last base pair of the stem (Figs 2C,D, 4A). The seven amino acids of the β2–β3 loop (D42–R48) form a β‐hairpin that inserts itself into the deep major groove of the RNA helix. Most of the intermolecular interactions are non‐sequence‐specific in nature and involve side‐chain and main‐chain contact with phosphate groups of the RNA (Fig 4A). All five side chains from D42 to D46 are involved in the recognition of the RNA major groove. The R43 side chain forms salt bridges with A3 and C4 phosphate, whereas its main‐chain amide forms contacts with A15 phosphate on the opposite strand of the helix. The K46 side chain forms a salt bridge with C16 phosphate (Fig 4A). The R43 and K46 side chains, located on opposite strands of the β2–β3 loop, cross each other to contact on the phosphate oxygen with the different strands of the helix (Fig 4B). Contacts by T44 to G2 phosphate, and possibly by S45, further stabilize the interaction. Finally, the side chain of D42 forms two hydrogen bonds, one with the main‐chain amide of R48 and the other with A15 amino (Fig 4A).
RBMY RRM stabilizes the 5′ end of the pentaloop
C9 and A10 extend the RNA helical stack, as C9 stacks over C8, and A10 over C9. In addition, C9 forms contacts with C8 phosphate (Fig 4C) and the R17 side chain (β1–α1 loop) stacks over A10 and forms contacts with C11 phosphate. C11 phosphate is also in contact with K76 (α2–β4 loop, Fig 4C). Human RBMY binding selectivity for C9 and A10 seems to be indirect; C9 might be preferred to other nucleotides to prevent the formation of a base pair with A13 in the free RNA and A10 might be preferred to a G as a G amino would clash sterically with the protein backbone. Finally, R48 further stabilizes the complex by forming contacts with all three RNA elements (Fig 4C): the stem (G14), the CAA triplet (phosphate of A12), and C9 or A10.
Mutagenesis studies confirm the recognition mode
We carried out EMSA experiments using RNA sequences that are representative of the two sets of sequences identified by SELEX. Sequences 2, 4 and 6 from set I (data not shown), S1 and S2 from set II, as well as S1A, all showed efficient binding with the GST‐fused hRBMY RRM (Fig 5A). The apparent Kd was between 0.6 and 0.9 nM for S1 and S2, respectively. Mutating A12 or A13 to a G resulted in a complete loss of binding, showing that the identity of these residues is crucial for high affinity (Fig 5B). By contrast, the replacement of C11 by a U (C11U) was better tolerated as it resulted in an approximately fourfold decrease in affinity. This agrees well with the structure, as the identity of C11 is not recognized as strongly as that of A12 or A13. Changing the size of the pentaloop also has a strong effect, as the removal of the A/U residue (A10) resulted in very weak binding. Similarly, the insertion of an additional A (+A13) resulted in a complete loss of binding (Fig 5B). These experiments confirm that a loop size of five nucleotides is optimal for hRBMY binding. Finally, we tested whether the hydrogen bond between D42 and A15 was crucial by replacing U7‐A15 by a C7‐G15 base pair (Fig 5B). Surprisingly, this change in the sequence did not alter the affinity. It is possible that a slight rearrangement could take place, allowing D42 to interact with C7 in the mutant RNA instead of A15 in the wild type. This mutation indicates that the contact mediated by D42 is not sequence specific. Thus, the interactions mediated by the β2–β3 loop would constitute a shape‐specific recognition of the RNA major groove.
The structure shows that the β2–β3 loop fits into the major groove of the RNA, and that there are steric constraints associated with this insertion. Interestingly, the RRM of the mRBMY—as well as those of the human hnRNP G and G‐T—shows two changes in the β2–β3 loop compared with that of hRBMY (Fig 1A): the three RRMs contain an additional E between R43 and T44, and S45 of hRBMY is replaced by K in mRBMY or N in hnRNP G and G‐T. To study the role of these residues, two hRBMY RRM mutants (mut1 and mut2; Fig 1A) were made and their affinity was compared with the wild‐type RRM (Fig 5A, left panel). Strikingly, the simple insertion of an E in the β2–β3 loop resulted in more than a tenfold decrease in binding. Furthermore, when the additional S to K mutation was carried out—to make a mouse‐like β2–β3 loop—no detectable interaction was observed (Fig 5A). This confirms the crucial role of the β2–β3 loop of the hRBMY RRM for binding RNA stem–loops. We also showed that the three RRMs of mRBMY, human hnRNP G and G‐T could not efficiently bind to the S2 stem–loop (Fig 5A, right panel).
A dual mode of RNA recognition by human RBMY
Using SELEX, we have identified an unusual RNA binding consensus sequence for hRBMY RRM. The high‐affinity sites are RNA stem–loops with a CA/UCAA loop and a GUC–loop–GAY consensus in the last three base pairs of the stem (Fig 1B). The structure of the RRM complexed with a stem–loop containing a CACAA loop showed that the recognition of the RNA is both sequence‐ and shape‐specific (Stefl et al, 2005). The structure explains how C9, A12, A13 and G14 are sequence‐specifically recognized by the RRM (Figs 3, 4), confirming and explaining the SELEX consensus in the pentaloop and at the first base pair. However, from a structural point of view, it remains unclear why an A/U and a C are preferred in the second and third positions, respectively of the loop in the SELEX sequence. It is possible that other base types in these two positions would modify the accessibility of the last pentaloop triplet or induce a folding of RNA that might prevent protein binding or lower the binding affinity. More surprisingly, we found that the RNA stem is recognized by the RRM β2–β3 loop, which forms a β‐hairpin and is inserted into the major groove of the RNA helix. This recognition is shape specific, as it is the complementary shape and charge between the β2–β3 loop and the RNA major groove that dictate this intermolecular interaction (Fig 2). When this β2–β3 loop is elongated or mutated, RNA binding is weakened or lost, confirming the importance of this interaction for complex formation. These two modes of binding result from an ‘induced fit’ of one of the binding partners. In the sequence‐specific mode, the unstructured RNA pentaloop folds after binding on the rigid β‐sheet and becomes ordered. In the shape‐recognition mode, the flexible β2–β3 loop inserts into the major groove of the rigid RNA stem and becomes ordered (L.S. & F.A., unpublished data). This mode of recognition is unprecedented among RRMs, once again confirming the remarkable plasticity of this RNA recognition motif (see also the supplementary information online).
RNA recognition by other hnRNP‐G family members
We have shown that the RRMs of human hnRNP G, G‐T or mRBMY are unable to bind with high affinity to the hRBMY‐specific hairpin structure. This is in good agreement with our structural data, when considering the role of the β2–β3 loop in strengthening the interaction between hRBMY and the RNA (Figs 4, 5). However, as K9, L38 and E81 are conserved between hnRNP G and hRBMY, it is possible that hnRNP G RRM can similarly recognize CAA in a sequence‐specific manner. Interestingly, one RNA sequence identified as a potential target for hnRNP G contains a CAA triplet (Nasim et al, 2003). By contrast, the replacement of E81 in hnRNP G‐T and mRBMY (by A or K, respectively) is likely to impair binding to CAA. So far, little data are available concerning the RNA‐recognition properties of RBMY and related hnRNP proteins. A previous study indicated that hRBMY and hnRNP G are nonspecific RNA‐binding proteins (Hofmann & Wirth, 2002). We show here that this might not be the case, but further analyses are necessary to understand better the properties of all these related RRMs.
Finally, the particular RNA‐binding properties of hRBMY could have an important role in the function of the protein in human testes. The infertility caused by deletions in this RRM‐encoding gene (Elliott et al, 1997) indicate that crucial RNA processing pathways are disrupted. Protein–protein interactions and subnuclear localization experiments suggest a role for RBMY in splicing, although its precise function is still not clear. Testis is one of the tissues in which alternative splicing is largely used (Xu et al, 2002; Yeo et al, 2004), and is needed to establish the exclusive pattern of gene expression that occurs throughout the different stages of spermatogenesis (Venables, 2002). A recent study carried out with 52 different tissues and more than 10,000 genes showed that testis has the highest rate of divergence in alternative splicing events between human and mouse (Kan et al, 2005). Our findings might explain why the mRBMY seems to have a function different from that of the hRBMY (Szot et al, 2003), as both proteins might have different RNA targets. However, we cannot rule out that the natural hRBMY target sequences differ significantly from the stem–loop motifs we identified by SELEX. As a first step towards the identification of the biological RNA targets of hRBMY in vivo, we screened an alternative exon database using an algorithm based on the conservation of the stem–loop structure we characterized by SELEX (supplementary information online). This screening showed that putative hRBMY targets indeed exist within or in the vicinity of exons alternatively spliced in the testis (supplementary Table S2 online). Further work is necessary to analyse whether these RNA sequences are evolutionarily conserved in mammals and whether they are functionally relevant.
SELEX and EMSA. SELEX was carried out as described previously (Cavaloc et al, 1999), with only minor modifications using the GST‐fused RRM of hRBMY (amino acids 1–108; for details see the supplementary information online).
For EMSA, we used the GST‐fused RRMs of hRBMY, mRBMY, hnRNP G or hnRNP G‐T, as well as the two mutants of hRBMY. [32P]RNA was transcribed in vitro with T7 RNA polymerase and incubated with the different proteins as described previously (Cavaloc et al, 1999).
Cloning, expression and purification of hRBMY RRM for NMR. The N‐terminal RRM (amino acids 1–108) of the hRBMY was subcloned in pET30a+ (Invitrogen Corp., Carlsbad, CA, USA). The construction included a two‐residue linker (L‐E) between the RRM and the 6 × His tag. For 15N and 15N,13C‐labelling, Escherichia coli BL21(DE3)pLysS was used. Expression was carried out in M9 media containing [15N]NH4Cl and [13C]glucose. Human RBMY RRM was purified by Ni affinity and cation exchange chromatography. The protein solution was concentrated to 1 mM, as measured by UV spectroscopy at 205 nm.
RNA transcription and complex formation for NMR analysis. S1A RNA was prepared in vitro using T7 RNA polymerase and purified by anion exchange chromatography. RNA samples were dissolved in 25 mM NaH2PO4/NaOH at pH 7.0.
The hRBMY RRM–S1A complex was studied at 1 mM at a 1:1 ratio of protein and RNA in 25 mM NaH2PO4/NaOH and 25 mM NaCl buffer (pH 7.0; see the supplementary information online for the resonance assignment strategy).
Structure determination. In total, 124 intermolecular NOEs between the RRM and S1A were assigned. The RDCs of hRBMY RRM–S1A were obtained using the Pf1 phages as an aligning medium.
The structure determination was carried out as described previously (Oberstrass et al, 2006). The 17 final conformers with the lowest total energy or with the lowest alignment tensor energy were selected to form the final ensemble of conformers.
Structural data. All restraints used in structure determination and the 17 final structures have been deposited at the Protein Data Bank under the accession code 2FY1.
Supplementary information is available at EMBO reports online (http://www.emboreports.org).
This investigation was supported by the Swiss National Science Foundation—National Center of Competence in Research (SNF‐NCCR) Structural Biology, by the Roche Research Fund for Biology at the Eidgenössische Technische Hochschule Zürich (ETH) Zürich and by the ETH Zürich (TH‐ Fonds Nr. 0‐20960‐01) to F.H.T.A., and by grants from Inserm, Centre National de la Recherche Scientifique (CNRS), the European Union Network of Excellence on Alternative Splicing (EURASNET, 6th Framework Program) and the Association pour la Recherche sur le Cancer to J.S., and an Human Frontier Science Program (HSFP) postdoctoral fellowship to R.S. F.H.T.A. is an EMBO Young Investigator.
- Copyright © 2007 European Molecular Biology Organization