The CBF3 complex is an essential core component of the budding yeast kinetochore and is required for the centromeric localization of all other kinetochore proteins. We determined the crystal structure of a large section of the protein Cep3 from CBF3, which is the only component with obvious DNA‐binding motifs. The protein adopts a roughly bilobal shape, with an extended dimerization interface. The dimer has a large central channel that is sufficient to accommodate duplex B‐form DNA. The zinc‐finger domains emerge at the edges of the channel, and could bind to the DNA in a pseudo‐symmetrical manner at degenerate half‐sites in the centromeric sequence. We propose a mechanism for the modulation of DNA affinity by an acidic activator domain, which could be applicable to a wider family of transcription factors.
The kinetochore is a large proteinaceous complex that is responsible for linking chromatids to spindle microtubules during mitosis (Westermann et al, 2007). It transmits force from the mitotic spindle to the chromatids, generates motion through associated motor proteins and creates a checkpoint signal to prevent premature entry into anaphase. The kinetochore shows a high degree of functional conservation between species, although there is considerable sequence variability at the protein level (Meraldi et al, 2006). More than 65 proteins have been associated with the yeast kinetochore, of which many have been shown to form relatively stable sub‐complexes (De Wulf et al, 2003). These associate in a hierarchical and tightly controlled manner on a specialized region of the chromosome known as the centromere. Despite recent advances (Wei et al, 2005) there are still relatively little high‐resolution structural data available for kinetochore proteins. One area of interest is how the inner layer of the kinetochore binds to the appropriate centromeric DNA sequence. In budding yeast, the minimal essential DNA (CEN) required for kinetochore formation is only approximately 125 bp in length and has a defined sequence, which is both necessary and sufficient for kinetochore assembly (Clarke & Carbon, 1980). The centromere consists of three elements: CDEI, CDEII and CDEIII (Fitzgerald‐Hayes et al, 1982; Fig 1A). The central CDEII element (78–86 bp) has a degenerate A/T‐rich sequence, whereas CDEI (8 bp) and CDEIII (25 bp) are highly conserved. Deletions in either CDEI or CDEII degrade the fidelity of chromosome segregation but do not prevent it (Carbon & Clarke, 1984); by contrast, deletion of CDEIII entirely abolishes centromere activity (Ng & Carbon, 1987). The sequence of the CDEIII element shows partial dyad symmetry and contains a totally conserved CCG motif, in which point mutations cause a complete loss of CEN function (Jehn et al, 1991). Various proteins have been localized to the different CDE elements. The CDEIII element is bound by the CBF3 complex (Lechner & Carbon, 1991). This contains four proteins: Cep3 (Lechner, 1994; Strunnikov et al, 1995), Ctf13 (Doheny et al, 1993), Ndc10 (Goh & Kilmartin, 1993; Jiang et al, 1993) and Skp1 (Connelly & Hieter, 1996; Stemmann & Lechner, 1996). All four proteins are essential and formation of the complex is necessary for recruitment of all other kinetochore components (Sorger et al, 1994; He et al, 2001).
Biochemical (Espelin et al, 1997) and atomic force microscopy (AFM; Pietrasanta et al, 1999) studies suggest that the functional CBF3 complex has a stoichiometry of 2:2:1:1 Cep3:Ndc10:Ctf13:Skp1, giving an overall molecular mass of approximately 450 kDa (Fig 1B). AFM images show that the complex adopts a dumb‐bell shape on DNA, with the binding causing bending of the DNA. In common with higher eukaryotic centromeres, the presence of nucleosomes containing the histone H3 derivative Cse4CENP−A is a conserved feature in budding yeast. In this regard, the exact role of CBF3 remains unclear. It is possible that it represents the basic DNA binding element, on which the rest of the kinetochore assembles (Meluh et al, 1998); alternatively it's main function might be to correctly position the Cse4‐containing nucleosomes in an ordered array on which other, as yet unknown factors might then assemble (McAinsh et al, 2003).
The only kinetochore protein with a recognisable sequence‐specific DNA‐binding motif in CBF3, and indeed the whole budding yeast kinetochore, is Cep3. The short amino‐terminal domain of this 71 kDa protein shows a high level of sequence identity to the Zn2Cys6 family of transcription factors (Schjerling & Holmberg, 1996; Fig 1C,D). This family of proteins includes a large number of transcriptional regulatory proteins of which Gal4 is the best‐known example. Although these proteins bind to differing recognition sites, a characteristic of their cognate DNA recognition sequence is a CGG/CCG triplet, which is in contact with the protein in the major groove. This triplet is usually present in a pair, which might be in a tandem or inverted configuration, and might be separated by up to 18 bp. Commonly, each monomer of the transcription factor binds to one DNA half‐site, thus the dyad is bound by a dimer, which might or might not be symmetrical (King et al, 1999). The DNA‐binding domains of the GAL4 family contain a binuclear zinc cluster that is linked to the rest of the protein by a flexible linker. Sequence analysis (Schjerling & Holmberg, 1996) and structural data (Marmorstein et al, 1992) have shown that the linker is usually directly adjacent to a dimerization element, of which all examples so far are coiled‐coils. Many members of the family also have a highly acidic carboxy‐terminal patch, which is involved in transcriptional activation.
To better understand the structure and assembly of the CBF3 complex, and how the budding yeast kinetochore binds to DNA, we determined the crystal structure of a truncated version of the Cep3 protein (Cep3Δ). Attempts to crystallise the full‐length protein were unsuccessful, presumably owing to the highly mobile nature of the zinc‐cluster domains and linkers. Therefore, we designed a truncation starting at residue 48 in the proposed linker region, using limited proteolysis data (Russell et al, 1999). The N‐terminal of the truncated protein partly overlaps the homologous region of structurally characterised Gall4 binuclear zinc cluster, which allows an approximate mapping of the DNA‐binding domains.
Results And Discussion
The crystal structure was solved using a combination of single‐wavelength anomalous diffraction (SAD) from a selenomethionine‐substituted protein, together with heavy‐atom derivatives at a resolution of 2.5 Å. The refined structure shows good electron density for most of the structure with the exception of two disordered loops.
The overall structure of the protein is predominantly α‐helical with dimensions of approximately 60 × 70 × 30 Å that comprise a roughly bilobal shape (Fig 2A). The monomer is divided into two domains, with domain 1 (residues 48–335) consisting of a helical barrel surrounding a central three‐helix bundle, and domain 2 (residues 336–608) forming a larger, more open surface.
We compared the fold of the intact structure and individual domains to the protein databank. No extended regions of similarity to any other protein were seen, however small areas of structural homology were found with a wide variety of proteins containing helical bundles. The largest single section of homology (R.M.S.D of 3.7 Å over 135 residues) was seen between domain 2 and HEAT repeat domains (Fig 2B). HEAT repeats are often involved in protein–protein interaction surfaces (Andrade & Bork, 1995) suggesting that this domain is involved in contacts within the CBF3 complex.
The structure gives us a new insight into the multimeric state of Cep3. Hydrodynamic data (Russell et al, 1999) and considerations based on the overall mass of the complex as determined by AFM (Pietrasanta et al, 1999) suggest that CBF3 contains two copies of Cep3. Many other members of the GAL4 family of transcription factors form functional dimers. However, the peculiar fact that there is only one CCG half site in the CDEIII DNA consensus sequence argues that a homodimer might not be the actual state in the CBF3 complex (Lechner, 1994), and that only a single zinc‐finger domain binds to a single triplet, as seen in the zinc‐cluster repressor, ARGR2 (De Rijcke et al, 1992). Conversely, mutational studies (Jehn et al, 1991) have shown that a second site in the CDEIII sequence, 5′ of the core CCG triplet, has a significant effect on the rates of chromosome transmission fidelity.
Gel filtration experiments with both the full length and truncated protein clearly show a stable dimer in solution (Fig 2C). In the crystal structure, a symmetrical dimer is present around a crystallographic twofold axis, primarily involving domain 1 (Fig 3A). This is in contrast to the classic coiled‐coil dimerization interface found in members of the GAL4 family (Marmorstein et al, 1992) and does not involve the heptad‐repeat sequence located in the centre of the protein (Lechner, 1994; Strunnikov et al, 1995). The buried surface area of the Cep3 dimer interface is approximately 3240 Å2, which is indicative of a biologically relevant interface (Jones & Thornton, 1996). We suggest that the active form of the protein is indeed a homodimer, with the requisite asymmetry only present at the level of the protein–DNA contacts. Precedents for this are seen in structures such as the E47 transcription factor (Ellenberger et al, 1994) in which a symmetrical dimer binds to an asymmetrical recognition sequence.
The dimer has a crescent shape with the concave surface forming a groove with an approximate diameter of 30 Å. The N‐termini of each monomer exit the structure at the opposite corners of this groove. Despite a short region of sequence that overlaps with the structure of Gal4 (residues 48–60), this region is highly mobile so we were unable to model exactly the location of the zinc cluster using direct superimposition. It is likely that the domains have considerable freedom of movement, but their global position must be constrained by the length of the linkers.
A model for Cep3–DNA interactions
In members of the GAL4 family with known structures, dimerization occurs through a coiled‐coil interaction, immediately after the zinc‐cluster domains and linkers (Marmorstein et al, 1992). The termini of the coiled‐coils lie in the centre of the dyad site, and the linkers run outwards towards the half‐sites. By contrast, in Cep3Δ, the N‐termini of the protein are at opposite corners of the dimer and the linker regions must run inwards to ensure that the zinc clusters can reach the appropriate DNA half‐site. On the basis of the shape of the protein, surface electrostatics (Fig 3B) and constraints on the linker length, we propose that the DNA lies along the central channel of the dimer as shown in Fig 4A. This would allow zinc clusters to reach their cognate DNA sites and bind in the major groove in the same way as Gal4. The centre of the pseudo‐dyad would lie exposed at the centre of the protein, and would therefore be able to make contact with other components of CBF3. Although our model is by nature approximate, it is consistent with that proposed on the basis of cross‐linking studies (Espelin et al, 1997), which show that Cep3 makes contact with CDEIII at the canonical CCG triplet and a second TGT triplet, 12 bp CDEII‐proximal, whereas Ctf13 is shown to make contact with the centre of the pseudo‐dyad. This model suggests that the main interaction between Cep3 and DNA occurs through the zinc‐cluster domains. To test this, we carried out DNA‐binding studies with full‐length Cep3 and Cep3Δ (Fig 4D,E). Electrophoretic mobility‐shift assays (EMSAs) show that full‐length Cep3 is able to shift a native CDEIII sequence, whereas Cep3Δ shows no detectable binding, confirming that the zinc clusters are the main DNA‐binding sites. The binding might be successfully out competed by the native sequence, but mutations in the CCG motif (site 1) are unable to do so. However, mutations in the TGT motif (site 2) are still able to compete against the native sequence showing that site 1 is the main contributor to the protein–DNA affinity. This is consistent with genetic data (Jehn et al, 1991) showing that mutations in site 2 have a far lower rate of chromosome loss than those in the core CCG triplet.
The conserved nature of the recognition triplet for the GAL4 family means that recognition of the zinc clusters alone does not confer specificity for a particular DNA sequence. Instead, this is achieved by variation in the spacing between the half‐sites (Marmorstein et al, 1992), as specified by the linker section of the protein (Reece & Ptashne, 1993). Assuming that the TGT triplet at –12 relative to the CCG represents the second binding site, a similar mode of recognition could occur in Cep3. The ‘outside‐in’ orientation of the linkers in our model would confer specificity for the half‐site spacing, and they might also make direct contacts with the DNA outside the triplet pair. The location of the binding sites relative to the DNA place the entire Cep3 dimer within the CDEIII element, as predicted from footprinting results of the intact CBF3 complex (Lechner & Carbon, 1991). The relatively weak electrostatic interaction between the dimer channel and the DNA backbone suggests that there is very little non‐specific DNA affinity, as the binding experiments with the truncated protein confirm.
Protein–protein interactions and CBF3 assembly
We have analysed the structure of the dimer to look for potential protein–protein interaction surfaces. Previous work (Espelin et al, 1997) proposes that Cep3, Ndc10 and Ctf13 make direct contact with the CEN DNA. One model for the assembly of the CBF3 complex (Russell et al, 1999) requires that an initial association of Ctf13 and Skp1 occur, followed by binding to the Cep3 dimer, and subsequently Ndc10 and centromeric DNA. The interaction between Cep3 and Ctf13 seems to be essential to stabilize Ctf13, which is otherwise rapidly degraded. Previous work (Espelin et al, 1997) has shown that Ctf13 makes contact with the base exactly halfway between the two half‐sites on the bottom strand. If this interaction were to occur in our model, Ctf13 would make contact with the opposite face of the duplex to Cep3, and so totally encircle the DNA (Fig 4B). This toroidal architecture would suggest an extremely low off‐rate for the complex, and indeed measurements on CBF3 (Espelin et al, 1997) show a half‐life on DNA of approximately 2 h. A simple model for the assembly of the complex would involve an initial binding of the Cep3 dimer to CDEIII, which is then stabilized by subsequent binding of Ctf13/Skp1 and Ndc10. Our biochemical data show that the binding of Cep3 alone is relatively weak, but we would expect this to be enhanced in the intact complex.
Although this topological interface would be extremely stable, it seems likely to be the sole connection between the spindle and chromosome. It seems likely that there are additional interactions between the kinetochore and chromatin, probably mediated by Cse4‐containing nucleosomes (McAinsh et al, 2003).
Acidic activation domain
The GAL4 family often contain short sequences enriched in acidic residues near the C‐terminus (Schjerling & Holmberg, 1996). The function of these domains is to modulate transcription, which is achieved by the recruitment of other proteins (Ma & Ptashne, 1987). Cep3 also has a strongly acidic patch near the C‐terminus, the location and overall charge of which are similar to other members of the GAL4 family (Fig 1C).
The structure of the acidic patch in transcription factors remains unknown. The motif has been described as an amphipathic α‐helix (Giniger & Ptashne, 1987), a β‐strand (Van Hoy et al, 1993) or totally disordered (Sigler, 1988), although it is possible that the particular structure of the motif might be protein or context specific. One difficulty with studying this has been the lack of structural information for full‐length transcription factors. This is probably due to the high degree of disorder found in these proteins (Liu et al, 2006), which reflects their diverse regulatory roles. The acidic domain in Cep3 has been shown to be essential for CBF3 activity (Lechner, 1994). Analysis of the Cep3 structure shows that this motif forms a disordered loop between residues 570 and 587, the base of which is directly adjacent to the N‐terminal of the protein, where the linker and zinc cluster emerge (Fig 5A). Therefore, it is possible for a direct interaction to occur between the acidic domain and the DNA‐binding elements. It seems likely that this loop becomes ordered on binding by another protein, for example one of the other members of CBF3. The ordering of this loop could then directly affect the conformation of the zinc‐cluster domain, with associated modulation of the DNA affinity (Fig 5B). We suggest that this mechanism could apply more generally to transcription factors with an acidic domain and that, in addition to a recruitment role, the motif might participate in intra‐molecular interactions.
Our data provide a new insight into the mechanism of DNA‐binding by the Cep3 protein and suggest how the intact CBF3 complex might form. The homology to transcription factors provides an unexpected insight into the role of acidic activator domains, which might be of wider relevance. Structural data from the intact CBF3–CEN complex would undoubtedly provide further surprises.
The structure of recombinant Cep3Δ was solved using SAD and multiple isomorphous replacement (MIR) methods on selenomethionine‐substituted protein combined with heavy atom data from mercury and silver derivatives. The final model of Cep3Δ contains 4,254 protein atoms, 73 water molecules, 1 β‐mercaptoethanol and 1 cacodylate molecule and has an R‐factor of 22.0 (Rfree 24.6%). DNA‐binding assays were performed using the EMSA Accessory Kit (Novagen, Darmstadt, Germany) with a FAM6 fluorescently labelled CDEIII dsDNA (33 base pairs) probe to observe DNA‐protein binding in the presence of Cep31–608, Cep3Δ and unlabelled competitor DNA. Full details of the structural determination and DNA‐binding assays are given in supplementary information online. The coordinates and structure factors have been submitted to the Protein Data Bank (PDB code 2veq).
Supplementary information is available at EMBO reports online (http://www.emboreports.org).
We thank N. McDonald, H. Walden and D. Wigley for useful discussions. This work was funded by Cancer Research UK.
This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
- Copyright © 2008 European Molecular Biology Organization