Meiotic recombination in yeast is initiated by DNA double‐strand breaks (DSBs) that occur at preferred sites, distributed along the chromosomes. These DSB sites undergo changes in chromatin structure early in meiosis, but their common features at the level of DNA sequence have not been defined until now. Alignment of 1 kb sequences flanking six well‐mapped DSBs has allowed us to define a flexible sequence motif, the CoHR profile, which predicts the great majority of meiotic DSB locations. The 50 bp profile contains a poly(A) tract in its centre and may have several gaps of unrelated sequences over a total length of up to 250 bp. The major exceptions to the correlation between CoHRs and preferred DSB sites are at telomeric regions, where DSBs do not occur. The CoHR sequence may provide the basis for understanding meiosis‐induced chromatin changes that enable DSBs to occur at defined chromosomal sites.
Meiotic recombination in the yeast Saccharomyces cerevisiae is initiated by double‐strand breaks (DSBs) (Smith and Nicolas, 1998). Distribution of DSBs along the chromosomes is not random, however, and each chromosome shows a unique pattern of meiotic DSBs (Zenvirth et al., 1992; Klein et al., 1996). Most meiotic DSBs are found in promoter regions (Baudat and Nicolas, 1997). The chromatin in these regions appears to be sensitive to DNase I (Wu and Lichten, 1994). In the major DSB regions, the chromatin becomes sensitive to micrococcal nuclease early in meiosis (Ohta et al., 1994), at a stage just before the appearance of breaks (Ohta et al., 1998). It is not clear, however, what makes a region prone to meiotic double‐strand breakage, and what characterizes the chromatin that becomes sensitive to micrococcal nuclease. Precise analysis of five meiotic DSB sites, at the level of DNA sequence (de Massy et al., 1995; Liu et al., 1995; Xu and Kleckner, 1995; Xu and Petes, 1996), has shown that individual breaks could occur in many sites over a region of 100–250 bp. Spo11p, a member of the DNA topoisomerase VI group of proteins, has been implicated as the meiotic endonuclease responsible for these breaks (Bergerat et al., 1997; Keeney et al., 1997). As for other topoisomerases (Wang, 1996), no defined recognition sites in DNA could be identified for Spo11p.
Here we identify a common DNA motif that is associated with the regions that undergo DSBs during yeast meiosis. The motif has a flexible profile that was constructed from the comparison of six well characterized DSB regions. The locations of these profiles on the three small chromosomes of S. cerevisiae agree well with the positions of previously mapped DSBs (Klein et al., 1996). Furthermore, we show that meiotic DSBs near ARG4, which were altered by deletions in this region (de Massy and Nicolas, 1993), can be explained by changes in the DNA profiles.
Construction of a DNA profile
To identify a common DNA motif that would characterize regions that undergo double‐strand breakage during yeast meiosis, we chose six well known regions with precisely mapped DSBs: between YCR47c and YCR48w on chromosome III (Goldway et al., 1993; Liu et al., 1995), the promoter of the gene CYS3 on chromosome I (de Massy et al., 1995), the promoter of ARG4 on chromosome VIII (Liu et al., 1995), the 5′‐region of ADE1 on chromosome I (Klein et al., 1996), the 5′‐region and coding sequence of HIS2 on chromosome VI (Klein et al., 1996) and 5′ of HIS4 on chromosome III (Zenvirth et al., 1992; Fan et al., 1995). From each region we extracted 1 kb of sequence, 0.5 kb from each side of the DSB. Regions of homology, common to all six DSBs, were searched for by running FastA analysis of every one of these sequences against the other five. One common motif was obtained, termed CoHR, for common homology region (Figure 1A).
The sequences of CoHR from all six DSB regions were aligned (using GCG programs PileUp and Pretty, plurality = 3), revealing a consensus sequence (Figure 1A). We searched for this consensus sequence in the full sequences of chromosomes III and VI (by program FindPatterns, allowing 10 mismatches), on which meiotic DSBs had previously been mapped experimentally (Zenvirth et al., 1992; Klein et al., 1996). The consensus sequence was found at six locations on chromosome III and five locations on chromosome VI. Only two out of the six locations on chromosome III and three out of the five locations on chromosome VI corresponded to experimentally mapped preferred DSB sites. We suspected that the poor correspondence between the consensus and real DSB sites might reflect the rigid nature of the consensus. To create a more flexible profile matrix from the aligned sequences of the CoHR, we used the ProfileMake program. This program consists of a position‐dependent scoring matrix, which gives a best likelihood estimate of the correspondence between a given sequence and a known family of related sequences. A simplified presentation of this profile matrix, without individual weights that were given to each nucleotide at given positions, is shown in Figure 1B. It is unidirectional, presented for one DNA strand from 5′ to 3′, and consists of three parts: a stretch of poly(A) (at least ten As) in the middle, flanked by two characteristic regions of ∼20 bp each.
Occurrence of CoHR profile on chromosomes I, III and VI
We examined the occurrence of the CoHR profile on the sequences of the forward and reverse strands of chromosomes I, III and VI, by running the program ProfileGap (see Methods). In general, ProfileGap finds one best match for every stretch of 32 kb. Each ProfileGap match gets a quality score according to the correspondence between the candidate and profile sequences, and the number and length of gaps that were generated during alignment. Fifteen profile locations were found on the sequence of chromosome I, 19 on chromosome III and 18 on chromosome VI. The quality scores of these profiles ranged from 11.45 to 13.3 (mean = 11.96 ± 0.075).
To see whether the distribution of these quality scores was meaningful and deviated from randomness, we reshuffled the sequences of all three tested chromosomes and ran ProfileGap again. The quality scores of profiles located on the reshuffled chromosomes ranged from 11.24 to 12.15 (mean = 11.6 ± 0.058). Comparisons of quality scores of real and reshuffled chromosome showed that they differed significantly for each of the three chromosomes [t = 3.5 (27 degrees of freedom, d.f.) for chromosome I, t = 3.4 (37 d.f.) for chromosome III and t = 5.9 (33 d.f.) for chromosome VI].
As expected from the intrinsic flexibility of the search, gaps were found in most of the profiles emerging from the real chromosome sequences. In the poly(A) stretch, however, there were usually no gaps or only a single gap. Only six out of 52 (11.5%) profiles located on the three real chromosomes showed two gaps in the poly(A) stretch. In contrast, 31 out of 51 (60.8%) of the profiles located on the reshuffled chromosomes showed more than one gap in the poly(A) stretch. In the calculation of quality scores, ProfileGap takes into account the same penalty for all gaps, disregarding their location. To improve the predictive value of the ProfileGap searches, we applied an additional penalty of 0.25 for each gap in the poly(A) stretch beyond the first one. As a result of this modification, the quality scores of profiles located on real chromosomes I, III and VI ranged from 11.43 to 13.3 (mean = 11.94 ± 0.08). The quality scores of profiles located on the reshuffled chromosomes ranged from 10.78 to 12.08 (mean = 11.41 ± 0.10). The two groups of scores differ significantly (t = 8.8, 101 d.f.).
Correspondence between CoHR profile sites and mapped meiotic DSBs
Plots of quality‐score distributions of real and reshuffled chromosomes I, III and VI partially overlap (Figure 2) and intersect at a score of 11.7. Only 9.6% of the profiles located on the real chromosomes showed quality values >11.7, and in almost all cases these did not correspond to mapped DSBs. Therefore, we regard only profiles with quality scores <11.7 as meaningful profiles that may predict DSB formation.
If two profiles with high scores are situated <32 kb apart and serendipitously fall into the same 32 kb segment, ProfileGap will display only one of these profiles, that with the higher score. To cope with this problem we subdivided each chromosome sequence into overlapping regions of 32 kb, by shifting the start point for serial partition by 5 kb repeatedly, from 5 to 30 kb. This was done on the forward and reverse strands of the sequences of chromosomes I, III and VI. The final scores (Table 1) include extra penalties for opening more than one gap in the poly(A) tract.
To align the CoHR profile locations with the sites of observed meiotic DSBs on chromosomes I, III and VI (Zenvirth et al., 1992; Klein et al., 1996; Baudat and Nicolas, 1997), we took into account some length polymorphism differences among the strains. Meiotic DSBs had been mapped on SK‐1 strains, whereas the sequences in SGD (S. cerevisiae Genome Database) correspond to another strain, S288C. High‐quality profiles were found to be associated with all 10 ‘preferred’ DSBs mapped on chromosomes I, all 13 preferred DSBs mapped on chromosome III and 10 out of 13 DSBs mapped on chromosome VI (Figure 3).
Meiotic DSBs on chromosome III were mapped in our laboratory at a resolution of at least ±5 kb (Zenvirth et al., 1992), and at a higher resolution by Baudat and Nicolas (1997). In Figure 3, we have narrowed the locations of preferred DSBs (‘diamonds’) on chromosome III, based on the study by Baudat and Nicolas (1997). Data from these studies were used to align profile locations with DSB sites on this chromosome. Profile R7 is located 1.8 kb from a preferred DSB, which precisely coincides with F6. Profile R4 is found in the ‘cold’ region according to Baudat and Nicolas (1997) and at the edge of a DSB site imprecisely mapped by Zenvirth et al. (1992). Profiles F1 and F10 are found within 20 kb of the telomeres. Profiles R10 and R11 are located 1.5 kb from DSBs identified by Baudat and Nicolas (1997), but not recognized by Zenvirth et al. (1992) as ‘preferred’ DSB sites. R3 and R8 coincide precisely with a DSB mapped by Baudat and Nicolas (1997). The positions of all other profiles on chromosome III coincide precisely with positions of preferred DSBs (Zenvirth et al., 1992).
On chromosome VI, most profiles coincide well with mapped DSBs (Klein et al., 1996), except for F2, R2, F3, F4, R4, R7, R8 and R12 and the telomeric profiles F1, F12 and R14. For chromosome I, profiles F4, F8 and R3 are located within 2–4 kb of mapped DSBs (Klein et al., 1996). The other profiles on chromosome I correspond accurately to mapped DSBs, except profiles R6, R7, F11 and the five telomeric profiles: R1, R2, F1, F13 and R9.
Thus, there is good correspondence between sequence profile locations and meiotic DSBs mapped experimentally (Figure 3). This correspondence is considerably better for chromosome III than the other two chromosomes due to the availability of high‐resolution mapping of DSBs on the former by Baudat and Nicolas (1997). Of the 69 high‐quality CoHR profiles on chromosomes I, III and VI, 42 correspond to ‘preferred’ DSB sites, 10 occur in telomeric regions and 17 non‐telomeric profiles do not correspond to ‘preferred’ DSBs. Four of the latter, on chromosome III, correspond to DSBs identified by Baudat and Nicolas (1997), but not by Zenvirth et al. (1992). Thirty‐three of the 36 ‘preferred’ DSB sites on the three chromosomes are associated with good CoHR profiles.
Structural modifications of the ARG4 promoter region
de Massy and Nicolas (1993) generated several deletions and insertions in the ARG4 promoter region on chromosome VIII and mapped meiotic DSBs in strains carrying these alterations. The DSBs were located between nucleotides 141 575 and 141 620 in SGD. We retrieved from SGD an 8 kb sequence containing the ARG4 gene, between nucleotides 137 863 and 145 863, and ran ProfileGap on this sequence. The analysis revealed two good profile locations, one at 141 627–141 671 on the forward strand (profile F in Figure 4, upper line) and the other at 141 535–141 579 on the reverse strand (profile R), with quality scores of 13.54 and 11.81, respectively. The mapped DSBs in this region (de Massy and Nicolas, 1993) are located between the two profiles, 40–50 bp from both poly(A) tracks.
We re‐assembled the sequences in the altered regions, based on the information given by de Massy and Nicolas (1993), and ran ProfileGap. In deletions 1, 2 and 3 (Figure 4), the two sequence profiles remained intact and DSBs occurred in this region. Both profiles were deleted in deletions 4, 5 and 6, and no DSBs were found. In deletion 9, only profile R was deleted; DSBs occurred in two regions, one at the original location and the other 40–50 bp away, next to the poly(A) stretch of the remaining profile F. DSBs in both regions were weaker and less focused than in the parental strain. The weak DSB signals near deletion 9 may result from a diffuse DSB site (Liu et al., 1995). The existence of two close profiles may lead to strong, focused DSBs between them. Deletion 8 and replacement 10 abolished profile F and the left part of profile R, leaving the poly(A) stretch of R intact. We found that new CoHR profiles were formed at the sequence junctions in these two constructs (Figure 4). The quality scores of the profiles in constructs 8 and 10 are only 11.5 and 11.4, respectively, which may explain the weakness of the DSBs that were detected (de Massy and Nicolas, 1993). We have no explanation for the presence of a weak DSB signal in deletion 7, where we could not identify a CoHR profile in the altered sequence.
Thus, alterations within the two CoHR profiles in the ARG4 region can explain all but one of the changes in the occurrence of meiotic DSBs reported by de Massy and Nicolas (1993).
Relationship between the CoHR profiles and DSBs mapped at single‐nucleotide resolution level
In three regions, individual meiotic DSBs have been mapped at single‐nucleotide resolution. These are the promoters of ARG4 (Liu et al., 1995) and CYS3 (de Massy et al., 1995), and the region between YCR47c and YCR48w (Liu et al., 1995). The locations of CoHR profiles relative to the individual DSB sites in these regions are shown in Figure 5.
The two profiles near ARG4, between nucleotides 141 627 and 141 671 and between 141 535 and 141 579 represent two types of profile, one with a very high‐quality score (13.54) and no gaps and the other with a lower quality score (11.81) and several gaps (Figure 5). Individual DSBs are located in and between these two profiles, but not in the poly(A) stretches.
In the YCR47c–YCR48w region, a single, high‐quality profile (13.3) is located at nucleotides 210 439–210 489. Individual break sites in this region flank the poly(A) tract of the profile, but do not occur within it.
The meiotic breaks detected in the CYS3 promoter region (de Massy et al., 1995) were in or next to the CoHR profile at nucleotides 130 599–130 740 (quality score 11.85). The other profile in this region, at 130 688–130 843, with a score of 11.84, partly overlaps the former and may reinforce it. Again, none of the breaks was found in poly(A) tracts.
Thus, in the three regions examined, individual meiotic DSBs were found in or next to the CoHR sequence profile, on one side of the profile or on both sides. No breaks were found in the poly(A) tracts (Figure 5).
Although DSBs at preferred sites in chromosomal DNA have been shown to initiate meiotic recombination in S. cerevisiae (Smith and Nicolas, 1998), what determines these sites has so far not been defined. DSBs are formed in chromosomal regions that have a more open chromatin structure, as seen by their sensitivity to micrococcal nuclease (Ohta et al., 1998). But what determines that a given site will have an open chromatin structure compared with others? In this study we describe for the first time DNA sequence features that characterize the meiotic DSB regions.
We have constructed a flexible sequence motif that is common to six well‐mapped meiotic DSBs and called it the CoHR profile. The profile is 50 bp long and contains a poly(A) tract in its centre. It may contain several gaps of unrelated sequences, usually not in the poly(A) tract, and may therefore comprise up to 250 bp of DNA.
The correspondence between CoHR profiles and DSB sites is good. Among the profiles that are not associated with DSBs, almost half are within 25 kb of the telomeres, where we had previously noticed the absence of meiotic DSBs (Klein et al., 1996). Telomeric regions of S. cerevisiae chromosomes show silencing of transcription, probably due to a unique chromatin structure (Grunstein, 1997; Guarente, 1999). The proteins Rap1, Sir3 and Sir4 are involved in this structure and are, at least partly, responsible for the silencing. Silencing by these proteins, however, is limited to the distal 3 kb of the chromosomes, whereas the absence of DSBs is observed over longer regions. A recent report on histone H4 repression of transcription in telomeric regions (Wyrick et al., 1999) suggests domains of 20 kb, more similar to the lengths of sequences that are free of DSBs in our studies. The other few profiles without DSBs may be in locally ‘cold’ regions, the existence of which on chromosome III has been suggested by Baudat and Nicolas (1997) and Borde et al. (1999). Moreover, high‐resolution DSB mapping (Baudat and Nicolas, 1997) has reduced to one the number of non‐telomeric CoHR profiles without DSBs on chromosome III, and could similarly improve the correspondence between the two on the other chromosomes. Another way to explain some CoHR profiles without DSBs is that there are sequence differences between strains. Sequence information is available for strain S288C, whereas DSBs were characterized in strain SK1, which may differ from the former in some CoHR profiles.
Only three DSB sites, on chromosome VI, were found without a nearby CoHR profile (Figure 3). Meiotic DSBs in these regions need to be explained by DNA features other than the CoHR profile. Another known case of meiotic DSBs in yeast that we found not to be associated with the CoHR profile is the HIS4::LEU2 construct (Alani et al., 1990), which contains some bacterial DNA. The bacterial sequences may be responsible for the prominent DSBs in this region, as are pBR322 DNA inserts into yeast chromosomes (Wu and Lichten, 1995).
Individual DSB events were mapped at single‐nucleotide accuracy in three DSB regions (de Massy et al., 1995; Liu et al., 1995). Positioning the CoHR profile relative to these breaks (Figure 5) indicates that the latter can occur in or around the profile, but not in the poly(A) tract. Near ARG4 and CYS3, we find two profiles in the same region, on both DNA strands (the two profiles near CYS3 overlap). Such pairs of profiles may reinforce each other regarding DSBs, although for ARG4, when one profile was removed, DSBs still occurred (Figure 4).
What is the meaning of the CoHR profile in relation to the mechanism of meiotic double‐strand breakage? DSBs are generated by a large set of proteins, of which Spo11p is an important member (Bergerat et al., 1997; Keeney et al., 1997). One possibility is that the profile is a recognition site for the endonucleolytic activity of Spo11p. The distribution of individual breaks in, and around, the CoHR profile does not support a sequence‐specific recognition by the enzyme, however. Another more likely explanation for the association between the profile and meiotic DSBs is that the former results in a unique chromatin conformation that opens up early in meiosis, and is then accessible to Spo11p and its companion proteins. In the accessible region, individual DSBs can occur at numerous sites, except in the poly(A) tracts (see Figure 5). Poly(A) tracts are often found in open chromatin regions such as promoters, but for some reason they are not available to meiotic breakage. Openness of the DSB chromatin is induced in meiosis before DSBs are made (Ohta et al., 1998), thus there should be meiotic protein(s) that recognize CoHR chromatin and mediate its opening and accessibility to the Spo11p complex. Such proteins should recognize the unique chromatin structure prescribed by the CoHR profile.
Sequences of the yeast S. cerevisiae, including whole chromosomes I, III and VI, were retrieved from the SGD (http://genome‐www.stanford.edu/Saccharomyces), and were formatted to fit GCG format using the program Reformat.
GenBank (Release 97.0, 10.96) and EMBL (Release 48.0, 9.96) were used as sequence resources within GCG. We refer to forward and reverse strands as they appear in the SGD.
Programs used for the analysis of DNA sequences were taken from the Wisconsin Package Version 9.0, Genetics Computer Group (GCG), Madison, WI. All programs were run with their default parameters, unless indicated. Sequences were compared using FastA. The reverse strands of given sequences were created by the program Reverse. Sequence reshuffling was done using Shuffle. FindPatterns was used for searching long chromosomal sequences for defined motifs.
Multiple sequence analyses were done with programs Assemble, PileUp, Pretty, ProfileMake (version 4.40) and ProfileGap, which are included in the GCG package. The penalty parameters used in the computer runs of ProfileGap were 4.5 for gap creation and 0.05 for gap extension. The default window size for ProfileGap searches was 32 kb. Smaller window sizes were also used, with essentially similar results.
Statistical comparisons of Quality scores were done by t‐test, assuming equal variances.
We thank Gila Peleg for technical help, and Hanah Margalit, Alain Viari, David Klein, Michael Blumental, Ayelet Arbel and Haim Cohen for advice and discussions at various stages of this project. This research was supported by the US–Israel Binational Science Foundation (BSF).
- Copyright © 2000 European Molecular Biology Organization