Satellite DNAs represent a fast‐evolving portion of the eukaryotic genome whose evolution is proposed to be driven by the stochastic process of molecular drive. Recent results indicate that satellite DNAs are subject to certain structural constraints, which are probably related to their interaction with proteins involved in the establishment of specific chromatin structures. The evolutionary persistence and high sequence conservation of some satellites, as well as the presence of stage‐ or tissue‐specific, differentially expressed transcripts in several species, are consistent with the hypothesis that satellite DNA could have a regulatory role in eukaryotic organisms. Although the role of most transcripts is not known, some act as precursors of small interfering RNAs, which are now recognized as having an important role in chromatin modulation and the control of gene expression. Furthermore, some transcripts are involved in the cellular response to stress.
Satellite DNAs are tandemly repeated sequences that are present as long uninterrupted arrays in genetically silent heterochromatic regions (Charlesworth et al, 1994). Basic repeat units of satellite DNAs usually have distinct complex sequences, such as the 171‐bp‐long monomer of the human α‐satellite, which represents a main structural element of centromeric and pericentromeric regions (Schueler et al, 2001). The most abundant mouse pericentromeric γ–satellite also belongs to this group as it is composed of 234‐bp monomers of specific sequence (Rudert et al, 1995). However, other satellites are composed of short simple repeats, such as human satellite III with a 5 bp‐long monomer, as well as many of the Drosophila satellites (Borstnik et al, 1994). Several different satellite DNAs are usually present in a species and are subject to the influence of gene conversion and unequal crossing over. These recombinational mechanisms are responsible for the rapid horizontal spread of mutations among monomers in a genome. The mutations are subsequently fixed in reproductive populations through the stochastic process of molecular drive (Dover, 1986). Copy number change and loss of the satellite DNAs from the genome are also the result of unequal crossing over. The outcome of the recombinational mechanisms and molecular drive is a high turnover of this part of the eukaryotic genome. Therefore, satellite DNAs show significant rearrangements and sequence divergence as well as changes in copy number, even between closely related species (Ugarkovic & Plohl, 2002).
Despite their structural divergence and general lack of sequence conservation across species, satellite DNAs are the main constituents of heterochromatin and act as a centromere‐building element. They associate strongly with several proteins that form unique centromeric heterochromatin (Henikoff & Dalal, 2005). Satellite DNAs are not, however, a prerequisite for centromere establishment and are instead proposed to drive the adaptive evolution of specific centromeric histones (Cooper & Henikoff, 2004). In addition, transcripts of satellite DNAs in the form of small interfering RNAs (siRNA) participate in the epigenetic process of chromatin remodulation and heterochromatin formation (Volpe et al, 2002; Verdel et al, 2004). Recent data indicate that the evolution of satellite DNA sequences is not only driven by molecular drive, but also influenced by selective constraints (Hall et al, 2003; Mravinac et al, 2005). This conclusion is based on the extreme sequence preservation and wide evolutionary distribution of some satellite DNAs, as well as on the conservation of particular structural motifs. Selective constraints on satellite sequence are probably related to their interaction with specific proteins necessary for heterochromatin formation and to their role in controlling gene expression. This review summarizes recent studies on the presence of different structural elements of functional importance found in diverse sequences of satellite DNA. In particular, the transcription of satellite DNA, which seems to be a general phenomenon, and the processing of transcripts into different functional RNA molecules are discussed.
Common sequence motifs and structural features
Selection was first thought to influence satellite DNA sequences after the observation of constant and variable regions in Arabidopsis thaliana and human α‐satellite DNA (Romanova et al, 1996; Heslop‐Harrison et al, 1999). The significant difference in mutation rate of distinct nucleotide positions is also characteristic of human satellite III pentameric repeats (Borstnik et al, 1994). By comparing more than 1,000 monomers, an analysis of the variability of A. thaliana satellite DNA and those from the insect genus Tribolium has been performed. This clearly shows that the rate of evolution is not uniform along the sequence (Hall et al, 2003; Mravinac et al, 2004). Such a non‐random pattern of variability indicates possible functional constraints on satellite DNA sequence that may be related to specific protein‐binding sites.
The most characterized satellite DNA‐binding protein is human centromere protein B (CENP‐B), which binds to a 17‐bp motif in human α‐satellite DNA known as the CENP‐B box (Masumoto et al, 1989). Proteins homologous to CENP‐B have been found in many eukaryotes, including the fission yeast Schizosaccharomyces pombe, and motifs that are 60%–70% similar to the CENP‐B box have been detected in diverse centromeric repeats of mammals and insects (Kipling & Warburton, 1997; Mravinac et al, 2004). Although only 23% of repeats in human α‐satellite DNA have a functional CENP‐B box, it seems to be essential for the assembly of centromere‐specific chromatin (Fig 1; Ohzeki et al, 2002; Basu et al, 2005). Analysis of long higher‐order α‐satellite arrays revealed that CENP‐B binding sites are located in alternating repeat monomers (Ikeno et al, 1994). This supports the model that only a subset of α‐satellite monomers binds CENP‐B, which results in protein phasing and higher‐order chromatin structure (Yoda et al, 1998). In addition to the CENP‐B box, other relatively conserved oligonucleotide motifs have been identified in different satellites. In avian centromeric satellites, short oligonucleotide tracts are conserved in sequence and position whereas the rest of the sequence is highly divergent (Madsen et al, 1994). Clustering of A or T and regular phasing of tracts of three or more A + T combined with the presence of dyad structures has been reported for many different satellite DNAs (Martinez‐Balbas et al, 1990; Ugarkovic et al, 1996). Periodic distribution of AT tracts usually leads to curvature of the DNA helix axis, which is characteristic of approximately 50% of satellite DNAs (Fitzgerald et al, 1994). As a result of this curvature, a superhelical tertiary structure is formed that is thought to be important for the tight packing of DNA and proteins in heterochromatin (Fig 1; Ugarkovic et al, 1992; Fitzgerald et al, 1994).
Besides these conserved motifs, other more variable regions in satellite DNAs may also be functionally important owing to their interaction with rapidly evolving proteins. An example is the centromere‐specific histone, CENH3, which replaces histone H3 in centromeric nucleosomes and is required for proper chromosome distribution during cell division (Henikoff & Dalal, 2005). Unlike the highly conserved histone H3, CENH3 is divergent and subject to the influence of positive selection, which particularly affects the sites that potentially interact with satellite DNA (Cooper & Henikoff, 2004). This suggests that variable regions are potential drivers of CENH3 adaptive evolution and that they have a role in the evolution of the centromere.
Ultra‐conserved satellite sequences
Although, in general, satellite DNAs belong to the fast‐evolving portion of the eukaryotic genome, some are preserved for long evolutionary periods and have a highly conserved monomer sequence. Extreme sequence conservation of two satellite DNAs that represent main pericentromeric repeats in the coleopteran insect species Palorus ratzeburgii and Palorus subdepressus has been reported (Mravinac et al, 2002, 2005). These satellites are present in many coleopteran species at a low copy number and their sequences have remained unchanged for 60 million years. This remarkable antiquity and sequence conservation are also characteristic of human α‐satellite DNA, which has been detected as a rare, highly conserved repeat in evolutionarily distant species such as chicken and zebrafish (Li & Kirby, 2003). This complete sequence conservation and the wide evolutionary distribution of some satellite sequences have led to the assumption that in addition to participating in centromere formation, they might also act as cis‐regulatory elements of gene expression, as is observed for some highly conserved, mammalian non‐coding elements (Frazer et al, 2004).
Satellite DNA transcripts
Transcripts of satellite DNAs have been reported in several organisms including vertebrates, invertebrates and plants. In most species, satellite DNAs are temporally transcribed at particular developmental stages or are differentially expressed in some cell types, tissues or organs. Transcription has been detected during embryogenesis in the newts Triturus cristatus carnifex (Varley et al, 1980) and Notophthalmus viridescens (Diaz et al, 1981). Transcripts of an α‐like satellite repeat detected during early embryogenesis in chick and zebrafish were limited to the cardiac neural crest, the head and the heart (Li & Kirby, 2003). Two types of transcript were identified; one that corresponds to α‐repeat RNA and another group of mRNAs that contain an α‐like satellite sequence in the 5‘ and 3’ untranslated regions. Mouse γ‐satellite DNA is differentially expressed during development of the central nervous system, as well as in the adult liver and testis (Rudert et al, 1995). Transcripts have also been detected in hymenopteran insect species (Rouleux‐Bonnin et al, 1996), as well as on Y‐chromosome loops in primary spermatocyte nuclei of Drosophila melanogaster (Bonaccorsi et al, 1990) and Drosophila hydei (Trapitz et al, 1988).
Most transcripts are present as polyadenylated RNA in the cytoplasm but some are found exclusively in the nucleus, such as those associated with the Y chromosome of D. melanogaster and D. hydei (Trapitz et al, 1988; Bonaccorsi et al, 1990). Transcripts are usually heterogeneous in size and are in some cases strand‐specific, as in the mouse (Rudert et al, 1995), or transcription proceeds from both DNA strands, as in Hymenoptera (Rouleux‐Bonnin et al, 1996). The developmental, stage‐ and tissue‐specific expression of satellite DNAs in several species suggest that they have a regulatory role, although for most transcripts this role is still elusive and hypothetical. Taking into account the extreme sequence diversity of satellite DNAs and their transcripts, several sequence‐specific regulatory signals might reside within them. Targets of these signals could be other RNAs, DNA or proteins. Regulatory interactions may also involve secondary or tertiary structures of RNA and RNA‐mediated catalysis.
Recently, it has been shown that heat shock induces the transcription of a subset of satellite III, which is located within pericentromeric heterochromatic regions of specific human chromosomes (Jolly et al, 2004; Rizzi et al, 2004). Long, single‐stranded polyadenylated transcripts of satellite III are involved directly in the recruitment of splicing factors to nuclear stress granules (Metz et al, 2004; Chiodi et al, 2004). It has been proposed that sequestration of splicing factors within the granules through their association with nuclear satellite III transcripts could regulate splicing function during stress (Fig 1; Valgardsdottir et al, 2005). The satellite DNA transcripts from salamanders (Epstein & Gall, 1987), schistosomes (Ferbeyre et al, 1998) and crickets (Rojas et al, 2000) are expressed as long multimeric precursor RNAs that have the ability to adopt hammerhead‐like secondary structures and can function as ribozymes with self‐cleavage activity. The products of self‐cleavage are satellite monomers that can act as ribozymes either in cis or in trans (Fig 1). However, the physiological role of these ribozymes is unknown.
Promoters and transcription‐factor binding sites
The characteristic sequence structure of some satellite DNAs is based on simple repeats, which led to the proposal that they are transcribed by read‐through from upstream genes or transposable element promoters (Diaz et al, 1981). However, potential regulatory elements for RNA polymerase (pol) II and RNA pol III have been predicted in some satellite sequences (Renault et al, 1999). In schistosome satellite DNA, which encodes an active ribozyme, a functional RNA pol III promoter is present (Ferbeyre et al, 1998). The sequence of satellite 2 found in the newts Notophthalmus viridescens and Triturus vulgaris meridionalis contains a functional analogue of the vertebrate small nuclear RNA (snRNA) promoter that is responsible for RNA pol II transcription (Coats et al, 1994). Human satellite III, which is specifically expressed under stress, has a binding motif for the heat‐shock transcription factor 1 that drives RNA pol II transcription (Metz et al, 2004). Transcription factor YY1 associates with γ−satellite DNA, which is located pericentromerically on all murine chromosomes (Shestakova et al, 2004). YY1 belongs to the Polycomb group of proteins that are involved in the repression of homeotic genes, and its interaction with heterochromatin suggests a link between these two silencing states.
Short interfering RNAs cognate to satellite DNAs
Recently, it has been shown that transcripts derived from tandemly repeated centromeric DNA of the fission yeast S. pombe exist in the form of small 20–25 bp‐long RNAs that are involved in RNA interference (RNAi)‐mediated heterochromatin assembly (Fig 1; Volpe et al, 2002). This chromatin‐silencing mechanism is initiated by long double‐stranded RNA (dsRNA) that arises from bidirectional transcription of repeated DNA and is further processed by the RNase III‐like ribonuclease Dicer into siRNAs. siRNAs are then loaded into the RNA‐induced transcriptional silencing complex (RITS) through their association with the Argonaute protein (Verdel et al, 2004). RITS also interacts with the RNA‐directed RNA polymerase complex (RDRC), which is required for the production of secondary dsRNA and amplification of the silencing signal (Motamedi et al, 2004). Both RITS and RDRC associate with the nascent noncoding centromeric RNA transcript and binding to RITS is probably achieved through the base‐pairing of siRNA molecules with nascent RNA and by direct contact with the RNA pol II elongation complex (Schramke et al, 2005; Kato et al, 2005). In addition to siRNAs, the association of RITS with chromatin also requires a histone methyltransferase. Histone H3 methylation at lysine 9 (Lys 9) is essential for the recruitment of the chromodomain protein Swi6, a homologue of mammalian heterochromatin protein 1 (HP1). This represents an initial step in the formation of heterochromatin. Methylation of H3‐Lys 9 also creates binding sites for RITS and therefore enables further recruitment of heterochromatin assembly factors (Noma et al, 2004). These studies show that the low level of expression of satellite DNA is necessary for establishing the transcriptionally silent heterochromatic state. However, it is not clear whether this low level of expression is required only to initiate the heterochromatic state, which can be further propagated by epigenetic mechanisms such as histone methylation, or whether it is also required for its maintenance.
The siRNAs cognate to satellite DNAs are also involved in the epigenetic process of chromatin modification in D. melanogaster as well as in vertebrate cells (Pal‐Bhadra et al, 2004; Fukagawa et al, 2004). In D. melanogaster, their expression is developmentally regulated and is most intense in testes and in early embryos. This is probably related to the marked changes in heterochromatin that occur during these stages (Aravin et al, 2003).
Satellite DNA‐derived siRNAs may be involved in post‐transcriptional gene regulation through the action of the RNA‐induced silencing complex (RISC). This complex contains the Argonaute protein that binds siRNA and mediates complementary mRNA recognition and inactivation (Fig 1; Hammond et al, 2001). The presence of several coding mRNAs in human and chick embryos that contain α‐like satellite repeats as part of their 5′ or 3′ untranslated regions indicates that their expression could be controlled by siRNAs derived from α‐satellite repeats (Li & Kirby, 2003). This is in accordance with a model proposed by Davidson & Britten (1979) for the coordinated expression of repeat sequences and structural genes that contain repeat‐complementary regions. It is possible that siRNAs originating from satellite repeats have a more extensive role in gene expression and affect both heterochromatin formation on tandemly repeated noncoding regions and the expression of particular genes with embedded satellite repeats (Fig 1).
Evidence is accumulating on the functional significance of satellite DNA sequences, in particular the uneven distribution of mutations in satellite repeats that results in conserved and variable segments and the extreme sequence conservation and evolutionary preservation of some satellites. Widespread transcriptional activity together with the presence of active promoters and binding sites for different transcription factors found in some satellite repeats further support the functional importance of these diverse sequences. The role of most transcripts is not known, but some are important for epigenetic chromatin modifications and might control the expression of satellite repeat‐tagged genes. Others show ribozyme activity, whereas human satellite III transcripts are involved in the recruitment of splicing factors during stress. These examples suggest an active role for satellite transcripts in several regulatory layers from chromatin modulation, transcription and RNA maturation to translation. As the transcription of most satellites is either developmentally and temporally regulated or restricted to particular tissues and organs, it is plausible that the transcripts are responsible for fine‐tuning gene expression. However, the precise biochemical mechanisms of action for different satellites and their transcripts still need to be determined.
I am grateful to M. Sopta for critical reading of the manuscript. I apologize to those whose work could not be cited owing to space restrictions. This work was supported by grant 0098074 from the Croatian Ministry of Science.
- Copyright © 2005 European Molecular Biology Organization