The kangaroo genome is a rich and unique resource for comparative genomics. Marsupial genetics and cytology have made significant contributions to the understanding of gene function and evolution, and increasing the availability of kangaroo DNA sequence information would provide these benefits on a genomic scale. Here we summarize the contributions from cytogenetic and genetic studies of marsupials, describe the genomic resources currently available and those being developed, and explore the benefits of a kangaroo genome project.
Comparative genomics is a powerful tool for identifying the features and dissecting the functions of genomes. The approach is based on selection for the gene or regulatory region constraining the evolution of the sequence. Comparison with other genomes has become an integral part of the analysis of the human genome sequence and is one of the most effective methods for identifying genes (Batzoglou et al., 2000; Roest Crollius et al., 2000). As the human genome project progresses beyond sequencing into its new phase of detailed annotation, attention is being focused on the contribution that comparative genomics can make to understanding the function and evolutionary history of the human genome.
Current targets for whole‐genome sequencing include chimpanzee, cattle, pig, rat, chicken and fish species. These will all provide unique and valuable comparative information. However, a quick glance at their phylogenetic relationships (Fig. 1) reveals a gaping hole in the phylogeny: the 250 million years between the divergence of chicken and the eutherian mammals. We propose that sequencing the kangaroo genome would fill this gap and provide a powerful new dimension to comparative genomic analysis.
Marsupials for comparative genomics
Choosing the species that will be most worthwhile for evolutionary comparisons is a balancing act between random noise and the ability to align the sequences unambiguously. In closely related species, conservation of random sequences due to chance rather than function creates false signals (noise). In very distantly related species, sequence divergence combined with inversion and deletion events hampers the unambiguous alignment of sequences outside highly conserved coding regions, and can result in true conservation being overlooked owing to the comparison being made with non‐homologous sequences. These conflicting effects depend on the strength of the selective forces constraining the sequence being analysed, as well as the evolutionary distance between the two organisms.
Marsupials diverged from the eutherian (‘placental’) lineage 130–180 million years ago, providing a middle ground for the analysis of many regions of the genome. Comparisons between marsupials and eutherians allow relatively straightforward alignment, and display a high ratio of conservation signal to random noise, reducing the extent and degree of conservation required to infer functionally conserved sequences. This reduced noise level is of particular value in regions of the genome that are changing more rapidly than coding regions but still have selective pressures acting on them, such as 5′ and 3′ untranslated regions and introns (Fig. 2).
Now that the human genome sequence has been unravelled, it has become apparent that many more control regions than expected lie within the introns of genes or are contained in an antisense transcript. Some estimates have even suggested that more than 90% of the human genome is contained in the introns of overlapping genes (Wong et al., 2000). This will make the increased discriminatory power of marsupial–eutherian comparisons valuable for the analysis of a large number of regions.
In the first large‐scale comparison of marsupial and eutherian sequences, Chapman et al. (2003) analysed the genomic region surrounding the lymphoblastic‐leukaemia‐derived sequence 1 (LYL1) gene. Comparisons between mouse and human showed high conservation over the region. When the marsupial sequence was compared with human and mouse, non‐coding homology was reduced and all promoters and exons could be readily identified. The marsupial sequence also revealed putative transcription‐factor‐binding sites consistent with those of the better studied paralogue, the stem‐cell leukaemia gene, SCL.
Marsupial biology and reproduction
Marsupials are mammals and, unlike chickens, share many uniquely mammalian features that are important to our understanding of human biology and health. The tammar wallaby (Macropus eugenii) is a small member of the kangaroo family (Macropodidae). At a conference of marsupial geneticists this species was adopted as the primary animal for genetic, reproductive and physiological studies on marsupials (Hinds et al., 1990). The tammar wallaby was chosen over other kangaroo species because it is small, can be bred in captivity and is relatively easy to handle.
The reproduction of kangaroos is highly adapted to a harsh and variable environment. The embryo develops only for 21 days in utero and is born when only 6–10 mm long and weighing less than 400 mg, at a developmental stage roughly equivalent to a 35–45‐day human or a 13–18‐day mouse embryo (Tyndale‐Biscoe & Renfree, 1987). The jellybean‐sized newborn has the bare essentials: a mouth and gut for feeding in the pouch, forelimbs to climb the fur, and an incompletely developed set of lungs and circulatory system. Much of the development occurs in the pouch, including the hindlimbs, eyes, gonads and a significant portion of the brain (Mark & Marotte, 1992; Pask & Renfree, 2001; Reynolds et al., 1985; Tyndale‐Biscoe & Janssens, 1988). This mode of development minimizes maternal investment, allowing the mother to respond to altered environmental conditions (Tyndale‐Biscoe, 2001; Tyndale‐Biscoe & Janssens, 1988). The early postnatal development provides a unique experimental environment, with ready experimental access to developmental stages that are much more difficult to manipulate in eutherian mammals. This access has been used for studies of sex hormones, such as the minimally invasive administration of oestrogen to male embryos by using a feeding tube alongside the nipple, inducing gonadal sex reversal (Coveney et al., 2001). In many species of kangaroo, including the tammar wallaby, the mother conceives again immediately after birth but the embryo is held in diapause as long as the young in the pouch continues to provide suckling stimuli. This allows early‐stage embryos to be accurately timed through the removal of the pouch young (Tyndale‐Biscoe & Renfree, 1987).
The marsupial genome is estimated to be 3.3 billion base pairs in size, similar to eutherian mammal genomes, but is packaged into a small number of large chromosomes. Marsupial karyotypes are very stable across diverse lineages, which are separated by relatively few simple rearrangements. G‐banding and, more recently, chromosome painting between marsupial groups has confirmed the original deduction that there is an ancestral marsupial karyotype with 12 autosomes and a pair of sex chromosomes. In the macropodid ancestor that gave rise to the present‐day kangaroo family, this increased to 22 chromosomes by centromeric fissions. The modern macropodid karyotypes are derived by Robertsonian and tandem fusions (Toder et al., 1997; De Leo et al., 1999; Rens et al., 1999). The infrequency of major inversions also predicts few large‐scale rearrangements within the chromosome arms, allowing the extrapolation of a cytogenetic map from a model marsupial to other marsupials of interest.
Although chromosome painting of different marsupial species has been of significant value, this technique is insufficiently sensitive to detect chromosome homologies between marsupials and eutherians. It has therefore been necessary to map conserved coding genes through family studies, somatic cell genetics and in situ hybridization. A high‐density cytogenetic map has been generated for genes on the X and Y chromosome, and there is patchy coverage of the autosomes. The major limiting factor in mapping coding genes is the identification and verification of orthologous genes for use as probes, and we expect rapid progress in cytological mapping as genomic‐scale technologies are adopted.
Comparison of marsupial and human maps indicates that large chromosomal regions have been conserved (Samollow & Graves, 1998). This conservation will be valuable for identifying orthologous members of complex gene families by their location in conserved blocks of genes that have fewer sequence homologies in the genome.
Even chromosomal location can be a powerful indicator of gene function. Famously, this provided a critical test for the Y‐borne testis‐determining gene. The first candidate, ZFY, was ruled out as the universal mammalian testis‐determining gene when it was discovered to be autosomal in marsupials (Sinclair et al., 1988). The subsequently identified SRY gene was found to be on the Y chromosome in marsupials, and SOX3, the X‐linked gene from which it is believed to have evolved, was also first identified in these species (Foster et al., 1992; Foster & Graves, 1994).
Knowledge of the evolutionary history of a gene or chromosome region can also help to explain its activity or function. For instance, comparative mapping of genes on the human X chromosome revealed that it is composed of a conserved region (also present on the X chromosome in marsupials) and a recently added region that is on tammar chromosome 5, which has been added to the eutherian X after the marsupial/eutherian divergence but before the eutherian radiation (Graves, 1995; Pask & Graves, 2001). The conserved and recently added regions of the human X chromosome were strikingly demonstrated by painting the tammar wallaby X chromosome DNA onto human chromosomes (Glas et al., 1999). This evolutionary history explains why many genes on the short arm of the human X chromosome escape X‐chromosome inactivation (XCI; Carrel et al., 1999): genes within this region were recently part of a paired autosomal region that did not require dosage compensation and therefore have yet to be recruited into the X‐inactivation system.
Marsupials are particularly valuable because they share many mammal‐specific regulatory systems with eutherians, such as XCI and genomic imprinting.
Eutherian XCI is a complex, multistage process that acts in cis on an entire chromosome. XCI is controlled by the XIST gene, which transcribes an untranslated RNA that coats the inactive X chromosome and somehow represses transcription by means of histone deacetylation, DNA methylation and chromatin remodelling (Brown et al., 1992). XIST expression is under the control of a complex and incompletely understood system that includes an antisense transcript, TSIX(Boumil & Lee, 2001).
In contrast, marsupial XCI is paternal rather than random, is less complete than in eutherians, and is tissue‐specific. Histone deacetylation is involved but methylation of CpG islands in promotor regions is not, suggesting that DNA methylation is a recently evolved repression mechanism (Gartler et al., 1985; Piper et al., 1993; Wakefield et al., 1997). The absence of the highly stable DNA methylation component of the inactivation system in marsupials exposes other layers of the mechanism, providing a unique model system for the study of the chromatin components of X inactivation.
Cloning of the marsupial XIST and X‐inactivation centre will provide valuable sequence comparisons for the identification of important control elements involved in random inactivation and X chromosome choice in eutherians, versus paternal inactivation in marsupials. Analysis of the XIST RNA transcribed region will also provide a method of identifying conserved domains that interact with downstream components of the inactivation pathway.
Genomic imprinting is defined as the specific expression of small clusters of autosomal genes dependent upon the parent from which they were derived. It is similar to X inactivation in that it seems to involve chromatin remodelling, antisense RNAs and DNA methylation. Initial work has established that IGF2, which is imprinted in human and mouse, is also imprinted in marsupials, but not in chicken or monotremes (O'Neill et al., 2000; Killian et al., 2001). As with the X‐inactivation centre, comparative sequencing of the imprinted regions provides an ideal opportunity to identify conserved regulatory elements in the genomic regions around the imprinted clusters. Because the maternal investment in the marsupial embryo before birth is minimal, marsupials will be crucial for testing the exciting ’parental tug‐of‐war' theory. This theory proposes that the male and female genomes are in competition for the resources of the mother, with paternal genes trying to maximize maternal resource utilization and the maternal genes attempting to restrict this drain to improve maternal survival (Moore & Haig, 1991). Because maternal resources are delivered to marsupials over a longer period and mainly through lactation rather than development in utero, significant differences in the imprinting pattern of genes in marsupials would be predicted by this theory.
Marsupial molecular genetics is a powerful tool for dissecting the evolutionary relationships and functions of mammalian genes, producing results that have not been predicted by other evolutionary comparisons.
For example, the globin genes, which encode molecules responsible for oxygen transportation, are a central development in vertebrate evolution. Modern vertebrates have multiple developmentally regulated globin genes that are derived from the duplication of an ancestral gene. Eutherian mammals have three ϵ‐globin class genes (ϵ, γ and η) and two β‐globin class genes (β and δ) in a single cluster, and a group of α‐like globins in a separate locus. Genes with sequence similarity to both the ϵ and β classes are also present in birds, and marsupials have a globin cluster that includes a single ϵ‐globin and a single β‐globin gene. As in eutherian mammals, the ϵ‐globin of marsupials is used in neonatal development and the β form in adults.
From physiological and biochemical studies, it was determined that marsupials possess a third globin gene, ω‐globin. Cloning of the gene encoding ω‐globin revealed it to be an atypical β‐globin‐like gene, closer in sequence to the avian β‐globin than to eutherian globins. Surprisingly, this gene was also found to be separated from the ϵβ‐globin gene cluster and to map with the α‐globin cluster. The sequence similarity and mapping of this gene suggest a model of globin gene evolution differing significantly from that suggested by the avian and eutherian data alone. The evolution of the globins is most parsimoniously explained by two globin genes in the vertebrate ancestor, one of which gave rise to the marsupial ϵ‐ and β‐globins and all the eutherian globins, and a second gene that gave rise to all the avian globins and the marsupial ω‐globin (Wheeler et al., 2001). Significantly, such a model would mean that the avian and eutherian β‐globin clusters are not orthologous but are independently derived paralogues. Other complex multigene families might have similar evolutionary histories, making marsupials a valuable tool for dissecting such complex relationships between genes.
Marsupials hold surprises even with regard to fundamental processes such as the control of genetic recombination. In most mammals, the male has a lower recombination rate than the female, resulting in larger genetic map distances between markers. However, in marsupials the female exhibits lower recombination rates for large sections of the genome (Bennett et al., 1986; van Oorschot et al., 1992; Zenger et al., 2002). This difference in recombination rates is not consistent with the prevailing theory that sex‐related differences in recombination rates are due to heterogamy (Haldane, 1922). Marsupials provide a novel system by which to dissect the factors that affect recombination, and such investigations would be made possible by genome‐scale data.
The kangaroo genome project
An international project to achieve full draft‐quality sequencing of the tammar wallaby genome within 5 years is currently being initiated. As the kangaroo genome is unlikely to receive the concentration of resources required to make shotgun sequencing viable, the community is promoting a bacterial artificial chromosome (BAC) tiling approach to the genome. This will allow projects focusing on specific regions of the genome to be readily undertaken as appropriate resources become available, and will provide researchers outside the genome sequencing community with access to clones for their region of interest.
The small number of large, readily identifiable chromosomes in the tammar karyotype has facilitated the development of a skeleton cytogenetic map, which currently has a higher coverage of the sex chromosomes. A linkage map with complete genome coverage has recently been developed by matings between the Kangaroo Island and Garden Island subspecies of tammar wallaby at Macquarie University (Zenger et al., 2002). A project to integrate the linkage map with the cytogenetic map is currently in development.
Kangaroo–rodent cell hybrids represent another valuable mapping resource. Most rodent–marsupial cell hybrids contain only fragments of the marsupial genome and can be used to order genes in the same way as for radiation hybrids (in fact, their analysis pre‐dated radiation hybrid mapping) (Dobrovic & Graves, 1986; Donald & Hope, 1981). These hybrids are being characterized by painting their DNA onto normal tammar wallaby chromosomes to identify the regions of the tammar genome that they contain. The bank of hybrids will allow rapid mapping of expressed sequence tag (EST) and BAC end sequence markers onto the existing linkage map, and integration of the physical, cytogenetic and genetic maps. These mapping resources will greatly enhance our ability to generate, interpret, annotate and use kangaroo genome data.
An important component of the analysis of the marsupial genome will be large‐scale comparative expression analysis. Expression data will allow the identification of genes for which temporal and spatial expression patterns have been conserved, giving a powerful insight into function. The current focus of the community has been in the development of EST and microarray resources for reproductive and developmental studies, but these facilities will quickly find broader applications (K. Nicholas, personal communication).
In addition, the small marsupial Y chromosome will facilitate identification of the minimal set of genes required for male development and function (Toder et al., 2000), and sequence conservation of these genes across all mammals will highlight those that are particularly important. Furthermore, both the X‐inactivation control region and imprinted domains contain complex and enigmatic controls of gene expression and involve unknown interactions of functional RNA molecules and chromatin. A comparative analysis of such complex regions will highlight conserved features for direct experimental investigation.
These pilot projects will be useful, not only in their own right, but also in providing an opportunity to fine‐tune bioinformatic approaches to gain maximal benefit from the kangaroo data being generated.
The kangaroo genome is a treasure trove of comparative genomics data. The analysis of individual genes, and of gene arrangement, has already contributed significantly to our understanding of human biology and genetics. Analysis on a genome‐wide scale will provide sequences for identifying conserved genes, functional domains and regulatory elements. The unique value of marsupial sequences for such comparisons will motivate the complete sequencing of the kangaroo genome.
- Copyright © 2003 European Molecular Biology Organization