Open Access

Transparent Process

Functional divergence of gene duplicates through ectopic recombination

Joaquin F Christiaens, Sebastiaan E Van Mulders, Jorge Duitama, Chris A Brown, Maarten G Ghequire, Luc De Meester, Jan Michiels, Tom Wenseleers, Karin Voordeckers, Kevin J Verstrepen

Author Affiliations

  1. Joaquin F Christiaens1,2,,
  2. Sebastiaan E Van Mulders3,,
  3. Jorge Duitama1,2,,
  4. Chris A Brown1,2,4,5,
  5. Maarten G Ghequire1,
  6. Luc De Meester6,
  7. Jan Michiels1,
  8. Tom Wenseleers6,
  9. Karin Voordeckers1,2 and
  10. Kevin J Verstrepen*,1,2
  1. 1 Department of Microbial and Molecular Systems, Centre of Microbial and Plant Genetics (CMPG), KU Leuven, Kasteelpark Arenberg 22, B‐3001, Leuven (Heverlee), Belgium
  2. 2 VIB Laboratory of Systems Biology, KU Leuven, Kasteelpark Arenberg 22, B‐3001, Leuven (Heverlee), Belgium
  3. 3 Department of Microbial and Molecular Systems, Centre for Malting and Brewing Science, Faculty of Bioscience Engineering, KU Leuven, Kasteelpark Arenberg 22, B‐3001, Leuven (Heverlee), Belgium
  4. 4 Faculty of Arts and Sciences Center for Systems Biology, Harvard University, Cambridge, Massachusetts, 02138, USA
  5. 5 Fathom Information Design, Boston, Massachusetts, 02114, USA
  6. 6 Department of Biology, Animal Ecology and Systematics Section, B‐3000, Leuven, Belgium
  1. *Corresponding author. Tel:+32 (0) 16 75 13 90; Fax:+32 (0) 16 75 13 91; E-mail: kevin.verstrepen{at}
  1. These authors contributed equally to this paper.


Gene duplication stimulates evolutionary innovation as the resulting paralogs acquire mutations that lead to sub‐ or neofunctionalization. A comprehensive in silico analysis of paralogs in Saccharomyces cerevisiae reveals that duplicates of cell‐surface and subtelomeric genes also undergo ectopic recombination, which leads to new chimaeric alleles. Mimicking such intergenic recombination events in the FLO (flocculation) family of cell‐surface genes shows that chimaeric FLO alleles confer different adhesion phenotypes than the parental genes. Our results indicate that intergenic recombination between paralogs can generate a large set of new alleles, thereby providing the raw material for evolutionary adaptation and innovation.


How organisms evolve and adapt to new environments remains a central question in biology. Gene duplication events have a crucial role in evolutionary processes, especially in the rapid development of new functions [[1], [2]]. Duplication might yield adaptive benefits through increased dosage of the parental gene. In addition, gene duplication can also stimulate evolutionary innovation as mutations in one or both gene duplicates can lead to subfunctionalization or neofunctionalization [[3], [4], [5], [6], [7], [8]].

Gene duplications are not spread evenly over the genome, occuring much more frequently in the subtelomeres, regions directly adjacent to the telomeres [[9], [10], [11]]. As a result, subtelomeric gene families are often large, with some families carrying as many as several hundred paralogs [[9], [11]]. In Trypanosoma and Plasmodium species, variable expression of subtelomeric variants of a cell‐surface antigen allows these pathogens to elude the host immune system. Subtelomeric gene families in the more compact genome of Saccharomyces cerevisiae are generally smaller and are enriched for cell‐surface genes, as well as genes involved in nutrient transport and metabolism [[9], [12], [13], [14], [15], [16], [17], [18]].

Recombination between paralogs could generate more sequence diversity in duplicated genes [[19]]. However, apart from anecdotal examples in specific gene families such as the VAR and VSG cell‐surface genes of Plasmodia and Trypanosomes and the major histocompatibility complex (MHC) class genes in vertebrates, the occurrence and biological relevance of ectopic recombination between gene duplicates have not been systematically investigated [[16], [17], [20]].

Here, we present the results of a comprehensive in silico analysis of ectopic recombination in all paralog gene families in the model eukaryote S. cerevisiae. Our results show that intergenic recombinations occur predominantly in gene families that are located at the subtelomeres and/or encode cell‐surface genes. To verify whether these intergenic recombination events could lead to altered phenotypes, we mimicked the intergenic recombination events that shaped the FLO (flocculation) adhesin gene family. Phenotypic analyses of these artificial FLO chimaera revealed that they were functional and conferred phenotypes that differed from their parental adhesins.


Ectopic recombination in gene families

To investigate the occurrence of intergenic recombination events in S. cerevisiae, we first identified paralogous genes by BLASTing the reference strain S288c's proteome against itself. Next, we used the MCL clustering algorithm and manual curation to define a list of 210 gene families, each comprising at least two paralogs (see methods and supplementary information online for details) [[21]]. For all genes in these families we collected the known GO categories as well as chromosomal locations. We attributed these characteristics to gene families if at least two genes in a family shared the annotation (the rationale being that it takes at least two genes to create a chimaeric allele).

Next, we BLAST‐searched each family against a database containing 24 recently published high‐quality S. cerevisiae genomes (supplementary Table S1 online). For each family, we performed several in silico tests to check for evidence of intergenic recombination in the family. First, we used the SplitsTree4 programme [[22][ to produce reticulate phylogenetic trees of each family. In the absence of recombination, this procedure results in a classic, unrooted phylogenetic tree. Alternatively, when a recombination event took place, this is represented in the reticulate tree as a closed rectangle. To further analyse these families and provide statistically significant proof for the occurrence of intergenic recombination, we used the Pairwise Homoplasy Index (PHI) test [[23]]. The null hypothesis for this test is that the observed sequence differences are due to convergent mutations, which implies that all PHI values are similar for all pairs (i.e., the PHI values do not vary with the physical distance between residues). Alternatively, in the presence of recombination, distant sites show low PHI values. These calculations then result in a single PHI value for the paralog gene family. For further details about these procedures, please refer the supplementary information online.

The distribution of PHI values indicates the presence of certain highly recombinogenic families (Fig 1). Interestingly, almost all chimaeric sequences are part of families that contain either subtelomeric or cell‐surface genes, or both (Table 1). Statistical analyses of these distributions with a Komolgorov–Smirnov test revealed a statistically significant enrichment for subtelomeric gene families (P‐value 3.48 × 10−6) and for cell‐surface gene families (P‐value 5.5 × 10−3). The distribution of a combination of all subtelomeric and cell‐surface gene families was also significantly enriched for chimaeric alleles. Moreover, when all cell‐surface gene families are removed from the analysis, we still find a statistically higher occurrence of chimaeric subtelomeric genes and vice versa. We were unable to find other subgroups that are enriched or depleted for ectopic recombination (e.g., the P‐value for the comparison between families containing intragenic tandem repeats, and those without is 0.38; supplementary Fig S1 online).

Figure 1.

Analyses of paralog gene families indicate intergenic recombination. Distribution of the PHI values for all paralog gene families in the S. cerevisiae genome. Most gene families cluster towards the left hand side of the graph, with high PHI values that are indicative of the absence of intergenic recombination. However, several gene families on the far right hand side (Table 1) show inter‐paralog recombination. Note that this group exists almost exclusively of subtelomeric and/or cell‐surface gene families. For more information on the calculation of PHI values, see Bruen et al [[23]] and (supplementary information online) methods. PHI, pairwise homoplasy index.

View this table:
Table 1. S. cerevisiae gene families showing ectopic recombination

Several recombinations shaped the adhesin gene family

To investigate the functional implications of ectopic recombination events, we focused on the subtelomeric FLO gene family, which encodes lectin‐like cell‐surface adhesins that confer adhesion to abiotic surfaces and/or other yeast cells [[24], [25], [26], [27], [28], [29]]. These two phenotypes are biologically important for yeast cells and are relatively easy to measure and quantify [[24], [29], [30], [31], [32], [33]]. The amino terminus of these rod‐shaped proteins is a lectin‐like globular domain that contains a pentapeptide involved in adhesion to specific carbohydrate residues present at the surface of other yeast cells or host tissues [[34], [35]]. The central adhesin domain is formed by a repetitive pattern of a heavily glycosylated serine/threonine‐rich peptide, which is thought to act as a variable rod‐like spacer that helps to display the N‐terminal domain to the environment [[36][.

To perform an in‐depth analysis of recombination events between FLO paralogs, we gathered 58 more FLO sequences (including a few partial sequences or pseudogenes) from both NCBI and ENA (supplementary Table S2 online). Phylogenetic analysis revealed three subclades that cluster with the FLO1, FLO10 and FLO11 genes found in the reference S288c strain. In line with previous findings, we found extensive variation in tandem repeat length, even among adhesins from the same class [[36], [37]].

We first verified whether the full set of FLO genes shows signs of intergenic recombination across the repeat region. Unrooted phylogenetic trees of the N‐ (Fig 2A) and carboxy‐terminal (Fig 2B) domains revealed that several alleles might originate from recombination across the repeat region (Figs 2A–C). The PHI test further confirmed the occurrence of several intergenic recombination events in the complete FLO open reading frames (P‐value <10−16). Whereas this shows that ectopic recombination between intragenic tandem repeats does occur, analyses using all S. cerevisiae families do not show a significant enrichment of chimaera among genes that contain internal tandem repeats (see above).

Figure 2.

Phylogenetic analyses of FLO genes reveal extensive intergenic recombination. AC show the principle of reticulate analysis. Shown in blue are sequences used as representative haplotypes in supplementary Fig S2 online. (A) Phylogenetic tree of nucleotide sequences coding for N‐terminal domains of all complete FLO1‐like sequences. (B) Phylogenetic tree of nucleotide sequences coding for C‐terminal domains of the same subset of FLO1‐like sequences. (C) Trees of (A) and (B) are combined in an unrooted reticulate network (i.e., the overlay of trees shown in A and B). Such a phylogenetic reticulate allows to represent recombination events between either current or ancestral sequences and provides a more precise visualization of the evolutionary history of the FLO genes. Sequence (27) appears in different places in (A) and (B), thus placing it at an extension of the corner of a closed square in the reticulate tree. Such closed squares are generated by a predicted recombination event. (D) Reticulate displaying recombination events in the N‐terminal domain of all FLO1‐like sequences. Note that all these analyses were performed using all available FLO sequences but for the sake of clarity, only a subset of sequences was used to generate these figures. FLO, flocculation.

As repeat regions can cause artifacts in alignments, we repeated the analyses to specifically search for recombination events in the N‐terminal domains of the FLO genes. These domains contain several sites important for the recognition and binding of specific carbohydrates [[34], [35]], which is important for pathogenicity in Candida strains [[38], [39]], and for FLO characteristics in brewing strains [[28], [40]]. The altered carbohydrate‐binding properties due to intergenic recombination could therefore have significant phenotypic consequences. These analyses revealed extensive recombination between the N‐terminal domains (P‐value <10−16), especially in the FLO1‐like N‐terminal domains. The reticulate with the N‐terminal domains of the FLO1‐like subgroup (Fig 2D) confirms several recombination events between different N‐terminal FLO1 gene domains (P‐value <10−16).

The high frequency of recombination and lack of ancestral sequences or non‐recombined outgroups prevent identification of the ancestral FLO sequences that recombined into the present‐day alleles. Nevertheless, whereas it is impossible to discriminate between recombined and ancestral FLO genes, it is possible to identify groups of genes comprising the parents and the product of a recombination event, and to examine the recombination breakpoints (supplementary Fig S2 online). These analyses revealed several chimaeric adhesin sequences, including the previously described LgFLO1 gene. Further, two types of recombination events appear to occur between adhesins, both with the potential to have strong phenotypic effect. In the first type, recombination events occur outside the repeat regions, across small regions of microhomology in the N‐ or C‐terminal domain (supplementary Fig S3 online). Detailed analysis of these genes revealed that many recombination events in the N‐terminal domain occurred in‐between regions important for substrate binding [[35]]. Such events could subtly alter the strength and preference of substrate binding in the N‐terminal domain and therefore influence FLO. In the second group, recombination occurred across the central repeat domain. These recombination events lead to variation in the length and sequence of the repeats, which in turn can also result in new combinations of functional FLO domains.

Chimaeric adhesins also occur in pathogens

Previous studies in yeasts such as Candia albicans and Candida glabrata have speculated that the rapid phenotypic variation produced by chimaeric adhesins could contribute to pathogenicity in these species by avoiding the host immune response [[39], [41]]. To investigate this, we performed a similar in silico analysis on adhesins of pathogenic yeasts by collecting sequences both for the EPA (epithelial adhesins) genes of C. glabrata (11 sequences) and the ALS (agglutinin‐like sequences; 14 sequences) genes of C. albicans. Both cases revealed significant evidence for intergenic recombination (P‐value <10−16), suggesting that recombination between adhesins is a common occurrence across yeast strains.

Engineered chimaeric adhesins confer distinct phenotypes

To assess the functional implications of recombination events between paralogs, we mimicked recombination by constructing several chimaeric FLO genes, each combining the 5′ end of one FLO gene with the 3′ end of another (supplementary Table S3 online). We used three genes (FLO1, FLO10 and FLO11) from S288c that represent FLO gene classes with distinct FLO characteristics as parents for the chimaeric genes. The protein products of the engineered chimaera retain the traditional three‐domain structure of fungal adhesins while differing in N‐terminal domains, total length, number and positions of tandem repeats and glycosylation sites.

To assess the functionality of these engineered chimaera, we expressed each construct separately in the non‐flocculent and non‐adherent S288c strain. Expression levels (determined by quantitative PCR) were comparable for each of the constructs, with an average expression level around 75% of the ACT1 transcription level, which is comparable to the natural FLO gene expression in flocculent strains derived from the feral yeast EM93 [[33]].

We then determined flocculation strength, adhesion to diverse surfaces and cell wall hydrophobicity conferred by each of the chimaeric Flo proteins. Our results demonstrate that different domains contribute to different phenotypes and that the chimaeric adhesins confer phenotypes that differ from their parental adhesins (Fig 3; Table 2; supplementary Table S4 online for details). This demonstrates that recombination events between paralogs can generate new alleles that can display new combinations of phenotypes and/or variation in the degree of adhesion.

Figure 3.

Chimaeric adhesin confer new cell–cell adhesion phenotypes. To investigate the FLO (cell–cell adhesion) and agar‐adhesion properties of chimaeric adhesins, three natural adhesin genes (FLO1, FLO10 and FLO11) as well as several chimaeric adhesins consisting of the N‐terminal part of one natural adhesin, and the C‐terminal part of another were overexpressed in a strain that does not express any other adhesin gene (see Table 2; supplementary information online for details). (A) Ectopic recombination between FLO genes generates new chimaeric alleles that display a wide array of cell–cell adhesion (FLO) phenotypes. Strains expressing the natural FLO1 adhesin or any chimaeric protein carrying the N‐terminal part of FLO1 all show strong FLO. Strains displaying adhesins with a FLO10 N‐terminal domain show a broad range of FLO that depends on the nature of the central‐ and C‐terminal domain, whereas adhesins with a FLO11 N‐terminal domain confer weak or no FLO (for more information about the chimaeric adhesins see the supplementary Table S3 online). (B) The cell‐surface adhesion of natural and chimaeric adhesin genes was measured by expressing these genes in cells that do not express any other adhesin gene. The resulting transformants were grown for 6 days and subsequently washed under a gentle stream of water to estimate their propensity to stick to the agar surface. Strains displaying adhesins containing the central and C‐terminal part of FLO11 show strong adhesion and therefore resist washing with water (see methods for details), whereas expression of other adhesins show a wide array of intermediate or weak cell‐surface adhesion as measured with the plate‐washing assay. FLO, flocculation; WT, wild‐type.

View this table:
Table 2. Overview of adhesin phenotypes


Our results reveal an elegant mechanism by which gene duplicates might be used as a molecular toolbox for the generation of a large array of different new alleles and phenotypes. These findings also demonstrate that paralogs do not necessarily evolve completely independently through mutation. Instead, ectopic recombination events also contribute to the generation of sequence divergence in paralogs, which in turn propels evolutionary innovation [[1], [2], [3], [4], [5], [6], [7], [8]].

Intergenic recombination events are observed predominantly in subtelomeric gene families, which are known to harbour lifestyle‐specific genes, or in cell‐surface gene families, which are involved in interactions with the cells environment. Hence, this mechanism could provide yeasts with an ever‐changing reservoir of genes to quickly adapt and tune the way they interact with specific conditions and opportunities. Our results also confirm speculation that C. albicans and C. glabrata might have chimaeric adhesins [[39], [42]], suggesting that this allows pathogens to adapt to the host. This mechanism is also similar to those of some pathogenic protozoans, such as Plasmodium and Trypanosoma spp. where modular cell coat proteins continuously recombine to form new variants and avoid host immune system recognition [[16], [17], [43]]. Hence, recombination of cell‐surface and subtelomeric genes seems to be a common theme in (eukaryotic) microorganisms, yielding a substantial source of phenotypic variability with only a few genes. Moreover, as recombinations are clearly not limited to yeasts but have also been observed in the MHC class genes of vertebrates, it seems likely that similar mechanisms also exist in higher eukaryotes, including humans [[20]].

Our results show that in yeast, two distinct categories of gene families show enrichment for chimaeric allelles: families containing genes located near subtelomeres, and gene families encoding cell‐surface proteins. Importantly, this enrichment remains significant for both categories, even if all families belonging to the other category are removed from the analysis, indicating that the observed enrichment is independent. One main unresolved question is whether intergenic recombination occurs more frequently between subtelomeric and cell‐surface genes, or whether all paralog families show similar rates of recombination but show differences in the adaptive advantage of such recombination events. It is, for example, possible that ectopic recombination between non‐subtelomeric and non‐cell‐surface genes often leads to dysfunctional chimaera. Although it seems likely that both mechanisms might contribute to the enrichment of chimaeric alleles of cell‐surface and subtelomeric genes, further research is needed to investigate the underlying mechanism(s).


Paralog gene families were defined on the basis of relative pairwise blast scores between all known S288c proteins using the MCL algorithm (supplementary Dataset S1 online) [[21]]. For each family, the S288c sequences it comprises were blasted against a database containing 24 high‐quality Saccharomyces genomes (supplementary Table S1 online). Resulting sequences were retained if they were not 200 bp shorter than the shortest member of the family and were at least 600 bp long. More (partial) coding sequences for FLO genes in the Saccharomyces sensu stricto group were gathered from the publicly available NCBI nucleotide database and the ENA database of EBI (supplementary Table S2 online). Sequences in a family were aligned using the ClustalW server provided by EBI [[44]]. For the FLO genes, N‐ and C‐terminal sequence fragments were aligned seperately. The alignments were analysed using SplitsTree4 [[22]]. We chose to ignore gaps and used the neighbour‐joining algorithm to produce the trees [[45]]. The reticulate was generated using the NetworkNet algorithm [[46]]. Tests for significant evidence of recombination were performed using the PHI test provided by SplitsTree4 [[46]]. To check for statistically significant differences between the distributions of the PHI values, we performed a Komolgorov–Smirnov test in R. All yeast strains used in this study are listed in supplementary Table S5 online with primers used for construction, verification and quantitative PCR listed in supplementary Table S6 online. Culturing conditions for different experiments are described in supplementary information online. Strains and primers used for constructing reshuffled FLO genes are listed in supplementary Table S6 online. FLO tests were performed using the method of D'Hautcourt and Smart [[47]] with some modifications. Adhesive and invasive growth on agar were assessed by the plate‐washing assay [[48]]. Adhesion to plastics was tested in 96‐well flat‐bottom plates (MicrotestTM Flat bottom 96‐well plate, Becton Dickinson Labware) according to Reynolds and Fink [[29]] with minor modifications. Hydrophobicity of cells expressing the different (chimaeric) adhesins was measured by an adaptation of the method of Rosenberg [[49]].

Supplementary information is available at EMBO reports online (

Conflict of Interest

The authors declare that they have no conflict of interest.

Supplementary Information

Supplementary Information [embor2012157-sup-0001.pdf]

Supplementary Dataset S1 [embor2012157-sup-0002.xls]


The authors thank Verstrepen Lab and Centre of Microbial and Plant Genetics members for valuable advice and feedback. This research was supported by ERC Young Investigator grant 241426, KU Leuven IDO program, the Belgian Federal Science Policy Office and European Space Agency PRODEX program and the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT‐Vlaanderen, Belgium). KJV acknowledges support from ERC Stg 241426, VIB, KU Leuven, FWO Vlaanderen, the Odysseus program, EMBO YIP program and the AB InBev Baillet‐Latour foundation.

Author contributions: J.F.C. and S.E.V.M. designed, performed and analysed (in silico) experiments and wrote the manuscript. J.D. and C.A.B. performed in silico analyses and wrote the manuscript. M.G.G. performed experiments. L.D.M., J.M., T.W., K.V. helped with design and interpretation of results and wrote the manuscript. K.J.V. led the study, designed and interpreted experiments and wrote the manuscript.


This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.