Fermentation innovation through complex hybridization of wild and domesticated yeasts

The most common fermented beverage, lager beer, is produced by interspecies hybrids of the brewing yeast Saccharomyces cerevisiae and its wild relative S. eubayanus. Lager-brewing yeasts are not the only example of hybrid vigour or heterosis in yeasts, but the full breadth of interspecies hybrids associated with human fermentations has received less attention. Here we present a comprehensive genomic analysis of 122 Saccharomyces hybrids and introgressed strains. These strains arose from hybridization events between two to four species. Hybrids with S. cerevisiae contributions originated from three lineages of domesticated S. cerevisiae, including the major wine-making lineage and two distinct brewing lineages. In contrast, the undomesticated parents of these interspecies hybrids were all from wild Holarctic or European lineages. Most hybrids have inherited a mitochondrial genome from a parent other than S. cerevisiae, which recent functional studies suggest could confer adaptation to colder temperatures. A subset of hybrids associated with crisp flavour profiles, including both lineages of lager-brewing yeasts, have inherited inactivated S. cerevisiae alleles of critical phenolic off-flavour genes and/or lost functional copies from the wild parent through multiple genetic mechanisms. These complex hybrids shed light on the convergent and divergent evolutionary trajectories of interspecies hybrids and their impact on innovation in lager brewing and other diverse fermentation industries. Genomic analysis of Saccharomyces hybrids shows complex hybridization in strains used in beer fermentation and genetic changes associated with adaptation to cold temperature and the crisp flavour of lager beer.

H umans have been producing and consuming fermented beverages for thousands of years 1 . During this process, they have unwittingly shaped the evolutionary history of the microbes that are responsible for fermented products. The star of fermented beverage production is often Saccharomyces cerevisiae. Many studies have investigated the evolutionary impact of domestication in fermentation environments on the genomes of different lineages of this species [2][3][4][5][6][7][8][9][10][11][12][13] . These human-associated fermentation environments have also led to innovation through the hybridization of distantly related species.
Lager beers are made with hybrids between the distantly related species S. cerevisiae and S. eubayanus [14][15][16] . These hybrids combine unique properties from each: S. cerevisiae's carbon use and fermentation capabilities combined with S. eubayanus's cryotolerance to produce yeasts that could ferment well in the cold [17][18][19][20][21][22] . Other interspecies hybrids of Saccharomyces have been associated, both favourably and unfavourably, with diverse fermentations. S. cerevisiae × S. kudriavzevii hybrids are prized for their unique flavour profiles in beer and wine 23 . Conversely, hybrids and introgressed strains with large genomic contributions from S. eubayanus and S. uvarum, are viewed as contaminants in breweries due to the production of offflavours, while other strains have been associated with sparkling wine and cider fermentation 16,[24][25][26] . Although these previous studies have hinted at the complexity of fermentation hybrids, their focus on a handful of strains or a handful of loci has only given us a fleeting glimpse of the diversity of Saccharomyces hybrids, their total genomic compositions and their evolution.
Here we identified, sequenced and analysed the genomes of 122 interspecies hybrids and introgressed strains in the genus Saccharomyces to understand their origins and evolutionary innovations. This collection contains pairwise hybrids, as well as more complex hybrids and introgressed strains with three or four parent species. We show that all genomic contributions from S. cerevisiae have arisen out of three domesticated lineages of S. cerevisiae, while all other parents belonged to Holarctic or European wild lineages of their respective species. We also analysed inheritance of the mitochondrial genome (mtDNA) and the genetic events generating functional diversity in genes relevant to fermented beverages. The genomic complexity of these hybrids provides insight into their origins and evolutionary successes in human-associated fermentation environments.

Summary of interspecies hybrid types.
Here, we analysed the genome sequences of 122 interspecies hybrids and introgressed strains of Saccharomyces, 63 strains of which are newly sequenced here, more than doubling the number of previously published hybrid genomes (Supplementary Dataset 1). Collectively, industrial settings dominated the isolation origins of all hybrids; 86% (n = 105) were from beer, wine, cider, a distillery or other beverages ( . The lager-like hybrids were almost exclusively associated with beer ( Fig. 1b) and have genomic contributions that were consistent with previous observations in the two lineages, Saaz and Frohberg 27 . The S. cerevisiae × S. kudriavzevii strains were associated with beer and wine (Fig. 1b). They had considerable differences in S. kudriavzevii genomic content, suggesting that these hybrids are of variable ages and evolutionary histories. The S. eubayanus × S. uvarum hybrids and introgressed strains were the most variable, both in isolation environment and genomic contributions ( Fig. 1 and Supplementary Dataset 1). The wide range in genomic contributions in these strains was probably influenced by their ability to backcross due to the low but non-zero, spore viability of hybrids of these sister species 16 . These S. eubayanus × S. uvarum strains had the most total translocations (χ 2 = 1,250.1, adjusted P = 2.64 × 10 -15 ), as well as the most translocations shared with other hybrid types (χ 2 = 15.964, adjusted P = 0.0138) ( Supplementary Fig. 2). The shared nature of some of these translocations in hybrids with more than two parents suggests that S. eubayanus × S. uvarum introgressed strains further hybridized to produce some of the complex three or four parent species hybrids. Thus, these four types of hybrids each show unique dynamics in genome evolution and are used for different products that range from several regional niche beverages to the globally dominant beer style, lagers.
Wild parent populations. Three out of four of the species contributing to these hybrids (S. kudriavzevii, S. uvarum and S. eubayanus) have primarily been isolated from wild settings and have global distributions with populations that reflect their geography 28,29 . We used these established populations and phylogenomic and principal component analysis (PCA) approaches to evaluate the origins of these hybrids (Supplementary Text).
S. kudriavzevii has been isolated in Europe and Asia and consists of three described populations: Asia A, Asia B and Europe 23,30,31 . The S. kudriavzevii subgenomes of the hybrids all clustered with the European population as a monophyletic clade (Fig. 2a, Supplementary Fig. 3, Supplementary Datasets 2 and 3 and Supplementary Text). These findings show that these hybrids were drawn from a closely related lineage of the European population of S. kudriavzevii.
In S. eubayanus, analysis of both large and small contributions, showed that these hybrids and introgressed strains clustered with the Holarctic lineage of S. eubayanus (Fig. 2b, Supplementary  Fig. 5, Supplementary Datasets 2 and 5 and Supplementary Text). Our vastly expanded dataset suggests that the Holarctic lineage is the closest known relative of all industrially relevant S. eubayanus hybrids and introgressed strains. The array of hybrids observed here requires that multiple hybridization events occurred between this lineage and other species. We also analysed genetic diversity of the S. eubayanus contributions to industrial hybrids and introgressed strains (Supplementary Text). We found low nucleotide diversity in lager-like hybrids that shows that these widely used interspecies hybrids arose out of a narrow swath of S. eubayanus diversity, while the less frequently used hybrids and introgressed strains retained more nucleotide diversity.
S. uvarum has a parallel population structure to S. eubayanus 26,32 , with the exception of its increased isolation frequency in the Northern Hemisphere and the presence of pure strains isolated from Europe. Here we found that all contributions from S. uvarum arose out of the S. uvarum Holarctic lineage 26 . In contrast to our S. eubayanus findings, the S. uvarum subgenomes of these hybrids and introgressed strains were interspersed with pure wild strains (Fig. 2c and 8 and Supplementary Datasets 2, 7 and 8). These findings suggest that there have been multiple hybridization events and extensive backcrossing with wild lineages of S. uvarum, integrating wild diversity into these hybrids and leading to a diverse set of introgressed strains. Domesticated S. cerevisiae parent lineages. Of the species contributing to domesticated interspecies hybrids, S. cerevisiae has the most extensive datasets, including industrial yeasts 5,[8][9][10][11] . We recapitulated the previously described domesticated S. cerevisiae clades 8,9 and our 81 interspecies hybrids with S. cerevisiae contributions fell into three domesticated lineages: Wine, Ale/Beer1 and Beer2 (Fig. 2d, Supplementary Fig. 9 and Supplementary Datasets 2 and 9).
The S. cerevisiae × S. kudriavzevii hybrids grouped with both Beer2 and Wine. Strains with contributions from three or four parent species fell into both clades (Beer2 and Wine), suggesting that these complex hybrids originated stepwise through iterative hybridization (Supplementary Text).
Interestingly, the only hybrids we detected in the Ale/Beer1 group were the lager-brewing yeasts (Fig. 2d). The S. cerevisiae subgenomes of the Saaz and Frohberg lager-brewing lineages formed distinct clades and, although we identified more Frohberg strains, Frohberg genetic diversity was lower (Supplementary Text). To determine if there was a particular clade of Ale/Beer1 that was the closest known relative to lager-brewing hybrids, we performed a targeted analysis of just the Ale/Beer1 S. cerevisiae strains and lagerbrewing hybrids, (Supplementary Figs. 10 and 11, Supplementary Datasets 2 and 10 and Supplementary Text). Our concatenated phylogenomic analyses did not strongly support any recognized geographical clade of Ale/Beer1 S. cerevisiae strains as the closest outgroup to the lager-brewing yeasts. Our PCA analyses, which make no assumptions about consistent genome-wide signals, suggested several Stout beer, Wheat beer and mosaic strains as sharing the most ancestry with lager-brewing yeasts, rather than any clade affiliated with a geographic style ( Supplementary Fig. 9). Overall,   with 612 phased (for strains with >20,000 heterozygous sites) or unphased haplotypes and 21,222 SNPs from across the genome, rooted with the Taiwanese strain EN14S01 (removed for clarity). Previously identified wild lineages from West Africa, Malaysia, North America, Japan and the Philippines are included in the Wild Misc group 11,74 . The other lineages are named in a similar manner to previous studies on ale-brewing and Mediterranean Oak (MedOak) strains 8,9,71 . Bottom: Principal component projection for PC1 and PC2 (including EN14S01, which groups with Sake/Asian).
our analyses clearly show that lager strains belong to the Ale/Beer1 lineage of S. cerevisiae and suggest affinity with a new set of diverse beer yeasts but they do not support any known extant strain as the sole closest relative.
Collectively, our data and analyses conclusively show that there have been multiple interspecies hybridization events between different domesticated lineages of S. cerevisiae and wild strains from three other Saccharomyces species (Fig. 2d). The sheer number and diversity of hybrids analysed here shows that evolutionary and industrial innovation through hybridization has happened on a scale and with a complexity beyond what previous smaller scale studies have suggested. In these diverse hybrids, the domesticated S. cerevisiae subgenomes were probably preadapted with general industrial fermentation traits, while the wild parent probably contributed one or more traits advantageous in the specific new industrial fermentation niche being explored.
Mitochondrial genome inheritance. The classic example of yeast hybrid vigour comes from the cryotolerance of lager-brewing yeasts. S. eubayanus, S. kudriavzevii and S. uvarum are all known to tolerate much colder temperatures 33,34 and recent functional experiments have shown that the mtDNA plays a pivotal role in the cryotolerance of interspecies hybrids 17,35 . Strikingly, in our comprehensive dataset, most (94%) of the hybrids inherited mtDNA from another species, rather than the S. cerevisiae mtDNA (Fig. 3a).
We tested if the parent that donated the mtDNA was also the parent that contributed the most nuclear gene content. We used a logistic regression to determine if the same parent species contributed both the mtDNA and the most complete set of orthologues.
We found that this trend was generally true (P = 8.0 × 10 -6 ; Akaike information criterion, AIC = 83.75) but there were informative outliers (Fig. 3b). In particular, more than half of the hybrids with S. kudriavzevii nuclear contributions inherited the S. kudriavzevii mtDNA, despite the fact that the S. kudriavzevii nuclear contribution was never in the majority. This discrepancy could be due to a fitness advantage conferred by the S. kudriavzevii mtDNA in colder fermentations or it could be due to a fitness advantage conferred by the S. cerevisiae or other nuclear genomes 36,37 . Indeed, all outliers in our logistic regression analysis were in the direction of inheriting a cryotolerant parent's mtDNA. These findings suggest that the inheritance of a cryotolerant mtDNA allowed these hybrids to thrive in colder environments where pure S. cerevisiae strains struggle, providing evolutionary and genetic innovation that enabled new fermentation techniques, such as lager brewing.
Hundreds of nuclear-encoded proteins localize to the mitochondria 38 . This interaction can be a source of genetic incompatibilities between the nuclear and mtDNAs, several of which have been characterized in Saccharomyces interspecies hybrids [39][40][41] . Therefore, we tested whether mitochondrially localized, nuclear-encoded genes were retained more often than other genes encoded in the nuclear genome matching the mtDNA parent. We found that mitochondrially localized genes were retained in the same ratio as all other orthologues (P = 0.8612, odds ratio = 0.9653) (Supplementary Dataset 11 and Fig. 3c). Although these results suggest that mitochondrial localization is not the main cause of the correlation between nuclear and mtDNA content, some nuance is warranted. First, only a few mitochondrially localized genes have been implicated in mito-nuclear incompatibilities 39 Fig. 1a. b, Analysis of concordance between which mtDNA was inherited and which parent contributed the most complete set of orthologous genes. 'True' includes hybrids that inherited the most nuclear gene content from the same species as the mtDNA. 'False' includes hybrids with mtDNA that did not match the species that contributed the most nuclear gene content. Colours represent the mtDNA parent and shapes represent the largest nuclear genome contributor. The middle of the box plot corresponds to the median, the upper and lower limits are the 75th and 25th percentiles respectively and the whiskers extend to the largest or smallest value no greater than 1.5× the differences between the 75th and 25th percentiles. There was a significant correlation between the mtDNA parent and the largest nuclear genomic contributor (logistic regression P = 3.58 × 10 -8 , AIC = 118.21). Notably, the S. eubayanus × S. uvarum hybrids, which have often undergone many backcrossing events, follow this trend and are both cryotolerant species. c, Linear relationship of the number of 1:1:1:1 orthologues versus the number of nuclear-encoded, mitochondrially localized genes present in the subgenome that matches the mtDNA (linear regression P = 2.0 × 10 -16 , AIC = 1151.5). The inset shows the mean proportion of mitochondrially localized versus all other nuclear genes present in the subgenome that matches the mitochondrial parent (P = 0.8612, odds ratio = 0.9653). Here, the middle of the box plot corresponds to the mean, the upper and lower limits are one standard error from the mean and the whiskers extend to the maximum and minimum values.
not rely on protein localization could also play a role (for example, metabolite exchange between the mitochondria and cytoplasm). Perhaps more importantly, these hybrids have often lost whole chromosomes or regions containing hundreds of genes at a time through chromosome mis-segregation or mitotic recombination events 15 ; this restriction imposed by genetic linkage may prevent fine-scale retention or loss and obscure any signal driven by specific genes. Overall, from this dataset, we conclude that there is a strong correlation between the amount of nuclear and mitochondrial DNA contributed by each parent species but mitochondrially localized genes are not more affected than other genes.
Pan-genome analyses. In analysis of the core genome, we found that retention of 1:1:1:1 orthologues from the contributing parent genomes ranged from as few as 12 genes to almost complete sets of orthologues from all their parents (Supplementary Fig. 12 and Supplementary Dataset 12). On average, hybrids retained 56.2% of orthologues from the parent who contributed the least genomic material.
We performed de novo genome assemblies to analyse the genomic content that was not present in the parent reference genomes ( Supplementary Fig. 13). On average, hybrids had 47.7 kilobases (kb) of novel genomic content; the minimum was 2.2 kb and the maximum was 363.3 kb. In addition to novel content that may come from the pan-genomes of other the Saccharomyces species, we detected previously characterized content from prior S. cerevisiae pan-genome analyses, including horizontally transferred genes (Supplementary Text) 5,12,42 . When we searched this material for Saccharomyces-like genes for which we could assign a function, we found an enrichment in genes associated with sugar transport, including the Gene Ontology 43,44 terms: transporter activity (corrected P = 4.67 × 10 -8 ), sugar:proton symporter activity (corrected P = 6.04 × 10 -8 ), cation:sugar symporter activity (corrected P = 6.04 × 10 -8 ) and sugar transmembrane transporter activity (corrected P = 6.04 × 10 -8 ) (Supplementary Dataset 7). The enrichment of sugar transport genes in the novel content of these hybrids and introgressed strains is consistent with strong selection for these activities in industrial fermentation environments.   Fig. 4 | Hybrid inheritance and functionality of genes responsible for 4VG production. a-c, Retention of the regions where the adjacent PAD1 and FDC1 genes, which are both required for 4VG production, are located in each parent species, shown as 10-kb windows of ploidy estimates over last 100-kb of the chromosome. Gene locations are represented by black dotted lines. Higher opacity represents higher ploidy. Species colours are that same as in Fig. 1a. S.par, S. paradoxus; S.mik, S. mikatae. a, S.cer × S.kud hybrids: all strains inherited versions of both PAD1 and FDC1 from S.cer that are predicted to be functional, +|+ but they have lost the S.kud alleles. b, S.uva × S.eub hybrids: all strains inherited versions of PAD1 and FDC1, from either S.uva or S.eub, that are predicted to be functional, +|+. c, All lager strains have completely lost the region in the S.eub genome where these genes reside. Additionally, all Saaz strains have also completely lost the S.cer versions of these genes, Δ | Δ. All but two Frohberg strains have retained versions of PAD1 from S.cer that are predicted to be functional but inherited S.cer alleles of FDC1 that are predicted to be inactive due to a frameshift mutation, +| Ψ. d,e, Haplotype networks were built for the amino acid sequences for Fdc1 (d) and Pad1 (e). Coloured pies correspond to S.cer lineages, hybrids or wild species with size representing the number of strains with that haplotype. Non-S.cer nodes or groups of nodes are labelled by the species to which they correspond. Coloured clouds correspond to communities: red is mostly S.cer, blue is mostly non-S.cer (including S.eub and S.uva), yellow is mostly S.par and S.mik, green is mostly S.kud and grey is mostly loss-of-function alleles. Pseudogenes are marked as Ψ with additional information about the loss-of-function nucleotide and amino-acid changes. Dotted connections represent >100 amino acid differences.
Genes for maltotriose use. We took a more detailed look at the genes for maltotriose use because maltotriose is generally the second most abundant sugar in beer wort or malt extract and Saccharomyces strains that use it are relatively rare outside of domesticated ale-brewing strains [45][46][47][48] . Our analyses of lager-brewing yeasts suggest that both S. cerevisiae and S. eubayanus contributed genes encoding functional maltotriose transporters to the hybrids, including alleles of S. cerevisiae MTT1 and S. eubayanus AGT1 previously shown to be functional 18 (Fig. 5b and ). b, Summary of how lager-brewing yeasts acquired their unique trait profile. The two lager-brewing lineages, Saaz and Frohberg, arose out of hybridizations between domesticated S.cer ale strains and wild S.eub strains. The S.cer strains could use maltotriose (+), did not produce phenolic off-flavours (POF − ) and preferred warmer temperatures (Sun symbol), while the S.eub strains tolerated colder temperatures (snowflake symbol), could not use maltotriose (−) and produced phenolic off-flavours (POF + ). The two lager-brewing lineages inherited the S.eub mitochondrial genome (pink circle), which partly conferred cryotolerance. Both lineages also inherited maltotriose transporter genes from both parents (MTT1 from S.cer and SeAGT1 from S.eub). Finally, both lineages convergently became POF − through multiple distinct mechanisms, including preadaptation in the S. cerevisiae ale-brewing parent due to a mutated pseudogene (PAD1|fdc1Ψ in red), aneuploidy removing functional S. eubayanus genes (pad1Δ|fdc1Δ in pink) and translocations in all Saaz strains and some Frohberg strains (pad1Δ|fdc1Δ in red).
homologues in other interspecies hybrids and their parent species, which have yet to be explored functionally (Supplementary Dataset 14). We conclude that the complexity and diversity of maltose transporter genes across Saccharomyces species is extensive and may have provided a source of functional diversity to fermentation hybrids.
Phenolic off-flavour genes. The introduction of genes from wild strains, especially the mitochondrial genome and S. eubayanus AGT1, may have been key to cold fermentations but other genes probably negatively affected products. The aromatic 4-vinyl guaiacol (4VG) is perceived as a clove-like, phenolic or smoky flavour and considered an undesirable off-flavour in most beers. Lager beers are known for their crisp flavour profiles that lack appreciable 4VG, while wild strains of S. eubayanus and other species produce 4VG 49 . Two genes, PAD1 and FDC1, are essential for the production of 4VG 50 . Studies in ale-brewing yeast show that this trait is under strong domestication selection (Supplementary Text) but the genotypes of PAD1 and FDC1 across diverse interspecies hybrids already in use by industry have not been investigated, nor have the evolutionary genetic events leading to these genotypes. In our large hybrid dataset, we analysed both retention and predicted functionality of PAD1 and FDC1 alleles from their parent species (Fig. 4).
In both S. cerevisiae × S. kudriavzevii and S. eubayanus × S. uvarum hybrids and introgressed strains, we found both FDC1 and PAD1 alleles that were predicted to be functional (Supplementary Text). These findings may reflect selection for diverse flavours, which are desirable in niche Trappist-style beers made with S. cerevisiae × S. kudriavzevii. In contrast, S. eubayanus × S. uvarum are often viewed as contaminants in industrial brewing environments and production of 4VG could contribute to this perception.
In the lager-brewing hybrids, we found that all strains have lost the ability to produce 4VG but the mechanism of this loss differed between Saaz and Frohberg (Supplementary Text). The Frohberg lager strains probably inherited a loss-of-function FDC1 allele from their domesticated S. cerevisiae parent and functional PAD1 and FDC1 alleles from their S. eubayanus parent. These functional wild alleles were then lost through translocations, probably due to break-induced replication. In contrast, the Saaz lineage has completely lost both the S. cerevisiae and S. eubayanus alleles of these genes through aneuploidy, an evolutionary trajectory facilitated by the fact that these subtelomeric genes reside on different chromosomes in these two species. The end result is that both Saaz and Frohberg lagers lack substantial phenolic off-flavours and have a crisp flavour profile. Even though Saaz and Frohberg strains evolved this trait through different final mutations that removed functional S. eubayanus alleles, the pre-adaptation of the domesticated S. cerevisiae parent, which already lacked functional genes, played a critical role by limiting the number of mutations needed. The contrast between Saaz and Frohberg strains highlights that there are many potential evolutionary trajectories open to interspecies hybrids to achieve a domestication trait.

Conclusions
Here, we characterized the genomes of 122 interspecies yeast hybrids and introgressed strains, the largest dataset of its kind to date. These hybrids have complex genomes with contributions from two to four species: S. cerevisiae, S. kudriavzevii, S. uvarum and S. eubayanus (Fig. 5a). The hybrids with S. cerevisiae contributions all arose out of three domesticated S. cerevisiae lineages: the wine lineage and two distinct beer clades. In contrast, all the S. kudriavzevii, S. uvarum and S. eubayanus parents belonged to Holarctic or European wild lineages. Our results show how hybrid vigour also applies to microbes, with the domesticated S. cerevisiae parents providing genes and traits preadapted for industrial fermentations and the divergent species of Saccharomyces contributing new genes and traits that led to the successes of these hybrids in specific products.
First, the frequent retention of mitochondrial genomes from cryotolerant parents probably conferred a fitness advantage during cold fermentation (Fig. 5b). Second, although the S. cerevisiae genome is required for maltotriose use by hybrids, both S. eubayanus and S. cerevisiae contributed functional maltotriose transporter genes to lager-brewing yeasts. Third, phenolic off-flavour genes have been inactivated or eliminated from lager-brewing yeasts by multiple types of mutations (Fig. 5b), while these genes have been retained in yeasts that ferment products where phenolic off-flavour is prized.
Hundreds of years ago, an S. cerevisiae strain meeting an S. eubayanus strain sparked the cold-brewing revolution and crisp refreshing lagers eventually overtook the global beer market. This extensive genomic dataset reveals the genetic mechanisms and distinct evolutionary trajectories followed by hybrids and introgressed strains associated with fermentation products. These diverse hybrids and introgressed strains highlight how dynamic and complex fermentation innovation has cascaded down divergent and convergent evolutionary trajectories.

Methods
Strain selection and sequencing. The strains newly published here are from wild or beverage isolations, the Agricultural Research Service (ARS) NRRL collection (https://nrrl.ncaur.usda.gov) and commercially available sources. Supplementary Dataset 15 contains the full metadata for strains. Whole-genome Illlumina paired-end sequencing was done as previously described using either 2 × 100 or 2 × 250 reads 32,51 . These short-read data are available through the NCBI SRA database under the accession number PRJNA522928. Short-read data for published genomes were downloaded from NCBI; Supplementary Dataset 16 contains a full list of accession numbers and citations 8,9,11,16,26,30,32,42, .
Hybrid identification. We used sppIDer 73 , a hybrid detection and analysis pipeline, to identify new hybrids, pure species and reconfirm the species and hybrid identities of published data. For sppIDer, we used a combination reference genome that included all published genomes for all the Saccharomyces species 63,72,74,75 (https://www.yeastgenome.org and www.saccharomycessensustricto.org). For S. kudriavzevii, we used the genome from the Portuguese strain ZP591. As previously noted 72 , the published S. uvarum genome has the labels for chromosomes X and XII swapped, so we manually corrected them. We ran sppIDer with parameters set to identify genomic contributions >1% of the total genome. As sppIDer is reference genome-based, inheritance of regions not in the reference genome was not analysed. Therefore, interspecies hybrids with only minor or subtelomeric introgressions were missed with this method. We also detected some smaller introgressions through the pan-genome analyses (see below).
Hybrid isolation environment was classified on the basis of marketed product type for commercial strains; for published strains or strains from the ARS NRRL collection, we used available metadata supplied by the authors or depositors. Full details of hybrid isolation environment classification can be found in Supplementary Dataset 1. To determine if there was an association between hybrid type and isolation environment, we completed χ 2 analyses of hybrid by environment and of environment by hybrid with a Bonferroni multiple test correction in R. We limited this test to our most common (n > 15) hybrid types (S. cerevisiae × S. eubayanus, S. cerevisiae × S. kudriavzevii and S. eubayanus × S. uvarum) and the most common (n > 8) origins (beer, wine and fruit).
Whole-genome sequence assembly pipeline. Alinment and single nucleotide polymorphism (SNP) calling were done as described previously 32 . Briefly, short reads were mapped with bwa 'mem' to a concatenated reference genome of just the contributing parents. Reference genomes used for concatenation were the same as used for sppIDer. Samtools 'view' and 'sort' were then used to prepare the mapped reads with a mapping quality greater than 20 for SNP calling. PCR duplicates were removed with picard 'MarkDuplicates' and read groups were set with picard ' AddOrReplaceReadGroups' . SNPs were called with GATK's haplotype caller. Genome coverage per base pair was assessed with bedtools 'genomeCoverageBed' . Strain-specific FASTA files were created by replacing called SNPs in repeat-masked concatenated reference genomes. Variants called as indels were replaced with Ns. Regions of extremely high coverage (the 99.9th percentile of genome-wide coverage) were masked as Ns. Regions that do not exist in hybrids were masked as Ns and regions at low coverage (between 3× and 10×, depending on where the 10th percentile of the distribution of depth of coverage across the concatenated genomes fell) were masked as Ns. The strain-specific FASTAs for hybrid genomes were split into their component subgenomes to be analysed with pure strains.
Genomic completeness was estimated as the percentage of the reference genome with coverage above the low-coverage masking threshold. Ploidy was estimated across the combination genome in 10-kb windows. We used the R package modes (v.0.7.0) to analyse the distribution of depth of coverage and determine the antimodes, which correspond to a change in ploidy state. Some manual curation was needed for strains with 'smiley patterns' , a pattern of increased coverage at chromosome ends that has been noted in other depth-ofcoverage analyses 8,76 and may be due to chromatin structure 77 . For these strains, we used only the coverages that fell below the 95th percentile to estimate the antimodes and then assigned the distal ends to the largest ploidy estimated. We also visually checked and corrected rare instances when a 'smiley pattern' lowered the ploidy estimate for the middle of the chromosome. From this antimode analysis, we were able to assign each 10-kb window a ploidy value. The total DNA base-pair content contributed by each parent could then be estimated as the sum of each ploidy value multiplied by 10,000 and the number of windows with that ploidy value. Correcting this total DNA content per species by the total sum of all contributing species gave us a measure of total genomic content per species. Genomic contribution to a hybrid genome can be viewed as genomic content and genomic completeness. To estimate genomic completeness, we determined what percentage of a total parent subgenome had at least one haploid copy. To estimate genomic content, we took into account both completeness and ploidy across the combination of subgenomes. Full details on hybrid genome contributions can be found in Supplementary Dataset 1. For visualizations, we clustered the strains on the basis of ploidy estimated across the combination genome using Ward's method in the R package pvclust (v.2.0-0; ref. 78 ).
For each strain, we calculated the number of sites called as heterozygous with GATK for each subgenome. Strains with more than 20,000 heterozygous sites in any subgenome were phased with GATK's 'ReadBackedPhasing' command 79 , which can phase short regions of the genome on the basis of overlapping reads. We then split the output into two phases, one that retains more reference variants and one that contains more alternative variants in phased regions. This pseudo-phasing allowed us to investigate regions that are less similar to the published reference. We converted these phases into two strain-specific FASTA files and masked them for coverage as above. Both phases were included in all downstream analyses involving phased genomes, which are noted as 'strainID 1' or 'strainID 2' . 1:1:1:1 orthologues. To characterize the core genome of these hybrids, we first analysed the retention of 1:1:1:1 orthologues conserved in all four parent species and determined which parents contributed the least and most coding sequences to each hybrid. We identified genes that are orthologous across all parent genomes based on the annotations in the published gff files for each reference genome, which yielded a list of 3,856 genes. We used the coordinates to determine the coverage for each orthologue. Gene presence was noted if the mean coverage for that orthologue was >3×.
De novo genome assembly and pan-genome analyses. We assembled the hybrid genomes with the meta-assembler iWGS 80 and chose the best assembly based on the largest N50 score. All hybrids, except DBVPG6257, were successfully assembled and are available under GenBank BioProject PRJNA522928.
We mapped the short-read data back to these assembled genomes and used the sppIDer output to classify to which parent reference genome each short-read mapped. With this analysis, we determined which reads did not map to a parent reference genome but did assemble de novo into a contig of 1.5 kb or greater. We classified these regions as 'unmapped' and used a tBLASTx to search for S. cerevisiae-like genes using S288C open reading frames and retaining hits with e-value < 10 −10 . To determine if this set of genes identified in these novel assembled regions were enriched for any functions, we used GO Term Finder (v.0. 86; refs. 43,44 ). To determine the potential origin of these novel regions, we used a BLASTn search of the NCBI nucleotide database (v.5). The output of this was then parsed for number of hits with an e-value < 10 −10 . To determine the number of hits to different species, we completed χ 2 analyses with a Bonferroni multiple test correction in R.

Translocation identification.
To detect shared breakpoints and translocations, we use LUMPY 81 with the mapped short-read data. We masked for repetitive regions by excluding regions with coverage above twice the genome-wide mean. Each breakpoint call had to be supported by at least four reads to be included in downstream analyses. We parsed this output for species subgenome, hybrid type and the species pair between which the translocation was detected. We calculated the total number of called breakpoints, breakpoints that were shared in at least two hybrids of the same type and breakpoints that were shared in multiple hybrid types. We compared these different categories with χ 2 analyses and a Bonferroni multiple test correction in R.
We also identified translocations from the de novo assemblies. For this analysis, we used sppIDer results to assign regions of the de novo assemblies to a parent species. Some regions were unmapped with sppIDer, as noted above. Additionally, some regions had high coverage from multiple parents in the de novo assembly, where the donor species could not be unambiguously assigned; these regions are probably repetitive and difficult to assemble. Translocations were identified when regions that were >2 kb came from different donor species and were assembled with <100 base pairs (bp) of unmapped or ambiguous data separating them. On average, we identified 17 translocations per strain. From this output, we counted the number of translocations identified in each hybrid type, the donor species and the pair of species between which the translocations occurred. We compared hybrid type, species pair and individual species with a χ 2 analyses with a Bonferroni multiple test correction in R.
Mitochondrial genome analysis pipeline. We use mitoSppIDer 73 to determine the mtDNA parent for the hybrids. This analysis was done in a similar manner to the whole-genome sppIDer analysis, except that mtDNAs for each Saccharomyces species were used 72,82,83 , except S. jurei. GenBank accessions lacking full manuscripts included S. mikatae (KX707788) and S. kudriavzevii (KX707787).
To determine if the mtDNA parent was associated with retention of the nuclear genes, we performed a logistic regression in R. We used the set of 1:1:1:1 orthologues to determine which parent contributed the most complete set of orthologous genes. To determine if there was an enrichment for the retention of nuclear-encoded, mitochondrially interacting proteins, we used the set of genes products identified as localized to the mitochondria through the Yeast GFP Fusion Localization Database 38 . When we filtered for genes that were also 1:1:1:1 orthologues, our final list consisted of 459 genes. To determine if there was a linear relationship between retention of mitochondrially localized genes and all other orthologues, we performed a linear regression and, to determine if there were more mitochondrially localized genes retained compared to all other genes, we used a Fisher's exact test with a Bonferroni correction. Tests were performed in R.
Since past work has shown that reticulate evolution, introgression and horizontal gene transfers are widespread in Saccharomyces mtDNAs 84 , we wanted to explore the inheritance of mitochondrially encoded genes in more depth. Due in part to their high AT content (~85%), mtDNAs are often poorly covered using Illumina sequencing. In particular, intergenic regions and coding sequences with transposable elements (introns, homing endonucleases and GC clusters) can be difficult to assemble. To explore the phylogenetic relationships of these mtDNAs, we used a bait-prey bioinformatic method to pull out the read sequences of coding sequences. We used HybPiper 85 to pull out reads from the hybrid Illumina libraries that mapped to those mitochondrial genes using gene sequences from reference strains used in mitoSppIDer as baits. These extracted Illumina reads were aligned to the reference genes in Geneious (v.6.1.6; ref. 86 ) and manually assembled. We successfully covered six mitochondrial genes (COX2, COX3, ATP6, ATP8, ATP9 and 15S ribosomal RNA), which were used to construct the mitochondrial phylogenetic haplotype network. This unique set of unambiguously completed genes was concatenated (4.7 kb) by strain to produce the haplotype for each pure Saccharomyces or hybrid strain ( Supplementary Fig. 14). Haplotypes and haplotype frequencies for each strain were encoded as a nexus-formatted file for PopART v.1.7.2 (ref. 87 ). The haplotype network was reconstructed using the TCS method 88 . Strains were assigned to each haplotype using DnaSP v.5 (ref. 89 ). For some strains, we could not assemble the 15S rRNA gene because of low-coverage data. For these strains, we inferred their haplotype designation based on an analysis where we removed the 15S rRNA gene. This information is not included in Supplementary Fig. 14 but can be found in Supplementary Dataset 17.

Genes of functional interest analysis pipeline.
To assemble the sequences of genes relevant to brewing, we again used HybPiper 85 . To be included for further analyses, the assembled length had to be at least as long as the bait gene and had to have a minimum 10× depth of coverage. For the baits, we used either gene sequences from the S. cerevisiae strain S288C found on the Saccharomyces Genome Database (https://www.yeastgenome.org); from the S. eubayanus type strain, CBS12357 T (ref. 72 ); or the lager strain W34/70 (ref. 90 ). For the PAD1 analysis in S. eubayanus × S. uvarum hybrids, we used the PAD1 gene sequence from the S. uvarum reference genome, CBS7001 (ref. 63 ). To get precise gene locations for PAD1 and FDC1, we used a tBLASTn search of the S. eubayanus, S. kudriavzevii and S. uvarum reference genomes with the S. cerevisiae sequences for these genes as the query.
The assembled genes were aligned with MAFFT v.7 (ref. 91 ), allowing for reverse complementation. The alignments were manually trimmed to the protein-coding sequences. For PAD1 and FDC1, the alignments were conceptually translated to amino acid sequences and haplotype networks were built with a modified minimum-spanning network and visualized with iGraph 92 in R. The haplotype networks were split into communities as previously described 93 .
Pairwise distances between sequences were calculated using the trimmed MAFFT nucleotide sequence alignments and the P-distance method as implemented in MEGA-X 94 with the following parameters: substitutions to include Transitions+Transversions, assuming uniform rates among sites and using pairwise deletion of gaps. The percentage identity of hits to the bait sequence was organized by species and hybrid status was recorded in Supplementary Dataset 14, along with the origin of the bait gene and tallies of sequences whose translations were visually identified as being incomplete or containing premature stop codons.
Phylogenomic and population structure analyses. We masked regions with no coverage as Ns, which is interpreted as missing data by most tools; therefore, for downstream whole-genome analyses, we only included subgenomes that were >50% complete (major contributions). To include the minor contribution hybrids in the non-S. cerevisiae analyses, we used reduced genomes that were concatenations of the regions of the genome that existed in at least one minor hybrid (Supplementary Dataset 18). This procedure allowed us include strains with minor introgressions and only use regions of the genome that had been contributed by the minor parent. To balance some of our analyses for Saaz and Frohberg lager strains, we used a random subset of Frohberg strains to match the number of Saaz strains.
Phylogenomic trees were built with RAxML v.8.1 (ref. 95 ) using SNPs from the whole genome for the major analyses or the reduced genome for the minor analyses. Trees were visualized with iTOL 96 . The PCA analyses were done with the adegenet package in R 97 and visualized with ggPlot2 (ref. 98  To determine the location of a gene in a reference genome, we used tBLASTn. To align functionally relevant genes, we used MAFFT (v.7), to compare pairwise differences of genes we used MEGA-X. To make haplotype networks for PAD1 and FDC1, we used a custom R script and visualized with iGraph (1.2.3). We assembled the hybrid genomes with the meta-assembler iWGS (v1.1). To determine if this set of genes identified in these novel assembled regions were enriched for any functions, we used GO Term Finder (Version 0.86). To detect shared breakpoints and translocations, we use LUMPY. The versions of R used were 3.3.0, 3.5.0, and 3.5.2; the version of python used was 2.6.6, and we used BioPython (1.66). Custom R and python scripts have been deposited on gitHub.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
nature research | reporting summary

October 2018
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability References and accession numbers for the published data used can be found in Supplementary File 16. Short-read data newly published here is available through the NCBI SRA database under the BioProject accession number PRJNA522928. Assembled genomes published here are available under GenBank BioProject PRJNA522928. Custom R and Python scripts used for this publication can be found on GitHub (https://github.com/qlangdon/hybrid-ferment-invent).

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. All studies must disclose on these points even when the disclosure is negative.

Study description
Population and phylogneomic study of Saccharomcyes yeast hybrids from fermentation and wild sources.

Research sample
Whole genome data for pure Saccharomces cerevisiae, Saccharomyces kudriavzevii, Saccharomyces uvarum, and Saccharomcyes eubayanus and hybrids of these species. Yeast strains came from commercial sources, strain collections, and from isolations of fermented products. We analyzed both newly sequenced whole genome data and existing whole genome data for these species of hybrids of these species. With this dataset we recapitulated the know population structure of these species.

Sampling strategy
For new strains, we targeted commercial strains that are used for fermentations associated with known hybrids, including lagers and Belgian-style beers. For strains collections, we acquired any strains which were classified just to the genus level or with a hybrid species name "Saccharomcyes bayanus" and "Saccharomcyes pastorianus". We included all strains we acquired for this study; the sample size represents all available whole genome data for hybrids of these species.

Data collection N/A
Timing and spatial scale N/A

Data exclusions
In a subset of our analyses, we included fewer Frohberg lager strains because the low diversity in this group biased the PCA approaches. However, we also present the analyses with the full data set.