## Summary

The speed of adaptive evolution will depend on the speed at which helpful mutations are launched right into a inhabitants and the health results of these mutations. The speed of helpful mutations and their anticipated health results is commonly tough to empirically quantify. As these 2 parameters decide the tempo of evolutionary change in a inhabitants, the dynamics of adaptive evolution might allow inference of their values. Copy quantity variants (CNVs) are a pervasive supply of heritable variation that may facilitate speedy adaptive evolution. Beforehand, we developed a locus-specific fluorescent CNV reporter to quantify CNV dynamics in evolving populations maintained in nutrient-limiting situations utilizing chemostats. Right here, we use CNV adaptation dynamics to estimate the speed at which helpful CNVs are launched by means of de novo mutation and their health results utilizing simulation-based probability–free inference approaches. We examined the suitability of two evolutionary fashions: a typical Wright–Fisher mannequin and a chemostat mannequin. We evaluated 2 likelihood-free inference algorithms: the well-established Approximate Bayesian Computation with Sequential Monte Carlo (ABC-SMC) algorithm, and the lately developed Neural Posterior Estimation (NPE) algorithm, which applies a man-made neural community to straight estimate the posterior distribution. By systematically evaluating the suitability of various inference strategies and fashions, we present that NPE has a number of benefits over ABC-SMC and {that a} Wright–Fisher evolutionary mannequin suffices generally. Utilizing our validated inference framework, we estimate the CNV formation fee on the *GAP1* locus within the yeast *Saccharomyces cerevisiae* to be 10^{−4.7} to 10^{−4} CNVs per cell division and a health coefficient of 0.04 to 0.1 per technology for *GAP1* CNVs in glutamine-limited chemostats. We experimentally validated our inference-based estimates utilizing 2 distinct experimental strategies—barcode lineage monitoring and pairwise health assays—which give unbiased affirmation of the accuracy of our strategy. Our outcomes are in line with a helpful CNV provide fee that’s 10-fold better than the estimated charges of helpful single-nucleotide mutations, explaining the outsized significance of CNVs in speedy adaptive evolution. Extra usually, our examine demonstrates the utility of novel neural community–primarily based probability–free inference strategies for inferring the charges and results of evolutionary processes from empirical information with attainable purposes starting from tumor to viral evolution.

**Quotation: **Avecilla G, Chuong JN, Li F, Sherlock G, Gresham D, Ram Y (2022) Neural networks allow environment friendly and correct simulation-based inference of evolutionary parameters from adaptation dynamics. PLoS Biol 20(5):

e3001633.

https://doi.org/10.1371/journal.pbio.3001633

**Tutorial Editor: **J. Arjan G. M. de Visser, Wageningen College, NETHERLANDS

**Acquired: **September 30, 2021; **Accepted: **April 14, 2022; **Printed: ** Might 27, 2022

**Copyright: ** © 2022 Avecilla et al. That is an open entry article distributed below the phrases of the Artistic Commons Attribution License, which allows unrestricted use, distribution, and copy in any medium, supplied the unique writer and supply are credited.

**Information Availability: **All supply code for performing the analyses and reproducing the figures is on the market at https://github.com/graceave/cnv_sims_inference. The entire information could be discovered at https://osf.io/e9d5x/.

**Funding: **This work was supported partly by grants from the Israel Science Basis (552/19) and Minerva Stiftung Middle for Lab Evolution (YR), from the NIH (R01 GM134066 and R01 GM107466) (DG) and NSF (MCB1818234) (DG), from the NIH (R35 GM131824 and R01 AI136992) (GS), NSF GRFP (DGE1342536) (GA) and (DGE1839302) (JC), and NIH (T32 GM132037) (JC). The funders had no position in examine design, information assortment and evaluation, resolution to publish, or preparation of the manuscript.

**Competing pursuits: ** The authors have declared that no competing pursuits exist.

**Abbreviations::
**ABC,

Approximate Bayesian Computation; ABC-SMC,

Approximate Bayesian Computation with Sequential Monte Carlo; AIC,

Akaike data criterion; CNV,

copy quantity variant; DFE,

distribution of health results; HDI,

highest density interval; HDR,

highest density area; KDE,

kernel density estimate; MAF,

masked autoregressive move; MAP,

most a posteriori; NPE,

Neural Posterior Estimation; NSF,

neural spline move; RMSE,

root imply sq. error; WAIC,

broadly relevant data criterion

## Introduction

Evolutionary dynamics are decided by the provision fee of helpful mutations and their related health impact. As the mix of those 2 parameters determines the general fee of adaptive evolution, experimental strategies are required for individually estimating them. The health results of helpful mutations could be decided utilizing competitors assays [1,2], and mutation charges are usually estimated utilizing mutation accumulation or Luria–Delbrück fluctuation assays [1,3]. An alternate strategy to estimating each the speed and impact of helpful mutations entails quantifying the dynamics of adaptive evolution and utilizing statistical inference strategies to search out parameter values which are in line with the dynamics [4–7]. Approaches to measure the dynamics of adaptive evolution, quantified as modifications within the frequencies of helpful alleles, have change into more and more accessible utilizing both phenotypic markers [8] or high-throughput DNA sequencing [9]. Thus, inference strategies utilizing adaptation dynamics information maintain nice promise for figuring out the underlying evolutionary parameters.

Health results of helpful mutations comprise a portion of a distribution of health results (DFE). Figuring out the parameters of the DFE in a given situation is a central purpose of evolutionary biology. Sometimes, helpful mutations can happen at a number of loci and thus variance within the DFE displays genetic heterogeneity. Nonetheless, in some eventualities, a single locus is the dominant gene during which helpful mutations happen, such because the case of mutations within the *β-*lactamase gene underlying *β-*lactam antibiotic resistance or in *rpoB* underlying rifampicin resistance in micro organism [10,11]. On this case, completely different mutations on the identical locus confer differential helpful results leading to a locus-specific DFE. Sometimes, a DFE of helpful mutations encompasses each allelic and locus heterogeneity.

Copy quantity variants (CNVs) are outlined as deletions or amplifications of genomic sequences. Resulting from their excessive fee of formation and powerful health results, they’ll underlie speedy adaptive evolution in numerous eventualities starting from area of interest adaptation to speciation [12–16]. Within the brief time period, CNVs might present fast health advantages by altering gene dosage. Over longer evolutionary timescales, CNVs can present the uncooked materials for the technology of evolutionary novelty by means of diversification of various gene copies [17]. In consequence, CNVs are widespread in human populations [18–20], domesticated and wild populations of animals and vegetation [21–23], pathogenic and nonpathogenic microbes [24–27], and viruses [28–30]. CNVs could be each a driver and a consequence of cancers (reviewed in [31]).

Though critically essential to adaptive evolution, our understanding of the dynamics and reproducibility of CNVs in adaptive evolution is poor. Particularly, key evolutionary properties of CNVs, together with their fee of formation and health results, are largely unknown. As with different courses of genomic variation, CNV formation is a comparatively uncommon occasion, occurring at sufficiently low frequencies to make experimental measurement difficult. Estimates of de novo CNV charges are derived from oblique and imprecise strategies, and even when genome-wide mutation charges are straight quantified by mutation accumulation research and whole-genome sequencing, estimates depend upon each genotype and situation [3] and differ by orders of magnitude [32–39].

Health results of CNVs differ relying on gene content material, genetic background, and the atmosphere. In evolution experiments in lots of methods, CNVs come up repeatedly in response to robust choice [40–47], in line with robust helpful health results. A number of of those research measured health of clonal isolates containing CNVs and reported choice coefficients starting from −0.11 to 0.6 [40,47,48]. Nonetheless, the health of lineages containing CNVs varies between isolates even inside research, which may very well be as a result of further heritable variation or to variations in health between several types of CNVs (e.g., aneuploidy versus single-gene amplification).

Because of the problem of empirically measuring charges and results of helpful mutations throughout many genetic backgrounds, situations, and kinds of mutations, researchers have tried to deduce these parameters from population-level information utilizing evolutionary fashions and Bayesian inference [5,6,49]. This strategy has a number of benefits. First, model-based inference offers estimations of interpretable parameters and the chance to match a number of fashions. Second, the diploma of uncertainty related to a degree estimate could be quantified. Third, a posterior distribution over mannequin parameters permits exploration of parameter mixtures which are in line with the noticed information, and posterior distributions can present perception into sure relationships between parameters [50]. Fourth, posterior predictions could be generated utilizing the mannequin and both in comparison with the information or used to foretell the end result of differing eventualities.

Normal Bayesian inference requires a probability perform, which supplies the likelihood of acquiring the noticed information given some values of the mannequin parameters. Nonetheless, for a lot of evolutionary fashions, such because the Wright–Fisher mannequin, the probability perform is analytically and/or computationally intractable. Probability-free simulation-based Bayesian inference strategies that bypass the probability perform, akin to Approximate Bayesian Computation (ABC; [51]), have been developed and used extensively in inhabitants genetics [52,53], ecology and epidemiology [54,55], cosmology [56], in addition to experimental evolution [4,6,57–59]. The best type of likelihood-free inference is rejection ABC [60,61], during which mannequin parameter proposals are sampled from a previous distribution, simulations are generated primarily based on these parameter proposals, and simulated information are in comparison with empirical observations utilizing abstract statistics and a distance perform. Proposals that generate simulated information with a distance lower than an outlined tolerance threshold are thought of samples from the posterior distribution and might due to this fact be used for its estimation. Environment friendly sampling strategies have been launched, specifically Markov chain Monte Carlo [62] and Sequential Monte Carlo (SMC) [63], which iteratively choose proposals primarily based on earlier parameters samples in order that areas of the parameter area with greater posterior density are explored extra typically. A shortcoming of ABC is that it requires abstract statistics and a distance perform, which can be tough to decide on appropriately and compute effectively, particularly when utilizing high-dimensional or multimodal information, though strategies have been developed to handle this problem [52,64,65].

Not too long ago, new inference strategies have been launched that straight approximate the probability or the posterior density perform utilizing deep neural density estimators—synthetic neural networks that approximate density features. These strategies, which have lately been utilized in neuroscience [50], inhabitants genetics [66], and cosmology [67], forego the abstract and distance features, can use information with greater dimensionality, and carry out inference extra effectively [50,67,68].

Regardless of being initially developed to investigate inhabitants genetic information, e.g., to deduce parameters of the coalescent mannequin [60–63], likelihood-free strategies have solely been utilized in a small variety of experimental evolution research. Hegreness and colleagues [5] estimated the speed and imply health impact of helpful mutations in *Escherichia coli*. They carried out 72 replicates of a serial dilution evolution experiment, beginning with equal frequencies of two strains that differ solely in a fluorescent marker in a putatively impartial location and allowed them to evolve over 300 generations. Following the marker frequencies, they estimated from every experimental replicate 2 abstract statistics: the time when a helpful mutation begins to unfold within the inhabitants and the speed at which its frequency will increase. They then ran 500 simulations of an evolutionary mannequin utilizing a grid of mannequin parameters to provide a theoretical distribution of abstract statistics. Lastly, they used the one-dimensional Kolmogorov–Smirnov distance between the empirical and theoretical abstract statistic distributions to evaluate the inferred parameters. Barrick and colleagues [6] additionally inferred the speed and imply health impact from related serial dilution experiments utilizing a unique evolutionary mannequin carried out with a τ-leap stochastic simulation algorithm. They used the identical abstract statistics however utilized the two-dimensional Kolmogorov–Smirnov distance perform to raised account for dependence between the abstract statistics. de Sousa and colleagues [69] additionally centered on evolutionary experiments with 2 impartial markers. Their mannequin included 3 parameters: the helpful mutation fee and the two parameters of a Gamma distribution for the health results of helpful mutations. They launched a brand new abstract statistic that makes use of each the marker frequency trajectories and the inhabitants imply health trajectories (measured utilizing competitors assays). They summarized these information by creating histograms of the frequency values and health values for every of 6 time factors. This resulted in 66 abstract statistics necessitating the applying of a regression-based methodology to cut back the dimensionality of the abstract statistics and obtain better effectivity [65,69]. An easier strategy was taken by Harari and colleagues [49], who used a rejection ABC strategy to estimate a single mannequin parameter, the endoreduplication fee, from evolutionary experiments with yeast. They used the frequency dynamics of three genotypes (haploid and diploid homozygous and heterozygous on the *MAT* locus) with out a abstract statistic. The gap between the empirical outcomes and 100 simulations was computed because the imply absolute error. Not too long ago, Schenk and colleagues [69] inferred the imply mutation fee and health impact for 3 courses of mutations from serial dilution experiments at 2 completely different inhabitants sizes, which they sequenced on the finish of the experiment. They used a Wright–Fisher mannequin to simulate the frequency of fastened mutations in every class and used a neural community strategy to estimate the parameters that finest match their information. These prior research level to the potential of simulation-based inference.

Beforehand, we developed a fluorescent CNV reporter system within the budding yeast, *Saccharomyces cerevisiae*, to quantify the dynamics of de novo CNVs throughout adaptive evolution [48]. Utilizing this technique, we quantified CNV dynamics on the *GAP1* locus, which encodes a basic amino acid permease, in nitrogen-limited chemostats for over 250 generations in a number of populations. We discovered that *GAP1* CNVs reproducibly come up early and sweep by means of the inhabitants. By combining the *GAP1* CNV reporter with barcode lineage monitoring and whole-genome sequencing, we discovered that 10^{2} to 10^{4} unbiased CNV-containing lineages comprising numerous constructions compete inside populations.

On this examine, we estimate the formation fee and health impact of *GAP1* CNVs. We examined each ABC-SMC [70] and a neural density estimation methodology, Neural Posterior Estimation (NPE) [71], utilizing a classical Wright–Fisher mannequin [72] and a chemostat mannequin [73]. Utilizing simulated information, we examined the utility of the completely different evolutionary fashions and inference strategies. We discover that NPE has higher efficiency than ABC-SMC. Though a extra complicated mannequin has improved efficiency, the easier and extra computationally environment friendly Wright–Fisher mannequin is acceptable in most eventualities. We validated our strategy by comparability to 2 completely different experimental strategies: lineage monitoring and pairwise health assays. We estimate that in glutamine-limited chemostats, helpful *GAP1* CNVs are launched at a fee of 10^{−4.7} to 10^{−4} per cell division and have a range coefficient of 0.04 to 0.1 per technology. NPE is more likely to be a helpful methodology for inferring evolutionary parameters throughout quite a lot of eventualities, together with tumor and viral evolution, offering a strong strategy for combining experimental and computational strategies.

## Outcomes

In a earlier experimental evolution examine, we quantified the dynamics of de novo CNVs in 9 populations utilizing a prototrophic yeast pressure containing a fluorescent *GAP1* CNV reporter. [48]. Populations have been maintained in glutamine-limited chemostats for over 250 generations and sampled each 8 to twenty generations (25 time factors in complete) to find out the proportion of cells containing a *GAP1* CNV utilizing move cytometry (populations gln_01-gln_09 in **Fig 1A**). In the identical examine, we additionally carried out 2 replicate evolution experiments utilizing the fluorescent *GAP1* CNV reporter and lineage-tracking barcodes quantifying the proportion of the inhabitants with a *GAP1* CNV at 32 time factors (populations bc01-bc02 in **Fig 1A**) [48]. We used interpolation to match time factors between these 2 experiments (**S1 Fig**) leading to a dataset comprising the proportion of the inhabitants with a *GAP1* CNV at 25 time factors in 11 replicate evolution experiments. On this examine, we examined whether or not the noticed dynamics of CNV-mediated evolution present a way of inferring the underlying evolutionary parameters.

Fig 1. Empirical information and evolutionary fashions.

**(A)** Estimates of the proportion of cells with *GAP1* CNVs for 11 *S*. *cerevisiae* populations containing both a fluorescent *GAP1* CNV reporter (gln_01 to gln_09) or a fluorescent *GAP1* CNV reporter and lineage monitoring barcodes (bc01 and bc02) evolving in glutamine-limited chemostats, from [48]. **(B)** In our fashions, cells with the ancestral genotype (*X*_{A}) may give rise to cells with a *GAP1* CNV (*X*_{C}) or different helpful mutation (*X*_{B}) at charges δ_{C} and δ_{B}, respectively. **(C)** The WF mannequin has discrete, nonoverlapping generations and a continuing inhabitants measurement. Allele frequencies within the subsequent technology change from the earlier technology as a result of mutation, choice, and drift. **(D)** Within the chemostat mannequin, medium containing an outlined focus of a growth-limiting nutrient (S_{0}) is added to the tradition at a continuing fee. The tradition, containing cells and medium, is eliminated by steady dilution at fee *D*. Upon inoculation, the variety of cells within the development vessel will increase and the limiting-nutrient focus decreases till a gentle state is reached (crimson and blue curves in inset). Throughout the development vessel, cells develop in steady, overlapping generations present process mutation, choice, and drift. Information and code required to generate **A** could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. CNV, copy quantity variant; WF, Wright–Fisher.

### Overview of evolutionary fashions

We examined 2 fashions of evolution: the classical Wright–Fisher mannequin [72] and a specialised chemostat mannequin [73]. Beforehand, it has been proven {that a} single efficient choice coefficient could also be ample to mannequin evolutionary dynamics in populations present process adaptation [5]. Subsequently, we concentrate on helpful mutations and assume a single choice coefficient for every class of mutation. In each fashions, we begin with an isogenic inhabitants during which *GAP1* CNV mutations happen at a fee δ_{C} and different helpful mutations happen at fee δ_{B} (**Fig 1B**). In our simulations, cells can purchase solely a single helpful mutation, both a CNV at *GAP1* or another helpful mutation (i.e., single nucleotide variant, transposition, diploidization, or CNV at one other locus). In all simulations (aside from sensitivity evaluation, see the “Inference from empirical evolutionary dynamics” part), the formation fee of helpful mutations aside from *GAP1* CNVs was fastened at δ_{B} = 10^{−5} per genome per cell division, and the choice coefficient was fastened at *s*_{B} = 0.001, primarily based on estimates from earlier experiments utilizing yeast in a number of situations [74–76]. Our purpose was to deduce the *GAP1* CNV formation fee, δ_{C}, and *GAP1* CNV choice coefficient, *s*_{C}.

The two evolutionary fashions have a number of distinctive options. Within the Wright–Fisher mannequin, the inhabitants measurement is fixed, and every technology is discrete. Subsequently, genetic drift is effectively modeled utilizing multinomial sampling (**Fig 1C**). Within the chemostat mannequin [73], recent medium is added to the expansion vessel at a continuing fee and medium, and cells are faraway from the expansion vessel on the identical fee leading to steady dilution of the tradition (**Fig 1D**). People are randomly faraway from the inhabitants by means of the dilution course of, no matter health, in a fashion analogous to genetic drift. Within the chemostat mannequin, we begin with a small preliminary inhabitants measurement and a excessive preliminary focus of the growth-limiting nutrient. Following inoculation, the inhabitants measurement will increase and the growth-limiting nutrient focus decreases till a gentle state is attained that persists all through the experiment. As generations are steady and overlapping within the chemostat mannequin, we use the Gillespie algorithm with τ-leaping [77] to simulate the inhabitants dynamics. Development parameters within the chemostat are primarily based on experimental situations through the evolution experiments [48] or taken from the literature (**Desk 1**).

### Overview of inference methods

We examined 2 likelihood-free Bayesian strategies for joint inference of the *GAP1* CNV formation fee and the *GAP1* CNV health impact: Approximate Bayesian Computation with Sequential Monte Carlo (ABC-SMC) [63] and NPE [78–80]. We used the proportion of the inhabitants with a *GAP1* CNV at 25 time factors because the noticed information (**Fig 1A**). For each strategies, we outlined a log-uniform prior distribution for the CNV formation fee starting from 10^{−12} to 10^{−3} and a log-uniform prior distribution for the choice coefficient starting from 10^{−4} to 0.4.

We utilized ABC-SMC (**Fig 2A**), carried out within the Python bundle *pyABC* [70]. We used an adaptively weighted Euclidean distance perform to match simulated information to noticed information. Thus, the gap perform adapts over the course of the inference course of primarily based on the quantity of variance at every time level [81]. The variety of samples drawn from the proposal distribution (and due to this fact variety of simulations) is modified at every iteration of the ABC-SMC algorithm utilizing the adaptive inhabitants technique, which is predicated on the form of the present posterior distribution [82]. We utilized bounds on the utmost variety of samples used to approximate the posterior in every iteration; nonetheless, the entire variety of samples (simulations) utilized in every iteration is bigger as a result of not all simulations are accepted for posterior estimation (see **Strategies**). For every commentary, we carried out ABC-SMC with a number of iterations till both the acceptance threshold (ε = 0.002) was reached or till 10 iterations had been accomplished. We carried out inference on every commentary independently 3 occasions. Though we discuss with completely different observations belonging to the identical “coaching set,” a unique ABC-SMC process have to be carried out for every commentary.

Fig 2. Inference strategies and efficiency evaluation.

**(A)** When utilizing ABC-SMC, within the first iteration, a proposal for the parameters δ_{C} (*GAP1* CNV formation fee) and *s*_{C} (*GAP1* CNV choice coefficient) is sampled from the prior distribution. Simulated information are generated utilizing both a WF or chemostat mannequin and the present parameter proposal. The gap between the simulated information and the noticed information is computed, and the proposed parameters are weighted by this distance. These weighted parameters are used to pattern the proposed parameters within the subsequent iteration. Over many iterations, the weighted parameter proposals present an more and more higher approximation of the posterior distribution of δ_{C} and *s*_{C} (tailored from [68]). **(B)** In NPE, simulated information are generated utilizing parameters sampled from the prior distribution. From the simulated information and parameters, a density-estimating neural community learns the joint density of the mannequin parameters and simulated information (the “amortized posterior”). The community then evaluates the conditional density of mannequin parameters given the noticed information, thus offering an approximation of the posterior distribution of δ_{C} and *s*_{C} (tailored from [50,68].) **(C)** Evaluation of inference efficiency. The 50% and 95% HDRs are proven on the joint posterior distribution with the true parameters and the MAP parameter estimates. We examine the true parameters to the estimates by their log ratio. We additionally generate posterior predictions (sampling 50 parameters from the joint posterior distribution and utilizing them to simulate frequency trajectories, ⍴_{i}), which we examine to the commentary, o_{i}, utilizing the RMSE and the correlation coefficient. ABC-SMC, Approximate Bayesian Computation with Sequential Monte Carlo; CNV, copy quantity variant; HDR, highest density area; MAP, most a posteriori; NPE, Neural Posterior Estimation; RMSE, root imply sq. error; WF, Wright–Fisher.

We utilized NPE (**Fig 2B**), carried out within the Python bundle *sbi* [71], and examined 2 specialised normalizing flows as density estimators: a masked autoregressive move (MAF) [83] and a neural spline move (NSF) [84]. The normalizing move is used as a density estimator to “be taught” an amortized posterior distribution, which might then be evaluated for particular observations. Thus, amortization permits for analysis of the posterior for every new commentary with out the necessity to retrain the neural community. To check the sensitivity of our inference outcomes on the set of simulations used to be taught the amortized posterior, we skilled 3 unbiased amortized networks with completely different units of simulations generated from the prior distribution and in contrast our ensuing posterior distributions for every commentary. We discuss with inferences made with the identical amortized community as having the identical “coaching set.”

### NPE outperforms ABC-SMC

To check the efficiency of every inference methodology and evolutionary mannequin, we generated 20 simulated artificial observations for every mannequin (Wright–Fisher or chemostat) over 4 mixtures of CNV formation charges and choice coefficients, leading to 40 artificial observations (i.e., 5 simulated observations per mixture of mannequin, δ_{C}, and *s*_{C}). We discuss with the parameters that generated the artificial commentary because the “true” parameters. For every artificial commentary, we carried out inference utilizing every methodology 3 occasions. Inference was carried out utilizing the identical evolutionary mannequin as that used to generate the commentary. We discovered that NPE utilizing NSF because the density estimator was superior to NPE utilizing MAF, and, due to this fact, we report outcomes utilizing NSF in the primary textual content (outcomes utilizing MAF are in **S2 Fig**).

For every inference methodology, we plotted the joint posterior distribution with the 50% and 95% highest density areas (HDR) [85] demarcated (**Fig 2C**, **S1 Information** in https://doi.org/10.17605/OSF.IO/E9D5X). The true parameters are anticipated to be coated by these HDRs no less than 50% and 95% of the time, respectively. We additionally computed the marginal 95% highest density intervals (HDIs) [85] utilizing the marginal posterior distributions for the *GAP1* CNV choice coefficient and *GAP1* CNV formation fee. We discovered that the true parameters have been throughout the 50% HDR in half or extra of the exams (averaged over 3 coaching units) throughout a variety of parameter values except ABC-SMC utilized to the Wright–Fisher mannequin when the *GAP1* CNV formation fee (δ_{C} = 10^{−7}) and choice coefficient (*s*_{C} = 0.001) have been each low (**Fig 3A**). The true parameters have been throughout the 95% HDR in 100% of exams (**S1 Information** in https://doi.org/10.17605/OSF.IO/E9D5X). The width of the HDI is informative in regards to the diploma of uncertainty related to the parameter estimation. The HDIs for each health impact and formation fee are usually smaller when inferring with NPE in comparison with ABC-SMC, and this benefit of NPE is extra pronounced when the CNV formation fee is excessive (δ_{C} = 10^{−5}) (**Fig 3B and 3C**).

Fig 3. Efficiency evaluation of inference strategies utilizing simulated artificial observations.

The determine exhibits the outcomes of inference on 5 simulated artificial observations utilizing both the WF or chemostat (Chemo) mannequin per mixture of health impact *s*_{C} and formation fee δ_{C}. Simulations and inference have been carried out utilizing the identical mannequin. For NPE, every coaching set corresponds to an independently amortized posterior distribution skilled on a unique set of 100,000 simulations, with which every artificial commentary was evaluated to provide a separate posterior distribution. For ABC-SMC, every coaching set corresponds to unbiased inference procedures on every commentary with a most of 10,000 complete simulations accepted for every inference process and a stopping standards of 10 iterations or ε < = 0.002, whichever happens first. **(A)** The p.c of true parameters coated by the 50% HDR of the inferred posterior distribution. The bar top exhibits the common of three coaching units. Horizontal line marks 50%. **(B, C)** Distribution of widths of 95% HDI of the posterior distribution of the health impact *s*_{C} (B) and CNV formation fee δ_{C} (C), calculated because the distinction between the 97.5 percentile and a pair of.5 percentile, for every individually inferred posterior distribution. **(D)** Log ratio of MAP estimate to true parameter for *s*_{C} and δ_{C}. Be aware the completely different y-axis ranges. Grey horizontal line represents a log ratio of zero, indicating an correct MAP estimate. **(E)** Imply and 95% confidence interval of RMSE of fifty posterior predictions in comparison with the artificial commentary from which the posterior was inferred. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. ABC-SMC, Approximate Bayesian Computation with Sequential Monte Carlo; CNV, copy quantity variant; HDI, highest density interval; HDR, highest density area; MAP, most a posteriori; NPE, Neural Posterior Estimation; RMSE, root imply sq. error; WF, Wright–Fisher.

We computed the utmost a posteriori (MAP) estimate of the *GAP1* CNV formation fee and choice coefficient by figuring out the mode (i.e., argmax) of the joint posterior distribution, and computed the log ratio of the MAP relative to the true parameters. We discover that the MAP estimate is near the true parameter (i.e., the log ratio is near zero) when the choice coefficient is excessive (*s*_{C} = 0.1), whatever the mannequin or methodology, and far of the error is as a result of formation fee estimation error (**Fig 3D**). Usually, the MAP estimate is inside an order of magnitude of the true parameter (i.e., the log ratio is lower than 1), besides when the formation fee and choice coefficient are each low (δ_{C} = 10^{−7}, *s*_{C} = 0.001); on this case, the formation fee was underestimated as much as 4-fold, and the choice coefficient was barely overestimated (**Fig 3D**). In some instances, there are substantial variations in log ratio between coaching units utilizing NPE; nonetheless, this variation in log ratio is normally lower than the variation within the log ratio when performing inference with ABC-SMC. Total, the log ratio tends to be nearer to zero (i.e., estimate near true parameter) when utilizing NPE (**Fig 3D**).

We carried out posterior predictive checks by simulating *GAP1* CNV dynamics utilizing the MAP estimates in addition to 50 parameter values sampled from the posterior distribution (**S1 Information** in https://doi.org/10.17605/OSF.IO/E9D5X). We computed each the foundation imply sq. error (RMSE) and the correlation coefficient between posterior predictions and the commentary to measure the prediction accuracy (**Fig 3E, S3 Fig**). We discover that the RMSE posterior predictive accuracy of NPE is much like, or higher than, that of ABC-SMC (**Fig 3E**). The predictive accuracy quantified utilizing correlation was near 1 for all instances besides when *GAP1* CNV formation fee and choice coefficient are each low (*s*_{C} = 0.001 and δ_{C} = 10^{−7}) (**S3 Fig**).

We carried out mannequin comparability utilizing each Akaike data criterion (AIC), computed utilizing the MAP estimate, and broadly relevant data criterion (WAIC), computed over the whole posterior distribution [86]. Decrease values suggest greater predictive accuracy and a distinction of two is taken into account important (**S4 Fig**) [87]. We discover related outcomes for each standards: NPE with both mannequin have related values, though the worth for Wright–Fisher is usually barely decrease than the worth for the chemostat mannequin. When *s*_{C} = 0.1, the worth for NPE is persistently and considerably decrease than for ABC-SMC. When δ_{C} = 10^{−5} and *s*_{C} = 0.001, the worth for NPE with the Wright–Fisher mannequin is considerably decrease than that for ABC-SMC, whereas the NPE with the chemostat mannequin will not be. The distinction between any mixture of mannequin and methodology was insignificant for δ_{C} = 10^{−7} and *s*_{C} = 0.001. Subsequently, NPE is comparable or higher than ABC-SMC utilizing both evolutionary mannequin and for all examined mixtures of *GAP1* CNV formation fee and choice coefficient, and we additional confirmed the generality of this development utilizing the Wright–Fisher mannequin and eight further parameter mixtures (**S5 Fig**).

We carried out NPE utilizing 10,000 or 100,000 simulations to coach the neural community and located that growing the variety of simulations didn’t considerably scale back the MAP estimation error, however did are likely to lower the width of the 95% HDIs for each parameters (**S6 Fig**). Equally, we carried out ABC-SMC with per commentary most accepted parameter samples (i.e., “particles” or “inhabitants measurement”) numbers of 10,000 and 100,000, which correspond to growing variety of simulations per inference process, and located that growing the price range decreases the widths of the 95% HDIs for each parameters (**S6 Fig**). Total, amortization with NPE allowed for extra correct inference utilizing fewer simulations similar to much less computation time (**S7 Fig**).

### The Wright–Fisher mannequin is appropriate for inference utilizing chemostat dynamics

Whereas the chemostat mannequin is a extra exact description of our evolution experiments, each the mannequin itself and its computational implementation have some drawbacks. First, the mannequin is a stochastic steady time mannequin carried out utilizing the τ-leap methodology [77]. On this methodology, time is incremented in discrete steps and the variety of stochastic occasions that happen inside that point step is sampled primarily based on the speed of occasions and the system state on the earlier time step. For correct stochastic simulation, occasion charges and possibilities have to be computed at every time step, and time steps have to be small enough. This incurs a heavy computational value as time steps are significantly smaller than one technology, which is the time step used within the easier Wright–Fisher mannequin. Furthermore, the chemostat mannequin itself has further parameters in comparison with the Wright–Fisher mannequin, which have to be experimentally measured or estimated.

The Wright–Fisher mannequin is extra basic and extra computationally environment friendly than the chemostat mannequin (**S1 Desk**). Subsequently, we investigated if it may be used to carry out correct inference with NPE on artificial observations generated by the chemostat mannequin. By assessing how typically the true parameters have been coated by the HDRs, we discovered that the Wright–Fisher is an effective sufficient approximation of the total chemostat dynamics when choice is weak (*s*_{C} = 0.001) (**S8 Fig**), and it performs equally to the chemostat mannequin in parameter estimation accuracy (**Fig 4A and 4B**). The Wright–Fisher is much less appropriate when choice is robust (*s*_{C} = 0.1), because the true parameters are usually not coated by the 50% or 95% HDR (**S8 Fig**). Nonetheless, estimation of the choice coefficient stays correct, and the distinction in estimation of the formation fee is lower than an order of magnitude, with a 3- to 5-fold overestimation (MAP log ratio between 0.5 and 0.7) (**Fig 4C and 4D**).

Fig 4. Inference with WF mannequin from chemostat dynamics.

The determine exhibits outcomes of inference utilizing NPE and both the WF or chemostat (Chemo) mannequin on 5 simulated artificial observations generated utilizing the chemostat mannequin for various mixtures of health impact *s*_{C} and formation fee δ_{C}. Boxplots and markers present the log ratio of MAP estimate to true parameters for *s*_{C} and δ_{C}. Horizontal strong line represents a log ratio of zero, indicating an correct MAP estimate; dotted strains point out an order of magnitude distinction between the MAP estimate and the true parameter. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. MAP, most a posteriori; NPE, Neural Posterior Estimation; WF, Wright–Fisher.

### Inference utilizing a set of observations

Our empirical dataset contains 11 organic replicates of the identical evolution experiment. Variations within the dynamics between unbiased replicates could also be defined by an underlying DFE slightly than a single fixed choice coefficient. It’s attainable to deduce the DFE utilizing all experiments concurrently. Nonetheless, inference of distributions from a number of experiments presents a number of challenges, widespread to different mixed-effects or hierarchical fashions [88]. Alternatively, particular person values inferred from particular person experiments might present an approximation of the underlying DFE.

To check these 2 different methods for inferring the DFE, we carried out simulations during which we allowed for variation within the choice coefficient of *GAP1* CNVs for every inhabitants in a set of observations. We sampled 11 choice coefficients from a Gamma distribution with form and scale parameters *α* and *β*, respectively, and an anticipated worth *E*(*s*) = *αβ* [69], after which simulated a single commentary for every sampled choice coefficient. Because the Wright–Fisher mannequin is an appropriate approximation of the chemostat mannequin (**Fig 4**), we used the Wright–Fisher mannequin each for producing our commentary units and for parameter inference.

For the commentary units, we used NPE to both infer a single choice coefficient for every commentary or to straight infer the Gamma distribution parameters *α* and *β* from all 11 observations. When inferring 11 choice coefficients, one for every commentary within the commentary set, we match a Gamma distribution to eight of the 11 inferred values (**Fig 5**, inexperienced strains). When straight inferring the DFE, we used a uniform prior for *α* from 0.5 to fifteen and a log-uniform prior for *β* from 10^{−3} to 0.8. We held out 3 experiments from the set of 11 and used a 3-layer neural community to cut back the remaining 8 observations to a 5-feature abstract statistic vector, which we then used as an embedding internet [71] with NPE to deduce the joint posterior distribution of *α*, *β*, and δ_{C} (**Fig 5**, blue strains). For every commentary set, we carried out every inference methodology 3 occasions, utilizing completely different units of 8 experiments to deduce the underlying DFE.

Fig 5. Inference of the DFE.

A set of 11 simulated artificial observations was generated from a WF mannequin with CNV choice coefficients sampled from an exponential (Gamma with *α* = 1) DFE (true DFE; black curve). The MAP DFEs (commentary set DFE, inexperienced curves) have been straight inferred utilizing 3 completely different subsets of 8 out of 11 artificial observations. We additionally inferred the choice coefficient for every particular person commentary within the set of 11 individually and match a Gamma distribution (single commentary DFE, blue curves) to units of 8 inferred choice coefficients. All inferences have been carried out with NPE utilizing the identical amortized community to deduce a posterior for every set of 8 artificial observations or every single commentary. **(A)** weak choice, excessive formation fee, **(B)** weak choice, low formation fee, **(C)** robust choice, excessive formation fee, **(D)** robust choice, low formation fee. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. CNV, copy quantity variant; DFE, distribution of health results; MAP, most a posteriori; NPE, Neural Posterior Estimation; WF, Wright–Fisher.

We used Kullback–Leibler divergence to measure the distinction between the true DFE and inferred DFE and discover that the inferred choice coefficients from the one experiments seize the underlying DFE as properly or higher than direct inference of the DFE from a set of observations for each *α* = 1 (an exponential distribution) and *α* = 10 (sum of 10 exponentials) (**Fig 5, S9 Fig**). The one exception we discovered is when *α* = 10, *E*(*s*) = 0.001, and δ_{C} = 10^{−5} (**S9 Fig, S2 Desk**). We assessed the efficiency of inference from a set of observations utilizing out-of-sample posterior predictive accuracy [86] and located that inferring *α* and *β* from a set of observations leads to decrease posterior predictive accuracy in comparison with inferring *s*_{C} from a single commentary (**S10 Fig**). Subsequently, we conclude that estimating the DFE by means of inference of particular person choice coefficients from every commentary is superior to inference of the distribution from a number of observations.

### Inference from empirical evolutionary dynamics

To use our strategy to empirical information we inferred *GAP1* CNV choice coefficients and formation charges utilizing 11 replicated evolutionary experiments in glutamine-limited chemostats [48] (**Fig 1A**) utilizing NPE with each evolution fashions. We carried out posterior predictive checks, drawing parameter values from the posterior distribution, and located that *GAP1* CNV have been predicted to extend in frequency earlier and extra step by step than is noticed in our experimental populations (**S11 Fig**). This discrepancy is particularly obvious in experimental populations that seem to expertise clonal interference with different helpful lineages (i.e., gln07, gln09). Subsequently, we excluded information after technology 116, by which level CNVs have reached excessive frequency within the populations however don’t but exhibit the nonmonotonic and variable dynamics noticed in later time factors, and carried out inference. The ensuing posterior predictions are extra much like the observations in preliminary generations (common MAP RMSE for the 11 observations as much as technology 116 is 0.06 when inference excludes late time factors versus 0.13 when inference contains all time factors). Moreover, the general RMSE (for observations as much as technology 267) was not considerably completely different (common MAP RMSE is 0.129 and 0.126 when excluding or together with late time factors, respectively; **S12 Fig**). Limiting the evaluation to early time factors didn’t dramatically have an effect on estimates of *GAP1* CNV choice coefficient and formation fee, nevertheless it did lead to much less variability in estimates between populations (i.e., unbiased observations) and a few reordering of populations’ choice coefficients and formation fee relative to one another (**S13 Fig**). Thus, we centered on inference utilizing information previous to technology 116.

The inferred *GAP1* CNV choice coefficients have been related no matter mannequin, with the vary of MAP estimates for all populations between 0.04 and 0.1, whereas the vary of inferred *GAP1* CNV formation charges was considerably greater when utilizing the Wright–Fisher mannequin, 10^{−4.1} to 10^{−3.4}, in comparison with the chemostat mannequin, 10^{−4.7} to 10^{−4} (**Fig 6A and 6B**). Whereas there may be variation in inferred parameters as a result of coaching set, variation between observations (replicate evolution experiments) is greater than variation between coaching units (**Fig 6A–6C**). Posterior predictions utilizing the chemostat mannequin, a fuller depiction of the evolution experiments, are likely to have barely decrease RMSE than predictions utilizing the Wright–Fisher mannequin (**Fig 6C**). Nonetheless, predictions utilizing each fashions recapitulate precise *GAP1* CNV dynamics, particularly in early generations (**Fig 6D**).

Fig 6. Inference of CNV formation fee and health impact from empirical evolutionary dynamics.

The inferred MAP estimate and 95% HDIs for health impact *s*_{C} and formation fee δ_{C}, utilizing the **(A)** WF or **(B)** chemostat (Chemo) mannequin and NPE for every experimental inhabitants from [48]. Inference carried out with information as much as technology 116, and every coaching set (marker form) corresponds to an unbiased amortized posterior distribution estimated with 100,000 simulations. **(C)** Imply and 95% confidence interval for RMSE of fifty posterior predictions in comparison with empirical observations as much as technology 116. **(D)** Proportion of the inhabitants with a *GAP1* CNV within the experimental observations (strong strains) and in posterior predictions utilizing the MAP estimate from one of many coaching units proven in panels A and B with both the WF (dotted line) or chemostat (dashed line) mannequin. Formation fee and health impact of different helpful mutations set to 10^{−5} and 10^{−3}, respectively. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. CNV, copy quantity variant; HDI, highest density interval; MAP, most a posteriori; NPE, Neural Posterior Estimation; RMSE, root imply sq. error; WF, Wright–Fisher.

To check the sensitivity of those estimates, we additionally inferred the *GAP1* CNV choice coefficient and formation fee utilizing the Wright–Fisher mannequin within the absence of different helpful mutations (δ_{B} = 0), and for 9 further mixtures of different helpful mutation choice coefficient *s*_{B} and formation fee δ_{B} (**S14 Fig**). Typically, perturbations to the speed and choice coefficient of different helpful mutations didn’t alter the inferred *GAP1* CNV choice coefficient or formation fee. We discovered a single exception: When each the formation fee and health impact of different helpful mutations is excessive (*s*_{B} = 0.1 and δ_{B} = 10^{−5}), the *GAP1* CNV choice coefficient was roughly 1.6-fold greater and the formation fee was roughly 2-fold decrease (**S14 Fig**); nonetheless, posterior predictions have been poor for this set of parameter values (**S15 Fig**), suggesting that these values are inappropriate.

### Experimental affirmation of health results inferred from adaptive dynamics

To experimentally validate the inferred choice coefficients, we used lineage monitoring to estimate the DFE [7,89,90]. We carried out barseq on the whole evolving inhabitants at a number of time factors and recognized lineages that did and didn’t include *GAP1* CNVs (**Fig 7A**). Utilizing barcode trajectories to estimate health results ([89]; see **Strategies**), we recognized 1,569 out of 80,751 lineages (1.94%) as adaptive within the bc01 inhabitants. A complete of 1,513 (96.4%) adaptive lineages have a *GAP1* CNV (**Fig 7A**).

Fig 7. Comparability of DFE inferred utilizing NPE, lineage-tracking barcodes, and competitors assays.

**(A)** Barcode-based lineage frequency trajectories in experimental inhabitants bc01. Lineages with (inexperienced) and with out (grey) *GAP1* CNVs are proven. **(B)** Two replicates of a pairwise competitors assay for a single *GAP1* CNV containing lineage remoted from an evolving inhabitants. The choice coefficient for the clone is estimated from the slope of the linear mannequin (blue line) and 95% CI (grey). **(C)** The DFE for all helpful *GAP1* CNVs inferred from 11 populations utilizing NPE and the WF (purple) and chemostat (Chemo; inexperienced) fashions in contrast with the DFE inferred from barcode frequency trajectories within the bc01 inhabitants (mild blue) and the DFE inferred utilizing pairwise competitors assays with completely different *GAP1* CNV containing clones (grey). Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. CNV, copy quantity variant; DFE, distribution of health results; NPE, Neural Posterior Estimation; WF, Wright–Fisher.

As a complementary experimental strategy, choice coefficients could be straight measured utilizing competitors assays by becoming a linear mannequin to the log ratio of the *GAP1* CNV pressure and ancestral pressure frequencies over time (**Fig 7B**). Subsequently, we remoted *GAP1* CNV containing clones from populations bc01 and bc02, decided their health (**Strategies**), and mixed these estimates with beforehand reported choice coefficients for *GAP1* CNV containing clones remoted from populations gln01-gln09 [48] to outline the DFE.

The DFE for adaptive *GAP1* CNV lineages in bc01 inferred utilizing lineage-tracking barcodes and the DFE from pairwise competitors assays share related properties to the distribution inferred utilizing NPE from all experimental populations (**Fig 7C**). Thus, our inference framework utilizing CNV adaptation dynamics is a dependable estimate of the DFE estimated utilizing laborious experimental strategies which are gold requirements within the discipline.

## Dialogue

On this examine, we examined the applying of simulation-based inference for figuring out key evolutionary parameters from noticed adaptive dynamics in evolution experiments. We centered on the position of CNVs in adaptive evolution utilizing experimental information during which we quantified the inhabitants frequency of de novo CNVs at a single locus utilizing a fluorescent CNV reporter. The purpose of our examine was to check a brand new computational framework for simulation-based, likelihood-free inference, examine it to the state-of-the-art methodology, and apply it to estimate the *GAP1* CNV choice coefficient and formation charges in experimental evolution utilizing glutamine-limited chemostats.

Our examine yielded a number of essential methodological findings. Utilizing artificial information, we examined 2 completely different algorithms for joint inference of evolutionary parameters, the impact of various evolutionary fashions on inference efficiency, and the way finest to find out a DFE utilizing a number of experiments. We discover that the neural community–primarily based algorithm NPE outperforms ABC-SMC no matter evolutionary mannequin. Though a extra complicated evolutionary mannequin higher describes the evolution experiments carried out in chemostats, we discover that a typical Wright–Fisher mannequin could be a ample approximation for inference utilizing NPE. Nonetheless, the inferred *GAP1* CNV formation fee below the Wright–Fisher mannequin is greater than below the chemostat mannequin (**Fig 6A and 6B**), which is in line with the overprediction of formation charges utilizing the Wright–Fisher mannequin for inference when an commentary is generated by the chemostat mannequin and choice coefficients are excessive (**Fig 4C and 4D**). This means that Wright–Fisher will not be one of the best suited mannequin to make use of in all real-world instances, particularly if many helpful CNVs prove to have robust choice coefficients. Lastly, though it’s attainable to carry out joint inference on a number of unbiased experimental observations to deduce a DFE, we discover that inference carried out on particular person experiments and submit facto estimation of the distribution extra precisely captures the underlying DFE.

Earlier research that utilized likelihood-free inference to outcomes of evolutionary experiments differ from our examine in varied methods [5,6,49]. First, they used serial dilution slightly than chemostat experiments. Second, most centered on all helpful mutations, whereas we categorize helpful mutations into 2 classes: *GAP1* CNVs and all different helpful mutations; thus, they used an evolutionary mannequin with a single course of producing genetic variation, whereas our examine contains 2 such processes, however focuses inference on our mutation kind of curiosity. Third, we used 2 completely different evolutionary fashions: the Wright–Fisher mannequin, a typical mannequin in evolutionary genetics, and a chemostat mannequin. The latter is extra lifelike but additionally extra computationally demanding. Fourth and importantly, earlier research utilized comparatively easy rejection ABC strategies [5,6,49,69]. We utilized 2 fashionable approaches: ABC with sequential Monte Carlo sampling [63], which is a computationally environment friendly algorithm for Bayesian inference, utilizing an adaptive distance perform [81]; and NPE [78–80] with NSF [84]. NPE approximates an amortized posterior distribution from simulations. Thus, it’s extra environment friendly than ABC-SMC, as it will probably estimate a posterior distribution for brand new observations with out requiring further coaching. This characteristic is particularly helpful when a extra computationally demanding mannequin is healthier (e.g., the chemostat mannequin when choice coefficients are excessive). Our examine is the primary, to our information, to make use of neural density estimation to use likelihood-free inference to experimental evolution information.

Our software of simulation-based inference yielded new insights into the position of CNVs in adaptive evolution. Utilizing a chemostat mannequin we estimated *GAP1* CNV formation fee and choice coefficient from empirical population-level adaptive evolution dynamics and located that *GAP1* CNVs kind at a fee of 10^{−4.7} to 10^{−4.0} per technology (roughly 1 in 10,000 cell divisions) and have choice coefficients of 0.04 to 0.1 per technology. We experimentally validated our inferred health estimates utilizing barcode lineage monitoring and pairwise competitors assays and confirmed that simulation-based inference is in good settlement with the two completely different experimental strategies. The formation fee that now we have decided for *GAP1* CNVs is remarkably excessive. Locus-specific CNV formation charges are extraordinarily tough to find out and fluctuation assays have yielded estimates starting from 10^{−12} to 10^{−6} [91–95]. Mutation accumulation research have yielded genome-wide CNV charges of about 10^{−5} [32,37,38], which is an order of magnitude decrease than our locus-specific formation fee. We posit 2 attainable explanations for this excessive fee: (1) CNVs on the *GAP1* locus could also be deleterious in most situations, together with the putative nonselective situations used for mutation-selection experiments, and due to this fact underestimated in mutation accumulation assays as a result of unfavorable choice; and (2) below nitrogen-limiting selective situations, during which *GAP1* expression ranges are extraordinarily excessive, a mechanism of induced CNV formation might function that will increase the speed at which they’re generated, as has been proven at different loci within the yeast genome [96, 97]. Empirical validation of the inferred fee of *GAP1* CNV formation in nitrogen-limiting situations requires experimental affirmation.

This simulation-based inference strategy could be readily prolonged to different evolution experiments. On this examine, we carried out inference of parameters for a single kind of mutation. This strategy may very well be prolonged to deduce the charges and results of a number of kinds of mutations concurrently. For instance, as an alternative of assuming a fee and choice coefficient for different helpful mutations and performing ex submit facto analyses wanting on the sensitivity of inference of *GAP1* CNV parameters in different helpful mutation regimes, one might concurrently infer parameters for each of all these mutations. As proven utilizing our barcode-sequencing information, many CNVs come up throughout adaptive evolution, and former research have proven that CNVs have completely different constructions and mechanisms of formation [48,98]. Inferring a single efficient choice coefficient and formation fee is a present limitation of our examine that may very well be overcome by inferring charges and results for various courses of CNVs (e.g., aneuploidy versus tandem duplication). Inspecting conditional correlations in posterior distributions involving a number of kinds of mutations has the potential to supply insights into how interactions between completely different courses of mutations form evolutionary dynamics.

The strategy may be utilized to CNV dynamics at different loci, in numerous genetic backgrounds, or in numerous media situations. Ploidy and numerous molecular mechanisms seemingly influence CNV formation charges. For instance, charges of aneuploidy, which consequence from nondisjunction errors, are greater in diploid yeast than haploid yeast, and chromosome beneficial properties are extra frequent than chromosome losses [37]. There’s appreciable proof for heterogeneity within the CNV fee between loci, as elements together with native sequence options, transcriptional exercise, genetic background, and the exterior atmosphere might influence the mutation spectrum. For instance, there may be proof that CNVs happen at the next fee close to sure genomic options, akin to repetitive components [42], tRNA genes [99], origins of replication [100], and replication fork obstacles [101].

Moreover, this strategy may very well be used to deduce formation charges and choice coefficients for different kinds of mutations in numerous asexually reproducing populations; the empirical information required is just the proportion of the inhabitants with a given mutation kind over time, which might effectively be decided utilizing a phenotypic marker, or related quantitative information akin to whole-genome whole-population sequencing. Evolutionary fashions may very well be prolonged to extra complicated evolutionary eventualities together with altering inhabitants sizes, fluctuating choice, and altering ploidy and reproductive technique, with an final purpose of inferring their influence on quite a lot of evolutionary parameters and predicting evolutionary dynamics in complicated environments and populations. Functions to tumor evolution and viral evolution are associated issues which are seemingly amenable to this strategy.

## Strategies

All supply code and information for performing the analyses and reproducing the figures is on the market at https://doi.org/10.17605/OSF.IO/E9D5X. Code can also be out there at https://github.com/graceave/cnv_sims_inference.

### Evolutionary fashions

We modeled the adaptive evolution from an isogenic asexual inhabitants with frequencies X_{A} of the ancestral (or wild kind) genotype, X_{C} of cells with a *GAP1* CNV, and X_{B} of cells with a unique kind of helpful mutation. Ancestral cells can achieve a *GAP1* CNV or one other helpful mutation at charges δ_{C} and δ_{B}, respectively. Subsequently, the frequencies of cells of various genotypes after mutation are

For simplicity, this mannequin neglects cells with a number of mutations, which is affordable for brief timescales, akin to these thought of right here.

Within the discrete time Wright–Fisher mannequin, the change in frequency as a result of pure choice is modeled by

the place w_{i} is the relative health of cells with genotype i, and is the inhabitants imply health relative to the ancestral kind. Relative health is said to the choice coefficient by

The change in frequency due random genetic drift is given by

the place *N* is the inhabitants measurement. In our simulations *N* = 3.3 × 10^{8}, the efficient inhabitants measurement within the chemostat populations in our experiment (see the “Figuring out the efficient inhabitants measurement within the chemostat” part).

The chemostat mannequin begins with a inhabitants measurement 1.5 × 10^{−7} and the focus of the limiting nutrient within the development vessel, *S*, is the same as the focus of that nutrient within the recent media, *S*_{0}. Throughout steady tradition, the chemostat is repeatedly diluted as recent media flows in and tradition media and cells are eliminated at fee *D*. In the course of the preliminary section of development, the inhabitants measurement grows, and the limiting nutrient focus is diminished till a gentle state is attained at which the inhabitants measurement and limiting nutrient focus are maintained indefinitely. We prolonged the mannequin for competitors between 2 haploid clonal populations for a single growth-limiting useful resource in a chemostat from [73] to three populations such that

*Y*_{i} is the tradition yield of pressure i per mole of limiting nutrient. *r*_{A} is the Malthusian parameter, or intrinsic fee of enhance, for the ancestral pressure, and within the chemostat literature is continuously known as *μ*_{max}, the maximal development fee. The expansion fee within the chemostat, *μ*, will depend on the the focus of the limiting nutrient with saturating kinetics . *okay*_{i} is the substrate focus at half-maximal *μ*. *r*_{C} and *r*_{B} are the Malthusian parameters for strains with a CNV and strains with one other helpful mutation, respectively, and are associated to the ancestral Malthusian parameter and choice coefficient by [102]

The values for the parameters used within the chemostat mannequin are in Desk 1.

We simulated steady time within the chemostat utilizing the Gillespie algorithm with *τ*-leaping. Briefly, we calculate the charges of ancestral development, ancestral dilution, CNV development, CNV dilution, different mutant development, different mutant dilution, mutation from ancestral to CNV, and mutation from ancestral to different mutant. For the subsequent time interval *τ*, we calculated the variety of occasions every occasion happens through the interval utilizing the Poisson distribution. The limiting substrate focus is then adjusted accordingly. These steps repeat till the specified variety of generations is reached.

For the chemostat mannequin, we started counting generations after 48 hours, which is roughly the period of time required for the chemostat to achieve regular state, and after we started recording generations in [48].

### Figuring out the efficient inhabitants measurement within the chemostat

With a purpose to decide the efficient inhabitants measurement within the chemostat, and thus the inhabitants measurement to make use of in with the Wright–Fisher mannequin, we decided the conditional variance of the allele frequency within the subsequent technology *p’* given the frequency within the present technology *p* within the chemostat. To do that, we simulated a chemostat inhabitants with 2 impartial alleles with frequencies *p* and *q* (*p + q = 1*), which start at equal frequencies, *p = q*. We allowed the simulation to run for 1,000 generations, recording the frequency *p* at each technology, excluding the primary 100 generations to make sure the inhabitants is at regular state. We then computed the conditional variance *Var(p’|p)* in every technology and estimated the efficient inhabitants measurement as (the place *t = 900* is the entire variety of generations) [103]:

The estimated efficient inhabitants measurement in our chemostat situations is 3.3 × 10^{8}, which is roughly two-thirds of the census inhabitants measurement *N* when the chemostat is at regular state.

### Inference strategies

For inference utilizing single observations, we used the proportion of the inhabitants with a *GAP1* CNV at 25 time factors as our abstract statistics and outlined a log-uniform prior for the formation fee starting from 10^{−12} to 10^{−3} and a log-uniform prior for the choice coefficient from 10^{−4} to 0.4.

For inference utilizing units of commentary, we used a uniform prior for *α* from 0.5 to fifteen, a log-uniform prior for *β* from 10^{−3} to 0.8, and a log-uniform prior for the formation fee starting from 10^{−12} to 10^{−3}. To be used with NPE, we used a 3-layer sequential neural community with linear transformations in every layer and rectified linear unit because the activation features to encode the commentary set into 5 abstract statistics, which we then used as an embedding internet with NPE.

We utilized ABC-SMC carried out within the Python bundle *pyABC* [70]. For inference utilizing single observations, we used an adaptively weighted Euclidean distance perform with the foundation imply sq. deviation as the dimensions perform. For inference utilizing a set of observations, we used the squared Euclidean distance as our distance metric. We used 100 samples from the prior for preliminary calibration earlier than the primary spherical, and a most acceptance fee of both 10,000 or 100,000 for each single observations and commentary units (i.e.,10,000 single observations or 10,000 units of 11 observations). For the acceptance fee of 10,000, we began inference with 100 samples, had a most of 1,000 accepted samples per spherical, and a most of 10 rounds. For the acceptance fee of 100,000, we began inference with 1,000 samples, had a most of 10,000 accepted samples per spherical, and a most of 10 rounds. The precise variety of samples from the proposal distribution throughout every spherical of sampling have been adaptively decided primarily based on the form of the present posterior distribution [82]. For inference of the posterior for every commentary, we carried out a number of rounds of sampling till both we reached the acceptance threshold ε < = 0.002 or 10 rounds have been carried out.

We utilized NPE carried out within the Python bundle *sbi* [71] utilizing a MAF [83] or a NSF [84] as a conditional density estimator that learns an amortized posterior density for single observations. We used both 10,000 or 100,000 simulations to coach the community. To check the dependence of our outcomes on the set of simulations used to be taught the posterior, we skilled 3 unbiased amortized networks with completely different units of simulations generated from the prior and in contrast our ensuing posterior distributions for every commentary.

### Evaluation of efficiency of every methodology with every mannequin

To check every methodology, we simulated 5 populations for every mixture of the next CNV formation charges and health results: *s*_{C} = 0.001 and δ_{C} = 10^{−5}; *s*_{C} = 0.1 and δ_{C} = 10^{−5}; *s*_{C} = 0.001 and δ_{C} = 10^{−7}; *s*_{C} = 0.1 and δ_{C} = 10^{−7}, for each the Wright–Fisher mannequin and the chemostat mannequin, leading to 40 complete simulated observations. We independently inferred the CNV health impact and formation fee for every simulated commentary 3 occasions.

We calculated the MAP estimate by first estimating a Gaussian kernel density estimate (KDE) utilizing *SciPy* (*scipy*.*stats*.*gaussian_kde)* [104] with no less than 1,000 parameter mixtures and their weights drawn from the posterior distribution. We then discovered the utmost of the KDE (utilizing *scipy*.*optimize*.*reduce* with the Nelder–Mead solver). We calculated the 95% HDIs for the MAP estimate of every parameter utilizing *pyABC* (*pyabc*.*visualization*.*credible*.*compute_credible_interval*) [70].

We carried out posterior predictive checks by simulating CNV dynamics utilizing the MAP estimate in addition to 50 parameter values sampled from the posterior distribution. We calculated RMSE and correlation to measure settlement of the 50 posterior predictions with the commentary and report the imply and 95% confidence intervals for these measures. For inference on units of observations, we calculated the RMSE and correlation coefficient between the posterior predictions and every of the three held out observations, and report the imply and 95% confidence intervals for these measures over all 3 held out observations.

We calculated AIC utilizing the usual components

the place is the MAP estimate, *okay* = 2 is the variety of inferred parameters, *y* is the noticed information, and *p* is the inferred posterior distribution. We calculated *Watanabe-AIC* or WAIC in line with each generally used formulation:

the place *S* is the variety of attracts from the posterior distribution, *θ*^{s} is a pattern from the posterior, and is the posterior pattern variance.

### Pairwise competitions

We remoted CNV-containing clones from the populations on the premise of fluorescence and carried out pairwise competitions between every clone and an unlabeled ancestral (FY4) pressure. We additionally carried out competitions between the ancestral *GAP1* CNV reporter pressure, with and with out barcodes. To carry out the competitions, we grew fluorescent *GAP1* CNV clones and ancestral clones in glutamine-limited chemostats till they reached regular state [48]. We then combined the fluorescent strains with the unlabeled ancestor in a ratio of roughly 1:9 and carried out competitions within the chemostats for 92 hours or about 16 generations, sampling roughly each 2 to three generations. For every time level, no less than 100,000 cells have been analyzed utilizing an Accuri move cytometer to find out the relative abundance of every genotype. Beforehand, we established that the ancestral *GAP1* CNV reporter has no detectable health impact in comparison with the unlabeled ancestral pressure [48]. Nonetheless, the *GAP1* CNV reporter with barcodes does seem to have a slight health value related to it; due to this fact, we took barely completely different approaches to find out the choice coefficient relative to the ancestral state relying on whether or not or not a *GAP1* CNV containing clone was barcoded. If a clone was not barcoded, we decided relative health utilizing linear regression of the log ratio of the frequency of the two genotypes in opposition to the variety of elapsed hours. If a clone was barcoded, relative health was computed utilizing linear regression of the log ratio of the frequencies of the barcoded *GAP1* CNV-containing clone and the unlabeled ancestor, and the log ratio of the frequencies of the unevolved barcoded *GAP1* CNV reporter ancestor to the unlabeled ancestor in opposition to the variety of elapsed hours, including a further interplay time period for the advanced versus ancestral state. We transformed relative health from per hour to technology by dividing by the pure log of two.

### Barcode sequencing

In our prior examine, populations with lineage monitoring barcodes and the *GAP1* CNV reporter have been advanced in glutamine-limited chemostats [48], and entire inhabitants samples have been periodically frozen in 15% glycerol. To extract DNA, we thawed pelleted cells utilizing centrifugation and extracted genomic DNA utilizing a modified Hoffman–Winston protocol, preceded by incubation with zymolyase at 37°C to reinforce cell lysis [105]. We measured DNA amount utilizing a fluorometer and used all DNA from every pattern as enter to a sequential PCR protocol to amplify DNA barcodes which have been then purified utilizing a Nucleospin PCR clean-up equipment, as described beforehand[48,89].

We measured fragment measurement with an Agilent TapeStation 2200 and carried out qPCR to find out the ultimate library focus. DNA libraries have been sequenced utilizing a paired-end 2 × 150 bp protocol on an Illumina NovaSeq 6000 utilizing an XP workflow. Normal metrics have been used to evaluate information high quality (Q30 and %PF). We used the Bartender algorithm with UMI dealing with to account for PCR duplicates and to cluster sequences with merging selections primarily based solely on distance besides in instances of low protection (<500 reads/barcode), for which the default cluster merging threshold was used [69]. Clusters with a measurement lower than 4 or with excessive entropy (>0.75 high quality rating) have been discarded. We estimated the relative abundance of barcodes utilizing the variety of distinctive reads supporting a cluster in comparison with complete library measurement. Uncooked sequencing information is on the market by means of the SRA, BioProject ID PRJNA767552.

### Detecting adaptive lineages in barcoded clonal populations

To detect spontaneous adaptive mutations in a barcoded clonal cell inhabitants that’s advanced for over time, we used a Python-based pipeline (which could be discovered at https://github.com/FangfeiLi05/PyFitMut) primarily based on a beforehand developed theoretical framework [89]. The pipeline identifies adaptive lineages and infers their health results and institution time. In a barcoded inhabitants, a lineage refers to cells that share the identical DNA barcode. For every lineage within the barcoded inhabitants, helpful mutations frequently happen at a complete helpful mutation fee Ub, with health impact s, which leads to a sure spectrum of health results of mutations μ(s). If a helpful mutant survives random drift and turns into massive sufficient to develop deterministically (exponentially), we are saying that the mutation carried by the mutant has established. Right here, we use Wright health s, which is outlined as common variety of further t offspring of a cell per technology, that’s, n(t) = n(0)·(1 + s), with n(t) being the entire variety of cells at technology t (could be nonintegers). Briefly, for every lineage, assuming that the lineage is adaptive (i.e., a lineage with a helpful mutation occurred and established), then estimates of the health impact and institution time of every lineage are made by random initialization, and the anticipated trajectory of every lineage is estimated and in comparison with the measured trajectory. Health impact and institution time estimates are iteratively adjusted to raised match the noticed information till an optimum is reached. On the identical time, the anticipated trajectory of the lineage can also be estimated assuming that the lineage is impartial. Lastly, Bayesian inference is used to find out whether or not the lineage is adaptive or impartial. An correct estimation of the imply health is important to detect mutations and quantify their health results, however the imply health is a amount that can not be measured straight from the evolution. Somewhat, it must be inferred by means of different variables. Beforehand, the imply health was estimated by monitoring the decline of impartial lineages [89]. Nonetheless, this methodology fails when there may be an inadequate variety of impartial lineages on account of low sequencing learn depth. Right here, we as an alternative estimate the imply health utilizing an iterative methodology. Particularly, we first initialize the imply health of the inhabitants as zero at every sequencing time level, then we estimate the health impact and institution time for adaptive mutations, then we recalculate the imply health with the optimized health and institution time estimates, repeating the method for a number of iterations till the imply health converges.

## Supporting data

### S1 Fig. Interpolation for bc01 and bc02.

Populations gln01-gln09 and bc01-bc02 have completely different time factors—the gln populations have 25 time factors in complete, whereas the bc populations have 32 time factors in complete. Of those, 12 of the time factors are the identical in each populations. To match the time factors within the gln populations, we interpolated from the two nearest time factors within the bc populations (utilizing pandas.DataFrame.interpolate(“values”)). This manner, we will use the identical information (identical time factors) for inference for all 11 populations in order that we will use the identical amortized NPE posterior to deduce parameters for each gln populations and bc populations. Unique bc information are proven as black dots, the matched information, with interpolated time factors, is proven as crimson crosses. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. NPE, Neural Posterior Estimation.

https://doi.org/10.1371/journal.pbio.3001633.s003

(PNG)

### S2 Fig. Efficiency evaluation of NPE with MAF utilizing single simulated artificial observations.

These present the outcomes of inference on 5 simulated artificial observations generated utilizing both the WF or chemostat (Chemo) mannequin (and inference carried out with the identical mannequin) per mixture of health impact *s*_{C} and formation fee δ_{C}. Right here, we present the outcomes of performing one coaching set with NPE with MAF utilizing 100,000 simulations for coaching and utilizing the identical amortized community to deduce a posterior for every replicate artificial commentary. **(A)** Share of true parameters throughout the 50% HDR. **(B)** Distribution of widths of the health impact *s*_{C} 95% HDI calculated because the distinction between the 97.5 percentile and a pair of.5 percentile, for every inferred posterior distribution. **(C)** Distribution of the variety of orders of magnitude encompassed by the formation fee δ_{C} 95% HDI, calculated as distinction of the bottom 10 logarithms of the 97.5 percentile and a pair of.5 percentile, for every inferred posterior distribution. **(D)** Log ratio MAP estimate as in comparison with true parameters for *s*_{C} and δ_{C}. Be aware that every panel has a unique y-axis. **(E)** Imply and 95% confidence interval for RMSE of fifty posterior predictions as in comparison with the artificial commentary for which inference was carried out. **(F)** RMSE of posterior prediction generated with MAP parameters as in comparison with the artificial commentary for which inference was carried out. **(G)** Imply and 95% confidence interval for correlation coefficient of fifty posterior predictions in comparison with the artificial commentary for which inference was carried out. **(H)** Correlation coefficient of posterior prediction posterior prediction generated with MAP parameters in comparison with the artificial commentary for which inference was carried out. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. HDI, highest density interval; HDR, highest density area; MAF, masked autoregressive move; MAP, most a posteriori; NPE, Neural Posterior Estimation; RMSE, root imply sq. error; WF, Wright–Fisher.

https://doi.org/10.1371/journal.pbio.3001633.s004

(PNG)

### S3 Fig. NPE with the WF mannequin performs as properly or higher than different mixtures of mannequin and methodology.

Outcomes of inference on 5 simulated single artificial observations generated utilizing both the WF or chemostat (Chemo) mannequin (and inference carried out with the identical mannequin) per mixture of health impact *s*_{C} and formation fee δ_{C}. Right here, we present the outcomes of performing coaching with NPE with NSF utilizing 100,000 simulations for coaching and utilizing the identical amortized community to deduce a posterior for every replicate artificial commentary, or ABC-SMC when the coaching price range was 10,000. **(A)** RMSE (decrease is healthier) of posterior prediction generated with MAP parameters as in comparison with the artificial commentary on which inference was carried out. **(B)** Correlation coefficient (greater is healthier) of posterior prediction generated with MAP parameters in comparison with the artificial commentary on which inference was carried out. **(C)** Imply and 95% confidence interval for correlation coefficient (greater is healthier) of fifty posterior predictions (sampled from the posterior distribution) in comparison with the artificial commentary on which inference was carried out. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. ABC-SMC, Approximate Bayesian Computation with Sequential Monte Carlo; MAP, most a posteriori; NPE, Neural Posterior Estimation; RMSE, root imply sq. error; WF, Wright–Fisher.

https://doi.org/10.1371/journal.pbio.3001633.s005

(PNG)

### S4 Fig. NPE and WF have the bottom data standards.

WAIC and AIC (decrease is healthier) of fashions fitted on single artificial observations utilizing both the WF or chemostat (Chemo) mannequin and both ABC-SMC or NPE for various mixtures of health impact *s*_{C} and formation fee δ_{C} with simulation budgets of 10,000 or 100,000 simulations per inference process (sides). We have been unable to finish ABC-SMC with the chemostat mannequin (crimson) when the coaching price range was 100,000 inside an inexpensive timeframe. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. ABC-SMC, Approximate Bayesian Computation with Sequential Monte Carlo; AIC, Akaike data criterion; NPE, Neural Posterior Estimation; WAIC, broadly relevant data criterion; WF, Wright–Fisher.

https://doi.org/10.1371/journal.pbio.3001633.s006

(PNG)

### S5 Fig. NPE performs much like or higher than ABC-SMC for 8 further parameter mixtures.

The determine exhibits the outcomes of inference on 5 simulated artificial observations utilizing the WF mannequin per mixture of health impact *s*_{C} and formation fee δ_{C}. Simulations and inference have been carried out utilizing the identical mannequin. For NPE, every coaching set corresponds to an independently amortized posterior distribution skilled on a unique set of 100,000 simulations, with which every artificial commentary was evaluated to provide a separate posterior distribution. For ABC-SMC, every coaching set corresponds to unbiased inference procedures on every commentary with a most of 100,000 complete simulations accepted for every inference process and a stopping standards of 10 iterations or ε < = 0.002, whichever happens first. **(A)** The p.c of true parameters throughout the 50% or 95% HDR of the inferred posterior distribution. The bar top exhibits the common of three coaching units. **(B, C)** Distribution of widths of 95% HDI of the posterior distribution of the health impact *s*_{C} (B) and CNV formation fee δ_{C} (C), calculated because the distinction between the 97.5 percentile and a pair of.5 percentile, for every individually inferred posterior distribution. **(D)** Log ratio (relative error) of MAP estimate to true parameter for *s*_{C} and δ_{C}. Be aware the completely different y-axis ranges. A superbly correct MAP estimate would have a log ratio of zero. **(E)** Imply and 95% confidence interval for RMSE of fifty posterior predictions as in comparison with the artificial commentary for which inference was carried out. **(F)** RMSE of posterior prediction generated with MAP parameters as in comparison with the artificial commentary for which inference was carried out. **(G)** Imply and 95% confidence interval for correlation coefficient of fifty posterior predictions in comparison with the artificial commentary for which inference was carried out. **(H)** Correlation coefficient of posterior prediction posterior prediction generated with MAP parameters in comparison with the artificial commentary for which inference was carried out. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. ABC-SMC, Approximate Bayesian Computation with Sequential Monte Carlo; HDI, highest density interval; HDR, highest density area; MAP, most a posteriori; NPE, Neural Posterior Estimation; RMSE, root imply sq. error; WF, Wright–Fisher.

https://doi.org/10.1371/journal.pbio.3001633.s007

(PNG)

### S6 Fig. Impact of simulation price range on relative error of MAP estimate and width of HDIs.

For NPE, amortized posteriors have been estimated utilizing both 10,000 or 100,000 simulations, with which every artificial commentary was evaluated to provide a separate posterior distribution. For ABC-SMC, a posterior was independently inferred for every commentary with a most of 10,000 or 100,000 complete simulations accepted and a stopping standards of 10 iterations or ε < = 0.002, whichever happens first. The grey strains in **(A, D)** signifies a relative error of zero (i.e., no distinction between MAP parameters and true parameters). **(D, E, F)** We have been unable to finish ABC-SMC with the chemostat mannequin (crimson) when the coaching price range was 100,000 inside an inexpensive timeframe. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. ABC-SMC, Approximate Bayesian Computation with Sequential Monte Carlo; MAP, most a posteriori; NPE, Neural Posterior Estimation.

https://doi.org/10.1371/journal.pbio.3001633.s008

(PNG)

### S7 Fig. The cumulative variety of simulations wanted to estimate posterior distributions for a number of observations.

The x-axis exhibits the variety of replicate simulated artificial observations for a mix of parameters, and the y-axis exhibits the cumulative variety of simulations wanted to deduce posteriors for an growing variety of observations *(*see the “Overview of inference methods” part for extra particulars), for observations with completely different mixtures of CNV choice coefficient s_{C} and CNV formation fee δ_{C} **(A–D)**. Every side represents a complete simulation price range for NPE, or the utmost variety of accepted simulations for ABC-SMC. Since NPE makes use of amortization, a single amortized community is skilled with 10,000 or 100,000 simulations, and that community is then used to deduce posteriors for every commentary (observe {that a} single amortized community was used to deduce posteriors for all parameter mixtures.) For ABC-SMC, every commentary requires a separate inference process to be carried out individually, and never all generated simulations are accepted for posterior estimation; due to this fact, the variety of simulations used for a single commentary could also be greater than the acceptance threshold, and the variety of simulations wanted will increase with the variety of observations for which a posterior is inferred. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. ABC-SMC, Approximate Bayesian Computation with Sequential Monte Carlo; CNV, copy quantity variant; NPE, Neural Posterior Estimation.

https://doi.org/10.1371/journal.pbio.3001633.s009

(PNG)

### S8 Fig. Outcomes of inference on 5 simulated artificial observations generated utilizing both the WF or chemostat (Chemo) mannequin per mixture of health impact *s*_{C} and formation fee δ_{C}.

We carried out inference on every artificial commentary utilizing each fashions. For NPE, every coaching set corresponds to an unbiased amortized posterior skilled with 100,000 simulations, with which every artificial commentary was evaluated. **(A)** Share of true parameters throughout the 50% HDR. The bar top exhibits the common of three coaching units. **(B)** Share of true parameters throughout the 95% HDR. The bar top exhibits the common of three coaching units. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. HDR, highest density area; NPE, Neural Posterior Estimation; WF, Wright–Fisher.

https://doi.org/10.1371/journal.pbio.3001633.s010

(PNG)

### S9 Fig. A set of 11 simulated artificial observations was generated from a WF mannequin with CNV choice coefficients sampled from an Gamma distribution the place *α* = 10 of health results (DFE) (black curve).

The MAP DFEs (blue curves) have been straight inferred utilizing 3 completely different subsets of 8 out of 11 artificial observations. We additionally inferred the choice coefficient for every commentary within the set of 11 individually, and match Gamma distributions to units of 8 inferred choice coefficients (inexperienced curves). All inferences have been carried out with NPE utilizing the identical amortized community to deduce a posterior for every set of 8 artificial observations or every single commentary. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. DFE, distribution of health results; MAP, most a posteriori; NPE, Neural Posterior Estimation; WF, Wright–Fisher.

https://doi.org/10.1371/journal.pbio.3001633.s011

(PNG)

### S11 Fig.

Proportion of the inhabitants with a *GAP1* CNV within the experimental observations (black) and in posterior predictions utilizing the MAP estimate proven in panels A and B with both the WF or chemostat (Chemo) mannequin. Inference was carried out with all information as much as technology 267 (WF ppc 267, Chemo ppc 267), or excluding information after technology 116 (WF ppc 116, Chemo ppc 116). Formation fee and health impact of different helpful mutations set to 10^{−5} and 10^{−3}, respectively. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. MAP, most a posteriori; WF, Wright–Fisher.

https://doi.org/10.1371/journal.pbio.3001633.s013

(PNG)

### S12 Fig. MAP predictions have decrease error when inference is carried out utilizing solely as much as technology 116 and are most correct for the primary 116 generations.

MAP posterior prediction RMSE when inference was carried out excluding information after technology 116 (left) or utilizing all information as much as technology 267 (proper). RMSE was calculated utilizing both the primary 116 generations or utilizing as much as technology 267 (x-axis). Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. MAP, most a posteriori; RMSE, root imply sq. error.

https://doi.org/10.1371/journal.pbio.3001633.s014

(PNG)

### S13 Fig.

The inferred MAP estimate and 95% HDIs for health impact *s*_{C} and formation fee δ_{C}, utilizing the **(A)** WF or **(B)** chemostat (Chemo) mannequin and NPE for every experimental inhabitants from Lauer and colleagues (2018). Inference was both carried out with information as much as technology 116 or with all information, as much as technology 267 (sides). Every coaching set corresponds to three unbiased amortized posterior distributions estimated with 100,000 simulations. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. HDI, highest density interval; MAP, most a posteriori; NPE, Neural Posterior Estimation; WF, Wright–Fisher.

https://doi.org/10.1371/journal.pbio.3001633.s015

(PNG)

### S14 Fig. Sensitivity evaluation.

*GAP1* CNV formation fee and choice coefficient inferred utilizing NPE with the WF mannequin doesn’t change significantly when different helpful mutations have completely different choice coefficients *s*_{B} and formation charges δ_{B}, besides when each *s*_{B} and δ_{B} are excessive (purple). Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. CNV, copy quantity variant; NPE, Neural Posterior Estimation; WF, Wright–Fisher.

https://doi.org/10.1371/journal.pbio.3001633.s016

(PNG)

### S15 Fig.

Imply and 95% confidence interval for RMSE **(A)** and correlation **(B)** of fifty posterior predictions in comparison with empirical observations as much as technology 116, utilizing posterior distributions inferred when different helpful mutations have completely different choice coefficients *s*_{B} and formation charges δ_{B}. Information and code required to generate this determine could be discovered at https://doi.org/10.17605/OSF.IO/E9D5X. RMSE, root imply sq. error.

https://doi.org/10.1371/journal.pbio.3001633.s017

(PNG)

## Acknowledgments

We thank Uri Obolski, Ilia Kohanovski, Mark Siegal, Molly Przeworski, and members of the Gresham and Ram labs for discussions and feedback.

## References

- 1.

Gallet R, Cooper TF, Elena SF, Lenormand T. Measuring choice coefficients under 10(-3): methodology, questions, and prospects. Genetics. 2012;190:175–86. pmid:22042578 - 2.

Ram Y, Dellus-Gur E, Bibi M, Karkare Ok, Obolski U, Feldman MW, et al. Predicting microbial development in a combined tradition from development curve information. Proc Natl Acad Sci U S A. 2019;116:14698–707. pmid:31253703 - 3.

Kondrashov FA, Kondrashov AS. Measurements of spontaneous charges of mutations within the latest previous and the close to future. Philosophical Transactions of the Royal Society B: Organic Sciences. 2010:1169–76. pmid:20308091 - 4.

de Sousa JAM, Campos PRA, Gordo I. An ABC Technique for Estimating the Fee and Distribution of Results of Useful Mutations. Genome Biol Evol. 2013:794–806. pmid:23542207 - 5.

Hegreness M, Shoresh N, Hartl D, Kishony R. An equivalence precept for the incorporation of favorable mutations in asexual populations. Science. 2006;311:1615–7. pmid:16543462 - 6.

Barrick JE, Kauth MR, Strelioff CC, Lenski RE.*Escherichia coli*rpoB mutants have elevated evolvability in proportion to their health defects. Mol Biol Evol. 2010;27:1338–47. pmid:20106907 - 7.

Nguyen Ba AN, Cvijović I, Rojas Echenique JI, Lawrence KR, Rego-Costa A, Liu X, et al. Excessive-resolution lineage monitoring reveals travelling wave of adaptation in laboratory yeast. Nature. 2019;575:494–9. pmid:31723263 - 8.

Lang GI, Botstein D, Desai MM. Genetic Variation and the Destiny of Useful Mutations in Asexual Populations. Genetics. 2011:647–61. pmid:21546542 - 9.

Torada L, Lorenzon L, Beddis A, Isildak U, Pattini L, Mathieson S, et al. ImaGene: a convolutional neural community to quantify pure choice from genomic information. BMC Bioinformatics. 2019;20:337. pmid:31757205 - 10.

Weinreich DM, Delaney NF, Depristo MA, Hartl DL. Darwinian evolution can observe solely only a few mutational paths to fitter proteins. Science. 2006;312:111–4. pmid:16601193 - 11.

MacLean RC, Buckling A. The distribution of health results of helpful mutations in*Pseudomonas aeruginosa*. PLoS Genet. 2009;5:e1000406. pmid:19266075 - 12.

Zuellig MP, Sweigart AL. Gene duplicates trigger hybrid lethality between sympatric species of Mimulus. PLoS Genet. 2018;14:e1007130. pmid:29649209 - 13.

Dhami MK, Hartwig T, Fukami T. Genetic foundation of precedence results: insights from nectar yeast. Proc Biol Sci. 2016;283. pmid:27708148 - 14.

Turner KM, Deshpande V, Beyter D, Koga T, Rusert J, Lee C, et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature. 2017;543:122–5. pmid:28178237 - 15.

Geiger T, Cox J, Mann M. Proteomic modifications ensuing from gene copy quantity variations in most cancers cells. PLoS Genet. 2010;6:e1001090–0. pmid:20824076 - 16.

Stratton MR, Campbell PJ, Futreal PA. The most cancers genome. Nature. 2009;458:719–24. pmid:19360079 - 17.

Harrison M-C, LaBella AL, Hittinger CT, Rokas A. The evolution of the GALactose utilization pathway in budding yeasts. Tendencies Genet. 2021. pmid:34538504 - 18.

Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L. Pure choice has pushed inhabitants differentiation in fashionable people. Nat Genet. 2008;40:340–5. pmid:18246066 - 19.

Iskow RC, Gokcumen O, Abyzov A, Malukiewicz J, Zhu Q, Sukumar AT, et al. Regulatory ingredient copy quantity variations form primate expression profiles. Proc Natl Acad Sci U S A. 2012;109:12656–61. pmid:22797897 - 20.

Zarrei M, MacDonald JR, Merico D, Scherer SW. A duplicate quantity variation map of the human genome. Nat Rev Genet. 2015;16:172–83. pmid:25645873 - 21.

Ramirez O, Olalde I, Berglund J, Lorente-Galdos B, Hernandez-Rodriguez J, Quilez J, et al. Evaluation of structural variety in wolf-like canids reveals post-domestication variants. BMC Genomics. 2014;15:465–5. pmid:24923435 - 22.

Clop A, Vidal O, Amills M. Copy quantity variation within the genomes of home animals. Anim Genet. 2012;43:503–17. pmid:22497594 - 23.

Żmieńko A, Samelak A, Kozłowski P, Figlerowicz M. Copy quantity polymorphism in plant genomes. Theor Appl Genet. 2014;127:1–18. pmid:23989647 - 24.

Greenblum S, Carr R, Borenstein E. In depth strain-level copy-number variation throughout human intestine microbiome species. Cell. 2015;160:583–94. pmid:25640238 - 25.

Nair S, Miller B, Barends M, Jaidee A, Patel J, Mayxay M, et al. Adaptive copy quantity evolution in malaria parasites. PLoS Genet. 2008;4:e1000243. pmid:18974876 - 26.

Iantorno SA, Durrant C, Khan A, Sanders MJ, Beverley SM, Warren WC, et al. Gene Expression in Leishmania Is Regulated Predominantly by Gene Dosage. MBio. 2017;8. pmid:28900023 - 27.

Dulmage KA, Darnell CL, Vreugdenhil A, Schmid AK. Copy quantity variation is related to gene expression change in archaea. Microb Genom. 2018. pmid:30142055 - 28.

Gao Y, Zhao H, Jin Y, Xu X, Han G-Z. Extent and evolution of gene duplication in DNA viruses. Virus Res. 2017;240:161–5. pmid:28822699 - 29.

Rezelj VV, Levi LI, Vignuzzi M. The faulty element of viral populations. Curr Opin Virol. 2018;33:74–80. pmid:30099321 - 30.

Elde NC, Youngster SJ, Eickbush MT, Kitzman JO, Rogers KS, Shendure J, et al. Poxviruses deploy genomic accordions to adapt quickly in opposition to host antiviral defenses. Cell. 2012;150:831–41. pmid:22901812 - 31.

Ben-David U, Amon A. Context is all the pieces: aneuploidy in most cancers. Nat Rev Genet. 2019. pmid:31548659 - 32.

Zhu YO, Siegal ML, Corridor DW, Petrov DA. Exact estimates of mutation fee and spectrum in yeast. Proc Natl Acad Sci U S A. 2014;111:E2310–8. pmid:24847077 - 33.

Anderson RP, Roth JR. Tandem Genetic Duplications in Phage and Micro organism. Annu Rev Microbiol. 1977;31:473–505. pmid:334045 - 34.

Horiuchi T, Horiuchi S, Novick A. The genetic foundation of hyper-synthesis of beta-galactosidase. Genetics. 1963;48:157–69. pmid:13954911 - 35.

Reams AB, Kofoid E, Savageau M, Roth JR. Duplication frequency in a inhabitants of Salmonella enterica quickly approaches regular state with or with out recombination. Genetics. 2010;184:1077–94. pmid:20083614 - 36.

Anderson P, Roth J. Spontaneous tandem genetic duplications in Salmonella typhimurium come up by unequal recombination between rRNA (rrn) cistrons. Proc Natl Acad Sci U S A. 1981;78:3113–7. pmid:6789329 - 37.

Sharp NP, Sandell L, James CG, Otto SP. The genome-wide fee and spectrum of spontaneous mutations differ between haploid and diploid yeast. Proc Natl Acad Sci U S A. 2018;115:E5046–55. pmid:29760081 - 38.

Sui Y, Qi L, Wu J-Ok, Wen X-P, Tang X-X, Ma Z-J, et al. Genome-wide mapping of spontaneous genetic alterations in diploid yeast cells. Proc Natl Acad Sci U S A. 2020;117:28191–200. pmid:33106417 - 39.

Liu H, Zhang J. Yeast Spontaneous Mutation Fee and Spectrum Differ with Surroundings. Curr Biol. 2019;29:1584–1591.e3. pmid:31056389 - 40.

Payen C, Di Rienzi SC, Ong GT, Pogachar JL, Sanchez JC, Sunshine AB, et al. The dynamics of numerous segmental amplifications in populations of Saccharomyces cerevisiae adapting to robust choice. 2014;G3 (4):399–409. - 41.

Solar S, Ke R, Hughes D, Nilsson M, Andersson DI. Genome-wide detection of spontaneous chromosomal rearrangements in micro organism. PLoS ONE. 2012;7:e42639. pmid:22880062 - 42.

Farslow JC, Lipinski KJ, Packard LB, Edgley ML, Taylor J, Flibotte S, et al. Speedy Improve in frequency of gene copy-number variants throughout experimental evolution in Caenorhabditis elegans. BMC Genomics. 2015. pmid:26645535 - 43.

Morgenthaler AB, Kinney WR, Ebmeier CC, Walsh CM, Snyder DJ, Cooper VS, et al. Mutations that enhance effectivity of a weak-link enzyme are uncommon in comparison with adaptive mutations elsewhere within the genome. elife. 2019. pmid:31815667 - 44.

Frickel J, Feulner PGD, Karakoc E, Becks L. Inhabitants measurement modifications and choice drive patterns of parallel evolution in a number–virus system. Nat Commun. 2018;9:1–10. - 45.

DeBolt S. Copy quantity variation shapes genome variety in Arabidopsis over fast household generational scales. Genome Biol Evol. 2010;2:441–53. pmid:20624746 - 46.

Todd RT, Selmecki A. Expandable and reversible copy quantity amplification drives speedy adaptation to antifungal medication. elife. 2020;9. pmid:32687060 - 47.

Sunshine AB, Payen C, Ong GT, Liachko I, Tan KM, Dunham MJ. The health penalties of aneuploidy are pushed by condition-dependent gene results. PLoS Biol. 2015;13:e1002155. pmid:26011532 - 48.

Lauer S, Avecilla G, Spealman P, Sethia G, Brandt N, Levy SF, et al. Single-cell copy quantity variant detection reveals the dynamics and variety of adaptation. PLoS Biol. 2018;16:e3000069. pmid:30562346 - 49.

Harari Y, Ram Y, Rappoport N, Hadany L, Kupiec M. Spontaneous Modifications in Ploidy Are Widespread in Yeast. Curr Biol. 2018;28:825–835.e4. pmid:29502947 - 50.

Gonçalves PJ, Lueckmann J-M, Deistler M, Nonnenmacher M, Öcal Ok, Bassetto G, et al. Coaching deep neural density estimators to establish mechanistic fashions of neural dynamics. elife. 2020;9. pmid:32940606 - 51.

Sunnåker M, Busetto AG, Numminen E, Corander J, Foll M, Dessimoz C. Approximate Bayesian computation. PLoS Comput Biol. 2013;9:e1002803. pmid:23341757 - 52.

Beaumont MA, Zhang W, Balding DJ. Approximate Bayesian computation in inhabitants genetics. Genetics. 2002;162:2025–35. pmid:12524368 - 53.

Foll M, Shim H, Jensen JD. WFABC: a Wright-Fisher ABC-based strategy for inferring efficient inhabitants sizes and choice coefficients from time-sampled information. Mol Ecol Resour. 2015;15:87–98. pmid:24834845 - 54.

Tanaka MM, Francis AR, Luciani F, Sisson SA. Utilizing Approximate Bayesian Computation to Estimate Tuberculosis Transmission Parameters From Genotype Information. Genetics. 2006:1511–20. pmid:16624908 - 55.

Beaumont MA. Approximate Bayesian Computation in Evolution and Ecology. 2010 [cited 18 May 2021]. - 56.

Jennings E, Madigan M. astroABC: An Approximate Bayesian Computation Sequential Monte Carlo sampler for cosmological parameter estimation. Astronomy and Computing. 2017:16–22. - 57.

Financial institution C, Hietpas RT, Wong A, Bolon DN, Jensen JD. A Bayesian MCMC Method to Assess the Full Distribution of Health Results of New Mutations: Uncovering the Potential for Adaptive Walks in Difficult Environments. Genetics. 2014:841–52. pmid:24398421 - 58.

Blanquart F, Bataillon T. Epistasis and the Construction of Health Landscapes: Are Experimental Health Landscapes Suitable with Fisher’s Geometric Mannequin? Genetics. 2016:847–62. pmid:27052568 - 59.

Harari Y, Ram Y, Kupiec M. Frequent ploidy modifications in rising yeast cultures. Curr Genet. 2018;64:1001–4. pmid:29525927 - 60.

Tavaré S, Balding DJ, Griffiths RC, Donnelly P. Inferring Coalescence Instances From DNA Sequence Information. Genetics. 1997:505–18. pmid:9071603 - 61.

Pritchard JK, Seielstad MT, Perez-Lezaun A, Feldman MW. Inhabitants development of human Y chromosomes: a examine of Y chromosome microsatellites. Mol Biol Evol. 1999;16:1791–8. pmid:10605120 - 62.

Marjoram P, Molitor J, Plagnol V, Tavare S. Markov chain Monte Carlo with out likelihoods. Proc Natl Acad Sci U S A. 2003;100:15324–8. pmid:14663152 - 63.

Sisson SA, Fan Y, Tanaka MM. Sequential Monte Carlo with out likelihoods. Proc Natl Acad Sci U S A. 2007;104:1760–5. pmid:17264216 - 64.

Blum MGB, François O. Non-linear regression fashions for Approximate Bayesian Computation. Stat Comput. 2010:63–73. - 65.

Csilléry Ok, François O, Blum MGB. abc: an R bundle for approximate Bayesian computation (ABC). Strategies Ecol Evol. 2012:475–9. - 66.

Flagel L, Brandvain Y, Schrider DR. The Unreasonable Effectiveness of Convolutional Neural Networks in Inhabitants Genetic Inference. Mol Biol Evol. 2019;36:220–38. pmid:30517664 - 67.

Alsing J, Charnock T, Feeney S, Wandelt B. Quick likelihood-free cosmology with neural density estimators and lively studying. Mon Not R Astron Soc. 2019. - 68.

Cranmer Ok, Brehmer J, Louppe G. The frontier of simulation-based inference. Proc Natl Acad Sci U S A. 2020;117:30055–62. pmid:32471948 - 69.

Schenk MF, Zwart MP, Hwang S, Ruelens P, Severing E, Krug J, et al. Inhabitants measurement mediates the contribution of high-rate and large-benefit mutations to parallel evolution. Nat Ecol Evol. 2022. pmid:35241808 - 70.

Klinger E, Rickert D, Hasenauer J. pyABC: distributed, likelihood-free inference. Bioinformatics. 2018;34:3591–3. pmid:29762723 - 71.

Tejero-Cantero A, Boelts J, Deistler M, Lueckmann J-M, Durkan C, Gonçalves P, et al. sbi: A toolkit for simulation-based inference. Journal of Open Supply Software program. 2020:2505. - 72.

Otto SP, Day T. A Biologist’s Information to Mathematical Modeling in Ecology and Evolution. 2007. - 73.

Dean AM. Defending Haploid Polymorphisms in Temporally Variable Environments. Genetics. 2005:1147–56. pmid:15545644 - 74.

Venkataram S, Dunn B, Li Y, Agarwala A, Chang J, Ebel ER, et al. Improvement of a Complete Genotype-to-Health Map of Adaptation-Driving Mutations in Yeast. Cell. 2016;166:1585–1596.e22. pmid:27594428 - 75.

Joseph SB, Corridor DW. Spontaneous Mutations in Diploid Saccharomyces cerevisiae. Genetics. 2004:1817–25. pmid:15611159 - 76.

Corridor DW, Mahmoudizad R, Hurd AW, Joseph SB. Spontaneous mutations in diploid Saccharomyces cerevisiae: one other thousand cell generations. Genet Res. 2008;90: 229–241. pmid:18593510 - 77.

Gillespie DT. Approximate accelerated stochastic simulation of chemically reacting methods. J Chem Phys. 2001:1716–33. - 78.

Lueckmann J-M, Goncalves PJ, Bassetto G, Öcal Ok, Nonnenmacher M, Macke JH. Versatile statistical inference for mechanistic fashions of neural dynamics. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, et al., editors. Advances in Neural Data Processing Techniques 30. Curran Associates, Inc.; 2017. pp. 1289–1299. - 79.

Greenberg DS, Nonnenmacher M, Macke JH. Automated Posterior Transformation for Probability-Free Inference. arXiv [cs.LG]. 2019. Accessible: http://arxiv.org/abs/1905.07488 - 80.

Papamakarios G, Murray I. Quick epsilon -free Inference of Simulation Fashions with Bayesian Conditional Density Estimation. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R, editors. Advances in Neural Data Processing Techniques 29. Curran Associates, Inc.; 2016. pp. 1028–1036. https://doi.org/10.1021/acsami.5b09533 pmid:26696337 - 81.

Prangle D. Adapting the ABC Distance Operate. Bayesian Anal. 2017. - 82.

Klinger E, Hasenauer J. A Scheme for Adaptive Choice of Inhabitants Sizes in Approximate Bayesian Computation—Sequential Monte Carlo. Computational Strategies in Techniques Biology. 2017:128–44. - 83.

Papamakarios G, Pavlakou T, Murray I. Masked Autoregressive Circulation for Density Estimation. arXiv [stat.ML]. 2017. Accessible: http://arxiv.org/abs/1705.07057 - 84.

Durkan C, Bekasov A, Murray I, Papamakarios G. Neural Spline Flows. arXiv [stat.ML]. 2019. Accessible: http://arxiv.org/abs/1906.04032 - 85.

Kruschke JK. Doing Bayesian Information Evaluation: A Tutorial with R, JAGS, and Stan. Tutorial Press; 2014. - 86.

Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB. Bayesian Information Evaluation, Third Version. CRC Press; 2013. - 87.

Kass RE, Raftery AE. Bayes Components. J Am Stat Assoc. 1995:773–95. - 88.

Harrison XA, Donaldson L, Correa-Cano ME, Evans J, Fisher DN, Goodwin CED, et al. A short introduction to combined results modelling and multi-model inference in ecology. PeerJ. 2018;6:e4794. pmid:29844961 - 89.

Levy SF, Blundell JR, Venkataram S, Petrov DA, Fisher DS, Sherlock G. Quantitative evolutionary dynamics utilizing high-resolution lineage monitoring. Nature. 2015;519:181–6. pmid:25731169 - 90.

Aggeli D, Li Y, Sherlock G. Modifications within the distribution of health results and adaptive mutational spectra following a single first step in the direction of adaptation. https://doi.org/10.1101/2020.06.12.148833 - 91.

Lynch M, Sung W, Morris Ok, Coffey N, Landry CR, Dopman EB, et al. A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc Natl Acad Sci U S A. 2008;105:9272–7. pmid:18583475 - 92.

Dorsey M, Peterson C, Bray Ok, Paquin CE. Spontaneous amplification of the ADH4 gene in Saccharomyces cerevisiae. Genetics. 1992;132:943–50. pmid:1459445 - 93.

Zhang H, Zeidler AFB, Tune W, Puccia CM, Malc E, Greenwell PW, et al. Gene copy-number variation in haploid and diploid strains of the yeast Saccharomyces cerevisiae. Genetics. 2013;193:785–801. pmid:23307895 - 94.

Schacherer J, de Montigny J, Welcker A, Souciet J-L, Potier S. Duplication processes in Saccharomyces cerevisiae haploid strains. Nucleic Acids Res. 2005;33:6319–26. pmid:16269823 - 95.

Schacherer J, Tourrette Y, Potier S, Souciet J-L, de Montigny J. Spontaneous duplications in diploid Saccharomyces cerevisiae cells. DNA Restore. 2007;6:1441–52. pmid:17544927 - 96.

Hull RM, Cruz C, Jack CV, Houseley J. Environmental change drives accelerated adaptation by means of stimulated copy quantity variation. PLoS Biol. 2017;15:e2001333. pmid:28654659 - 97.

Whale AJ, King M, Hull RM, Krueger F, Houseley J. Stimulation of adaptive gene amplification by origin firing below replication fork constraint. bioRxiv 2021. Accessible: https://www.biorxiv.org/content material/10.1101/2021.03.04.433911v1.summary - 98.

Hong J, Gresham D. Molecular specificity, convergence and constraint form adaptive evolution in nutrient-poor environments. PLoS Genet. 2014;10:e1004041. pmid:24415948 - 99.

Bermudez-Santana C, Attolini C, Kirsten T, Engelhardt J, Prohaska SJ, Steigele S, et al. Genomic group of eukaryotic tRNAs. BMC Genomics. 2010;11:270–0. pmid:20426822 - 100.

Di Rienzi SC, Collingwood D, Raghuraman MK, Brewer BJ. Fragile genomic websites are related to origins of replication. Genome Biol Evol. 2009;1:350–63. pmid:20333204 - 101.

Labib Ok, Hodgson B, Admire A, Shanks L, Danzl N, Wang M, et al. Replication fork obstacles: pausing for a break or stalling for time? EMBO Rep. 2007;8:346–53. pmid:17401409 - 102.

Chevin L-M. On measuring choice in experimental evolution. Biol Lett. 2011:210–3. pmid:20810425 - 103.

Crow JF, Kimura M. An Introduction to Inhabitants Genetics Concept. Burgess Worldwide Group; 1970. - 104.

Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: basic algorithms for scientific computing in Python. Nat Strategies. 2020;17:261–72. pmid:32015543 - 105.

Hoffman CS, Winston F. A ten-minute DNA preparation from yeast effectively releases autonomous plasmids for transformaion of Escherichia coli. Gene. 1987;57:267–72. pmid:3319781