Project description:Observable patterns of cultural variation are consistently intertwined with demic movements, cultural diffusion, and adaptation to different ecological contexts [Cavalli-Sforza and Feldman (1981) Cultural Transmission and Evolution: A Quantitative Approach; Boyd and Richerson (1985) Culture and the Evolutionary Process]. The quantitative study of gene-culture coevolution has focused in particular on the mechanisms responsible for change in frequency and attributes of cultural traits, the spread of cultural information through demic and cultural diffusion, and detecting relationships between genetic and cultural lineages. Here, we make use of worldwide whole-genome sequences [Pagani et al. (2016) Nature 538:238-242] to assess the impact of processes involving population movement and replacement on cultural diversity, focusing on the variability observed in folktale traditions (n = 596) [Uther (2004) The Types of International Folktales: A Classification and Bibliography. Based on the System of Antti Aarne and Stith Thompson] in Eurasia. We find that a model of cultural diffusion predicted by isolation-by-distance alone is not sufficient to explain the observed patterns, especially at small spatial scales (up to [Formula: see text]4,000 km). We also provide an empirical approach to infer presence and impact of ethnolinguistic barriers preventing the unbiased transmission of both genetic and cultural information. After correcting for the effect of ethnolinguistic boundaries, we show that, of the alternative models that we propose, the one entailing cultural diffusion biased by linguistic differences is the most plausible. Additionally, we identify 15 tales that are more likely to be predominantly transmitted through population movement and replacement and locate putative focal areas for a set of tales that are spread worldwide.
Project description:The transition from MIRU-VNTR-based epidemiology studies in tuberculosis (TB) to genomic epidemiology has transformed how we track transmission. However, short-read sequencing is poor at analyzing repetitive regions such as the MIRU-VNTR loci. This causes a gap between the new genomic data and the large amount of information stored in historical databases. Long-read sequencing could bridge this knowledge gap by allowing analysis of repetitive regions. However, the feasibility of extracting MIRU-VNTRs from long reads and linking them to historical data has not been evaluated. In our study, an in silico arm, consisting of inference of MIRU patterns from long-read sequences (using MIRUReader program), was compared with an experimental arm, involving standard amplification and fragment sizing. We analyzed overall performance on 39 isolates from South Africa and confirmed reproducibility in a sample enriched with 62 clustered cases from Spain. Finally, we ran 25 consecutive incident cases, demonstrating the feasibility of correctly assigning new clustered/orphan cases by linking data inferred from genomic analysis to MIRU-VNTR databases. Of the 3,024 loci analyzed, only 11 discrepancies (0.36%) were found between the two arms: three attributed to experimental error and eight to misassigned alleles from long-read sequencing. A second round of analysis of these discrepancies resulted in agreement between the experimental and in silico arms in all but one locus. Adjusting the MIRUReader program code allowed us to flag potential in silico misassignments due to suboptimal coverage or unfixed double alleles. Our study indicates that long-read sequencing could help address potential chronological and geographical gaps arising from the transition from molecular to genomic epidemiology of tuberculosis.ImportanceThe transition from molecular epidemiology in tuberculosis (TB), based on the analysis of repetitive regions (VNTR-based genotyping), to genomic epidemiology transforms in the precision with which we track transmission. However, short-read sequencing, the most common method for performing genomic analysis, is poor at analyzing repetitive regions. This means that we face a gap between the new genomic data and the large amount of information stored in historical databases, which is also an obstacle to cross-national surveillance involving settings where only molecular data are available. Long-read sequencing could help bridge this knowledge gap by allowing analysis of repetitive regions. Our study demonstrates that MIRU-VNTR patterns can be successfully inferred from long-read sequences, allowing the correct assignment of new cases as clustered/orphan by linking new data extracted from genomic analysis to historical MIRU-VNTR databases. Our data may provide a starting point for bridging the knowledge gap between the molecular and genomic eras in tuberculosis epidemiology.
Project description:High-throughput sequencing based techniques, such as 16S rRNA gene profiling, have the potential to elucidate the complex inner workings of natural microbial communities - be they from the world's oceans or the human gut. A key step in exploring such data is the identification of dependencies between members of these communities, which is commonly achieved by correlation analysis. However, it has been known since the days of Karl Pearson that the analysis of the type of data generated by such techniques (referred to as compositional data) can produce unreliable results since the observed data take the form of relative fractions of genes or species, rather than their absolute abundances. Using simulated and real data from the Human Microbiome Project, we show that such compositional effects can be widespread and severe: in some real data sets many of the correlations among taxa can be artifactual, and true correlations may even appear with opposite sign. Additionally, we show that community diversity is the key factor that modulates the acuteness of such compositional effects, and develop a new approach, called SparCC (available at https://bitbucket.org/yonatanf/sparcc), which is capable of estimating correlation values from compositional data. To illustrate a potential application of SparCC, we infer a rich ecological network connecting hundreds of interacting species across 18 sites on the human body. Using the SparCC network as a reference, we estimated that the standard approach yields 3 spurious species-species interactions for each true interaction and misses 60% of the true interactions in the human microbiome data, and, as predicted, most of the erroneous links are found in the samples with the lowest diversity.
Project description:Homologous recombination is a central feature of bacterial evolution, yet it confounds traditional phylogenetic methods. While a number of methods specific to bacterial evolution have been developed, none of these permit joint inference of a bacterial recombination graph and associated parameters. In this article, we present a new method which addresses this shortcoming. Our method uses a novel Markov chain Monte Carlo algorithm to perform phylogenetic inference under the ClonalOrigin model. We demonstrate the utility of our method by applying it to ribosomal multilocus sequence typing data sequenced from pathogenic and nonpathogenic Escherichia coli serotype O157 and O26 isolates collected in rural New Zealand. The method is implemented as an open source BEAST 2 package, Bacter, which is available via the project web page at http://tgvaughan.github.io/bacter.
Project description:Inferring the structure of human populations from genetic variation data is a key task in population and medical genomic studies. Although a number of methods for population structure inference have been proposed, current methods are impractical to run on biobank-scale genomic datasets containing millions of individuals and genetic variants. We introduce SCOPE, a method for population structure inference that is orders of magnitude faster than existing methods while achieving comparable accuracy. SCOPE infers population structure in about a day on a dataset containing one million individuals and variants as well as on the UK Biobank dataset containing 488,363 individuals and 569,346 variants. Furthermore, SCOPE can leverage allele frequencies from previous studies to improve the interpretability of population structure estimates.
Project description:ObjectiveThe characterization of spatial network dynamics is desirable for a better understanding of seizure physiology. The goal of this work is to develop a computational method for identifying transient spatial patterns from intracranial electroencephalographic (iEEG) data.MethodsStarting with bivariate synchrony measures, such as phase correlation, a two-step clustering procedure is used to identify statistically significant spatial network patterns, whose temporal evolution can be inferred. We refer to this as the composite synchrony profile (CSP) method.ResultsThe CSP method was verified with simulated data and evaluated using ictal and interictal recordings from three patients with intractable epilepsy. Application of the CSP method to these clinical iEEG datasets revealed a set of distinct CSPs with topographies consistent with medial temporal/limbic and superior parietal/medial frontal networks thought to be involved in the seizure generation process.ConclusionsBy combining relatively straightforward multivariate signal processing techniques, such as phase synchrony, with clustering and statistical hypothesis testing, the methods we describe may prove useful for network definition and identification.SignificanceThe network patterns we observe using the CSP method cannot be inferred from direct visual inspection of the raw time series data, nor are they apparent in voltage-based topographic map sequences.
Project description:Genomic data are informative about the history of species divergence and interspecific gene flow, including the direction, timing, and strength of gene flow. However, gene flow in opposite directions generates similar patterns in multilocus sequence data, such as reduced sequence divergence between the hybridizing species. As a result, inference of the direction of gene flow is challenging. Here, we investigate the information about the direction of gene flow present in genomic sequence data using likelihood-based methods under the multispecies-coalescent-with-introgression model. We analyze the case of two species, and use simulation to examine cases with three or four species. We find that it is easier to infer gene flow from a small population to a large one than in the opposite direction, and easier to infer inflow (gene flow from outgroup species to an ingroup species) than outflow (gene flow from an ingroup species to an outgroup species). It is also easier to infer gene flow if there is a longer time of separate evolution between the initial divergence and subsequent introgression. When introgression is assumed to occur in the wrong direction, the time of introgression tends to be correctly estimated and the Bayesian test of gene flow is often significant, while estimates of introgression probability can be even greater than the true probability. We analyze genomic sequences from Heliconius butterflies to demonstrate that typical genomic datasets are informative about the direction of interspecific gene flow, as well as its timing and strength.