Project description:It is now evident that DNA forms an organized nuclear architecture, which is essential to maintain the structural and functional integrity of the genome. Chromatin organization can be systematically studied due to the recent boom in chromosome conformation capture technologies (e.g., 3C and its successors 4C, 5C and Hi-C), which is accompanied by the development of computational pipelines to identify biologically meaningful chromatin contacts in such data. However, not all tools are applicable to all experimental designs and all structural features. Capture Hi-C (CHi-C) is a method that uses an intermediate hybridization step to target and select predefined regions of interest in a Hi-C library, thereby increasing effective sequencing depth for those regions. It allows researchers to investigate fine chromatin structures at high resolution, for instance promoter-enhancer loops, but it introduces additional biases with the capture step, and therefore requires specialized pipelines. Here, we compare multiple analytical pipelines for CHi-C data analysis. We consider the effect of retaining multi-mapping reads and compare the efficiency of different statistical approaches in both identifying reproducible interactions and determining biologically significant interactions. At restriction fragment level resolution, the number of multi-mapping reads that could be rescued was negligible. The number of identified interactions varied widely, depending on the analytical method, indicating large differences in type I and type II error rates. The optimal pipeline depends on the project-specific tolerance level of false positive and false negative chromatin contacts.
Project description:BackgroundNext-generation sequencing (NGS) has provided an alternative strategy to study the composition of nematode communities with increased resolution and sensitivity. However, the handling and processing of gigabytes worth of amplicon sequence data produced by an NGS platform is still a major hurdle, limiting the use and adoption of faster and more convenient analysis software.MethodsIn total 32 paired, fecal samples from Swedish sheep flocks were cultured and the larvae subsequently harvested subjected to internal transcribed spacer 2 (ITS2) amplicon sequencing using the PacBio platform. Samples were analyzed with three different bioinformatic pipelines, i.e. the DADA2, Mothur and SCATA pipelines, to determine species composition and richness.ResultsFor the the major species tested in this study (Haemonchus contortus, Teladorsagia circumcinta and Trichostrongylus colubriformis) neither relative abundances nor species diversity differed significantly between the three pipelines, effectively showing that all three analysis pipelines, although different in their approaches, yield nearly identical outcomes. In addition, the samples analyzed here had especially high frequencies of H. contortus (90-95% across the three pipelines) both before and after sample treatment, followed by T. circumcinta (3.5-4%). This shows that H. contortus is the parasite of primary importance in contemporary Swedish sheep farms struggling with anthelmintic resistance. Finally, although on average a significant reduction in egg counts was achieved post-treatment, no significant shifts in major species relative frequencies occurred, indicating highly rigid community structures at sheep farms where anthelmintic resistance has been reported.ConclusionsThe findings presented here further contribute to the development and application of NGS technology to study nemabiome compositions in sheep, in addition to expanding our understanding about the most recent changes in parasite species abundances from Swedish sheep farms struggling with anthelmintic resistance.
Project description:Reconstructing phylogeny from retrotransposon insertions is often limited by access to only a single reference genome, whereby support for clades that do not include the reference taxon cannot be directly observed. Here we have developed a new statistical framework that accounts for this ascertainment bias, allowing us to employ phylogenetically powerful retrotransposon markers to explore the radiation of the largest living marsupials, the kangaroos and wallabies of the genera Macropus and Wallabia. An exhaustive in silico screening of the tammar wallaby (Macropus eugenii) reference genome followed by experimental screening revealed 29 phylogenetically informative retrotransposon markers belonging to a family of endogenous retroviruses. We identified robust support for the enigmatic swamp wallaby (Wallabia bicolor) falling within a paraphyletic genus, Macropus. Our statistical approach provides a means to test for incomplete lineage sorting and introgression/hybridization in the presence of the ascertainment bias. Using retrotransposons as "molecular fossils", we reveal one of the most complex patterns of hemiplasy yet identified, during the rapid diversification of kangaroos and wallabies. Ancestral state reconstruction incorporating the new retrotransposon phylogenetic information reveals multiple independent ecological shifts among kangaroos into more open habitats, coinciding with the Pliocene onset of increased aridification in Australia from ~3.6 million years ago.
Project description:BackgroundDivide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of such approaches.ResultsIn this paper, we introduce a divide-and-conquer approach that does not require supertree estimation: we divide the species set into pairwise disjoint subsets, construct a tree on each subset using a base method, and then combine the subset trees using a distance matrix. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of Neighbor Joining (NJ); thus, NJMerge can be viewed either as a method for improving traditional NJ or as a method for scaling the base method to larger datasets. We prove that NJMerge can be used to create divide-and-conquer pipelines that are statistically consistent under some models of evolution. We also report the results of an extensive simulation study evaluating NJMerge on multi-locus datasets with up to 1000 species. We found that NJMerge sometimes improved the accuracy of traditional NJ and substantially reduced the running time of three popular species tree methods (ASTRAL-III, SVDquartets, and "concatenation" using RAxML) without sacrificing accuracy. Finally, although NJMerge can fail to return a tree, in our experiments, NJMerge failed on only 11 out of 2560 test cases.ConclusionsTheoretical and empirical results suggest that NJMerge is a valuable technique for large-scale phylogeny estimation, especially when computational resources are limited. NJMerge is freely available on Github (http://github.com/ekmolloy/njmerge).
Project description:The plant family Bignoniaceae is a conspicuous and charismatic element of the tropical flora. The family has a complex taxonomic history, with substantial changes in the classification of the group during the past two centuries. Recent re-classifications at the tribal and generic levels have been largely possible by the availability of molecular phylogenies reconstructed using Sanger sequencing data. However, our complete understanding of the systematics, evolution, and biogeography of the family remains incomplete, especially due to the low resolution and support of different portions of the Bignoniaceae phylogeny. To overcome these limitations and increase the amount of molecular data available for phylogeny reconstruction within this plant family, we developed a bait kit targeting 762 nuclear genes, including 329 genes selected specifically for the Bignoniaceae; 348 genes obtained from the Angiosperms353 with baits designed specifically for the family; and, 85 low-copy genes of known function. On average, 77.4% of the reads mapped to the targets, and 755 genes were obtained per species. After removing genes with putative paralogs, 677 loci were used for phylogenetic analyses. On-target genes were compared and combined in the Exon-Only dataset, and on-target + off-target regions were combined in the Supercontig dataset. We tested the performance of the bait kit at different taxonomic levels, from family to species-level, using 38 specimens of 36 different species of Bignoniaceae, representing: 1) six (out of eight) tribal level-clades (e.g., Bignonieae, Oroxyleae, Tabebuia Alliance, Paleotropical Clade, Tecomeae, and Jacarandeae), only Tourrettieae and Catalpeae were not sampled; 2) all 20 genera of Bignonieae; 3) seven (out of nine) species of Dolichandra (e.g., D. chodatii, D. cynanchoides, D. dentata, D. hispida, D. quadrivalvis, D. uncata, and D. uniguis-cati), only D. steyermarkii and D. unguiculata were not sampled; and 4) three individuals of Dolichandra unguis-cati. Our data reconstructed a well-supported phylogeny of the Bignoniaceae at different taxonomic scales, opening new perspectives for a comprehensive phylogenetic framework for the family as a whole.
Project description:Genetic variants on non-recombining DNA and the hierarchical order in which they accumulate are commonly of interest. This variant hierarchy can be established and combined with information on the population and geographic origin of the individuals carrying the variants to find population structures and infer migration patterns. Further, individuals can be assigned to the characterized populations, which is relevant in forensic genetics, genetic genealogy, and epidemiologic studies. However, there is currently no straightforward method to obtain such a variant hierarchy. Here, we introduce the software SNPtotree v1.0, which uniquely determines the hierarchical order of variants on non-recombining DNA without error-prone manual sorting. The algorithm uses pairwise variant comparisons to infer their relationships and integrates the combined information into a phylogenetic tree. Variants that have contradictory pairwise relationships or ambiguous positions in the tree are removed by the software. When benchmarked using two human Y-chromosomal massively parallel sequencing datasets, SNPtotree outperforms traditional methods in the accuracy of phylogenetic trees for sequencing data with high amounts of missing information. The phylogenetic trees of variants created using SNPtotree can be used to establish and maintain publicly available phylogeny databases to further explore genetic epidemiology and genealogy, as well as population and forensic genetics.
Project description:Early efforts to classify Mortierellaceae were based on macro- and micromorphology, but sequencing and phylogenetic studies with ribosomal DNA (rDNA) markers have demonstrated conflicting taxonomic groupings and polyphyletic genera. Although some taxonomic confusion in the family has been clarified, rDNA data alone is unable to resolve higher level phylogenetic relationships within Mortierellaceae. In this study, we applied two parallel approaches to resolve the Mortierellaceae phylogeny: low coverage genome (LCG) sequencing and high-throughput, multiplexed targeted amplicon sequencing to generate sequence data for multi-gene phylogenetics. We then combined our datasets to provide a well-supported genome-based phylogeny having broad sampling depth from the amplicon dataset. Resolving the Mortierellaceae phylogeny into monophyletic groups led to the definition of 14 genera, 7 of which are newly proposed. Low-coverage genome sequencing proved to be a relatively cost-effective means of generating a well-resolved phylogeny. The multi-gene phylogenetics approach enabled much greater sampling depth and breadth than the LCG approach, but was unable to resolve higher-level organization of groups. We present this work to resolve some of the taxonomic confusion and provide a genus-level framework to empower future studies on Mortierellaceae diversity, biology, and evolution.
Project description:BackgroundThe great diversity in plant genome size and chromosome number is partly due to polyploidization (i.e. genome doubling events). The differences in genome size and chromosome number among diploid plant species can be a window into the intriguing phenomenon of past genome doubling that may be obscured through time by the process of diploidization. The genus Hibiscus L. (Malvaceae) has a wide diversity of chromosome numbers and a complex genomic history. Hibiscus is ideal for exploring past genomic events because although two ancient genome duplication events have been identified, more are likely to be found due to its diversity of chromosome numbers. To reappraise the history of whole-genome duplication events in Hibiscus, we tested three alternative scenarios describing different polyploidization events.ResultsUsing target sequence capture, we designed a new probe set for Hibiscus and generated 87 orthologous genes from four diploid species. We detected paralogues in > 54% putative single-copy genes. 34 of these genes were selected for testing three different genome duplication scenarios using gene counting. All species of Hibiscus sampled shared one genome duplication with H. syriacus, and one whole genome duplication occurred along the branch leading to H. syriacus.ConclusionsHere, we corroborated the independent genome doubling previously found in the lineage leading to H. syriacus and a shared genome doubling of this lineage and the remainder of Hibiscus. Additionally, we found a previously undiscovered genome duplication shared by the /Pavonia and /Malvaviscus clades (both nested within Hibiscus) with the occurrences of two copies in what were otherwise single-copy genes. Our results highlight the complexity of genomic diversity in some plant groups, which makes orthology assessment and accurate phylogenomic inference difficult.