Project description:MotivationGene trees often differ from the species trees that contain them due to various factors, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL). Several highly accurate species tree estimation methods have been introduced to explicitly address ILS, including ASTRAL, a widely used statistically consistent method, and wQFM, a quartet amalgamation approach experimentally shown to be more accurate than ASTRAL. Two recent advancements, ASTRAL-Pro and DISCO, have emerged in phylogenomics to consider GDL. ASTRAL-Pro introduces a refined quartet similarity measure, accounting for orthology and paralogy. On the other hand, DISCO offers a general strategy to decompose multi-copy gene trees into a collection of single-copy trees, allowing the utilization of methods previously designed for species tree inference in the context of single-copy gene trees.ResultsIn this study, we first introduce some variants of DISCO to examine its underlying hypotheses and present analytical results on the statistical guarantees of DISCO. In particular, we introduce DISCO-R, a variant of DISCO with a refined and improved pruning strategy that provides more accurate and robust results. We then demonstrate with extensive evaluation studies on a collection of simulated and real data sets that wQFM paired with DISCO variants consistently matches or outperforms ASTRAL-Pro and other competing methods.Availability and implementationDISCO-R and other variants are freely available at https://github.com/skhakim/DISCO-variants.
Project description:In recent years, many publications have established histone lysine methylation as a central epigenetic modification in the regulation of chromatin and transcription. The histone lysine methyltransferases contain a conserved SET domain and are widely distributed in various organisms. However, a comprehensive study on the origin and diversification of the SET-domain-containing genes in fungi has not been conducted. In this study, a total of 3816 SET-domain-containing genes, which were identified and characterized using HmmSearch from 229 whole genomes sequenced fungal species, were used to ascertain their evolution and diversification in fungi. Using the CLANS program, all the SET-domain-containing genes were grouped into three main clusters, and each cluster contains several groups. Domain organization analysis showed that genes belonging to the same group have similar sequence structures. In contrast, different groups process domain organizations or locations differently, suggesting the SET-domain-containing genes belonging to different groups may have obtained distinctive regulatory mechanisms during their evolution. These genes that conduct the histone methylations (such as H3K4me, H3K9me, H3K27me, H4K20me, H3K36me) are mainly grouped into Cluster 1 while the other genes grouped into Clusters 2 and 3 are still functionally undetermined. Our results also showed that numerous gene duplication and loss events have happened during the evolution of those fungal SET-domain-containing proteins. Our results provide novel insights into the roles of SET-domain genes in fungal evolution and pave a fundamental path to further understanding the epigenetic basis of gene regulation in fungi.
Project description:Gene duplication provides an important source of genetic raw material for phenotypic diversification, but few studies have detailed the mechanisms through which duplications produce evolutionary novelty within species. Here, we investigate how a set of recently duplicated homologs of the floral inducer FLOWERING LOCUS T (FT) has contributed to sunflower domestication. We find that changes in expression of these duplicates are associated with differences in flowering behavior between wild and domesticated sunflower. In addition, we present genetic and functional evidence demonstrating that a frameshift mutation in one paralog, Helianthus annuus FT 1 (HaFT1), underlies a major QTL for flowering time and experienced a selective sweep during early domestication. Notably, this dominant-negative allele delays flowering through interference with action of another paralog, HaFT4. Together, these data reveal that changes affecting the expression, sequence, and gene interactions of HaFT paralogs have played key roles during sunflower domestication. Our findings also illustrate the important role that evolving interactions between new gene family members may play in fostering phenotypic change.
Project description:Aster flaccidus is a perennial medicinal plant belong the sunflower family Compositae, which is widely distributed in China and some other Asian countries. The complete chloroplast genome sequence of A. flaccidus was sequenced using the Illumina Hiseq 4000 platform. The size of the A. flaccidus chloroplast genome is 151,329 bp, with an average GC content of 37.5%. This circular molecule has a typical quadripartite structure containing a large single copy (LSC) region of 83,480 bp, a small single copy (SSC) region of 18,149 bp, and two inverted (IRs) repeat regions of 24,850 bp. A total of 132 genes were successfully annotated containing 87 protein-coding genes, 37 tRNA genes, 8 rRNA genes. A maximum likelihood (ML) phylogenetic tree supported that the chloroplast genome of A. flaccidus is closely related to that of Aster indicus.
Project description:Powering and communicating with wearable devices on bio-interfaces is challenging due to strict weight, size, and resource constraints. This study presents a sunflower-like plant-wearable sensing device that harnesses solar energy, achieving complete energy self-sustainability for long-term monitoring of plant sap flow, a crucial indicator of plant health. It features foldable solar panels along with all essential flexible electronic components, resulting in a compact system that is lightweight enough for small plants. To tackle the low-energy density of solar power, we developed an ultralow-energy light communication mechanism inspired by fireflies. Together with unmanned aerial vehicles and deep learning algorithms, this approach enables efficient data retrieval from multiple devices across large agricultural fields. With its simple deployment, it shows great potential as a low-cost plant phenotyping tool. We believe our energy and communication solution for wearable devices can be extended to similar resource-limited and challenging scenarios, leading to exciting applications.
Project description:BACKGROUND: In order to interpret the results obtained from a microarray experiment, researchers often shift focus from analysis of individual differentially expressed genes to analyses of sets of genes. These gene-set analysis (GSA) methods use previously accumulated biological knowledge to group genes into sets and then aim to rank these gene sets in a way that reflects their relative importance in the experimental situation in question. We suspect that the presence of paralogs affects the ability of GSA methods to accurately identify the most important sets of genes for subsequent research. RESULTS: We show that paralogs, which typically have high sequence identity and similar molecular functions, also exhibit high correlation in their expression patterns. We investigate this correlation as a potential confounding factor common to current GSA methods using Indygene http://www.cbio.uct.ac.za/indygene, a web tool that reduces a supplied list of genes so that it includes no pairwise paralogy relationships above a specified sequence similarity threshold. We use the tool to reanalyse previously published microarray datasets and determine the potential utility of accounting for the presence of paralogs. CONCLUSIONS: The Indygene tool efficiently removes paralogy relationships from a given dataset and we found that such a reduction, performed prior to GSA, has the ability to generate significantly different results that often represent novel and plausible biological hypotheses. This was demonstrated for three different GSA approaches when applied to the reanalysis of previously published microarray datasets and suggests that the redundancy and non-independence of paralogs is an important consideration when dealing with GSA methodologies.
Project description:MotivationWith the rapid growth rate of newly sequenced genomes, species tree inference from multiple genes has become a basic bioinformatics task in comparative and evolutionary biology. However, accurate species tree estimation is difficult in the presence of gene tree discordance, which is often due to incomplete lineage sorting (ILS), modelled by the multi-species coalescent. Several highly accurate coalescent-based species tree estimation methods have been developed over the last decade, including MP-EST. However, the running time for MP-EST increases rapidly as the number of species grows.ResultsWe present divide-and-conquer techniques that improve the scalability of MP-EST so that it can run efficiently on large datasets. Surprisingly, this technique also improves the accuracy of species trees estimated by MP-EST, as our study shows on a collection of simulated and biological datasets.
Project description:Dual flagellar systems have been described in several bacterial genera, but the extent of their prevalence has not been fully explored. Bradyrhizobium diazoefficiens USDA 110T possesses two flagellar systems, the subpolar and the lateral flagella. The lateral flagellum of Bradyrhizobium displays no obvious role, since its performance is explained by cooperation with the subpolar flagellum. In contrast, the lateral flagellum is the only type of flagella present in the related Rhizobiaceae family. In this work, we have analyzed the phylogeny of the Bradyrhizobium genus by means of Genome-to-Genome Blast Distance Phylogeny (GBDP) and Average Nucleotide Identity (ANI) comparisons of 128 genomes and divided it into 13 phylogenomic groups. While all the Bradyrhizobium genomes encode the subpolar flagellum, none of them encodes only the lateral flagellum. The simultaneous presence of both flagella is exclusive of the B. japonicum phylogenomic group. Additionally, 292 Rhizobiales order genomes were analyzed and both flagellar systems are present together in only nine genera. Phylogenetic analysis of 150 representative Rhizobiales genomes revealed an uneven distribution of these flagellar systems. While genomes within and close to the Rhizobiaceae family only possess the lateral flagellum, the subpolar flagellum is exclusive of more early-diverging families, where certain genera also present both flagella.
Project description:Cnidaria, the sister group to Bilateria, is a highly diverse group of animals in terms of morphology, lifecycles, ecology, and development. How this diversity originated and evolved is not well understood because phylogenetic relationships among major cnidarian lineages are unclear, and recent studies present contrasting phylogenetic hypotheses. Here, we use transcriptome data from 15 newly-sequenced species in combination with 26 publicly available genomes and transcriptomes to assess phylogenetic relationships among major cnidarian lineages. Phylogenetic analyses using different partition schemes and models of molecular evolution, as well as topology tests for alternative phylogenetic relationships, support the monophyly of Medusozoa, Anthozoa, Octocorallia, Hydrozoa, and a clade consisting of Staurozoa, Cubozoa, and Scyphozoa. Support for the monophyly of Hexacorallia is weak due to the equivocal position of Ceriantharia. Taken together, these results further resolve deep cnidarian relationships, largely support traditional phylogenetic views on relationships, and provide a historical framework for studying the evolutionary processes involved in one of the most ancient animal radiations.
Project description:Interpretation of mercury (Hg) geochemistry in environmental systems remains a challenge. This is largely associated with the inability to identify specific Hg transformation processes and species using established analytical methods in Hg geochemistry (total Hg and Hg speciation). In this study, we demonstrate the improved Hg geochemical interpretation, particularly related to process tracing, that can be achieved when Hg stable isotope analyses are complemented by a suite of more established methods and applied to both solid- (soil) and liquid-phases (groundwater) across two Hg2+-chloride (HgCl2) contaminated sites with distinct geological and physicochemical properties. This novel approach allowed us to identify processes such as Hg2+ (i.e., HgCl2) sorption to the solid-phase, Hg2+ speciation changes associated with changes in groundwater level and redox conditions (particularly in the upper aquifer and capillary fringe), Hg2+ reduction to Hg0, and dark abiotic redox equilibration between Hg0 and Hg(II). Hg stable isotope analyses play a critical role in our ability to distinguish, or trace, these in situ processes. While we caution against the non-critical use of Hg isotope data for source tracing in environmental systems, due to potentially variable source signatures and overprinting by transformation processes, our study demonstrates the benefits of combining multiple analytical approaches, including Hg isotope ratios as a process tracer, to obtain an improved picture of the enigmatic geochemical behavior and fate of Hg at contaminated legacy sites.