Project description:The past decade has yielded more biodiversity observations from community science than the past century of traditional scientific collection. This rapid influx of data is promising for overcoming critical biodiversity data shortfalls, but we also have vast untapped resources held in undigitized natural history collections. Yet, the ability of these undigitized collections to fill data gaps, especially compared against the constant accumulation of community science data, remains unclear. Here, we compare how well community science (iNaturalist) observations and digitized herbarium specimens represent the diversity, distributions, and modeling needs of vascular plants in Canada. We find that, despite having only a third as many records, herbarium specimens capture more taxonomic, phylogenetic, and functional diversity and more efficiently capture species' environmental niches. As such, the digitization of Canada's 7.3M remaining specimens has the potential to more than quintuple our ability to model biodiversity. In contrast, it would require over 27M more iNaturalist observations to produce similar benefits. Our findings indicate that digitizing Earth's remaining herbarium specimens is likely an efficient, feasible, and potentially critical investment when it comes to improving our ability to predict and protect biodiversity into the future.
Project description:The herbarium digitization process is an essential first step in transforming the vast amount of data associated with a physical specimen into flexible digital data formats. In this framework, the Herbarium of the University of Pisa (international code PI), at the end of 2018 started a process of digitization focusing on one of its most relevant collections: the Herbarium of Michele Guadagno (1878-1930). This scholar studied flora and vegetation of different areas of southern Italy, building a large herbarium including specimens collected by himself, plus many specimens obtained through exchanges with Italian and foreign botanists. The Herbarium is composed by 547 packages of vascular plants. Metadata were entered into the online database Virtual Herbaria JACQ and mirrored into a personalized virtual Herbarium of the Botanic Museum. After the completion of the digitization process, the number of sheets preserved in the Herbarium amounts to 44,345. Besides Guadagno, who collected 42% of his specimens, a further 1,102 collectors are represented. Most specimens were collected in Europe (91%), but all the continents are represented. As expected, Italy is the most represented country (59%), followed by France, Spain, Germany, and Greece. The specimens cover a time span of 99 years, from 1830 to 1929, whereas the specimens collected by Guadagno range between 1889 and 1928. Furthermore, we traced 134 herbarium sheets associated with documents, among which 75 drawings handmade by Guadagno, 34 letters from various corresponding authors, 16 copies of publications, and 14 copies of published iconographies.
Project description:Premise of the studyHerbarium specimens provide a robust record of historical plant phenology (the timing of seasonal events such as flowering or fruiting). However, the difficulty of aggregating phenological data from specimens arises from a lack of standardized scoring methods and definitions for phenological states across the collections community.Methods and resultsTo address this problem, we report on a consensus reached by an iDigBio working group of curators, researchers, and data standards experts regarding an efficient scoring protocol and a data-sharing protocol for reproductive traits available from herbarium specimens of seed plants. The phenological data sets generated can be shared via Darwin Core Archives using the Extended MeasurementOrFact extension.ConclusionsOur hope is that curators and others interested in collecting phenological trait data from specimens will use the recommendations presented here in current and future scoring efforts. New tools for scoring specimens are reviewed.
Project description:Performing whole genome sequencing (WGS) for the surveillance of antimicrobial resistance offers the ability to determine not only the antimicrobials to which rates of resistance are increasing, but also the evolutionary mechanisms and transmission routes responsible for the increase at local, national, and global scales. To derive WGS-based outputs, a series of processes are required, beginning with sample and metadata collection, followed by nucleic acid extraction, library preparation, sequencing, and analysis. Throughout this pathway there are many data-related operations required (informatics) combined with more biologically focused procedures (bioinformatics). For a laboratory aiming to implement pathogen genomics, the informatics and bioinformatics activities can be a barrier to starting on the journey; for a laboratory that has already started, these activities may become overwhelming. Here we describe these data bottlenecks and how they have been addressed in laboratories in India, Colombia, Nigeria, and the Philippines, as part of the National Institute for Health Research Global Health Research Unit on Genomic Surveillance of Antimicrobial Resistance. The approaches taken include the use of reproducible data parsing pipelines and genome sequence analysis workflows, using technologies such as Data-flo, the Nextflow workflow manager, and containerization of software dependencies. By overcoming barriers to WGS implementation in countries where genome sampling for some species may be underrepresented, a body of evidence can be built to determine the concordance of antimicrobial sensitivity testing and genome-derived resistance, and novel high-risk clones and unknown mechanisms of resistance can be discovered.
Project description:Patient-derived autologous chimeric antigen receptor (CAR)-T cell therapy is a revolutionary breakthrough in immunotherapy and has made impressive progress in both preclinical and clinical studies. However, autologous CAR-T cells still have notable drawbacks in clinical manufacture, such as long production time, variable cell potency and possible manufacturing failures. Allogeneic CAR-T cell therapy is significantly superior to autologous CAR-T cell therapy in these aspects. The use of allogeneic CAR-T cell therapy may provide simplified manufacturing process and allow the creation of 'off-the-shelf' products, facilitating the treatments of various types of tumors at less delivery time. Nevertheless, severe graft-versus-host disease (GvHD) or host-mediated allorejection may occur in the allogeneic setting, implying that addressing these two critical issues is urgent for the clinical application of allogeneic CAR-T cell therapy. In this review, we summarize the current approaches to overcome GvHD and host rejection, which empower allogeneic CAR-T cell therapy with a broader future.
Project description:Digitization of molecular complexity is of key importance in chemistry and life sciences to develop structure-activity relationships in chemical behavior and biological activity. The complexity of a given molecule compared to others is largely based on intuitive perception and lacks a standardized numerical measure. Quantifying molecular complexity remains a fundamental challenge, with key implications currently remaining controversial. In this study, we introduce a novel machine learning-based framework employing a Learning to Rank (LTR) approach to quantify molecular complexity on the basis of labeled data. As a result, we developed a ranking model utilizing the dataset that comprizes approximately 300 000 data points across diverse chemical structures, leveraging human expertise to capture complex decision rules that researchers intuitively use. Applications of our model in mapping the current organic chemistry landscape, analyzing FDA-approved drugs, guiding lead optimization processes, and interpreting total synthesis approaches reveal key trends in increasing molecular complexity and synthetic strategy evolution. Our study advances the methodologies available for quantifying molecular complexity, changing it from an elusive property to a numerical characteristic. With machine learning, we managed to digitize human perception of molecular complexity. Moreover, a corresponding large labeled dataset was produced for future research in this area.
Project description:Terpenoids are a large and diverse class of plant metabolites that also includes volatile mono- and sesquiterpenes which are involved in biotic interactions of plants. Due to the limited natural availability of these terpenes and the tight regulation of their biosynthesis, there is strong interest to introduce or enhance their production in crop plants by metabolic engineering for agricultural, pharmaceutical and industrial applications. While engineering of monoterpenes has been quite successful, expression of sesquiterpene synthases in engineered plants frequently resulted in production of only minor amounts of sesquiterpenes. To identify bottlenecks for sesquiterpene engineering in plants, we have used two nearly identical terpene synthases, snapdragon (Antirrhinum majus) nerolidol/linalool synthase-1 and -2 (AmNES/LIS-1/-2), that are localized in the cytosol and plastids, respectively. Since these two bifunctional terpene synthases have very similar catalytic properties with geranyl diphosphate (GPP) and farnesyl diphosphate (FPP), their expression in target tissues allows indirect determination of the availability of these substrates in both subcellular compartments. Both terpene synthases were expressed under control of the ripening specific PG promoter in tomato fruits, which are characterized by a highly active terpenoid metabolism providing precursors for carotenoid biosynthesis. As AmNES/LIS-2 fruits produced the monoterpene linalool, AmNES/LIS-1 fruits were found to exclusively produce the sesquiterpene nerolidol. While nerolidol emission in AmNES/LIS-1 fruits was 60- to 584-fold lower compared to linalool emission in AmNES/LIS-2 fruits, accumulation of nerolidol-glucosides in AmNES/LIS-1 fruits was 4- to 14-fold lower than that of linalool-glucosides in AmNES/LIS-2 fruits. These results suggest that only a relatively small pool of FPP is available for sesquiterpene formation in the cytosol. To potentially overcome limitations in sesquiterpene production, we transiently co-expressed the key pathway-enzymes hydroxymethylglutaryl-CoA reductase (HMGR) and 1-deoxy-D-xylulose 5-phosphate synthase (DXS), as well as the regulator isopentenyl phosphate kinase (IPK). While HMGR and IPK expression increased metabolic flux toward nerolidol formation 5.7- and 2.9-fold, respectively, DXS expression only resulted in a 2.5-fold increase.
Project description:Emerging diseases are a major challenge to public health. Revealing the evolutionary processes that allow novel pathogens to adapt to new hosts, also the potential barriers to host adaptation, is central to understanding the drivers of disease emergence. In particular, it is unclear how the genetics and ecology of pathogens interact to shape the likelihood of successful cross-species transmission. To better understand the determinants of host adaptation and emergence, we modelled key aspects of pathogen evolutionary dynamics at both intra- and inter-host scales, using parameter values similar to those observed in influenza virus. We considered the possibility of acquiring the necessary host adaptive mutations both before ('off-the-shelf' emergence) and after ('tailor-made' emergence) a virus is transmitted from a donor to a new recipient species. Under both scenarios, population bottlenecks at inter-host transmission act as a major barrier to host adaptation, greatly limiting the number of adaptive mutations that are able to cross the species barrier. In addition, virus emergence is hindered if the fitness valley between the donor and recipient hosts is either too steep or too shallow. Overall, our results reveal where in evolutionary parameter space a virus could adapt to and become transmissible in a new species.