Project description:Dependent on concise, pre-defined protein sequence databases, traditional search algorithms perform poorly when analyzing mass spectra derived from wholly uncharacterized protein products. Conversely, de novo peptide sequencing algorithms can interpret mass spectra without relying on reference databases. However, such algorithms have been difficult to apply to complex protein mixtures, in part due to a lack of methods for automatically validating de novo sequencing results. Here, we present novel metrics for benchmarking de novo sequencing algorithm performance on large scale proteomics datasets, and present a method for accurately calibrating false discovery rates on de novo results. We also present a novel algorithm (LADS) which leverages experimentally disambiguated fragmentation spectra to boost sequencing accuracy and sensitivity. LADS improves sequencing accuracy on longer peptides relative to other algorithms and improves discriminability of correct and incorrect sequences. Using these advancements, we demonstrate accurate de novo identification of peptide sequences not identifiable using database search-based approaches.
Project description:Therapy-related acute myeloid leukemia (t-AML) is a severe complication of the cytotoxic therapy used for primary cancer treatment. The outcome of these patients is poor compared to people who develop de novo acute myeloid leukemia (p-AML). Chromosome abnormalities in t-AML are partly dependent on the induction agent. Partial or total losses of chromosome 5 and/or 7 are observed after therapy with alkylating agents. Balanced translocations, most of which involve 11q23 with MLL rearrangement, are found after treatment with the topoisomerase II inhibitor. Complex cases are also more frequent. The aim of this study was to compare t-AML to p-AML using high-resolution array CGH in order to identify gene-specific copy number abnormalities (CNA). Thirty t-AML versus thirty-six p-AML patient samples were studied. In t-AML, 99 CNAs were observed with 63 losses and 36 gains while the mean number was 3,3 per case. In p-AML, 64 CNAs were observed with 30 losses and 34 gains with a mean number of 1.78 per case. A few very complex cases (>8 chromosomal abnormalities) contributed considerably to the chromosomal burden in p-AML. Several minimal critical regions (MCR) that contain proteins and microRNA genes implicated in leukemogenesis were found in t-AML. On 7p15.2, a HOXA gene cluster involved in the processes of hematopoietic progenitor cell development and leukemogenesis was recurrently gained. Loss of a 5 Mb MCR located on 5q31.3q32 (142,91-148,19 Mb) was found distal to a previously described MCR; it harbored 29 genes. A 40kb deleted MCR pointed to RUNX1 on 21q22, a gene coding for a transcription factor implicated in frequent rearrangements in leukemia and in familial thrombocytopenia with susceptibility to AML. The sequence revealed no abnormality in 3 patients and a mutation in one patient, resulting in complete deficiency of RUNX1. In de novo AML a gain of 21q22<38,41-39,36> harboring ERG and ETS2 was observed in two patients with very complex rearrangements.
Project description:The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective fashion. Here, we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67X coverage, Sample GSM1551550). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Aedes aegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that virtually all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, accurate, and can be applied to many species.
Project description:Background: The rapid evolution and dissemination of mobilized colistin resistance gene (mcr) family has revealed as a severe threat to the global public health. Nevertheless, dramatic reduction in the prevalence of mcr-1, the major member of mcr family, was observed after the withdrawal of colistin in animal fodder in China since 2017, demonstrating that colistin acts as a selective stress to promote the dissemination of mcr-1. As the second largest lineage, mcr-3 was firstly discovered in 2017 and has been identified from numerous sources. However, whether the spreading of mcr-3 is driven by colistin remains unknown. Methods: To this end, we investigated the global prevalence of mcr-3 from 2005 to 2022 by an up-to-date systematic review, along with a nation-wide epidemiological study to establish the change of mcr-3 prevalence in China before and after 2017. To investigate the fitness cost imposed by MCR-3 upon bacterial host, in vitro and in vivo competitive assays were employed, along with morphological study and fluorescent observation. Moreover, by replacing non-optimal codons with optimal codons, synonymous mutations were introduced into the 5’-coding region of mcr-3 to study mechanisms accounting for the distinct fitness cost conferred by MCR-1 and MCR-3. Furthermore, by combining AlphaFold and molecular dynamics (MD) simulation, we provided a complete characterization on the putative lipid A binding pocket localized at the linker domain of MCR-3. Crucially, inhibitors targeting at the putative binding pocket of MCR-1 or MCR-3 were identified from small molecules library using the pipeline of virtual screening. Findings: The global prevalence of mcr-3 increased continuously from 2005 to 2022. The average prevalence was 0.18% during 2005-2014 and rapidly increased to 3.41% during 2020-2022. The prevalence of mcr-3 in China increased from 0.79% in 2016 to 5.87% in 2019. We found that the fitness of mcr-3-bearing E. coli and empty plasmid control was comparable but higher than that of mcr-1-positive strain. Although the putative lipid A binding pocket of MCR-3 was similar to that of in MCR-1, mcr-3 occupies remarkable codon bias at the 5’-end of coding region that disrupted the stability of mRNA, further reduced its protein expression in E. coli, resulting in the low fitness burden of bacterial host. Moreover, the 5’-end codon usage frequency appeared as a critical factor related with the evolution of mcr family. Furthermore, based on the similar lipid A binding pocket among MCR family protein, we identified three novel MCR inhibitors targeting at such pocket by screening from small-molecule library, which effectively restored the colistin susceptibility of mcr-bearing E. coli. Interpretation: For the first time, we found that the prevalence of mcr-3 increased continuously during 2016-2019 in China, demonstrating that the withdrawal of colistin in husbandry failed to prevent the dissemination of mcr-3. Our study evidenced that the 5’-end codon bias appeared as a crucial regulator upon the fitness cost conferred by horizontally transferred genes. Most importantly, the putative lipid A binding pocket verified from current study was a promising target site for designing inhibitors against mcr-positive strains.