ABSTRACT: Molecular nanotechnology is a rapidly developing field, and tremendous progress has been made in developing synthetic molecular machines. One long-sought after nanotechnology is systems able to achieve the assembly-line like production of molecules. Here we report the discovery of a rudimentary synthetic molecular assembler that produces polymers. The molecular assembler is a supramolecular aggregate of bifunctional surfactants produced by the reaction of two phase-separated reactants. Initially self-reproduction of the bifunctional surfactants is observed, but once it reaches a critical concentration the assembler starts to produce polymers instead of supramolecular aggregates. The polymer size can be controlled by adjusting temperature, reaction time, or introducing a capping agent. There has been considerable debate about molecular assemblers in the context of nanotechnology, our demonstration that primitive assemblers may arise from simple phase separated reactants may provide a new direction for the design of functional supramolecular systems.
Project description:Dynamic DNA-based circuits represent versatile systems to perform complex computing operations at the molecular level. However, the majority of DNA circuits relies on freely diffusing reactants, which slows down their rate of operation substantially. Here we introduce the use of DNA-functionalized benzene-1,3,5-tricarboxamide (BTA) supramolecular polymers as dynamic scaffolds to template DNA-based molecular computing. By selectively recruiting DNA circuit components to a supramolecular BTA polymer functionalized with 10-nucleotide handle strands, the kinetics of strand displacement and strand exchange reactions were accelerated 100-fold. In addition, strand exchange reactions were also favored thermodynamically by bivalent interactions between the reaction product and the supramolecular polymer. The noncovalent assembly of the supramolecular polymers enabled straightforward optimization of the polymer composition to best suit various applications. The ability of supramolecular BTA polymers to increase the efficiency of DNA-based computing was demonstrated for three well-known and practically important DNA-computing operations: multi-input AND gates, Catalytic Hairpin Assembly and Hybridization Chain Reactions. This work thus establishes supramolecular BTA polymers as an efficient platform for DNA-based molecular operations, paving the way for the construction of autonomous bionanomolecular systems that confine and combine molecular sensing, computation, and actuation.
Project description:The giant muscle protein titin is the largest protein in cells and responsible for the passive elasticity of muscles. Titin, made of hundreds of individually folded globular domains, is a protein polymer with folded globular domains as its macromonomers. Due to titin's ultrahigh molecular weight, it has been challenging to engineer high molecular weight artificial protein polymers that mimic titin. Taking advantage of protein fragment reconstitution (PFR) of a small protein GB1, which can be reconstituted from its two split fragments GN and GC, here we report the development of an efficient, PFR-based supramolecular polymerization strategy to engineer protein polymers with ultrahigh molecular weight. We found that the engineered bifunctional protein macromonomers (GC-macromonomer-GN) can undergo supramolecular polymerization, in a way similar to condensation polymerization, via the reconstitution of GN and GC to produce protein polymers with ultrahigh molecular weight (with an average molecular weight of 0.5 MDa). Such high molecular weight linear protein polymers closely mimic titin and provide protein polymer building blocks for the construction of biomaterials with improved physical and mechanical properties.
Project description:Assembling a large genome using next generation sequencing reads requires large computer memory and a long execution time. To reduce these requirements, we propose an extension-based assembler, called JR-Assembler, where J and R stand for "jumping" extension and read "remapping." First, it uses the read count to select good quality reads as seeds. Second, it extends each seed by a whole-read extension process, which expedites the extension process and can jump over short repeats. Third, it uses a dynamic back trimming process to avoid extension termination due to sequencing errors. Fourth, it remaps reads to each assembled sequence, and if an assembly error occurs by the presence of a repeat, it breaks the contig at the repeat boundaries. Fifth, it applies a less stringent extension criterion to connect low-coverage regions. Finally, it merges contigs by unused reads. An extensive comparison of JR-Assembler with current assemblers using datasets from small, medium, and large genomes shows that JR-Assembler achieves a better or comparable overall assembly quality and requires lower memory use and less central processing unit time, especially for large genomes. Finally, a simulation study shows that JR-Assembler achieves a superior performance on memory use and central processing unit time than most current assemblers when the read length is 150 bp or longer, indicating that the advantages of JR-Assembler over current assemblers will increase as the read length increases with advances in next generation sequencing technology.
Project description:Next generation sequencing (NGS) technologies have greatly changed the landscape of transcriptomic studies of non-model organisms. Since there is no reference genome available, de novo assembly methods play key roles in the analysis of these data sets. Because of the huge amount of data generated by NGS technologies for each run, many assemblers, e.g., ABySS, Velvet and Trinity, are developed based on a de Bruijn graph due to its time- and space-efficiency. However, most of these assemblers were developed initially for the Illumina/Solexa platform. The performance of these assemblers on 454 transcriptomic data is unknown. In this study, we evaluated and compared the relative performance of these de Bruijn graph based assemblers on both simulated and real 454 transcriptomic data. The results suggest that Trinity, the Illumina/Solexa-specialized transcriptomic assembler, performs the best among the multiple de Bruijn graph assemblers, comparable to or even outperforming the standard 454 assembler Newbler which is based on the overlap-layout-consensus algorithm. Our evaluation is expected to provide helpful guidance for researchers to choose assemblers when analyzing 454 transcriptomic data.
Project description:The advent of next-generation sequencing technologies is accompanied with the development of many whole-genome sequence assembly methods and software, especially for de novo fragment assembly. Due to the poor knowledge about the applicability and performance of these software tools, choosing a befitting assembler becomes a tough task. Here, we provide the information of adaptivity for each program, then above all, compare the performance of eight distinct tools against eight groups of simulated datasets from Solexa sequencing platform. Considering the computational time, maximum random access memory (RAM) occupancy, assembly accuracy and integrity, our study indicate that string-based assemblers, overlap-layout-consensus (OLC) assemblers are well-suited for very short reads and longer reads of small genomes respectively. For large datasets of more than hundred millions of short reads, De Bruijn graph-based assemblers would be more appropriate. In terms of software implementation, string-based assemblers are superior to graph-based ones, of which SOAPdenovo is complex for the creation of configuration file. Our comparison study will assist researchers in selecting a well-suited assembler and offer essential information for the improvement of existing assemblers or the developing of novel assemblers.
Project description:Oxford Nanopore sequencing can be used to achieve complete bacterial genomes. However, the error rates of Oxford Nanopore long reads are greater compared to Illumina short reads. Long-read assemblers using a variety of assembly algorithms have been developed to overcome this deficiency, which have not been benchmarked for genomic analyses of bacterial pathogens using Oxford Nanopore long reads. In this study, long-read assemblers, namely Canu, Flye, Miniasm/Racon, Raven, Redbean, and Shasta, were thus benchmarked using Oxford Nanopore long reads of bacterial pathogens. Ten species were tested for mediocre- and low-quality simulated reads, and 10 species were tested for real reads. Raven was the most robust assembler, obtaining complete and accurate genomes. All Miniasm/Racon and Raven assemblies of mediocre-quality reads provided accurate antimicrobial resistance (AMR) profiles, while the Raven assembly of <i>Klebsiella variicola</i> with low-quality reads was the only assembly with an accurate AMR profile among all assemblers and species. All assemblers functioned well for predicting virulence genes using mediocre-quality and real reads, whereas only the Raven assemblies of low-quality reads had accurate numbers of virulence genes. Regarding multilocus sequence typing (MLST), Miniasm/Racon was the most effective assembler for mediocre-quality reads, while only the Raven assemblies of <i>Escherichia coli</i> O157:H7 and <i>K. variicola</i> with low-quality reads showed positive MLST results. Miniasm/Racon and Raven were the best performers for MLST using real reads. The Miniasm/Racon and Raven assemblies showed accurate phylogenetic inference. For the pan-genome analyses, Raven was the strongest assembler for simulated reads, whereas Miniasm/Racon and Raven performed the best for real reads. Overall, the most robust and accurate assembler was Raven, closely followed by Miniasm/Racon.
Project description:Background:Current advancements in next-generation sequencing technology have made possible to sequence whole genome but assembling a large number of short sequence reads is still a big challenge. In this article, we present the comparative study of seven assemblers, namely, ABySS, Velvet, Edena, SGA, Ray, SSAKE, and Perga, using prokaryotic and eukaryotic paired-end as well as single-end data sets from Illumina platform. Results:Results showed that in case of single-end data sets, Velvet and ABySS outperformed in all the seven assemblers with comparatively low assembling time and high genome fraction. Velvet consumed the least amount of memory than any other assembler. In case of paired-end data sets, Velvet consumed least amount of time and produced high genome fraction after ABySS and Ray. In terms of low memory usage, SGA and Edena outperformed in all the assemblers. Ray also showed good genome fraction; however, extremely high assembling time consumed by the Ray might make it prohibitively slow on larger data sets of single and paired-end data. Conclusions:Our comparison study will provide assistance to the scientists for selecting the suitable assembler according to their data sets and will also assist the developers to upgrade or develop a new assembler for de novo assembling.
Project description:A plethora of algorithmic assemblers have been proposed for the de novo assembly of genomes, however, no individual assembler guarantees the optimal assembly for diverse species. Optimizing various parameters in an assembler is often performed in order to generate the most optimal assembly. However, few efforts have been pursued to take advantage of multiple assemblies to yield an assembly of high accuracy. In this study, we employ various state-of-the-art assemblers to generate different sets of contigs for bacterial genomes. A tool, named CISA, has been developed to integrate the assemblies into a hybrid set of contigs, resulting in assemblies of superior contiguity and accuracy, compared with the assemblies generated by the state-of-the-art assemblers and the hybrid assemblies merged by existing tools. This tool is implemented in Python and requires MUMmer and BLAST+ to be installed on the local machine. The source code of CISA and examples of its use are available at http://sb.nhri.org.tw/CISA/.
Project description:<h4>Background</h4>Metagenomics is the study of the microbial genomes isolated from communities found on our bodies or in our environment. By correctly determining the relation between human health and the human associated microbial communities, novel mechanisms of health and disease can be found, thus enabling the development of novel diagnostics and therapeutics. Due to the diversity of the microbial communities, strategies developed for aligning human genomes cannot be utilized, and genomes of the microbial species in the community must be assembled de novo. However, in order to obtain the best metagenomic assemblies, it is important to choose the proper assembler. Due to the rapidly evolving nature of metagenomics, new assemblers are constantly created, and the field has not yet agreed on a standardized process. Furthermore, the truth sets used to compare these methods are either too simple (computationally derived diverse communities) or complex (microbial communities of unknown composition), yielding results that are hard to interpret. In this analysis, we interrogate the strengths and weaknesses of five popular assemblers through the use of defined biological samples of known genomic composition and abundance. We assessed the performance of each assembler on their ability to reassemble genomes, call taxonomic abundances, and recreate open reading frames (ORFs).<h4>Results</h4>We tested five metagenomic assemblers: Omega, metaSPAdes, IDBA-UD, metaVelvet and MEGAHIT on known and synthetic metagenomic data sets. MetaSPAdes excelled in diverse sets, IDBA-UD performed well all around, metaVelvet had high accuracy in high abundance organisms, and MEGAHIT was able to accurately differentiate similar organisms within a community. At the ORF level, metaSPAdes and MEGAHIT had the least number of missing ORFs within diverse and similar communities respectively.<h4>Conclusions</h4>Depending on the metagenomics question asked, the correct assembler for the task at hand will differ. It is important to choose the appropriate assembler, and thus clearly define the biological problem of an experiment, as different assemblers will give different answers to the same question.
Project description:Several de novo transcriptome assemblers have been developed recently to assemble the short reads generated from the next-generation sequencing platforms and different strategies were employed for assembling transcriptomes of various eukaryotes without genome sequences. Though there are some comparisons among these de novo assembly tools for assembling transcriptomes of different eukaryotic organisms, there is no report about the relationship between assembly strategies and ploidies of the organisms.When we de novo assembled transcriptomes of sweet potato (hexaploid), Trametes gallica (a diploid fungus), Oryza meyeriana (a diploid wild rice), five assemblers, including Edena, Oases, Soaptrans, IDBA-tran and Trinity, were used in different strategies (Single-Assembler Single-Parameter, SASP; Single-Assembler Multiple-Parameters, SAMP; Combined De novo Transcriptome Assembly, CDTA, that is multiple assembler multiple parameter). It was found that CDTA strategy has the best performance compared with other two strategies for assembling transcriptome of the hexaploid sweet potato, whereas SAMP strategy with assembler Oases is better than other strategies for assembling transcriptomes of diploid fungus and the wild rice transcriptomes.Based on the results from ours and others, it is suggested that CDTA strategy is better used for transcriptome assembly of polyploidy organisms and SAMP strategy of Oases is outperformed for those diploid organisms without genome sequences.