Project description:This paper addresses an important materials engineering question: How can one identify the complete space (or as much of it as possible) of microstructures that are theoretically predicted to yield the desired combination of properties demanded by a selected application? We present a problem involving design of magnetoelastic Fe-Ga alloy microstructure for enhanced elastic, plastic and magnetostrictive properties. While theoretical models for computing properties given the microstructure are known for this alloy, inversion of these relationships to obtain microstructures that lead to desired properties is challenging, primarily due to the high dimensionality of microstructure space, multi-objective design requirement and non-uniqueness of solutions. These challenges render traditional search-based optimization methods incompetent in terms of both searching efficiency and result optimality. In this paper, a route to address these challenges using a machine learning methodology is proposed. A systematic framework consisting of random data generation, feature selection and classification algorithms is developed. Experiments with five design problems that involve identification of microstructures that satisfy both linear and nonlinear property constraints show that our framework outperforms traditional optimization methods with the average running time reduced by as much as 80% and with optimality that would not be achieved otherwise.
Project description:As the importance of transcriptional variation and regulation for Plasmodium becomes more apparent, advances for non-falciparum species are hindered by our reliance on natural infections to study parasite biology. Untargeted transcriptomic research is also complicated by low parasite densities and high proportions of human genetic material, highlighting the need for optimized sample processing protocols. In this study, we used a P. knowlesi culture diluted in whole blood as a mock P. vivax natural infection to compare white blood cell, rRNA-, and globin depletion methods and RNA-seq library preparation kits to create an optimized protocol for low-volume sample processing.
Project description:This paper demonstrates the use of a new model consisting of a genetic algorithm in combination with thermodynamic calculations and analytical process models to minimize the processing time during a vacuum degassing treatment of liquid steel. The model sets multiple simultaneous targets for final S, N, O, Si and Al levels and uses the total slag mass, the slag composition, the steel composition and the start temperature as optimization variables. The predicted optimal conditions agree well with industrial practice. For those conditions leading to the shortest process time the target compositions for S, N and O are reached almost simultaneously.
Project description:With recent methodological advances in the field of computational protein design, in particular those based on deep learning, there is an increasing need for frameworks that allow for coherent, direct integration of different models and objective functions into the generative design process. Here we demonstrate how evolutionary multiobjective optimization techniques can be adapted to provide such an approach. With the established Non-dominated Sorting Genetic Algorithm II (NSGA-II) as the optimization framework, we use AlphaFold2 and ProteinMPNN confidence metrics to define the objective space, and a mutation operator composed of ESM-1v and ProteinMPNN to rank and then redesign the least favorable positions. Using the two-state design problem of the foldswitching protein RfaH as an in-depth case study, and PapD and calmodulin as examples of higher-dimensional design problems, we show that the evolutionary multiobjective optimization approach leads to significant reduction in the bias and variance in RfaH native sequence recovery, compared to a direct application of ProteinMPNN. We suggest that this improvement is due to three factors: (i) the use of an informative mutation operator that accelerates the sequence space exploration, (ii) the parallel, iterative design process inherent to the genetic algorithm that improves upon the ProteinMPNN autoregressive sequence decoding scheme, and (iii) the explicit approximation of the Pareto front that leads to optimal design candidates representing diverse tradeoff conditions. We anticipate this approach to be readily adaptable to different models and broadly relevant for protein design tasks with complex specifications.
Project description:Primer and probe sequence designs are among the most critical input factors in real-time polymerase chain reaction (PCR) assay optimization. In this study, we present the use of statistical design of experiments (DOE) approach as a general guideline for probe optimization and more specifically focus on design optimization of label-free hydrolysis probes that are designated as mediator probes (MPs), which are used in reverse transcription MP PCR (RT-MP PCR). The effect of three input factors on assay performance was investigated: distance between primer and mediator probe cleavage site; dimer stability of MP and target sequence (influenza B virus); and dimer stability of the mediator and universal reporter (UR). The results indicated that the latter dimer stability had the greatest influence on assay performance, with RT-MP PCR efficiency increased by up to 10% with changes to this input factor. With an optimal design configuration, a detection limit of 3-14 target copies/10 μl reaction could be achieved. This improved detection limit was confirmed for another UR design and for a second target sequence, human metapneumovirus, with 7-11 copies/10 μl reaction detected in an optimum case. The DOE approach for improving oligonucleotide designs for real-time PCR not only produces excellent results but may also reduce the number of experiments that need to be performed, thus reducing costs and experimental times.
Project description:With recent methodological advances in the field of computational protein design, in particular those based on deep learning, there is an increasing need for frameworks that allow for coherent, direct integration of different models and objective functions into the generative design process. Here we demonstrate how evolutionary multiobjective optimization techniques can be adapted to provide such an approach. With the established Non-dominated Sorting Genetic Algorithm II (NSGA-II) as the optimization framework, we use AlphaFold2 and ProteinMPNN confidence metrics to define the objective space, and a mutation operator composed of ESM-1v and ProteinMPNN to rank and then redesign the least favorable positions. Using the multistate design problem of the foldswitching protein RfaH as an in-depth case study, we show that the evolutionary multiobjective optimization approach leads to significant reduction in the bias and variance in RfaH native sequence recovery, compared to a direct application of ProteinMPNN. We suggest that this improvement is due to three factors: (i) the use of an informative mutation operator that accelerates the sequence space exploration, (ii) the parallel, iterative design process inherent to the genetic algorithm that improves upon the ProteinMPNN autoregressive sequence decoding scheme, and (iii) the explicit approximation of the Pareto front that leads to optimal design candidates representing diverse tradeoff conditions. We anticipate this approach to be readily adaptable to different models and broadly relevant for protein design tasks with complex specifications.
Project description:A novel algorithm, PILOT_PTM, has been developed for the untargeted identification of post-translational modifications (PTMs) on a template sequence. The algorithm consists of an analysis of an MS/MS spectrum via an integer linear optimization model to output a rank-ordered list of PTMs that best match the experimental data. Each MS/MS spectrum is analyzed by a preprocessing algorithm to reduce spectral noise and label potential complimentary, offset, isotope, and multiply charged peaks. Postprocessing of the rank-ordered list from the integer linear optimization model will resolve fragment mass errors and will reorder the list of PTMs based on the cross-correlation between the experimental and theoretical MS/MS spectrum. PILOT_PTM is instrument-independent, capable of handling multiple fragmentation technologies, and can address the universe of PTMs for every amino acid on the template sequence. The various features of PILOT_PTM are presented, and it is tested on several modified and unmodified data sets including chemically synthesized phosphopeptides, histone H3-(1-50) polypeptides, histone H3-(1-50) tryptic fragments, and peptides generated from proteins extracted from chromatin-enriched fractions. The data sets consist of spectra derived from fragmentation via collision-induced dissociation, electron transfer dissociation, and electron capture dissociation. The capability of PILOT_PTM is then benchmarked using five state-of-the-art methods, InsPecT, Virtual Expert Mass Spectrometrist (VEMS), Mod(i), Mascot, and X!Tandem. PILOT_PTM demonstrates superior accuracy on both the small and large scale proteome experiments. A protocol is finally developed for the analysis of a complete LC-MS/MS scan using template sequences generated from SEQUEST and is demonstrated on over 270,000 MS/MS spectra collected from a total chromatin digest.
Project description:The advantages of mist-netting, the main technique used in Neotropical bat community studies to date, include logistical implementation, standardization and sampling representativeness. Nonetheless, study designs still have to deal with issues of detectability related to how different species behave and use the environment. Yet there is considerable sampling heterogeneity across available studies in the literature. Here, we approach the problem of sample size optimization. We evaluated the common sense hypothesis that the first six hours comprise the period of peak night activity for several species, thereby resulting in a representative sample for the whole night. To this end, we combined re-sampling techniques, species accumulation curves, threshold analysis, and community concordance of species compositional data, and applied them to datasets of three different Neotropical biomes (Amazonia, Atlantic Forest and Cerrado). We show that the strategy of restricting sampling to only six hours of the night frequently results in incomplete sampling representation of the entire bat community investigated. From a quantitative standpoint, results corroborated the existence of a major Sample Area effect in all datasets, although for the Amazonia dataset the six-hour strategy was significantly less species-rich after extrapolation, and for the Cerrado dataset it was more efficient. From the qualitative standpoint, however, results demonstrated that, for all three datasets, the identity of species that are effectively sampled will be inherently impacted by choices of sub-sampling schedule. We also propose an alternative six-hour sampling strategy (at the beginning and the end of a sample night) which performed better when resampling Amazonian and Atlantic Forest datasets on bat assemblages. Given the observed magnitude of our results, we propose that sample representativeness has to be carefully weighed against study objectives, and recommend that the trade-off between logistical constraints and additional sampling performance should be carefully evaluated.
Project description:With the recent industrial expansion, heavy metals and other pollutants have increasingly contaminated our living surroundings. Heavy metals, being non-degradable, tend to accumulate in the food chain, resulting in potentially damaging toxicity to organisms. Thus, techniques to detect metal ions have gradually begun to receive attention. Recent progress in research on synthetic biology offers an alternative means for metal ion detection via the help of promoter elements derived from microorganisms. To make the design easier, it is necessary to develop a systemic design method for evaluating and selecting adequate components to achieve a desired detection performance. A multi-objective (MO) H2/H∞ performance criterion is derived here for design specifications of a metal ion biosensor to achieve the H2 optimal matching of a desired input/output (I/O) response and simultaneous H∞ optimal filtering of intrinsic parameter fluctuations and external cellular noise. According to the two design specifications, a Takagi-Sugeno (T-S) fuzzy model is employed to interpolate several local linear stochastic systems to approximate the nonlinear stochastic metal ion biosensor system so that the multi-objective H2/H∞ design of the metal ion biosensor can be solved by an associated linear matrix inequality (LMI)-constrained multi-objective (MO) design problem. The analysis and design of a metal ion biosensor with optimal I/O response matching and optimal noise filtering ability then can be achieved by solving the multi-objective problem under a set of LMIs. Moreover, a multi-objective evolutionary algorithm (MOEA)-based library search method is employed to find adequate components from corresponding libraries to solve LMI-constrained MO H2/H∞ design problems. It is a useful tool for the design of metal ion biosensors, particularly regarding the tradeoffs between the design factors under consideration.
Project description:The sample preparation of samples containing bovine serum albumin (BSA), e.g., as used in transdermal Franz diffusion cell (FDC) solutions, was evaluated using an analytical quality-by-design (QbD) approach. Traditional precipitation of BSA by adding an equal volume of organic solvent, often successfully used with conventional HPLC-PDA, was found insufficiently robust when novel fused-core HPLC and/or UPLC-MS methods were used. In this study, three factors (acetonitrile (%), formic acid (%) and boiling time (min)) were included in the experimental design to determine an optimal and more suitable sample treatment of BSA-containing FDC solutions. Using a QbD and Derringer desirability (D) approach, combining BSA loss, dilution factor and variability, we constructed an optimal working space with the edge of failure defined as D<0.9. The design space is modelled and is confirmed to have an ACN range of 83±3% and FA content of 1±0.25%.