Piecewise linear approximation of protein structures using the principle of minimum message length.
ABSTRACT: UNLABELLED: Simple and concise representations of protein-folding patterns provide powerful abstractions for visualizations, comparisons, classifications, searching and aligning structural data. Structures are often abstracted by replacing standard secondary structural features-that is, helices and strands of sheet-by vectors or linear segments. Relying solely on standard secondary structure may result in a significant loss of structural information. Further, traditional methods of simplification crucially depend on the consistency and accuracy of external methods to assign secondary structures to protein coordinate data. Although many methods exist automatically to identify secondary structure, the impreciseness of definitions, along with errors and inconsistencies in experimental structure data, drastically limit their applicability to generate reliable simplified representations, especially for structural comparison. This article introduces a mathematically rigorous algorithm to delineate protein structure using the elegant statistical and inductive inference framework of minimum message length (MML). Our method generates consistent and statistically robust piecewise linear explanations of protein coordinate data, resulting in a powerful and concise representation of the structure. The delineation is completely independent of the approaches of using hydrogen-bonding patterns or inspecting local substructural geometry that the current methods use. Indeed, as is common with applications of the MML criterion, this method is free of parameters and thresholds, in striking contrast to the existing programs which are often beset by them. The analysis of results over a large number of proteins suggests that the method produces consistent delineation of structures that encompasses, among others, the segments corresponding to standard secondary structure. AVAILABILITY: http://www.csse.monash.edu.au/~karun/pmml.
Project description:MOTIVATION: Secondary structure underpins the folding pattern and architecture of most proteins. Accurate assignment of the secondary structure elements is therefore an important problem. Although many approximate solutions of the secondary structure assignment problem exist, the statement of the problem has resisted a consistent and mathematically rigorous definition. A variety of comparative studies have highlighted major disagreements in the way the available methods define and assign secondary structure to coordinate data. RESULTS: We report a new method to infer secondary structure based on the Bayesian method of minimum message length inference. It treats assignments of secondary structure as hypotheses that explain the given coordinate data. The method seeks to maximize the joint probability of a hypothesis and the data. There is a natural null hypothesis and any assignment that cannot better it is unacceptable. We developed a program SST based on this approach and compared it with popular programs, such as DSSP and STRIDE among others. Our evaluation suggests that SST gives reliable assignments even on low-resolution structures. AVAILABILITY: http://www.csse.monash.edu.au/~karun/sst.
Project description:The majority of residues in protein structures are involved in the formation of alpha-helices and beta-strands. These distinctive secondary structure patterns can be used to represent a protein for visual inspection and in vector-based protein structure comparison. Success of such structural comparison methods depends crucially on the accurate identification and delineation of secondary structure elements.We have developed a method PALSSE (Predictive Assignment of Linear Secondary Structure Elements) that delineates secondary structure elements (SSEs) from protein Calpha coordinates and specifically addresses the requirements of vector-based protein similarity searches. Our program identifies two types of secondary structures: helix and beta-strand, typically those that can be well approximated by vectors. In contrast to traditional secondary structure algorithms, which identify a secondary structure state for every residue in a protein chain, our program attributes residues to linear SSEs. Consecutive elements may overlap, thus allowing residues located at the overlapping region to have more than one secondary structure type.PALSSE is predictive in nature and can assign about 80% of the protein chain to SSEs as compared to 53% by DSSP and 57% by P-SEA. Such a generous assignment ensures almost every residue is part of an element and is used in structural comparisons. Our results are in agreement with human judgment and DSSP. The method is robust to coordinate errors and can be used to define SSEs even in poorly refined and low-resolution structures. The program and results are available at http://prodata.swmed.edu/palsse/.
Project description:The secondary structure of RNA is integral to the variety of functions it carries out in the cell and its depiction allows researchers to develop hypotheses about which nucleotides and base pairs are functionally relevant. Current approaches to visualizing secondary structure provide an adequate platform for the conversion of static text-based representations to 2D images, but are limited in their offer of interactivity as well as their ability to display larger structures, multiple structures and pseudoknotted structures.In this article, we present forna, a web-based tool for displaying RNA secondary structure which allows users to easily convert sequences and secondary structures to clean, concise and customizable visualizations. It supports, among other features, the simultaneous visualization of multiple structures, the display of pseudoknotted structures, the interactive editing of the displayed structures, and the automatic generation of secondary structure diagrams from PDB files. It requires no software installation apart from a modern web browser.The web interface of forna is available at http://rna.tbi.univie.ac.at/forna while the source code is available on github at email@example.comSupplementary data are available at Bioinformatics online.
Project description:PURPOSE:To identify and characterize the sources of B0 field changes due to head motion, to reduce motion sensitivity in human brain MRI. METHODS:B0 fields were measured in 5 healthy human volunteers at various head poses. After measurement of the total field, the field originating from the subject was calculated by subtracting the external field generated by the magnet and shims. A subject-specific susceptibility model was created to quantify the contribution of the head and torso. The spatial complexity of the field changes was analyzed using spherical harmonic expansion. RESULTS:Minor head pose changes can cause substantial and spatially complex field changes in the brain. For rotations and translations of approximately 5?º and 5?mm, respectively, at 7 T, the field change that is associated with the subject's magnetization generates a standard deviation (SD) of about 10 Hz over the brain. The stationary torso contributes to this subject-associated field change significantly with a SD of about 5?Hz. The subject-associated change leads to image-corrupting phase errors in multi-shot <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow><mml:msubsup><mml:mi>T</mml:mi> <mml:mn>2</mml:mn> <mml:mo>*</mml:mo></mml:msubsup> </mml:mrow> </mml:math> -weighted acquisitions. CONCLUSION:The B0 field changes arising from head motion are problematic for multishot <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow><mml:msubsup><mml:mi>T</mml:mi> <mml:mn>2</mml:mn> <mml:mo>*</mml:mo></mml:msubsup> </mml:mrow> </mml:math> -weighted imaging. Characterization of the underlying sources provides new insights into mitigation strategies, which may benefit from individualized predictive field models in addition to real-time field monitoring and correction strategies.
Project description:When evaluating a newly developed statistical test, an important step is to check its type 1 error (T1E) control using simulations. This is often achieved by the standard simulation design S0 under the so-called "theoretical" null of no association. In practice, the whole-genome association analyses scan through a large number of genetic markers ( <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> s) for the ones associated with an outcome of interest ( <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> ), where <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> comes from an alternative while the majority of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> s are not associated with <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> ; the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi> <mml:mo>-</mml:mo> <mml:mi>G</mml:mi></mml:math> relationships are under the "empirical" null. This reality can be better represented by two other simulation designs, where design S1.1 simulates <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> from analternative model based on <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> , then evaluates its association with independently generated <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mrow/> <mml:msub><mml:mi>G</mml:mi> <mml:mrow><mml:mi>n</mml:mi> <mml:mi>e</mml:mi> <mml:mi>w</mml:mi></mml:mrow> </mml:msub> </mml:mrow> </mml:math> ; while design S1.2 evaluates the association between permutated <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> . More than a decade ago, Efron (2004) has noted the important distinction between the "theoretical" and "empirical" null in false discovery rate control. Using scale tests for variance heterogeneity, direct univariate, and multivariate interaction tests as examples, here we show that not all null simulation designs are equal. In examining the accuracy of a likelihood ratio test, while simulation design S0 suggested the method being accurate, designs S1.1 and S1.2 revealed its increased empirical T1E rate if applied in real data setting. The inflation becomes more severe at the tail and does not diminish as sample size increases. This is an important observation that calls for new practices for methods evaluation and T1E control interpretation.
Project description:Species delineation based on bacterial genomes is an essential part of the research of prokaryotes. In silico genome-to-genome comparison methods are computationally demanding, but much less tedious and error prone than the wet-lab methods. In this paper, we present a novel method for the delineation of bacterial genomes based on genomic signal processing. The proposed method uses numerical representations of whole bacterial genomes, phase signal and cumulated phase signal, from which four parameters are derived for each genome. The parameters characterize a genome and their calculation is independent of the other genomes comprising a delineation dataset. The delineation itself is processed as a calculation of the parameters' average similarity. The method was statistically verified on 1826 bacterial genomes. A similarity threshold of 96% was set based on the receiver operating characteristic curve that featured sensitivity of 99.78% and specificity of 97.25%. Additionally, comparative analysis on another 33 bacterial genomes was conducted using standard delineation tools as these tools were not able to process the dataset of 1826 genomes using desktop computer. The proposed method achieved comparable or better delineation results in comparison with the standard tools. Besides the excellent delineation results, another great advantage of the method is its small computational demands, which enables the delineation of thousands of genomes on a desktop computer. The calculation of the parameters takes tens of minutes for thousands of genomes. Moreover, they can be calculated in advance by creating a database, meaning the delineation itself is then completed in a matter of seconds.
Project description:Structural characterization of RNAs is a dynamic field, offering many modelling possibilities. RNA secondary structure models are usually characterized by an encoding that depicts structural information of the molecule through string representations or graphs. In this work, we provide a generalization of the BEAR encoding (a context-aware structural encoding we previously developed) by expanding the set of alignments used for the construction of substitution matrices and then applying it to secondary structure encodings ranging from fine-grained to more coarse-grained representations. We also introduce a re-interpretation of the Shannon Information applied on RNA alignments, proposing a new scoring metric, the Relative Information Gain (RIG). The RIG score is available for any position in an alignment, showing how different levels of detail encoded in the RNA representation can contribute differently to convey structural information. The approaches presented in this study can be used alongside state-of-the-art tools to synergistically gain insights into the structural elements that RNAs and RNA families are composed of. This additional information could potentially contribute to their improvement or increase the degree of confidence in the secondary structure of families and any set of aligned RNAs.
Project description:<h4>Purpose</h4>Design of a preconditioner for fast and efficient parallel imaging (PI) and compressed sensing (CS) reconstructions for Cartesian trajectories.<h4>Theory</h4>PI and CS reconstructions become time consuming when the problem size or the number of coils is large, due to the large linear system of equations that has to be solved in <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow><mml:msub><mml:mi>?</mml:mi> <mml:mn>1</mml:mn></mml:msub> </mml:mrow> </mml:math> and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow><mml:msub><mml:mi>?</mml:mi> <mml:mn>2</mml:mn></mml:msub> </mml:mrow> </mml:math> -norm based reconstruction algorithms. Such linear systems can be solved efficiently using effective preconditioning techniques.<h4>Methods</h4>In this article we construct such a preconditioner by approximating the system matrix of the linear system, which comprises the data fidelity and includes total variation and wavelet regularization, by a matrix that is block circulant with circulant blocks. Due to this structure, the preconditioner can be constructed quickly and its inverse can be evaluated fast using only two fast Fourier transformations. We test the performance of the preconditioner for the conjugate gradient method as the linear solver, integrated into the well-established Split Bregman algorithm.<h4>Results</h4>The designed circulant preconditioner reduces the number of iterations required in the conjugate gradient method by almost a factor of 5. The speed up results in a total acceleration factor of approximately 2.5 for the entire reconstruction algorithm when implemented in MATLAB, while the initialization time of the preconditioner is negligible.<h4>Conclusion</h4>The proposed preconditioner reduces the reconstruction time for PI and CS in a Split Bregman implementation without compromising reconstruction stability and can easily handle large systems since it is Fourier-based, allowing for efficient computations.
Project description:While protein secondary structure is well understood, representing the repetitive nature of tertiary packing in proteins remains difficult. We have developed a construct called the relative packing group (RPG) that applies the clique concept from graph theory as a natural basis for defining the packing motifs in proteins. An RPG is defined as a clique of residues, where every member contacts all others as determined by the Delaunay tessellation. Geometrically similar RPGs define a regular element of tertiary structure or tertiary motif (TerMo). This intuitive construct provides a simple approach to characterize general repetitive elements of tertiary structure.A dataset of over 4 million tetrahedral RPGs was clustered using different criteria to characterize the various aspects of regular tertiary structure in TerMos. Grouping this data within the SCOP classification levels of Family, Superfamily, Fold, Class and PDB showed that similar packing is shared across different folds. Classification of RPGs based on residue sequence locality reveals topological preferences according to protein sizes and secondary structure. We find that larger proteins favor RPGs with three local residues packed against a non-local residue. Classifying by secondary structure, helices prefer mostly local residues, sheets favor at least two local residues, while turns and coil populate with more local residues. To depict these TerMos, we have developed 2 complementary and intuitive representations: (i) Dirichlet process mixture density estimation of the torsion angle distributions and (ii) kernel density estimation of the Cartesian coordinate distribution. The TerMo library and representations software are available upon request.
Project description:Optimal operation of water resources in multiple and multipurpose reservoirs is very complicated. This is because of the number of dams, each dam's location (Series and parallel), conflict in objectives and the stochastic nature of the inflow of water in the system. In this paper, performance optimization of the system of Karun and Dez reservoir dams have been studied and investigated with the purposes of hydroelectric energy generation and providing water demand in 6 dams. On the Karun River, 5 dams have been built in the series arrangements, and the Dez dam has been built parallel to those 5 dams. One of the main achievements in this research is the implementation of the structure of production of hydroelectric energy as a function of matrix in MATLAB software. The results show that the role of objective function structure for generating hydroelectric energy in weighting method algorithm is more important than water supply. Nonetheless by implementing ?- constraint method algorithm, we can both increase hydroelectric power generation and supply around 85% of agricultural and industrial demands.