Project description:Numerous networks in the real world change with time, producing dynamic graphs such as human mobility networks and brain networks. Typically, the "dynamics on graphs" (e.g., changing node attribute values) are visible, and they may be connected to and suggestive of the "dynamics of graphs" (e.g., evolution of the graph topology). Due to two fundamental obstacles, modeling and mapping between them have not been thoroughly explored: (1) the difficulty of developing a highly adaptable model without solid hypotheses and (2) the ineffectiveness and slowness of processing data with varying granularity. To solve these issues, we offer a novel scalable deep echo-state graph dynamics encoder for networks with significant temporal duration and dimensions. A novel neural architecture search (NAS) technique is then proposed and tailored for the deep echo-state encoder to ensure strong learnability. Extensive experiments on synthetic and actual application data illustrate the proposed method's exceptional effectiveness and efficiency.
Project description:Visualizing data through graphs can be an effective way to communicate one's results. A ubiquitous graph and common technique to communicate behavioral data is the bar graph. The bar graph was first invented in 1786 and little has changed in its format. Here, a replacement for the bar graph is proposed. The new format, called a hat graph, maintains some of the critical features of the bar graph such as its discrete elements, but eliminates redundancies that are problematic when the baseline is not at zero. Hat graphs also include design elements based on Gestalt principles of grouping and graph design principles. The effectiveness of the hat graph was tested in five empirical studies. Participants were nearly 40% faster to find and identify the condition that led to the biggest difference from baseline to final test when the data were plotted with hat graphs than with bar graphs. Participants were also more sensitive to the magnitude of an effect plotted with a hat graph compared with a bar graph that was restricted to having its baseline at zero. The recommendation is to use hat graphs when plotting data from discrete categories.
Project description:Test compound one, 5,6-benzoflavone (BNF), was known to act through both the Ah receptor and Nrf2 receptor pathways, while test compounds two and three, 3H-1,2-dithiole-3-thione (D3T) and 4-methyl-5-pyrazinyl-3H-1,2-dithiole-3-thione (OLT), were known to act through the Nrf2 receptor pathway. Furthermore, D3T is known to be more potent and efficacious than OLT for Nrf2 activation. OLT has been shown to exhibit 20-50% of the efficacy of D3T for inhibition of alfatoxin-induced heptic foci. Nonetheless, because OLT is an approved drug, it is currently being evaluated in human phase II intervention trials of biomarkers of alfatoxin-related hepatocellular carcinoma. More recently, BNF was shown to be an effective chemopreventive agent in the rat mammary carcinogen model, inhibiting 7,12-dimethylbenz(a)anthracene DNA adduct formation in liver and mammary cells by 96 and 83% respectively. We used microarrays to study the structure activities that lie within the test compounds. Keywords: treatment effect study
Project description:MotivationModern problems of concept annotation associate an object of interest (gene, individual, text document) with a set of interrelated textual descriptors (functions, diseases, topics), often organized in concept hierarchies or ontologies. Most ontology can be seen as directed acyclic graphs (DAGs), where nodes represent concepts and edges represent relational ties between these concepts. Given an ontology graph, each object can only be annotated by a consistent sub-graph; that is, a sub-graph such that if an object is annotated by a particular concept, it must also be annotated by all other concepts that generalize it. Ontologies therefore provide a compact representation of a large space of possible consistent sub-graphs; however, until now we have not been aware of a practical algorithm that can enumerate such annotation spaces for a given ontology.ResultsWe propose an algorithm for enumerating consistent sub-graphs of DAGs. The algorithm recursively partitions the graph into strictly smaller graphs until the resulting graph becomes a rooted tree (forest), for which a linear-time solution is computed. It then combines the tallies from graphs created in the recursion to obtain the final count. We prove the correctness of this algorithm, propose several practical accelerations, evaluate it on random graphs and then apply it to characterize four major biomedical ontologies. We believe this work provides valuable insights into the complexity of concept annotation spaces and its potential influence on the predictability of ontological annotation.Availability and implementationhttps://github.com/shawn-peng/counting-consistent-sub-DAG.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Artificial intelligence for graphs has achieved remarkable success in modeling complex systems, ranging from dynamic networks in biology to interacting particle systems in physics. However, the increasingly heterogeneous graph datasets call for multimodal methods that can combine different inductive biases-the set of assumptions that algorithms use to make predictions for inputs they have not encountered during training. Learning on multimodal datasets presents fundamental challenges because the inductive biases can vary by data modality and graphs might not be explicitly given in the input. To address these challenges, multimodal graph AI methods combine different modalities while leveraging cross-modal dependencies using graphs. Diverse datasets are combined using graphs and fed into sophisticated multimodal architectures, specified as image-intensive, knowledge-grounded and language-intensive models. Using this categorization, we introduce a blueprint for multimodal graph learning, use it to study existing methods and provide guidelines to design new models.
Project description:MotivationPangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These models offer the opportunity to study the entire genomic diversity of a population, including structurally complex regions. Nevertheless, analyzing hundreds of gigabase-scale genomes using pangenome graphs is difficult as it is not well-supported by existing tools. Hence, fast and versatile software is required to ask advanced questions to such data in an efficient way.ResultsWe wrote Optimized Dynamic Genome/Graph Implementation (ODGI), a novel suite of tools that implements scalable algorithms and has an efficient in-memory representation of DNA pangenome graphs in the form of variation graphs. ODGI supports pre-built graphs in the Graphical Fragment Assembly format. ODGI includes tools for detecting complex regions, extracting pangenomic loci, removing artifacts, exploratory analysis, manipulation, validation and visualization. Its fast parallel execution facilitates routine pangenomic tasks, as well as pipelines that can quickly answer complex biological questions of gigabase-scale pangenome graphs.Availability and implementationODGI is published as free software under the MIT open source license. Source code can be downloaded from https://github.com/pangenome/odgi and documentation is available at https://odgi.readthedocs.io. ODGI can be installed via Bioconda https://bioconda.github.io/recipes/odgi/README.html or GNU Guix https://github.com/pangenome/odgi/blob/master/guix.scm.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Graph theoretical concepts are useful for the description and analysis of interactions and relationships in biological systems. We give a brief introduction into some of the concepts and their areas of application in molecular biology. We discuss software that is available through the Bioconductor project and present a simple example application to the integration of a protein-protein interaction and a co-expression network.
Project description:The structure of RNA has been a natural subject for mathematical modeling, inviting many innovative computational frameworks. This single-stranded polynucleotide chain can fold upon itself in numerous ways to form hydrogen-bonded segments, imperfect with single-stranded loops. Illustrating these paired and non-paired interaction networks, known as RNA's secondary (2D) structure, using mathematical graph objects has been illuminating for RNA structure analysis. Building upon such seminal work from the 1970s and 1980s, graph models are now used to study not only RNA structure but also describe RNA's recurring modular units, sample the conformational space accessible to RNAs, predict RNA's three-dimensional folds, and apply the combined aspects to novel RNA design. In this article, we outline the development of the RNA-As-Graphs (or RAG) approach and highlight current applications to RNA structure prediction and design.
Project description:MotivationThe de Bruijn graph is one of the fundamental data structures for analysis of high throughput sequencing data. In order to be applicable to population-scale studies, it is essential to build and store the graph in a space- and time-efficient manner. In addition, due to the ever-changing nature of population studies, it has become essential to update the graph after construction, e.g. add and remove nodes and edges. Although there has been substantial effort on making the construction and storage of the graph efficient, there is a limited amount of work in building the graph in an efficient and mutable manner. Hence, most space efficient data structures require complete reconstruction of the graph in order to add or remove edges or nodes.ResultsIn this article, we present DynamicBOSS, a succinct representation of the de Bruijn graph that allows for an unlimited number of additions and deletions of nodes and edges. We compare our method with other competing methods and demonstrate that DynamicBOSS is the only method that supports both addition and deletion and is applicable to very large samples (e.g. greater than 15 billion k-mers). Competing dynamic methods, e.g. FDBG cannot be constructed on large scale datasets, or cannot support both addition and deletion, e.g. BiFrost.Availability and implementationDynamicBOSS is publicly available at https://github.com/baharpan/dynboss.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:BackgroundFactor graphs provide a flexible and general framework for specifying probability distributions. They can capture a range of popular and recent models for analysis of both genomics data as well as data from other scientific fields. Owing to the ever larger data sets encountered in genomics and the multiple-testing issues accompanying them, accurate significance evaluation is of great importance. We here address the problem of evaluating statistical significance of observations from factor graph models.ResultsTwo novel numerical approximations for evaluation of statistical significance are presented. First a method using importance sampling. Second a saddlepoint approximation based method. We develop algorithms to efficiently compute the approximations and compare them to naive sampling and the normal approximation. The individual merits of the methods are analysed both from a theoretical viewpoint and with simulations. A guideline for choosing between the normal approximation, saddle-point approximation and importance sampling is also provided. Finally, the applicability of the methods is demonstrated with examples from cancer genomics, motif-analysis and phylogenetics.ConclusionsThe applicability of saddlepoint approximation and importance sampling is demonstrated on known models in the factor graph framework. Using the two methods we can substantially improve computational cost without compromising accuracy. This contribution allows analyses of large datasets in the general factor graph framework.