Beyond ranking nodes: Predicting epidemic outbreak sizes by network centralities.
ABSTRACT: Identifying important nodes for disease spreading is a central topic in network epidemiology. We investigate how well the position of a node, characterized by standard network measures, can predict its epidemiological importance in any graph of a given number of nodes. This is in contrast to other studies that deal with the easier prediction problem of ranking nodes by their epidemic importance in given graphs. As a benchmark for epidemic importance, we calculate the exact expected outbreak size given a node as the source. We study exhaustively all graphs of a given size, so do not restrict ourselves to certain generative models for graphs, nor to graph data sets. Due to the large number of possible nonisomorphic graphs of a fixed size, we are limited to ten-node graphs. We find that combinations of two or more centralities are predictive (R2 scores of 0.91 or higher) even for the most difficult parameter values of the epidemic simulation. Typically, these successful combinations include one normalized spectral centrality (such as PageRank or Katz centrality) and one measure that is sensitive to the number of edges in the graph.
Project description:Graph theory models can produce simple, biologically informative metrics of the topology of resting-state functional connectivity (FC) networks. However, typical graph theory approaches model FC relationships between regions (nodes) as unweighted edges, complicating their interpretability in studies of disease or aging. We extended existing techniques and constructed fully connected weighted graphs for groups of age-matched human immunodeficiency virus (HIV) positive (n = 67) and HIV negative (n = 77) individuals. We compared test-retest reliability of weighted versus unweighted metrics in an independent study of healthy individuals (n = 22) and found weighted measures to be more stable. We quantified 2 measures of node centrality (closeness centrality and eigenvector centrality) to capture the relative importance of individual nodes. We also quantified 1 measure of graph entropy (diversity) to measure the variability in connection strength (edge weights) at each node. HIV was primarily associated with differences in measures of centrality, and age was primarily associated with differences in diversity. HIV and age were associated with divergent measures when evaluated at the whole graph level, within individual functional networks, and at the level of individual nodes. Graph models may allow us to distinguish previously indistinguishable effects related to HIV and age on FC.
Project description:BACKGROUND:Detection of central nodes in asymmetrically directed biological networks depends on centrality metrics quantifying individual nodes' importance in a network. In topological analyses on metabolic networks, various centrality metrics have been mostly applied to metabolite-centric graphs. However, centrality metrics including those not depending on high connections are largely unexplored for directed reaction-centric graphs. RESULTS:We applied directed versions of centrality metrics to directed reaction-centric graphs of microbial metabolic networks. To investigate the local role of a node, we developed a novel metric, cascade number, considering how many nodes are closed off from information flow when a particular node is removed. High modularity and scale-freeness were found in the directed reaction-centric graphs and betweenness centrality tended to belong to densely connected modules. Cascade number and bridging centrality identified cascade subnetworks controlling local information flow and irreplaceable bridging nodes between functional modules, respectively. Reactions highly ranked with bridging centrality and cascade number tended to be essential, compared to reactions that other central metrics detected. CONCLUSIONS:We demonstrate that cascade number and bridging centrality are useful to identify key reactions controlling local information flow in directed reaction-centric graphs of microbial metabolic networks. Knowledge about the local flow connectivity and connections between local modules will contribute to understand how metabolic pathways are assembled.
Project description:Constructing effective and scalable protection strategies over epidemic propagation is a challenging issue. It has been attracting interests in both theoretical and empirical studies. However, most of the recent developments are limited to the simplified single-layered networks. Multiplex social networks are social networks with multiplelayers where the same set of nodes appear in different layers. Consequently, a single attack can trigger simultaneous propagation in all corresponding layers. Therefore, suppressing propagation in multiplex topologies is more challenging given the fact that each layer also has a different structure. In this paper, we address the problem of suppressing the epidemic propagation in multiplex social networks by allocating protection resources throughout different layers. Given a multiplex graph, such as a social network, and k budget of protection resources, we aim to protect a set of nodes such that the percentage of survived nodes at the end of epidemics is maximized. We propose MultiplexShield, which employs the role of graph spectral properties, degree centrality and layer-wise stochastic propagation rate to pre-emptively select k nodes for protection. We also comprehensively evaluate our proposal in two different approaches: multiplex-based and layer-based node protection schemes. Furthermore, two kinds of common attacks are also evaluated: random and targeted attack. Experimental results show the effectiveness of our proposal on real-world datasets.
Project description:Human learners are adept at grasping the complex relationships underlying incoming sequential input1. In the present work, we formalize complex relationships as graph structures2 derived from temporal associations3,4 in motor sequences. Next, we explore the extent to which learners are sensitive to key variations in the topological properties5 inherent to those graph structures. Participants performed a probabilistic motor sequence task in which the order of button presses was determined by the traversal of graphs with modular, lattice-like or random organization. Graph nodes each represented a unique button press, and edges represented a transition between button presses. The results indicate that learning, indexed here by participants' response times, was strongly mediated by the graph's mesoscale organization, with modular graphs being associated with shorter response times than random and lattice graphs. Moreover, variations in a node's number of connections (degree) and a node's role in mediating long-distance communication (betweenness centrality) impacted graph learning, even after accounting for the level of practice on that node. These results demonstrate that the graph architecture underlying temporal sequences of stimuli fundamentally constrains learning, and moreover that tools from network science provide a valuable framework for assessing how learners encode complex, temporally structured information.
Project description:The analysis of paths in graphs is highly relevant in many domains. Typically, path-related tasks are performed in node-link layouts. Unfortunately, graph layouts often do not scale to the size of many real world networks. Also, many networks are multivariate, i.e., contain rich attribute sets associated with the nodes and edges. These attributes are often critical in judging paths, but directly visualizing attributes in a graph layout exacerbates the scalability problem. In this paper, we present visual analysis solutions dedicated to path-related tasks in large and highly multivariate graphs. We show that by focusing on paths, we can address the scalability problem of multivariate graph visualization, equipping analysts with a powerful tool to explore large graphs. We introduce Pathfinder (Figure 1), a technique that provides visual methods to query paths, while considering various constraints. The resulting set of paths is visualized in both a ranked list and as a node-link diagram. For the paths in the list, we display rich attribute data associated with nodes and edges, and the node-link diagram provides topological context. The paths can be ranked based on topological properties, such as path length or average node degree, and scores derived from attribute data. Pathfinder is designed to scale to graphs with tens of thousands of nodes and edges by employing strategies such as incremental query results. We demonstrate Pathfinder's fitness for use in scenarios with data from a coauthor network and biological pathways.
Project description:Biological network data, such as metabolic-, signaling- or physical interaction graphs of proteins are increasingly available in public repositories for important species. Tools for the quantitative analysis of these networks are being developed today. Protein network-based drug target identification methods usually return protein hubs with large degrees in the networks as potentially important targets. Some known, important protein targets, however, are not hubs at all, and perturbing protein hubs in these networks may have several unwanted physiological effects, due to their interaction with numerous partners. Here, we show a novel method applicable in networks with directed edges (such as metabolic networks) that compensates for the low degree (non-hub) vertices in the network, and identifies important nodes, regardless of their hub properties. Our method computes the PageRank for the nodes of the network, and divides the PageRank by the in-degree (i.e., the number of incoming edges) of the node. This quotient is the same in all nodes in an undirected graph (even for large- and low-degree nodes, that is, for hubs and non-hubs as well), but may differ significantly from node to node in directed graphs. We suggest to assign importance to non-hub nodes with large PageRank/in-degree quotient. Consequently, our method gives high scores to nodes with large PageRank, relative to their degrees: therefore non-hub important nodes can easily be identified in large networks. We demonstrate that these relatively high PageRank scores have biological relevance: the method correctly finds numerous already validated drug targets in distinct organisms (Mycobacterium tuberculosis, Plasmodium falciparum and MRSA Staphylococcus aureus), and consequently, it may suggest new possible protein targets as well. Additionally, our scoring method was not chosen arbitrarily: its value for all nodes of all undirected graphs is constant; therefore its high value captures importance in the directed edge structure of the graph.
Project description:BACKGROUND: A genetic network can be represented as a directed graph in which a node corresponds to a gene and a directed edge specifies the direction of influence of one gene on another. The reconstruction of such networks from transcript profiling data remains an important yet challenging endeavor. A transcript profile specifies the abundances of many genes in a biological sample of interest. Prevailing strategies for learning the structure of a genetic network from high-dimensional transcript profiling data assume sparsity and linearity. Many methods consider relatively small directed graphs, inferring graphs with up to a few hundred nodes. This work examines large undirected graphs representations of genetic networks, graphs with many thousands of nodes where an undirected edge between two nodes does not indicate the direction of influence, and the problem of estimating the structure of such a sparse linear genetic network (SLGN) from transcript profiling data. RESULTS: The structure learning task is cast as a sparse linear regression problem which is then posed as a LASSO (l1-constrained fitting) problem and solved finally by formulating a Linear Program (LP). A bound on the Generalization Error of this approach is given in terms of the Leave-One-Out Error. The accuracy and utility of LP-SLGNs is assessed quantitatively and qualitatively using simulated and real data. The Dialogue for Reverse Engineering Assessments and Methods (DREAM) initiative provides gold standard data sets and evaluation metrics that enable and facilitate the comparison of algorithms for deducing the structure of networks. The structures of LP-SLGNs estimated from the INSILICO1, INSILICO2 and INSILICO3 simulated DREAM2 data sets are comparable to those proposed by the first and/or second ranked teams in the DREAM2 competition. The structures of LP-SLGNs estimated from two published Saccharomyces cerevisae cell cycle transcript profiling data sets capture known regulatory associations. In each S. cerevisiae LP-SLGN, the number of nodes with a particular degree follows an approximate power law suggesting that its degree distributions is similar to that observed in real-world networks. Inspection of these LP-SLGNs suggests biological hypotheses amenable to experimental verification. CONCLUSION: A statistically robust and computationally efficient LP-based method for estimating the topology of a large sparse undirected graph from high-dimensional data yields representations of genetic networks that are biologically plausible and useful abstractions of the structures of real genetic networks. Analysis of the statistical and topological properties of learned LP-SLGNs may have practical value; for example, genes with high random walk betweenness, a measure of the centrality of a node in a graph, are good candidates for intervention studies and hence integrated computational - experimental investigations designed to infer more realistic and sophisticated probabilistic directed graphical model representations of genetic networks. The LP-based solutions of the sparse linear regression problem described here may provide a method for learning the structure of transcription factor networks from transcript profiling and transcription factor binding motif data.
Project description:BACKGROUND: As protein-protein interactions connect proteins that participate in either the same or different functions, networks of interacting and functionally annotated proteins can be converted into process graphs of inter-dependent function nodes (each node corresponding to interacting proteins with the same functional annotation). However, as proteins have multiple annotations, the process graph is non-redundant, if only proteins participating directly in a given function are included in the related function node. RESULTS: Reasoning that topological features (e.g., clusters of highly inter-connected proteins) might help approaching structured and non-redundant understanding of molecular function, an algorithm was developed that prioritizes inclusion of proteins into the function nodes that best overlap protein clusters. Specifically, the algorithm identifies function nodes (and their mutual relations), based on the topological analysis of a protein interaction network, which can be related to various biological domains, such as cellular components (e.g., peroxisome and cellular bud) or biological processes (e.g., cell budding) of the model organism S. cerevisiae. CONCLUSIONS: The method we have described allows converting a protein interaction network into a non-redundant process graph of inter-dependent function nodes. The examples we have described show that the resulting graph allows researchers to formulate testable hypotheses about dependencies among functions and the underlying mechanisms.
Project description:Hubs within the neocortical structural network determined by graph theoretical analysis play a crucial role in brain function. We mapped neocortical hubs topographically, using a sample population of 63 young adults. Subjects were imaged with high resolution structural and diffusion weighted magnetic resonance imaging techniques. Multiple network configurations were then constructed per subject, using random parcellations to define the nodes and using fibre tractography to determine the connectivity between the nodes. The networks were analysed with graph theoretical measures. Our results give reference maps of hub distribution measured with betweenness centrality and node degree. The loci of the hubs correspond with key areas from known overlapping cognitive networks. Several hubs were asymmetrically organized across hemispheres. Furthermore, females have hubs with higher betweenness centrality and males have hubs with higher node degree. Female networks have higher small-world indices.
Project description:Traumatic brain injury affects brain connectivity by producing traumatic axonal injury. This disrupts the function of large-scale networks that support cognition. The best way to describe this relationship is unclear, but one elegant approach is to view networks as graphs. Brain regions become nodes in the graph, and white matter tracts the connections. The overall effect of an injury can then be estimated by calculating graph metrics of network structure and function. Here we test which graph metrics best predict the presence of traumatic axonal injury, as well as which are most highly associated with cognitive impairment. A comprehensive range of graph metrics was calculated from structural connectivity measures for 52 patients with traumatic brain injury, 21 of whom had microbleed evidence of traumatic axonal injury, and 25 age-matched controls. White matter connections between 165 grey matter brain regions were defined using tractography, and structural connectivity matrices calculated from skeletonized diffusion tensor imaging data. This technique estimates injury at the centre of tract, but is insensitive to damage at tract edges. Graph metrics were calculated from the resulting connectivity matrices and machine-learning techniques used to select the metrics that best predicted the presence of traumatic brain injury. In addition, we used regularization and variable selection via the elastic net to predict patient behaviour on tests of information processing speed, executive function and associative memory. Support vector machines trained with graph metrics of white matter connectivity matrices from the microbleed group were able to identify patients with a history of traumatic brain injury with 93.4% accuracy, a result robust to different ways of sampling the data. Graph metrics were significantly associated with cognitive performance: information processing speed (R(2) = 0.64), executive function (R(2) = 0.56) and associative memory (R(2) = 0.25). These results were then replicated in a separate group of patients without microbleeds. The most influential graph metrics were betweenness centrality and eigenvector centrality, which provide measures of the extent to which a given brain region connects other regions in the network. Reductions in betweenness centrality and eigenvector centrality were particularly evident within hub regions including the cingulate cortex and caudate. Our results demonstrate that betweenness centrality and eigenvector centrality are reduced within network hubs, due to the impact of traumatic axonal injury on network connections. The dominance of betweenness centrality and eigenvector centrality suggests that cognitive impairment after traumatic brain injury results from the disconnection of network hubs by traumatic axonal injury.