Project description:Homophily-the tendency of nodes to connect to others of the same type-is a central issue in the study of networks. Here we take a local view of homophily, defining notions of first-order homophily of a node (its individual tendency to link to similar others) and second-order homophily of a node (the aggregate first-order homophily of its neighbors). Through this view, we find a surprising result for homophily values that applies with only minimal assumptions on the graph topology. It can be phrased most simply as "in a graph of red and blue nodes, red friends of red nodes are on average more homophilous than red friends of blue nodes". This gap in averages defies simple intuitive explanations, applies to globally heterophilous and homophilous networks and is reminiscent of but structually distinct from the Friendship Paradox. The existence of this gap suggests intrinsic biases in homophily measurements between groups, and hence is relevant to empirical studies of homophily in networks.
Project description:We present a new method for assessing and measuring homophily in networks whose nodes have categorical attributes, namely when the nodes of networks come partitioned into classes (colors). We probe this method in two different classes of networks: (i) protein-protein interaction (PPI) networks, where nodes correspond to proteins, partitioned according to their functional role, and edges represent functional interactions between proteins (ii) Pokec on-line social network, where nodes correspond to users, partitioned according to their age, and edges respresent friendship between users.Similarly to other classical and well consolidated approaches, our method compares the relative edge density of the subgraphs induced by each class with the corresponding expected relative edge density under a null model. The novelty of our approach consists in prescribing an endogenous null model, namely, the sample space of the null model is built on the input network itself. This allows us to give exact explicit expression for the [Formula: see text]-score of the relative edge density of each class as well as other related statistics. The [Formula: see text]-scores directly quantify the statistical significance of the observed homophily via Čebyšëv inequality. The expression of each [Formula: see text]-score is entered by the network structure through basic combinatorial invariant such as the number of subgraphs with two spanning edges. Each [Formula: see text]-score is computed in [Formula: see text] time for a network with n nodes and m edges. This leads to an overall efficient computational method for assesing homophily. We complement the analysis of homophily/heterophily by considering [Formula: see text]-scores of the number of isolated nodes in the subgraphs induced by each class, that are computed in O(nm) time. Theoretical results are then exploited to show that, as expected, both the analyzed network classes are significantly homophilic with respect to the considered node properties.
Project description:Higher-order network models are becoming increasingly relevant for their ability to explicitly capture interactions between three or more entities in a complex system at once. In this paper, we study homophily, the tendency for alike individuals to form connections, as it pertains to higher-order interactions. We find that straightforward extensions of classical homophily measures to interactions of size 3 and larger are often inflated by homophily present in pairwise interactions. This inflation can even hide the presence of anti-homophily in higher-order interactions. Hence, we develop a structural measure of homophily, simplicial homophily, which decouples homophily in pairwise interactions from that of higher-order interactions. The definition applies when the network can be modeled as a simplicial complex, a mathematical abstraction which makes a closure assumption that for any higher-order relationship in the network, all corresponding subsets of that relationship occur in the data. Whereas previous work has used this closure assumption to develop a rich theory in algebraic topology, here we use the assumption to make empirical comparisons between interactions of different sizes. The simplicial homophily measure is validated theoretically using an extension of a stochastic block model for simplicial complexes and empirically in large-scale experiments across 16 datasets. We further find that simplicial homophily can be used to identify when node features are valuable for higher-order link prediction. Ultimately, this highlights a subtlety in studying node features in higher-order networks, as measures defined on groups of size k can inherit features described by interactions of size [Formula: see text].
Project description:Individuals usually punish free riders but refuse to sanction those who cooperate but do not punish. This missing second-order peer punishment is a fundamental problem for the stabilization of cooperation. To solve this problem, most societies today have implemented central authorities that punish free riders and tax evaders alike, such that second-order punishment is fully established. The emergence of such stable authorities from individual decisions, however, creates a new paradox: it seems absurd to expect individuals who do not engage in second-order punishment to strive for an authority that does. Herein, we provide a mathematical model and experimental results from a public goods game where subjects can choose between a community with and without second-order punishment in two different ways. When subjects can migrate continuously to either community, we identify a bias toward institutions that do not punish tax evaders. When subjects have to vote once for all rounds of the game and have to accept the decision of the majority, they prefer a society with second-order punishment. These findings uncover the existence of a democracy premium. The majority-voting rule allows subjects to commit themselves and to implement institutions that eventually lead to a higher welfare for all.
Project description:Homophily is the seemingly ubiquitous tendency for people to connect and interact with other individuals who are similar to them. This is a well-documented principle and is fundamental for how society organizes. Although many social interactions occur in groups, homophily has traditionally been measured using a graph model, which only accounts for pairwise interactions involving two individuals. Here, we develop a framework using hypergraphs to quantify homophily from group interactions. This reveals natural patterns of group homophily that appear with gender in scientific collaboration and political affiliation in legislative bill cosponsorship and also reveals distinctive gender distributions in group photographs, all of which cannot be fully captured by pairwise measures. At the same time, we show that seemingly natural ways to define group homophily are combinatorially impossible. This reveals important pitfalls to avoid when defining and interpreting notions of group homophily, as higher-order homophily patterns are governed by combinatorial constraints that are independent of human behavior but are easily overlooked.
Project description:Systematicity is a property of cognitive architecture whereby having certain cognitive capacities implies having certain other "structurally related" cognitive capacities. The predominant classical explanation for systematicity appeals to a notion of common syntactic/symbolic structure among the systematically related capacities. Although learning is a (second-order) cognitive capacity of central interest to cognitive science, a systematic ability to learn certain cognitive capacities, i.e., second-order systematicity, has been given almost no attention in the literature. In this paper, we introduce learned associations as an instance of second-order systematicity that poses a paradox for classical theory, because this form of systematicity involves the kinds of associative constructions that were explicitly rejected by the classical explanation. Our category theoretic explanation of systematicity resolves this problem, because both first and second-order forms of systematicity are derived from the same categorical construction: universal morphisms, which generalize the notion of compositionality of constituent representations to (categorical) compositionality of constituent processes. We derive a model of systematic associative learning based on (co)recursion, which is an instance of a universal construction. These results provide further support for a category theory foundation for cognitive architecture.
Project description:MotivationSystems biology analyses often use correlations in gene expression profiles to infer co-expression networks that are then used as input for gene regulatory network inference or to identify functional modules of co-expressed or putatively co-regulated genes. While systematic biases, including batch effects, are known to induce spurious associations and confound differential gene expression analyses (DE), the impact of batch effects on gene co-expression has not been fully explored. Methods have been developed to adjust expression values, ensuring conditional independence of mean and variance from batch or other covariates for each gene, resulting in improved fidelity of DE analysis. However, such adjustments do not address the potential for spurious differential co-expression (DC) between groups. Consequently, uncorrected, artifactual DC can skew the correlation structure, leading to the identification of false, non-biological associations, even when the input data are corrected using standard batch correction.ResultsIn this work, we demonstrate the persistence of confounders in covariance after standard batch correction using synthetic and real-world gene expression data examples. We then introduce Co-expression Batch Reduction Adjustment (COBRA), a method for computing a batch-corrected gene co-expression matrix based on estimating a conditional covariance matrix. COBRA estimates a reduced set of parameters expressing the co-expression matrix as a function of the sample covariates, allowing control for continuous and categorical covariates. COBRA is computationally efficient, leveraging the inherently modular structure of genomic data to estimate accurate gene regulatory associations and facilitate functional analysis for high-dimensional genomic data.Availability and implementationCOBRA is available under the GLP3 open source license in R and Python in netZoo (https://netzoo.github.io).
Project description:This Article contains errors in Fig. 3, Fig. 4 and Fig. 7, for which we apologize. In Fig. 3, panel 'b', the 0.5 hour time point after Ku55933 treatment images were inadvertently replaced with duplicates of the 3 hour time point after Ku55933 treatment images in Fig. 3b. Additionally, in panel 'b', the 0.5 hour time point after Nu7026 treatment images were inadvertently replaced with duplicates of the 180 min time point after siMDC1 treatment images in Fig. 3d. In Fig. 4, panel 'g', RNF168 foci in U2OS cell images were inadvertently replaced with duplicates of RNF168 foci in HeLa cell images in Fig. 4f. In Fig. 7, panel 'b', the DAPI images 0.5 hours after IR under siID3 treatment were inadvertently replaced with DAPI images of a different field of view from the same experiment. Additionally, in panel 'i', the shID3 mock-treated GFP-ID3 cells image was inadvertently replace with duplications of the shID3 mock-treated GFP-ID3 cells image in Fig. 7g.
Project description:SummaryIt has been observed in different kinds of networks, such as social or biological ones, a typical behavior inspired by the general principle 'similarity breeds connections'. These networks are defined as homophilic as nodes belonging to the same class preferentially interact with each other. In this work, we present HONTO (HOmophily Network TOol), a user-friendly open-source Python3 package designed to evaluate and analyze homophily in complex networks. The tool takes in input from the network along with a partition of its nodes into classes and yields a matrix whose entries are the homophily/heterophily z-score values. To complement the analysis, the tool also provides z-score values of nodes that do not interact with any other node of the same class. Homophily/heterophily z-scores values are presented as a heatmap allowing a visual at-a-glance interpretation of results.Availability and implementationTool's source code is available at https://github.com/cumbof/honto under the MIT license, installable as a package from PyPI (pip install honto) and conda-forge (conda install -c conda-forge honto), and has a wrapper for the Galaxy platform available on the official Galaxy ToolShed (Blankenberg et al., 2014) at https://toolshed.g2.bx.psu.edu/view/fabio/honto.