-
Actionable Interpretability Must Be Defined in Terms of Symmetries
Authors:
Pietro Barbiero,
Mateo Espinosa Zarlenga,
Francesco Giannini,
Alberto Termine,
Filippo Bonchi,
Mateja Jamnik,
Giuseppe Marra
Abstract:
This paper argues that interpretability research in Artificial Intelligence (AI) is fundamentally ill-posed as existing definitions of interpretability fail to describe how interpretability can be formally tested or designed for. We posit that actionable definitions of interpretability must be formulated in terms of *symmetries* that inform model design and lead to testable conditions. Under a pro…
▽ More
This paper argues that interpretability research in Artificial Intelligence (AI) is fundamentally ill-posed as existing definitions of interpretability fail to describe how interpretability can be formally tested or designed for. We posit that actionable definitions of interpretability must be formulated in terms of *symmetries* that inform model design and lead to testable conditions. Under a probabilistic view, we hypothesise that four symmetries (inference equivariance, information invariance, concept-closure invariance, and structural invariance) suffice to (i) formalise interpretable models as a subclass of probabilistic models, (ii) yield a unified formulation of interpretable inference (e.g., alignment, interventions, and counterfactuals) as a form of Bayesian inversion, and (iii) provide a formal framework to verify compliance with safety standards and regulations.
△ Less
Submitted 29 January, 2026; v1 submitted 19 January, 2026;
originally announced January 2026.
-
Tapes as Stochastic Matrices of String Diagrams
Authors:
Filippo Bonchi,
Cipriano Junior Cioffo
Abstract:
Tape diagrams provide a graphical notation for categories equipped with two monoidal products, $\otimes$ and $\oplus$, where $\oplus$ is a biproduct. Recently, they have been generalised to handle Kleisli categories of arbitrary monoidal monads. In this work, we show that for the subdistribution monad, tapes are isomorphic to stochastic matrices of subdistributions of string diagrams. We then expl…
▽ More
Tape diagrams provide a graphical notation for categories equipped with two monoidal products, $\otimes$ and $\oplus$, where $\oplus$ is a biproduct. Recently, they have been generalised to handle Kleisli categories of arbitrary monoidal monads. In this work, we show that for the subdistribution monad, tapes are isomorphic to stochastic matrices of subdistributions of string diagrams. We then exploit this result to provide a complete axiomatisation of probabilistic Boolean circuits.
△ Less
Submitted 4 January, 2026;
originally announced January 2026.
-
A Diagrammatic Basis for Computer Programming
Authors:
Filippo Bonchi,
Alessandro Di Giorgio,
Elena Di Lavore
Abstract:
Tape diagrams provide a convenient graphical notation for arrows of rig categories, i.e., categories equipped with two monoidal products, $\oplus$ and $\otimes$. In this work, we introduce Kleene-Cartesian rig categories, namely rig categories where $\otimes$ provides a Cartesian bicategory, while $\oplus$ a Kleene bicategory. We show that the associated tape diagrams can conveniently deal with im…
▽ More
Tape diagrams provide a convenient graphical notation for arrows of rig categories, i.e., categories equipped with two monoidal products, $\oplus$ and $\otimes$. In this work, we introduce Kleene-Cartesian rig categories, namely rig categories where $\otimes$ provides a Cartesian bicategory, while $\oplus$ a Kleene bicategory. We show that the associated tape diagrams can conveniently deal with imperative programs and various program logic.
△ Less
Submitted 8 December, 2025;
originally announced December 2025.
-
A Survey on Centrality and Importance Measures in Hypergraphs: Categorization and Empirical Insights
Authors:
Jaewan Chun,
Fanchen Bu,
Yeongho Kim,
Atsushi Miyauchi,
Francesco Bonchi,
Kijung Shin
Abstract:
Identifying central entities and interactions is a fundamental problem in network science. While well-studied for graphs (pairwise relations), many biological and social systems exhibit higher-order interactions best modeled by hypergraphs. This has led to a proliferation of specialized hypergraph centrality measures, but the field remains fragmented and lacks a unifying framework. This paper addr…
▽ More
Identifying central entities and interactions is a fundamental problem in network science. While well-studied for graphs (pairwise relations), many biological and social systems exhibit higher-order interactions best modeled by hypergraphs. This has led to a proliferation of specialized hypergraph centrality measures, but the field remains fragmented and lacks a unifying framework. This paper addresses this gap by providing the first systematic survey of 39 distinct measures. We introduce a novel taxonomy classifying them as: (1) structural (topology-based), (2) functional (impact on system dynamics), or (3) contextual (incorporating external features). We also present an experimental assessment comparing their empirical similarity and computation time. Finally, we discuss applications, establishing a coherent roadmap for future research in this area.
△ Less
Submitted 27 November, 2025;
originally announced December 2025.
-
Online Minimization of Polarization and Disagreement via Low-Rank Matrix Bandits
Authors:
Federico Cinus,
Yuko Kuroki,
Atsushi Miyauchi,
Francesco Bonchi
Abstract:
We study the problem of minimizing polarization and disagreement in the Friedkin-Johnsen opinion dynamics model under incomplete information. Unlike prior work that assumes a static setting with full knowledge of agents' innate opinions, we address the more realistic online setting where innate opinions are unknown and must be learned through sequential observations. This novel setting, which natu…
▽ More
We study the problem of minimizing polarization and disagreement in the Friedkin-Johnsen opinion dynamics model under incomplete information. Unlike prior work that assumes a static setting with full knowledge of agents' innate opinions, we address the more realistic online setting where innate opinions are unknown and must be learned through sequential observations. This novel setting, which naturally mirrors periodic interventions on social media platforms, is formulated as a regret minimization problem, establishing a key connection between algorithmic interventions on social media platforms and the theory of multi-armed bandits. In our formulation, a learner observes only a scalar feedback of the overall polarization and disagreement after an intervention. For this novel bandit problem, we propose a two-stage algorithm based on low-rank matrix bandits. The algorithm first performs subspace estimation to identify an underlying low-dimensional structure, and then employs a linear bandit algorithm within the compact dimensional representation derived from the estimated subspace. We show that our algorithm achieves the cumulative regret of $\widetilde{\mathcal{O}}\big(\max(\tfrac{1}κ,\sqrt{|V|})\sqrt{|V|T}\big)$ over time horizon $T$, where $V$ is the set of agents and $κ$ is a parameter dependent on the diversity of interventions. Empirical results validate that our algorithm significantly outperforms a linear bandit baseline in terms of both cumulative regret and running time.
△ Less
Submitted 6 March, 2026; v1 submitted 1 October, 2025;
originally announced October 2025.
-
Program Logics via Distributive Monoidal Categories
Authors:
Filippo Bonchi,
Elena Di Lavore,
Mario Román,
Sam Staton
Abstract:
We derive multiple program logics, including correctness, incorrectness, and relational Hoare logic, from the axioms of imperative categories: uniformly traced distributive copy-discard categories. We introduce an internal language for imperative multicategories, on top of which we derive combinators for an adaptation of Dijkstra's guarded command language. Rules of program logics are derived from…
▽ More
We derive multiple program logics, including correctness, incorrectness, and relational Hoare logic, from the axioms of imperative categories: uniformly traced distributive copy-discard categories. We introduce an internal language for imperative multicategories, on top of which we derive combinators for an adaptation of Dijkstra's guarded command language. Rules of program logics are derived from this internal language.
△ Less
Submitted 24 July, 2025;
originally announced July 2025.
-
Size-adaptive Hypothesis Testing for Fairness
Authors:
Antonio Ferrara,
Francesco Cozzi,
Alan Perotti,
André Panisson,
Francesco Bonchi
Abstract:
Determining whether an algorithmic decision-making system discriminates against a specific demographic typically involves comparing a single point estimate of a fairness metric against a predefined threshold. This practice is statistically brittle: it ignores sampling error and treats small demographic subgroups the same as large ones. The problem intensifies in intersectional analyses, where mult…
▽ More
Determining whether an algorithmic decision-making system discriminates against a specific demographic typically involves comparing a single point estimate of a fairness metric against a predefined threshold. This practice is statistically brittle: it ignores sampling error and treats small demographic subgroups the same as large ones. The problem intensifies in intersectional analyses, where multiple sensitive attributes are considered jointly, giving rise to a larger number of smaller groups. As these groups become more granular, the data representing them becomes too sparse for reliable estimation, and fairness metrics yield excessively wide confidence intervals, precluding meaningful conclusions about potential unfair treatments.
In this paper, we introduce a unified, size-adaptive, hypothesis-testing framework that turns fairness assessment into an evidence-based statistical decision. Our contribution is twofold. (i) For sufficiently large subgroups, we prove a Central-Limit result for the statistical parity difference, leading to analytic confidence intervals and a Wald test whose type-I (false positive) error is guaranteed at level $α$. (ii) For the long tail of small intersectional groups, we derive a fully Bayesian Dirichlet-multinomial estimator; Monte-Carlo credible intervals are calibrated for any sample size and naturally converge to Wald intervals as more data becomes available. We validate our approach empirically on benchmark datasets, demonstrating how our tests provide interpretable, statistically rigorous decisions under varying degrees of data availability and intersectionality.
△ Less
Submitted 19 March, 2026; v1 submitted 12 June, 2025;
originally announced June 2025.
-
Bounded-Abstention Pairwise Learning to Rank
Authors:
Antonio Ferrara,
Andrea Pugnana,
Francesco Bonchi,
Salvatore Ruggieri
Abstract:
Ranking systems influence decision-making in high-stakes domains like health, education, and employment, where they can have substantial economic and social impacts. This makes the integration of safety mechanisms essential. One such mechanism is $\textit{abstention}$, which enables algorithmic decision-making system to defer uncertain or low-confidence decisions to human experts. While abstention…
▽ More
Ranking systems influence decision-making in high-stakes domains like health, education, and employment, where they can have substantial economic and social impacts. This makes the integration of safety mechanisms essential. One such mechanism is $\textit{abstention}$, which enables algorithmic decision-making system to defer uncertain or low-confidence decisions to human experts. While abstention have been predominantly explored in the context of classification tasks, its application to other machine learning paradigms remains underexplored. In this paper, we introduce a novel method for abstention in pairwise learning-to-rank tasks. Our approach is based on thresholding the ranker's conditional risk: the system abstains from making a decision when the estimated risk exceeds a predefined threshold. Our contributions are threefold: a theoretical characterization of the optimal abstention strategy, a model-agnostic, plug-in algorithm for constructing abstaining ranking models, and a comprehensive empirical evaluations across multiple datasets, demonstrating the effectiveness of our approach.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Finding Counterfactual Evidences for Node Classification
Authors:
Dazhuo Qiu,
Jinwen Chen,
Arijit Khan,
Yan Zhao,
Francesco Bonchi
Abstract:
Counterfactual learning is emerging as an important paradigm, rooted in causality, which promises to alleviate common issues of graph neural networks (GNNs), such as fairness and interpretability. However, as in many real-world application domains where conducting randomized controlled trials is impractical, one has to rely on available observational (factual) data to detect counterfactuals. In th…
▽ More
Counterfactual learning is emerging as an important paradigm, rooted in causality, which promises to alleviate common issues of graph neural networks (GNNs), such as fairness and interpretability. However, as in many real-world application domains where conducting randomized controlled trials is impractical, one has to rely on available observational (factual) data to detect counterfactuals. In this paper, we introduce and tackle the problem of searching for counterfactual evidences for the GNN-based node classification task. A counterfactual evidence is a pair of nodes such that, regardless they exhibit great similarity both in the features and in their neighborhood subgraph structures, they are classified differently by the GNN. We develop effective and efficient search algorithms and a novel indexing solution that leverages both node features and structural information to identify counterfactual evidences, and generalizes beyond any specific GNN. Through various downstream applications, we demonstrate the potential of counterfactual evidences to enhance fairness and accuracy of GNNs.
△ Less
Submitted 2 June, 2025; v1 submitted 16 May, 2025;
originally announced May 2025.
-
The calculus of neo-Peircean relations
Authors:
Filippo Bonchi,
Alessandro Di Giorgio,
Nathan Haydon,
Pawel Sobocinski
Abstract:
The calculus of relations was introduced by De Morgan and Peirce during the second half of the 19th century, as an extension of Boole's algebra of classes. Later developments on quantification theory by Frege and Peirce himself, paved the way to what is known today as first-order logic, causing the calculus of relations to be long forgotten. This was until 1941, when Tarski raised the question on…
▽ More
The calculus of relations was introduced by De Morgan and Peirce during the second half of the 19th century, as an extension of Boole's algebra of classes. Later developments on quantification theory by Frege and Peirce himself, paved the way to what is known today as first-order logic, causing the calculus of relations to be long forgotten. This was until 1941, when Tarski raised the question on the existence of a complete axiomatisation for it. This question found only negative answers: there is no finite axiomatisation for the calculus of relations and many of its fragments, as shown later by several no-go theorems. In this paper we show that -- by moving from traditional syntax (cartesian) to a diagrammatic one (monoidal) -- it is possible to have complete axiomatisations for the full calculus. The no-go theorems are circumvented by the fact that our calculus, named the calculus of neo-Peircean relations, is more expressive than the calculus of relations and, actually, as expressive as first-order logic. The axioms are obtained by combining two well known categorical structures: cartesian and linear bicategories.
△ Less
Submitted 9 April, 2026; v1 submitted 8 May, 2025;
originally announced May 2025.
-
Tape Diagrams for Monoidal Monads
Authors:
Filippo Bonchi,
Cipriano Junior Cioffo,
Alessandro Di Giorgio,
Elena Di Lavore
Abstract:
Tape diagrams provide a graphical representation for arrows of rig categories, namely categories equipped with two monoidal structures, $\oplus$ and $\otimes$, where $\otimes$ distributes over $\oplus$. However, their applicability is limited to categories where $\oplus$ is a biproduct, i.e., both a categorical product and a coproduct. In this work, we extend tape diagrams to deal with Kleisli cat…
▽ More
Tape diagrams provide a graphical representation for arrows of rig categories, namely categories equipped with two monoidal structures, $\oplus$ and $\otimes$, where $\otimes$ distributes over $\oplus$. However, their applicability is limited to categories where $\oplus$ is a biproduct, i.e., both a categorical product and a coproduct. In this work, we extend tape diagrams to deal with Kleisli categories of symmetric monoidal monads, presented by algebraic theories.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
Mathematical Foundation of Interpretable Equivariant Surrogate Models
Authors:
Jacopo Joy Colombini,
Filippo Bonchi,
Francesco Giannini,
Fosca Giannotti,
Roberto Pellungrini,
Patrizio Frosini
Abstract:
This paper introduces a rigorous mathematical framework for neural network explainability, and more broadly for the explainability of equivariant operators called Group Equivariant Operators (GEOs) based on Group Equivariant Non-Expansive Operators (GENEOs) transformations. The central concept involves quantifying the distance between GEOs by measuring the non-commutativity of specific diagrams. A…
▽ More
This paper introduces a rigorous mathematical framework for neural network explainability, and more broadly for the explainability of equivariant operators called Group Equivariant Operators (GEOs) based on Group Equivariant Non-Expansive Operators (GENEOs) transformations. The central concept involves quantifying the distance between GEOs by measuring the non-commutativity of specific diagrams. Additionally, the paper proposes a definition of interpretability of GEOs according to a complexity measure that can be defined according to each user preferences. Moreover, we explore the formal properties of this framework and show how it can be applied in classical machine learning scenarios, like image classification with convolutional neural networks.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Minimizing Polarization and Disagreement in the Friedkin-Johnsen Model with Unknown Innate Opinions
Authors:
Federico Cinus,
Atsushi Miyauchi,
Yuko Kuroki,
Francesco Bonchi
Abstract:
The bulk of the literature on opinion optimization in social networks adopts the Friedkin-Johnsen (FJ) opinion dynamics model, in which the innate opinions of all nodes are known: this is an unrealistic assumption. In this paper, we study opinion optimization under the FJ model without the full knowledge of innate opinions. Specifically, we borrow from the literature a series of objective function…
▽ More
The bulk of the literature on opinion optimization in social networks adopts the Friedkin-Johnsen (FJ) opinion dynamics model, in which the innate opinions of all nodes are known: this is an unrealistic assumption. In this paper, we study opinion optimization under the FJ model without the full knowledge of innate opinions. Specifically, we borrow from the literature a series of objective functions, aimed at minimizing polarization and/or disagreement, and we tackle the budgeted optimization problem, where we can query the innate opinions of only a limited number of nodes. Given the complexity of our problem, we propose a framework based on three steps: (1) select the limited number of nodes we query, (2) reconstruct the innate opinions of all nodes based on those queried, and (3) optimize the objective function with the reconstructed opinions. For each step of the framework, we present and systematically evaluate several effective strategies. A key contribution of our work is a rigorous error propagation analysis that quantifies how reconstruction errors in innate opinions impact the quality of the final solutions. Our experiments on various synthetic and real-world datasets show that we can effectively minimize polarization and disagreement even if we have quite limited information about innate opinions.
△ Less
Submitted 28 January, 2025; v1 submitted 27 January, 2025;
originally announced January 2025.
-
Engagement-Driven Content Generation with Large Language Models
Authors:
Erica Coppolillo,
Federico Cinus,
Marco Minici,
Francesco Bonchi,
Giuseppe Manco
Abstract:
Large Language Models (LLMs) demonstrate significant persuasive capabilities in one-on-one interactions, but their influence within social networks, where interconnected users and complex opinion dynamics pose unique challenges, remains underexplored. This paper addresses the research question: \emph{Can LLMs generate meaningful content that maximizes user engagement on social networks?}
To answ…
▽ More
Large Language Models (LLMs) demonstrate significant persuasive capabilities in one-on-one interactions, but their influence within social networks, where interconnected users and complex opinion dynamics pose unique challenges, remains underexplored. This paper addresses the research question: \emph{Can LLMs generate meaningful content that maximizes user engagement on social networks?}
To answer this, we propose a pipeline using reinforcement learning with simulated feedback, where the network's response to LLM-generated content (i.e., the reward) is simulated through a formal engagement model. This approach bypasses the temporal cost and complexity of live experiments, enabling an efficient feedback loop between the LLM and the network under study. It also allows to control over endogenous factors such as the LLM's position within the social network and the distribution of opinions on a given topic. Our approach is adaptive to the opinion distribution of the underlying network and agnostic to the specifics of the engagement model, which is embedded as a plug-and-play component. Such flexibility makes it suitable for more complex engagement tasks and interventions in computational social science.
Using our framework, we analyze the performance of LLMs in generating social engagement under different conditions, showcasing their full potential in this task. The experimental code is publicly available at https://github.com/mminici/Engagement-Driven-Content-Generation.
△ Less
Submitted 12 June, 2025; v1 submitted 20 November, 2024;
originally announced November 2024.
-
Effectful Mealy Machines
Authors:
Filippo Bonchi,
Elena Di Lavore,
Mario Román
Abstract:
Effectful Mealy machines, which we introduce, are a generalization of Mealy machines with global effects determined by an effectful triple. We provide semantics of effectful Mealy machines in terms of both bisimilarity and traces: bisimilarity is characterized syntactically, via uniform feedback; traces are constructed coinductively in terms of streams. We prove that this framework characterizes s…
▽ More
Effectful Mealy machines, which we introduce, are a generalization of Mealy machines with global effects determined by an effectful triple. We provide semantics of effectful Mealy machines in terms of both bisimilarity and traces: bisimilarity is characterized syntactically, via uniform feedback; traces are constructed coinductively in terms of streams. We prove that this framework characterizes standard causal processes and existing flavours of Mealy machine, bisimilarity, and trace equivalence. In the commutative case, we introduce a monoidal generalization of Raney's causal functions: monoidal causal processes.
△ Less
Submitted 23 December, 2025; v1 submitted 14 October, 2024;
originally announced October 2024.
-
A Diagrammatic Algebra for Program Logics
Authors:
Filippo Bonchi,
Alessandro Di Giorgio,
Elena Di Lavore
Abstract:
Tape diagrams provide a convenient notation for arrows of rig categories, i.e., categories equipped with two monoidal products, $\oplus$ and $\otimes$, where $\otimes$ distributes over $\oplus $. In this work, we extend tape diagrams with traces over $\oplus$ in order to deal with iteration in imperative programming languages. More precisely, we introduce Kleene-Cartesian bicategories, namely rig…
▽ More
Tape diagrams provide a convenient notation for arrows of rig categories, i.e., categories equipped with two monoidal products, $\oplus$ and $\otimes$, where $\otimes$ distributes over $\oplus $. In this work, we extend tape diagrams with traces over $\oplus$ in order to deal with iteration in imperative programming languages. More precisely, we introduce Kleene-Cartesian bicategories, namely rig categories where the monoidal structure provided by $\otimes$ is a cartesian bicategory, while the one provided by $\oplus$ is what we name a Kleene bicategory. We show that the associated language of tape diagrams is expressive enough to deal with imperative programs and the corresponding laws provide a proof system that is at least as powerful as the one of Hoare logic.
△ Less
Submitted 4 October, 2024;
originally announced October 2024.
-
Algorithmic Drift: A Simulation Framework to Study the Effects of Recommender Systems on User Preferences
Authors:
Erica Coppolillo,
Simone Mungari,
Ettore Ritacco,
Francesco Fabbri,
Marco Minici,
Francesco Bonchi,
Giuseppe Manco
Abstract:
Digital platforms such as social media and e-commerce websites adopt Recommender Systems to provide value to the user. However, the social consequences deriving from their adoption are still unclear. Many scholars argue that recommenders may lead to detrimental effects, such as bias-amplification deriving from the feedback loop between algorithmic suggestions and users' choices. Nonetheless, the e…
▽ More
Digital platforms such as social media and e-commerce websites adopt Recommender Systems to provide value to the user. However, the social consequences deriving from their adoption are still unclear. Many scholars argue that recommenders may lead to detrimental effects, such as bias-amplification deriving from the feedback loop between algorithmic suggestions and users' choices. Nonetheless, the extent to which recommenders influence changes in users leaning remains uncertain. In this context, it is important to provide a controlled environment for evaluating the recommendation algorithm before deployment. To address this, we propose a stochastic simulation framework that mimics user-recommender system interactions in a long-term scenario. In particular, we simulate the user choices by formalizing a user model, which comprises behavioral aspects, such as the user resistance towards the recommendation algorithm and their inertia in relying on the received suggestions. Additionally, we introduce two novel metrics for quantifying the algorithm's impact on user preferences, specifically in terms of drift over time. We conduct an extensive evaluation on multiple synthetic datasets, aiming at testing the robustness of our framework when considering different scenarios and hyper-parameters setting. The experimental results prove that the proposed methodology is effective in detecting and quantifying the drift over the users preferences by means of the simulation. All the code and data used to perform the experiments are publicly available.
△ Less
Submitted 24 September, 2024;
originally announced September 2024.
-
Link Polarity Prediction from Sparse and Noisy Labels via Multiscale Social Balance
Authors:
Marco Minici,
Federico Cinus,
Francesco Bonchi,
Giuseppe Manco
Abstract:
Signed Graph Neural Networks (SGNNs) have recently gained attention as an effective tool for several learning tasks on signed networks, i.e., graphs where edges have an associated polarity. One of these tasks is to predict the polarity of the links for which this information is missing, starting from the network structure and the other available polarities. However, when the available polarities a…
▽ More
Signed Graph Neural Networks (SGNNs) have recently gained attention as an effective tool for several learning tasks on signed networks, i.e., graphs where edges have an associated polarity. One of these tasks is to predict the polarity of the links for which this information is missing, starting from the network structure and the other available polarities. However, when the available polarities are few and potentially noisy, such a task becomes challenging.
In this work, we devise a semi-supervised learning framework that builds around the novel concept of \emph{multiscale social balance} to improve the prediction of link polarities in settings characterized by limited data quantity and quality. Our model-agnostic approach can seamlessly integrate with any SGNN architecture, dynamically reweighting the importance of each data sample while making strategic use of the structural information from unlabeled edges combined with social balance theory.
Empirical validation demonstrates that our approach outperforms established baseline models, effectively addressing the limitations imposed by noisy and sparse data. This result underlines the benefits of incorporating multiscale social balance into SGNNs, opening new avenues for robust and accurate predictions in signed network analysis.
△ Less
Submitted 22 July, 2024;
originally announced July 2024.
-
When Lawvere meets Peirce: an equational presentation of boolean hyperdoctrines
Authors:
Filippo Bonchi,
Alessandro Di Giorgio,
Davide Trotta
Abstract:
Fo-bicategories are a categorification of Peirce's calculus of relations. Notably, their laws provide a proof system for first-order logic that is both purely equational and complete. This paper illustrates a correspondence between fo-bicategories and Lawvere's hyperdoctrines. To streamline our proof, we introduce peircean bicategories, which offer a more succinct characterization of fo-bicategori…
▽ More
Fo-bicategories are a categorification of Peirce's calculus of relations. Notably, their laws provide a proof system for first-order logic that is both purely equational and complete. This paper illustrates a correspondence between fo-bicategories and Lawvere's hyperdoctrines. To streamline our proof, we introduce peircean bicategories, which offer a more succinct characterization of fo-bicategories.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Multilayer Correlation Clustering
Authors:
Atsushi Miyauchi,
Florian Adriaens,
Francesco Bonchi,
Nikolaj Tatti
Abstract:
In this paper, we establish Multilayer Correlation Clustering, a novel generalization of Correlation Clustering (Bansal et al., FOCS '02) to the multilayer setting. In this model, we are given a series of inputs of Correlation Clustering (called layers) over the common set $V$. The goal is then to find a clustering of $V$ that minimizes the $\ell_p$-norm ($p\geq 1$) of the disagreements vector, wh…
▽ More
In this paper, we establish Multilayer Correlation Clustering, a novel generalization of Correlation Clustering (Bansal et al., FOCS '02) to the multilayer setting. In this model, we are given a series of inputs of Correlation Clustering (called layers) over the common set $V$. The goal is then to find a clustering of $V$ that minimizes the $\ell_p$-norm ($p\geq 1$) of the disagreements vector, which is defined as the vector (with dimension equal to the number of layers), each element of which represents the disagreements of the clustering on the corresponding layer. For this generalization, we first design an $O(L\log n)$-approximation algorithm, where $L$ is the number of layers, based on the well-known region growing technique. We then study an important special case of our problem, namely the problem with the probability constraint. For this case, we first give an $(α+2)$-approximation algorithm, where $α$ is any possible approximation ratio for the single-layer counterpart. For instance, we can take $α=2.5$ in general (Ailon et al., JACM '08) and $α=1.73+ε$ for the unweighted case (Cohen-Addad et al., FOCS '23). Furthermore, we design a $4$-approximation algorithm, which improves the above approximation ratio of $α+2=4.5$ for the general probability-constraint case. Computational experiments using real-world datasets demonstrate the effectiveness of our proposed algorithms.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Local Centrality Minimization with Quality Guarantees
Authors:
Atsushi Miyauchi,
Lorenzo Severini,
Francesco Bonchi
Abstract:
Centrality measures, quantifying the importance of vertices or edges, play a fundamental role in network analysis. To date, triggered by some positive approximability results, a large body of work has been devoted to studying centrality maximization, where the goal is to maximize the centrality score of a target vertex by manipulating the structure of a given network. On the other hand, due to the…
▽ More
Centrality measures, quantifying the importance of vertices or edges, play a fundamental role in network analysis. To date, triggered by some positive approximability results, a large body of work has been devoted to studying centrality maximization, where the goal is to maximize the centrality score of a target vertex by manipulating the structure of a given network. On the other hand, due to the lack of such results, only very little attention has been paid to centrality minimization, despite its practical usefulness.
In this study, we introduce a novel optimization model for local centrality minimization, where the manipulation is allowed only around the target vertex. We prove the NP-hardness of our model and that the most intuitive greedy algorithm has a quite limited performance in terms of approximation ratio. Then we design two effective approximation algorithms: The first algorithm is a highly-scalable algorithm that has an approximation ratio unachievable by the greedy algorithm, while the second algorithm is a bicriteria approximation algorithm that solves a continuous relaxation based on the Lovász extension, using a projected subgradient method. To the best of our knowledge, ours are the first polynomial-time algorithms with provable approximation guarantees for centrality minimization. Experiments using a variety of real-world networks demonstrate the effectiveness of our proposed algorithms: Our first algorithm is applicable to million-scale graphs and obtains much better solutions than those of scalable baselines, while our second algorithm is rather strong against adversarial instances.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Query-Efficient Correlation Clustering with Noisy Oracle
Authors:
Yuko Kuroki,
Atsushi Miyauchi,
Francesco Bonchi,
Wei Chen
Abstract:
We study a general clustering setting in which we have $n$ elements to be clustered, and we aim to perform as few queries as possible to an oracle that returns a noisy sample of the weighted similarity between two elements. Our setting encompasses many application domains in which the similarity function is costly to compute and inherently noisy. We introduce two novel formulations of online learn…
▽ More
We study a general clustering setting in which we have $n$ elements to be clustered, and we aim to perform as few queries as possible to an oracle that returns a noisy sample of the weighted similarity between two elements. Our setting encompasses many application domains in which the similarity function is costly to compute and inherently noisy. We introduce two novel formulations of online learning problems rooted in the paradigm of Pure Exploration in Combinatorial Multi-Armed Bandits (PE-CMAB): fixed confidence and fixed budget settings. For both settings, we design algorithms that combine a sampling strategy with a classic approximation algorithm for correlation clustering and study their theoretical guarantees. Our results are the first examples of polynomial-time algorithms that work for the case of PE-CMAB in which the underlying offline optimization problem is NP-hard.
△ Less
Submitted 3 November, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Fairness in Algorithmic Recourse Through the Lens of Substantive Equality of Opportunity
Authors:
Andrew Bell,
Joao Fonseca,
Carlo Abrate,
Francesco Bonchi,
Julia Stoyanovich
Abstract:
Algorithmic recourse -- providing recommendations to those affected negatively by the outcome of an algorithmic system on how they can take action and change that outcome -- has gained attention as a means of giving persons agency in their interactions with artificial intelligence (AI) systems. Recent work has shown that even if an AI decision-making classifier is ``fair'' (according to some reaso…
▽ More
Algorithmic recourse -- providing recommendations to those affected negatively by the outcome of an algorithmic system on how they can take action and change that outcome -- has gained attention as a means of giving persons agency in their interactions with artificial intelligence (AI) systems. Recent work has shown that even if an AI decision-making classifier is ``fair'' (according to some reasonable criteria), recourse itself may be unfair due to differences in the initial circumstances of individuals, compounding disparities for marginalized populations and requiring them to exert more effort than others. There is a need to define more methods and metrics for evaluating fairness in recourse that span a range of normative views of the world, and specifically those that take into account time. Time is a critical element in recourse because the longer it takes an individual to act, the more the setting may change due to model or data drift.
This paper seeks to close this research gap by proposing two notions of fairness in recourse that are in normative alignment with substantive equality of opportunity, and that consider time. The first considers the (often repeated) effort individuals exert per successful recourse event, and the second considers time per successful recourse event. Building upon an agent-based framework for simulating recourse, this paper demonstrates how much effort is needed to overcome disparities in initial circumstances. We then proposes an intervention to improve the fairness of recourse by rewarding effort, and compare it to existing strategies.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Diagrammatic Algebra of First Order Logic
Authors:
Filippo Bonchi,
Alessandro Di Giorgio,
Nathan Haydon,
Pawel Sobocinski
Abstract:
We introduce the calculus of neo-Peircean relations, a string diagrammatic extension of the calculus of binary relations that has the same expressivity as first order logic and comes with a complete axiomatisation. The axioms are obtained by combining two well known categorical structures: cartesian and linear bicategories.
We introduce the calculus of neo-Peircean relations, a string diagrammatic extension of the calculus of binary relations that has the same expressivity as first order logic and comes with a complete axiomatisation. The axioms are obtained by combining two well known categorical structures: cartesian and linear bicategories.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
Learning Multi-Frequency Partial Correlation Graphs
Authors:
Gabriele D'Acunto,
Paolo Di Lorenzo,
Francesco Bonchi,
Stefania Sardellitti,
Sergio Barbarossa
Abstract:
Despite the large research effort devoted to learning dependencies between time series, the state of the art still faces a major limitation: existing methods learn partial correlations but fail to discriminate across distinct frequency bands. Motivated by many applications in which this differentiation is pivotal, we overcome this limitation by learning a block-sparse, frequency-dependent, partial…
▽ More
Despite the large research effort devoted to learning dependencies between time series, the state of the art still faces a major limitation: existing methods learn partial correlations but fail to discriminate across distinct frequency bands. Motivated by many applications in which this differentiation is pivotal, we overcome this limitation by learning a block-sparse, frequency-dependent, partial correlation graph, in which layers correspond to different frequency bands, and partial correlations can occur over just a few layers. To this aim, we formulate and solve two nonconvex learning problems: the first has a closed-form solution and is suitable when there is prior knowledge about the number of partial correlations; the second hinges on an iterative solution based on successive convex approximation, and is effective for the general case where no prior knowledge is available. Numerical results on synthetic data show that the proposed methods outperform the current state of the art. Finally, the analysis of financial time series confirms that partial correlations exist only within a few frequency bands, underscoring how our methods enable the gaining of valuable insights that would be undetected without discriminating along the frequency domain.
△ Less
Submitted 12 May, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Extracting the Multiscale Causal Backbone of Brain Dynamics
Authors:
Gabriele D'Acunto,
Francesco Bonchi,
Gianmarco De Francisci Morales,
Giovanni Petri
Abstract:
The bulk of the research effort on brain connectivity revolves around statistical associations among brain regions, which do not directly relate to the causal mechanisms governing brain dynamics. Here we propose the multiscale causal backbone (MCB) of brain dynamics, shared by a set of individuals across multiple temporal scales, and devise a principled methodology to extract it.
Our approach le…
▽ More
The bulk of the research effort on brain connectivity revolves around statistical associations among brain regions, which do not directly relate to the causal mechanisms governing brain dynamics. Here we propose the multiscale causal backbone (MCB) of brain dynamics, shared by a set of individuals across multiple temporal scales, and devise a principled methodology to extract it.
Our approach leverages recent advances in multiscale causal structure learning and optimizes the trade-off between the model fit and its complexity. Empirical assessment on synthetic data shows the superiority of our methodology over a baseline based on canonical functional connectivity networks. When applied to resting-state fMRI data, we find sparse MCBs for both the left and right brain hemispheres. Thanks to its multiscale nature, our approach shows that at low-frequency bands, causal dynamics are driven by brain regions associated with high-level cognitive functions; at higher frequencies instead, nodes related to sensory processing play a crucial role. Finally, our analysis of individual multiscale causal structures confirms the existence of a causal fingerprint of brain connectivity, thus supporting the existing extensive research in brain connectivity fingerprinting from a causal perspective.
△ Less
Submitted 19 March, 2024; v1 submitted 31 October, 2023;
originally announced November 2023.
-
Setting the Right Expectations: Algorithmic Recourse Over Time
Authors:
Joao Fonseca,
Andrew Bell,
Carlo Abrate,
Francesco Bonchi,
Julia Stoyanovich
Abstract:
Algorithmic systems are often called upon to assist in high-stakes decision making. In light of this, algorithmic recourse, the principle wherein individuals should be able to take action against an undesirable outcome made by an algorithmic system, is receiving growing attention. The bulk of the literature on algorithmic recourse to-date focuses primarily on how to provide recourse to a single in…
▽ More
Algorithmic systems are often called upon to assist in high-stakes decision making. In light of this, algorithmic recourse, the principle wherein individuals should be able to take action against an undesirable outcome made by an algorithmic system, is receiving growing attention. The bulk of the literature on algorithmic recourse to-date focuses primarily on how to provide recourse to a single individual, overlooking a critical element: the effects of a continuously changing context. Disregarding these effects on recourse is a significant oversight, since, in almost all cases, recourse consists of an individual making a first, unfavorable attempt, and then being given an opportunity to make one or several attempts at a later date - when the context might have changed. This can create false expectations, as initial recourse recommendations may become less reliable over time due to model drift and competition for access to the favorable outcome between individuals.
In this work we propose an agent-based simulation framework for studying the effects of a continuously changing environment on algorithmic recourse. In particular, we identify two main effects that can alter the reliability of recourse for individuals represented by the agents: (1) competition with other agents acting upon recourse, and (2) competition with new agents entering the environment. Our findings highlight that only a small set of specific parameterizations result in algorithmic recourse that is reliable for agents over time. Consequently, we argue that substantial additional work is needed to understand recourse reliability over time, and to develop recourse methods that reward agents' effort.
△ Less
Submitted 13 September, 2023;
originally announced September 2023.
-
Rebalancing Social Feed to Minimize Polarization and Disagreement
Authors:
Federico Cinus,
Aristides Gionis,
Francesco Bonchi
Abstract:
Social media have great potential for enabling public discourse on important societal issues. However, adverse effects, such as polarization and echo chambers, greatly impact the benefits of social media and call for algorithms that mitigate these effects. In this paper, we propose a novel problem formulation aimed at slightly nudging users' social feeds in order to strike a balance between releva…
▽ More
Social media have great potential for enabling public discourse on important societal issues. However, adverse effects, such as polarization and echo chambers, greatly impact the benefits of social media and call for algorithms that mitigate these effects. In this paper, we propose a novel problem formulation aimed at slightly nudging users' social feeds in order to strike a balance between relevance and diversity, thus mitigating the emergence of polarization, without lowering the quality of the feed. Our approach is based on re-weighting the relative importance of the accounts that a user follows, so as to calibrate the frequency with which the content produced by various accounts is shown to the user. We analyze the convexity properties of the problem, demonstrating the non-matrix convexity of the objective function and the convexity of the feasible set. To efficiently address the problem, we develop a scalable algorithm based on projected gradient descent. We also prove that our problem statement is a proper generalization of the undirected-case problem so that our method can also be adopted for undirected social networks. As a baseline for comparison in the undirected case, we develop a semidefinite programming approach, which provides the optimal solution. Through extensive experiments on synthetic and real-world datasets, we validate the effectiveness of our approach, which outperforms non-trivial baselines, underscoring its ability to foster healthier and more cohesive online communities.
△ Less
Submitted 28 August, 2023;
originally announced August 2023.
-
Counterfactual Explanations for Graph Classification Through the Lenses of Density
Authors:
Carlo Abrate,
Giulia Preti,
Francesco Bonchi
Abstract:
Counterfactual examples have emerged as an effective approach to produce simple and understandable post-hoc explanations. In the context of graph classification, previous work has focused on generating counterfactual explanations by manipulating the most elementary units of a graph, i.e., removing an existing edge, or adding a non-existing one. In this paper, we claim that such language of explana…
▽ More
Counterfactual examples have emerged as an effective approach to produce simple and understandable post-hoc explanations. In the context of graph classification, previous work has focused on generating counterfactual explanations by manipulating the most elementary units of a graph, i.e., removing an existing edge, or adding a non-existing one. In this paper, we claim that such language of explanation might be too fine-grained, and turn our attention to some of the main characterizing features of real-world complex networks, such as the tendency to close triangles, the existence of recurring motifs, and the organization into dense modules. We thus define a general density-based counterfactual search framework to generate instance-level counterfactual explanations for graph classifiers, which can be instantiated with different notions of dense substructures. In particular, we show two specific instantiations of this general framework: a method that searches for counterfactual graphs by opening or closing triangles, and a method driven by maximal cliques. We also discuss how the general method can be instantiated to exploit any other notion of dense substructures, including, for instance, a given taxonomy of nodes. We evaluate the effectiveness of our approaches in 7 brain network datasets and compare the counterfactual statements generated according to several widely-used metrics. Results confirm that adopting a semantic-relevant unit of change like density is essential to define versatile and interpretable counterfactual explanation methods.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Exploiting Adjoints in Property Directed Reachability Analysis
Authors:
Mayuko Kori,
Flavio Ascari,
Filippo Bonchi,
Roberto Bruni,
Roberta Gori,
Ichiro Hasuo
Abstract:
We formulate, in lattice-theoretic terms, two novel algorithms inspired by Bradley's property directed reachability algorithm. For finding safe invariants or counterexamples, the first algorithm exploits over-approximations of both forward and backward transition relations, expressed abstractly by the notion of adjoints. In the absence of adjoints, one can use the second algorithm, which exploits…
▽ More
We formulate, in lattice-theoretic terms, two novel algorithms inspired by Bradley's property directed reachability algorithm. For finding safe invariants or counterexamples, the first algorithm exploits over-approximations of both forward and backward transition relations, expressed abstractly by the notion of adjoints. In the absence of adjoints, one can use the second algorithm, which exploits lower sets and their principals. As a notable example of application, we consider quantitative reachability problems for Markov Decision Processes.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Fast and Effective GNN Training through Sequences of Random Path Graphs
Authors:
Francesco Bonchi,
Claudio Gentile,
Francesco Paolo Nerini,
André Panisson,
Fabio Vitale
Abstract:
We present GERN, a novel scalable framework for training GNNs in node classification tasks, based on effective resistance, a standard tool in spectral graph theory. Our method progressively refines the GNN weights on a sequence of random spanning trees suitably transformed into path graphs which, despite their simplicity, are shown to retain essential topological and node information of the origin…
▽ More
We present GERN, a novel scalable framework for training GNNs in node classification tasks, based on effective resistance, a standard tool in spectral graph theory. Our method progressively refines the GNN weights on a sequence of random spanning trees suitably transformed into path graphs which, despite their simplicity, are shown to retain essential topological and node information of the original input graph. The sparse nature of these path graphs substantially lightens the computational burden of GNN training. This not only enhances scalability but also improves accuracy in subsequent test phases, especially under small training set regimes, which are of great practical importance, as in many real-world scenarios labels may be hard to obtain. In these settings, our framework yields very good results as it effectively counters the training deterioration caused by overfitting when the training set is small. Our method also addresses common issues like over-squashing and over-smoothing while avoiding under-reaching phenomena.
Although our framework is flexible and can be deployed in several types of GNNs, in this paper we focus on graph convolutional networks and carry out an extensive experimental investigation on a number of real-world graph benchmarks, where we achieve simultaneous improvement of training speed and test accuracy over a wide pool of representative baselines.
△ Less
Submitted 24 February, 2025; v1 submitted 7 June, 2023;
originally announced June 2023.
-
Hyper-distance Oracles in Hypergraphs
Authors:
Giulia Preti,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: the main one is that the line graph is typically orders of…
▽ More
We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: the main one is that the line graph is typically orders of magnitude larger than the original hypergraph. We then introduce HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding constructing the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge s-distance queries for any value of s. A key observation at the basis of our framework is that, as s increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the s-connected components of the hypergraph. For this task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate HypED on several real-world hypergraphs and prove its versatility in answering s-distance queries for different values of s. Our framework allows answering such queries in fractions of a millisecond, while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the s-distance oracle in two applications, namely, hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyper-edges in the context of protein-to-protein interactions.
△ Less
Submitted 19 March, 2024; v1 submitted 5 June, 2023;
originally announced June 2023.
-
A Survey on the Densest Subgraph Problem and Its Variants
Authors:
Tommaso Lanciano,
Atsushi Miyauchi,
Adriano Fazzone,
Francesco Bonchi
Abstract:
The Densest Subgraph Problem requires to find, in a given graph, a subset of vertices whose induced subgraph maximizes a measure of density. The problem has received a great deal of attention in the algorithmic literature since the early 1970s, with many variants proposed and many applications built on top of this basic definition. Recent years have witnessed a revival of research interest in this…
▽ More
The Densest Subgraph Problem requires to find, in a given graph, a subset of vertices whose induced subgraph maximizes a measure of density. The problem has received a great deal of attention in the algorithmic literature since the early 1970s, with many variants proposed and many applications built on top of this basic definition. Recent years have witnessed a revival of research interest in this problem with several important contributions, including some groundbreaking results, published in 2022 and 2023. This survey provides a deep overview of the fundamental results and an exhaustive coverage of the many variants proposed in the literature, with a special attention to the most recent results. The survey also presents a comprehensive overview of applications and discusses some interesting open problems for this evergreen research topic.
△ Less
Submitted 18 April, 2024; v1 submitted 25 March, 2023;
originally announced March 2023.
-
Balancing Utility and Fairness in Submodular Maximization (Technical Report)
Authors:
Yanhao Wang,
Yuchen Li,
Francesco Bonchi,
Ying Wang
Abstract:
Submodular function maximization is a fundamental combinatorial optimization problem with plenty of applications -- including data summarization, influence maximization, and recommendation. In many of these problems, the goal is to find a solution that maximizes the average utility over all users, for each of whom the utility is defined by a monotone submodular function. However, when the populati…
▽ More
Submodular function maximization is a fundamental combinatorial optimization problem with plenty of applications -- including data summarization, influence maximization, and recommendation. In many of these problems, the goal is to find a solution that maximizes the average utility over all users, for each of whom the utility is defined by a monotone submodular function. However, when the population of users is composed of several demographic groups, another critical problem is whether the utility is fairly distributed across different groups. Although the \emph{utility} and \emph{fairness} objectives are both desirable, they might contradict each other, and, to the best of our knowledge, little attention has been paid to optimizing them jointly.
To fill this gap, we propose a new problem called \emph{Bicriteria Submodular Maximization} (BSM) to balance utility and fairness. Specifically, it requires finding a fixed-size solution to maximize the utility function, subject to the value of the fairness function not being below a threshold. Since BSM is inapproximable within any constant factor, we focus on designing efficient instance-dependent approximation schemes. Our algorithmic proposal comprises two methods, with different approximation factors, obtained by converting a BSM instance into other submodular optimization problem instances. Using real-world and synthetic datasets, we showcase applications of our proposed methods in three submodular maximization problems: maximum coverage, influence maximization, and facility location.
△ Less
Submitted 19 June, 2023; v1 submitted 2 November, 2022;
originally announced November 2022.
-
The language of opinion change on social media under the lens of communicative action
Authors:
Corrado Monti,
Luca Maria Aiello,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
Which messages are more effective at inducing a change of opinion in the listener? We approach this question within the frame of Habermas' theory of communicative action, which posits that the illocutionary intent of the message (its pragmatic meaning) is the key. Thanks to recent advances in natural language processing, we are able to operationalize this theory by extracting the latent social dim…
▽ More
Which messages are more effective at inducing a change of opinion in the listener? We approach this question within the frame of Habermas' theory of communicative action, which posits that the illocutionary intent of the message (its pragmatic meaning) is the key. Thanks to recent advances in natural language processing, we are able to operationalize this theory by extracting the latent social dimensions of a message, namely archetypes of social intent of language, that come from social exchange theory. We identify key ingredients to opinion change by looking at more than 46k posts and more than 3.5M comments on Reddit's r/ChangeMyView, a debate forum where people try to change each other's opinion and explicitly mark opinion-changing comments with a special flag called "delta". Comments that express no intent are about 77% less likely to change the mind of the recipient, compared to comments that convey at least one social dimension. Among the various social dimensions, the ones that are most likely to produce an opinion change are knowledge, similarity, and trust, which resonates with Habermas' theory of communicative action. We also find other new important dimensions, such as appeals to power or empathetic expressions of support. Finally, in line with theories of constructive conflict, yet contrary to the popular characterization of conflict as the bane of modern social media, our findings show that voicing conflict in the context of a structured public debate can promote integration, especially when it is used to counter another conflictive stance. By leveraging recent advances in natural language processing, our work provides an empirical framework for Habermas' theory, finds concrete examples of its effects in the wild, and suggests its possible extension with a more faceted understanding of intent interpreted as social dimensions of language.
△ Less
Submitted 31 October, 2022;
originally announced October 2022.
-
Deconstructing the Calculus of Relations with Tape Diagrams
Authors:
Filippo Bonchi,
Alessandro Di Giorgio,
Alessio Santamaria
Abstract:
Rig categories with finite biproducts are categories with two monoidal products, where one is a biproduct and the other distributes over it. In this work we present tape diagrams, a sound and complete diagrammatic language for these categories, that can be intuitively thought as string diagrams of string diagrams. We test the effectiveness of our approach against the positive fragment of Tarski's…
▽ More
Rig categories with finite biproducts are categories with two monoidal products, where one is a biproduct and the other distributes over it. In this work we present tape diagrams, a sound and complete diagrammatic language for these categories, that can be intuitively thought as string diagrams of string diagrams. We test the effectiveness of our approach against the positive fragment of Tarski's calculus of relations.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
Learning Multiscale Non-stationary Causal Structures
Authors:
Gabriele D'Acunto,
Gianmarco De Francisci Morales,
Paolo Bajardi,
Francesco Bonchi
Abstract:
This paper addresses a gap in the current state of the art by providing a solution for modeling causal relationships that evolve over time and occur at different time scales. Specifically, we introduce the multiscale non-stationary directed acyclic graph (MN-DAG), a framework for modeling multivariate time series data. Our contribution is twofold. Firstly, we expose a probabilistic generative mode…
▽ More
This paper addresses a gap in the current state of the art by providing a solution for modeling causal relationships that evolve over time and occur at different time scales. Specifically, we introduce the multiscale non-stationary directed acyclic graph (MN-DAG), a framework for modeling multivariate time series data. Our contribution is twofold. Firstly, we expose a probabilistic generative model by leveraging results from spectral and causality theories. Our model allows sampling an MN-DAG according to user-specified priors on the time-dependence and multiscale properties of the causal graph. Secondly, we devise a Bayesian method named Multiscale Non-stationary Causal Structure Learner (MN-CASTLE) that uses stochastic variational inference to estimate MN-DAGs. The method also exploits information from the local partial correlation between time series over different time resolutions. The data generated from an MN-DAG reproduces well-known features of time series in different domains, such as volatility clustering and serial correlation. Additionally, we show the superior performance of MN-CASTLE on synthetic data with different multiscale and non-stationary properties compared to baseline models. Finally, we apply MN-CASTLE to identify the drivers of the natural gas prices in the US market. Causal relationships have strengthened during the COVID-19 outbreak and the Russian invasion of Ukraine, a fact that baseline methods fail to capture. MN-CASTLE identifies the causal impact of critical economic drivers on natural gas prices, such as seasonal factors, economic uncertainty, oil prices, and gas storage deviations.
△ Less
Submitted 17 November, 2023; v1 submitted 31 August, 2022;
originally announced August 2022.
-
Cascade-based Echo Chamber Detection
Authors:
Marco Minici,
Federico Cinus,
Corrado Monti,
Francesco Bonchi,
Giuseppe Manco
Abstract:
Despite echo chambers in social media have been under considerable scrutiny, general models for their detection and analysis are missing. In this work, we aim to fill this gap by proposing a probabilistic generative model that explains social media footprints -- i.e., social network structure and propagations of information -- through a set of latent communities, characterized by a degree of echo-…
▽ More
Despite echo chambers in social media have been under considerable scrutiny, general models for their detection and analysis are missing. In this work, we aim to fill this gap by proposing a probabilistic generative model that explains social media footprints -- i.e., social network structure and propagations of information -- through a set of latent communities, characterized by a degree of echo-chamber behavior and by an opinion polarity. Specifically, echo chambers are modeled as communities that are permeable to pieces of information with similar ideological polarity, and impermeable to information of opposed leaning: this allows discriminating echo chambers from communities that lack a clear ideological alignment.
To learn the model parameters we propose a scalable, stochastic adaptation of the Generalized Expectation Maximization algorithm, that optimizes the joint likelihood of observing social connections and information propagation. Experiments on synthetic data show that our algorithm is able to correctly reconstruct ground-truth latent communities with their degree of echo-chamber behavior and opinion polarity. Experiments on real-world data about polarized social and political debates, such as the Brexit referendum or the COVID-19 vaccine campaign, confirm the effectiveness of our proposal in detecting echo chambers. Finally, we show how our model can improve accuracy in auxiliary predictive tasks, such as stance detection and prediction of future propagations.
△ Less
Submitted 9 August, 2022;
originally announced August 2022.
-
On the Relation Between Opinion Change and Information Consumption on Reddit
Authors:
Flavio Petruzzellis,
Corrado Monti,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
While much attention has been devoted to the causes of opinion change, little is known about its consequences. Our study sheds a light on the relationship between one user's opinion change episode and subsequent behavioral change on an online social media, Reddit. In particular, we look at r/ChangeMyView, an online community dedicated to debating one's own opinions. Interestingly, this forum adopt…
▽ More
While much attention has been devoted to the causes of opinion change, little is known about its consequences. Our study sheds a light on the relationship between one user's opinion change episode and subsequent behavioral change on an online social media, Reddit. In particular, we look at r/ChangeMyView, an online community dedicated to debating one's own opinions. Interestingly, this forum adopts a well-codified schema for explicitly self-reporting opinion change. Starting from this ground truth, we analyze changes in future online information consumption behavior that arise after a self-reported opinion change on sociopolitical topics; and in particular, operationalized in this work as the participation to sociopolitical subreddits. Such participation profile is important as it represents one's information diet, and is a reliable proxy for, e.g., political affiliation or health choices.
We find that people who report an opinion change are significantly more likely to change their future participation in a specific subset of online communities. We characterize which communities are more likely to be abandoned after opinion change, and find a significant association (r=0.46) between propaganda-like language used in a community and the increase in chances of leaving it. We find comparable results (r=0.39) for the opposite direction, i.e., joining a community. This finding suggests how propagandistic communities act as a first gateway to internalize a shift in one's sociopolitical opinion. Finally, we show that the textual content of the discussion associated with opinion change is indicative of which communities are going to be subject to a participation change. In fact, a predictive model based only on the opinion change post is able to pinpoint these communities with an AP@5 of 0.20, similar to what can be reached by using all the past history of participation in communities.
△ Less
Submitted 25 July, 2022;
originally announced July 2022.
-
On learning agent-based models from data
Authors:
Corrado Monti,
Marco Pangallo,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
Agent-Based Models (ABMs) are used in several fields to study the evolution of complex systems from micro-level assumptions. However, ABMs typically can not estimate agent-specific (or "micro") variables: this is a major limitation which prevents ABMs from harnessing micro-level data availability and which greatly limits their predictive power. In this paper, we propose a protocol to learn the lat…
▽ More
Agent-Based Models (ABMs) are used in several fields to study the evolution of complex systems from micro-level assumptions. However, ABMs typically can not estimate agent-specific (or "micro") variables: this is a major limitation which prevents ABMs from harnessing micro-level data availability and which greatly limits their predictive power. In this paper, we propose a protocol to learn the latent micro-variables of an ABM from data. The first step of our protocol is to reduce an ABM to a probabilistic model, characterized by a computationally tractable likelihood. This reduction follows two general design principles: balance of stochasticity and data availability, and replacement of unobservable discrete choices with differentiable approximations. Then, our protocol proceeds by maximizing the likelihood of the latent variables via a gradient-based expectation maximization algorithm. We demonstrate our protocol by applying it to an ABM of the housing market, in which agents with different incomes bid higher prices to live in high-income neighborhoods. We demonstrate that the obtained model allows accurate estimates of the latent variables, while preserving the general behavior of the ABM. We also show that our estimates can be used for out-of-sample forecasting. Our protocol can be seen as an alternative to black-box data assimilation methods, that forces the modeler to lay bare the assumptions of the model, to think about the inferential process, and to spot potential identification problems.
△ Less
Submitted 23 November, 2022; v1 submitted 10 May, 2022;
originally announced May 2022.
-
GRAPHSHAP: Explaining Identity-Aware Graph Classifiers Through the Language of Motifs
Authors:
Alan Perotti,
Paolo Bajardi,
Francesco Bonchi,
André Panisson
Abstract:
Most methods for explaining black-box classifiers (e.g. on tabular data, images, or time series) rely on measuring the impact that removing/perturbing features has on the model output. This forces the explanation language to match the classifier's feature space. However, when dealing with graph data, in which the basic features correspond to the edges describing the graph structure, this matching…
▽ More
Most methods for explaining black-box classifiers (e.g. on tabular data, images, or time series) rely on measuring the impact that removing/perturbing features has on the model output. This forces the explanation language to match the classifier's feature space. However, when dealing with graph data, in which the basic features correspond to the edges describing the graph structure, this matching between features space and explanation language might not be appropriate. Decoupling the feature space (edges) from a desired high-level explanation language (such as motifs) is thus a major challenge towards developing actionable explanations for graph classification tasks. In this paper we introduce GRAPHSHAP, a Shapley-based approach able to provide motif-based explanations for identity-aware graph classifiers, assuming no knowledge whatsoever about the model or its training data: the only requirement is that the classifier can be queried as a black-box at will. For the sake of computational efficiency we explore a progressive approximation strategy and show how a simple kernel can efficiently approximate explanation scores, thus allowing GRAPHSHAP to scale on scenarios with a large explanation space (i.e. large number of motifs). We showcase GRAPHSHAP on a real-world brain-network dataset consisting of patients affected by Autism Spectrum Disorder and a control group. Our experiments highlight how the classification provided by a black-box model can be effectively explained by few connectomics patterns.
△ Less
Submitted 7 July, 2023; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways
Authors:
Francesco Fabbri,
Yanhao Wang,
Francesco Bonchi,
Carlos Castillo,
Michael Mathioudakis
Abstract:
Recommender systems typically suggest to users content similar to what they consumed in the past. If a user happens to be exposed to strongly polarized content, she might subsequently receive recommendations which may steer her towards more and more radicalized content, eventually being trapped in what we call a "radicalization pathway". In this paper, we study the problem of mitigating radicaliza…
▽ More
Recommender systems typically suggest to users content similar to what they consumed in the past. If a user happens to be exposed to strongly polarized content, she might subsequently receive recommendations which may steer her towards more and more radicalized content, eventually being trapped in what we call a "radicalization pathway". In this paper, we study the problem of mitigating radicalization pathways using a graph-based approach. Specifically, we model the set of recommendations of a "what-to-watch-next" recommender as a d-regular directed graph where nodes correspond to content items, links to recommendations, and paths to possible user sessions. We measure the "segregation" score of a node representing radicalized content as the expected length of a random walk from that node to any node representing non-radicalized content. High segregation scores are associated to larger chances to get users trapped in radicalization pathways. Hence, we define the problem of reducing the prevalence of radicalization pathways by selecting a small number of edges to "rewire", so to minimize the maximum of segregation scores among all radicalized nodes, while maintaining the relevance of the recommendations. We prove that the problem of finding the optimal set of recommendations to rewire is NP-hard and NP-hard to approximate within any factor. Therefore, we turn our attention to heuristics, and propose an efficient yet effective greedy algorithm based on the absorbing random walk theory. Our experiments on real-world datasets in the context of video and news recommendations confirm the effectiveness of our proposal.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
FreSCo: Mining Frequent Patterns in Simplicial Complexes
Authors:
Giulia Preti,
Gianmarco De Francisci Morales,
Francesco Bonchi
Abstract:
Simplicial complexes are a generalization of graphs that model higher-order relations. In this paper, we introduce simplicial patterns -- that we call simplets -- and generalize the task of frequent pattern mining from the realm of graphs to that of simplicial complexes. Our task is particularly challenging due to the enormous search space and the need for higher-order isomorphism. We show that fi…
▽ More
Simplicial complexes are a generalization of graphs that model higher-order relations. In this paper, we introduce simplicial patterns -- that we call simplets -- and generalize the task of frequent pattern mining from the realm of graphs to that of simplicial complexes. Our task is particularly challenging due to the enormous search space and the need for higher-order isomorphism. We show that finding the occurrences of simplets in a complex can be reduced to a bipartite graph isomorphism problem, in linear time and at most quadratic space. We then propose an anti-monotonic frequency measure that allows us to start the exploration from small simplets and stop expanding a simplet as soon as its frequency falls below the minimum frequency threshold. Equipped with these ideas and a clever data structure, we develop a memory-conscious algorithm that, by carefully exploiting the relationships among the simplices in the complex and among the simplets, achieves efficiency and scalability for our complex mining task. Our algorithm, FreSCo, comes in two flavors: it can compute the exact frequency of the simplets or, more quickly, it can determine whether a simplet is frequent, without having to compute the exact frequency. Experimental results prove the ability of FreSCo to mine frequent simplets in complexes of various size and dimension, and the significance of the simplets with respect to the traditional graph patterns.
△ Less
Submitted 26 January, 2022; v1 submitted 20 January, 2022;
originally announced January 2022.
-
Multi-relation Graph Summarization
Authors:
Xiangyu Ke,
Arijit Khan,
Francesco Bonchi
Abstract:
Graph summarization is beneficial in a wide range of applications, such as visualization, interactive and exploratory analysis, approximate query processing, reducing the on-disk storage footprint, and graph processing in modern hardware. However, the bulk of the literature on graph summarization surprisingly overlooks the possibility of having edges of different types. In this paper, we study the…
▽ More
Graph summarization is beneficial in a wide range of applications, such as visualization, interactive and exploratory analysis, approximate query processing, reducing the on-disk storage footprint, and graph processing in modern hardware. However, the bulk of the literature on graph summarization surprisingly overlooks the possibility of having edges of different types. In this paper, we study the novel problem of producing summaries of multi-relation networks, i.e., graphs where multiple edges of different types may exist between any pair of nodes. Multi-relation graphs are an expressive model of real-world activities, in which a relation can be a topic in social networks, an interaction type in genetic networks, or a snapshot in temporal graphs. The first approach that we consider for multi-relation graph summarization is a two-step method based on summarizing each relation in isolation, and then aggregating the resulting summaries in some clever way to produce a final unique summary. In doing this, as a side contribution, we provide the first polynomial-time approximation algorithm based on the k-Median clustering for the classic problem of lossless single-relation graph summarization. Then, we demonstrate the shortcomings of these two-step methods, and propose holistic approaches, both approximate and heuristic algorithms, to compute a summary directly for multi-relation graphs. In particular, we prove that the approximation bound of k-Median clustering for the single relation solution can be maintained in a multi-relation graph with proper aggregation operation over adjacency matrices corresponding to its multiple relations. Experimental results and case studies (on co-authorship networks and brain networks) validate the effectiveness and efficiency of the proposed algorithms.
△ Less
Submitted 24 December, 2021;
originally announced December 2021.
-
Exposure Inequality in People Recommender Systems: The Long-Term Effects
Authors:
Francesco Fabbri,
Maria Luisa Croci,
Francesco Bonchi,
Carlos Castillo
Abstract:
People recommender systems may affect the exposure that users receive in social networking platforms, influencing attention dynamics and potentially strengthening pre-existing inequalities that disproportionately affect certain groups.
In this paper we introduce a model to simulate the feedback loop created by multiple rounds of interactions between users and a link recommender in a social netwo…
▽ More
People recommender systems may affect the exposure that users receive in social networking platforms, influencing attention dynamics and potentially strengthening pre-existing inequalities that disproportionately affect certain groups.
In this paper we introduce a model to simulate the feedback loop created by multiple rounds of interactions between users and a link recommender in a social network. This allows us to study the long-term consequences of those particular recommendation algorithms. Our model is equipped with several parameters to control (i) the level of homophily in the network, (ii) the relative size of the groups, (iii) the choice among several state-of-the-art link recommenders, and (iv) the choice among three different user behavior models, that decide which recommendations are accepted or rejected.
Our extensive experimentation with the proposed model shows that a minority group, if homophilic enough, can get a disproportionate advantage in exposure from all link recommenders. Instead, when it is heterophilic, it gets under-exposed. Moreover, while the homophily level of the minority affects the speed of the growth of the disparate exposure, the relative size of the minority affects the magnitude of the effect. Finally, link recommenders strengthen exposure inequalities at the individual level, exacerbating the "rich-get-richer" effect: this happens for both the minority and the majority class and independently of their level of homophily.
△ Less
Submitted 15 December, 2021;
originally announced December 2021.
-
Dense and well-connected subgraph detection in dual networks
Authors:
Tianyi Chen,
Francesco Bonchi,
David Garcia-Soriano,
Atsushi Miyauchi,
Charalampos E. Tsourakakis
Abstract:
Dense subgraph discovery is a fundamental problem in graph mining with a wide range of applications \cite{gionis2015dense}. Despite a large number of applications ranging from computational neuroscience to social network analysis, that take as input a {\em dual} graph, namely a pair of graphs on the same set of nodes, dense subgraph discovery methods focus on a single graph input with few notable…
▽ More
Dense subgraph discovery is a fundamental problem in graph mining with a wide range of applications \cite{gionis2015dense}. Despite a large number of applications ranging from computational neuroscience to social network analysis, that take as input a {\em dual} graph, namely a pair of graphs on the same set of nodes, dense subgraph discovery methods focus on a single graph input with few notable exceptions \cite{semertzidis2019finding,charikar2018finding,reinthal2016finding,jethava2015finding}. In this work, we focus the following problem: given a pair of graphs $G,H$ on the same set of nodes $V$, how do we find a subset of nodes $S \subseteq V$ that induces a well-connected subgraph in $G$ and a dense subgraph in $H$?
Our formulation generalizes previous research on dual graphs \cite{Wu+15,WuZLFJZ16,Cui2018}, by enabling the {\em control} of the connectivity constraint on $G$. We propose a novel mathematical formulation based on $k$-edge connectivity, and prove that it is solvable exactly in polynomial time. We compare our method to state-of-the-art competitors; we find empirically that ranging the connectivity constraint enables the practitioner to obtain insightful information that is otherwise inaccessible. Finally, we show that our proposed mining tool can be used to better understand how users interact on Twitter, and connectivity aspects of human brain networks with and without Autism Spectrum Disorder (ASD).
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
The Effect of People Recommenders on Echo Chambers and Polarization
Authors:
Federico Cinus,
Marco Minici,
Corrado Monti,
Francesco Bonchi
Abstract:
The effects of social media on critical issues, such as polarization and misinformation, are under scrutiny due to the disruptive consequences that these phenomena can have on our societies. Among the algorithms routinely used by social media platforms, people-recommender systems are of special interest, as they directly contribute to the evolution of the social network structure, affecting the in…
▽ More
The effects of social media on critical issues, such as polarization and misinformation, are under scrutiny due to the disruptive consequences that these phenomena can have on our societies. Among the algorithms routinely used by social media platforms, people-recommender systems are of special interest, as they directly contribute to the evolution of the social network structure, affecting the information and the opinions users are exposed to.
In this paper, we propose a framework to assess the effect of people recommenders on the evolution of opinions. Our proposal is based on Monte Carlo simulations combining link recommendation and opinion-dynamics models. In order to control initial conditions, we define a random network model to generate graphs with opinions, with tunable amounts of modularity and homophily. We join these elements into a methodology to study the effects of the recommender system on echo chambers and polarization. We also show how to use our framework to measure, by means of simulations, the impact of different intervention strategies.
Our thorough experimentation shows that people recommenders can in fact lead to a significant increase in echo chambers. However, this happens only if there is considerable initial homophily in the network. Also, we find that if the network already contains echo chambers, the effect of the recommendation algorithm is negligible. Such findings are robust to two very different opinion dynamics models, a bounded confidence model and an epistemological model.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
The Evolving Causal Structure of Equity Risk Factors
Authors:
Gabriele D'Acunto,
Paolo Bajardi,
Francesco Bonchi,
Gianmarco De Francisci Morales
Abstract:
In recent years, multi-factor strategies have gained increasing popularity in the financial industry, as they allow investors to have a better understanding of the risk drivers underlying their portfolios. Moreover, such strategies promise to promote diversification and thus limit losses in times of financial turmoil. However, recent studies have reported a significant level of redundancy between…
▽ More
In recent years, multi-factor strategies have gained increasing popularity in the financial industry, as they allow investors to have a better understanding of the risk drivers underlying their portfolios. Moreover, such strategies promise to promote diversification and thus limit losses in times of financial turmoil. However, recent studies have reported a significant level of redundancy between these factors, which might enhance risk contagion among multi-factor portfolios during financial crises. Therefore, it is of fundamental importance to better understand the relationships among factors.
Empowered by recent advances in causal structure learning methods, this paper presents a study of the causal structure of financial risk factors and its evolution over time. In particular, the data we analyze covers 11 risk factors concerning the US equity market, spanning a period of 29 years at daily frequency.
Our results show a statistically significant sparsifying trend of the underlying causal structure. However, this trend breaks down during periods of financial stress, in which we can observe a densification of the causal network driven by a growth of the out-degree of the market factor node. Finally, we present a comparison with the analysis of factors cross-correlations, which further confirms the importance of causal analysis for gaining deeper insights in the dynamics of the factor system, particularly during economic downturns.
Our findings are especially significant from a risk-management perspective. They link the evolution of the causal structure of equity risk factors with market volatility and a worsening macroeconomic environment, and show that, in times of financial crisis, exposure to different factors boils down to exposure to the market risk factor.
△ Less
Submitted 9 November, 2021;
originally announced November 2021.
-
Learning Ideological Embeddings from Information Cascades
Authors:
Corrado Monti,
Giuseppe Manco,
Cigdem Aslay,
Francesco Bonchi
Abstract:
Modeling information cascades in a social network through the lenses of the ideological leaning of its users can help understanding phenomena such as misinformation propagation and confirmation bias, and devising techniques for mitigating their toxic effects.
In this paper we propose a stochastic model to learn the ideological leaning of each user in a multidimensional ideological space, by anal…
▽ More
Modeling information cascades in a social network through the lenses of the ideological leaning of its users can help understanding phenomena such as misinformation propagation and confirmation bias, and devising techniques for mitigating their toxic effects.
In this paper we propose a stochastic model to learn the ideological leaning of each user in a multidimensional ideological space, by analyzing the way politically salient content propagates. In particular, our model assumes that information propagates from one user to another if both users are interested in the topic and ideologically aligned with each other. To infer the parameters of our model, we devise a gradient-based optimization procedure maximizing the likelihood of an observed set of information cascades. Our experiments on real-world political discussions on Twitter and Reddit confirm that our model is able to learn the political stance of the social media users in a multidimensional ideological space.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
String Diagram Rewrite Theory III: Confluence with and without Frobenius
Authors:
Filippo Bonchi,
Fabio Gadducci,
Aleks Kissinger,
Paweł Sobociński,
Fabio Zanasi
Abstract:
In this paper we address the problem of proving confluence for string diagram rewriting, which was previously shown to be characterised combinatorically as double-pushout rewriting with interfaces (DPOI) on (labelled) hypergraphs. For standard DPO rewriting without interfaces, confluence for terminating rewrite systems is, in general, undecidable. Nevertheless, we show here that confluence for DPO…
▽ More
In this paper we address the problem of proving confluence for string diagram rewriting, which was previously shown to be characterised combinatorically as double-pushout rewriting with interfaces (DPOI) on (labelled) hypergraphs. For standard DPO rewriting without interfaces, confluence for terminating rewrite systems is, in general, undecidable. Nevertheless, we show here that confluence for DPOI, and hence string diagram rewriting, is decidable. We apply this result to give effective procedures for deciding local confluence of symmetric monoidal theories with and without Frobenius structure by critical pair analysis. For the latter, we introduce the new notion of path joinability for critical pairs, which enables finitely many joins of a critical pair to be lifted to an arbitrary context in spite of the strong non-local constraints placed on rewriting in a generic symmetric monoidal theory.
△ Less
Submitted 18 April, 2022; v1 submitted 13 September, 2021;
originally announced September 2021.