-
UniqueRank: Identifying Important and Difficult-to-Replace Nodes in Attributed Graphs
Authors:
Erica Cai,
Benjamin A. Miller,
Olga Simek,
Christopher L. Smith
Abstract:
Node-ranking methods that focus on structural importance are widely used in a variety of applications, from ranking webpages in search engines to identifying key molecules in biomolecular networks. In real social, supply chain, and terrorist networks, one definition of importance considers the impact on information flow or network productivity when a given node is removed. In practice, however, a…
▽ More
Node-ranking methods that focus on structural importance are widely used in a variety of applications, from ranking webpages in search engines to identifying key molecules in biomolecular networks. In real social, supply chain, and terrorist networks, one definition of importance considers the impact on information flow or network productivity when a given node is removed. In practice, however, a nearby node may be able to replace another node upon removal, allowing the network to continue functioning as before. This replaceability is an aspect that existing ranking methods do not consider. To address this, we introduce UniqueRank, a Markov-Chain-based approach that captures attribute uniqueness in addition to structural importance, making top-ranked nodes harder to replace. We find that UniqueRank identifies important nodes with dissimilar attributes from its neighbors in simple symmetric networks with known ground truth. Further, on real terrorist, social, and supply chain networks, we demonstrate that removing and attempting to replace top UniqueRank nodes often yields larger efficiency reductions than removing and attempting to replace top nodes ranked by competing methods. Finally, we show UniqueRank's versatility by demonstrating its potential to identify structurally critical atoms with unique chemical environments in biomolecular structures.
△ Less
Submitted 21 October, 2025;
originally announced October 2025.
-
From low resource information extraction to identifying influential nodes in knowledge graphs
Authors:
Erica Cai,
Olga Simek,
Benjamin A. Miller,
Danielle Sullivan-Pao,
Evan Young,
Christopher L. Smith
Abstract:
We propose a pipeline for identifying important entities from intelligence reports that constructs a knowledge graph, where nodes correspond to entities of fine-grained types (e.g. traffickers) extracted from the text and edges correspond to extracted relations between entities (e.g. cartel membership). The important entities in intelligence reports then map to central nodes in the knowledge graph…
▽ More
We propose a pipeline for identifying important entities from intelligence reports that constructs a knowledge graph, where nodes correspond to entities of fine-grained types (e.g. traffickers) extracted from the text and edges correspond to extracted relations between entities (e.g. cartel membership). The important entities in intelligence reports then map to central nodes in the knowledge graph. We introduce a novel method that extracts fine-grained entities in a few-shot setting (few labeled examples), given limited resources available to label the frequently changing entity types that intelligence analysts are interested in. It outperforms other state-of-the-art methods. Next, we identify challenges facing previous evaluations of zero-shot (no labeled examples) methods for extracting relations, affecting the step of populating edges. Finally, we explore the utility of the pipeline: given the goal of identifying important entities, we evaluate the impact of relation extraction errors on the identification of central nodes in several real and synthetic networks. The impact of these errors varies significantly by graph topology, suggesting that confidence in measurements based on automatically extracted relations should depend on observed network features.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
GRASP: Accelerating Shortest Path Attacks via Graph Attention
Authors:
Zohair Shafi,
Benjamin A. Miller,
Ayan Chatterjee,
Tina Eliassi-Rad,
Rajmonda S. Caceres
Abstract:
Recent advances in machine learning (ML) have shown promise in aiding and accelerating classical combinatorial optimization algorithms. ML-based speed ups that aim to learn in an end to end manner (i.e., directly output the solution) tend to trade off run time with solution quality. Therefore, solutions that are able to accelerate existing solvers while maintaining their performance guarantees, ar…
▽ More
Recent advances in machine learning (ML) have shown promise in aiding and accelerating classical combinatorial optimization algorithms. ML-based speed ups that aim to learn in an end to end manner (i.e., directly output the solution) tend to trade off run time with solution quality. Therefore, solutions that are able to accelerate existing solvers while maintaining their performance guarantees, are of great interest. We consider an APX-hard problem, where an adversary aims to attack shortest paths in a graph by removing the minimum number of edges. We propose the GRASP algorithm: Graph Attention Accelerated Shortest Path Attack, an ML aided optimization algorithm that achieves run times up to 10x faster, while maintaining the quality of solution generated. GRASP uses a graph attention network to identify a smaller subgraph containing the combinatorial solution, thus effectively reducing the input problem size. Additionally, we demonstrate how careful representation of the input graph, including node features that correlate well with the optimization task, can highlight important structure in the optimization solution.
△ Less
Submitted 23 October, 2023; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Graph-SCP: Accelerating Set Cover Problems with Graph Neural Networks
Authors:
Zohair Shafi,
Benjamin A. Miller,
Tina Eliassi-Rad,
Rajmonda S. Caceres
Abstract:
Machine learning (ML) approaches are increasingly being used to accelerate combinatorial optimization (CO) problems. We investigate the Set Cover Problem (SCP) and propose Graph-SCP, a graph neural network method that augments existing optimization solvers by learning to identify a smaller sub-problem that contains the solution space. Graph-SCP uses both supervised learning from prior solved insta…
▽ More
Machine learning (ML) approaches are increasingly being used to accelerate combinatorial optimization (CO) problems. We investigate the Set Cover Problem (SCP) and propose Graph-SCP, a graph neural network method that augments existing optimization solvers by learning to identify a smaller sub-problem that contains the solution space. Graph-SCP uses both supervised learning from prior solved instances and unsupervised learning to minimize the SCP objective. We evaluate the performance of Graph-SCP on synthetically weighted and unweighted SCP instances with diverse problem characteristics and complexities, and on instances from the OR Library, a canonical benchmark for SCP. We show that Graph-SCP reduces the problem size by 60-80% and achieves runtime speedups of up to 10x on average when compared to Gurobi (a state-of-the-art commercial solver), while maintaining solution quality. This is in contrast to fast greedy solutions that significantly compromise solution quality to achieve guaranteed polynomial runtime. We showcase Graph-SCP's ability to generalize to larger problem sizes, training on SCP instances with up to 3,000 subsets and testing on SCP instances with up to 10,000 subsets.
△ Less
Submitted 9 October, 2025; v1 submitted 11 October, 2023;
originally announced October 2023.
-
Complex Network Effects on the Robustness of Graph Convolutional Networks
Authors:
Benjamin A. Miller,
Kevin Chan,
Tina Eliassi-Rad
Abstract:
Vertex classification -- the problem of identifying the class labels of nodes in a graph -- has applicability in a wide variety of domains. Examples include classifying subject areas of papers in citation networks or roles of machines in a computer network. Vertex classification using graph convolutional networks is susceptible to targeted poisoning attacks, in which both graph structure and node…
▽ More
Vertex classification -- the problem of identifying the class labels of nodes in a graph -- has applicability in a wide variety of domains. Examples include classifying subject areas of papers in citation networks or roles of machines in a computer network. Vertex classification using graph convolutional networks is susceptible to targeted poisoning attacks, in which both graph structure and node attributes can be changed in an attempt to misclassify a target node. This vulnerability decreases users' confidence in the learning method and can prevent adoption in high-stakes contexts. Defenses have also been proposed, focused on filtering edges before creating the model or aggregating information from neighbors more robustly. This paper considers an alternative: we leverage network characteristics in the training data selection process to improve robustness of vertex classifiers.
We propose two alternative methods of selecting training data: (1) to select the highest-degree nodes and (2) to iteratively select the node with the most neighbors minimally connected to the training set. In the datasets on which the original attack was demonstrated, we show that changing the training set can make the network much harder to attack. To maintain a given probability of attack success, the adversary must use far more perturbations; often a factor of 2--4 over the random training baseline. These training set selection methods often work in conjunction with the best recently published defenses to provide even greater robustness. While increasing the amount of randomly selected training data sometimes results in a more robust classifier, the proposed methods increase robustness substantially more. We also run a simulation study in which we demonstrate conditions under which each of the two methods outperforms the other, controlling for the graph topology, homophily of the labels, and node attributes.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Using Overlapping Methods to Counter Adversaries in Community Detection
Authors:
Benjamin A. Miller,
Kevin Chan,
Tina Eliassi-Rad
Abstract:
When dealing with large graphs, community detection is a useful data triage tool that can identify subsets of the network that a data analyst should investigate. In an adversarial scenario, the graph may be manipulated to avoid scrutiny of certain nodes by the analyst. Robustness to such behavior is an important consideration for data analysts in high-stakes scenarios such as cyber defense and cou…
▽ More
When dealing with large graphs, community detection is a useful data triage tool that can identify subsets of the network that a data analyst should investigate. In an adversarial scenario, the graph may be manipulated to avoid scrutiny of certain nodes by the analyst. Robustness to such behavior is an important consideration for data analysts in high-stakes scenarios such as cyber defense and counterterrorism. In this paper, we evaluate the use of overlapping community detection methods in the presence of adversarial attacks aimed at lowering the priority of a specific vertex. We formulate the data analyst's choice as a Stackelberg game in which the analyst chooses a community detection method and the attacker chooses an attack strategy in response. Applying various attacks from the literature to seven real network datasets, we find that, when the attacker has a sufficient budget, overlapping community detection methods outperform non-overlapping methods, often overwhelmingly so. This is the case when the attacker can only add edges that connect to the target and when the capability is added to add edges between neighbors of the target. We also analyze the tradeoff between robustness in the presence of an attack and performance when there is no attack. Our extensible analytic framework enables network data analysts to take these considerations into account and incorporate new attacks and community detection methods as they are developed.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
Defense Against Shortest Path Attacks
Authors:
Benjamin A. Miller,
Zohair Shafi,
Wheeler Ruml,
Yevgeniy Vorobeychik,
Tina Eliassi-Rad,
Scott Alfeld
Abstract:
Identifying shortest paths between nodes in a network is an important task in many applications. Recent work has shown that a malicious actor can manipulate a graph to make traffic between two nodes of interest follow their target path. In this paper, we develop a defense against such attacks by modifying the edge weights that users observe. The defender must balance inhibiting the attacker agains…
▽ More
Identifying shortest paths between nodes in a network is an important task in many applications. Recent work has shown that a malicious actor can manipulate a graph to make traffic between two nodes of interest follow their target path. In this paper, we develop a defense against such attacks by modifying the edge weights that users observe. The defender must balance inhibiting the attacker against any negative effects on benign users. Specifically, the defender's goals are: (a) recommend the shortest paths to users, (b) make the lengths of the shortest paths in the published graph close to those of the same paths in the true graph, and (c) minimize the probability of an attack. We formulate the defense as a Stackelberg game in which the defender is the leader and the attacker is the follower. We also consider a zero-sum version of the game in which the defender's goal is to minimize cost while achieving the minimum possible attack probability. We show that the defense problem is NP-hard and propose heuristic solutions for both the zero-sum and non-zero-sum settings. By relaxing some constraints of the original problem, we formulate a linear program for local optimization around a feasible point. We present defense results with both synthetic and real networks and show that our methods often reach the lower bound of the defender's cost.
△ Less
Submitted 30 April, 2025; v1 submitted 30 May, 2023;
originally announced May 2023.
-
Attacking Shortest Paths by Cutting Edges
Authors:
Benjamin A. Miller,
Zohair Shafi,
Wheeler Ruml,
Yevgeniy Vorobeychik,
Tina Eliassi-Rad,
Scott Alfeld
Abstract:
Identifying shortest paths between nodes in a network is a common graph analysis problem that is important for many applications involving routing of resources. An adversary that can manipulate the graph structure could alter traffic patterns to gain some benefit (e.g., make more money by directing traffic to a toll road). This paper presents the Force Path Cut problem, in which an adversary remov…
▽ More
Identifying shortest paths between nodes in a network is a common graph analysis problem that is important for many applications involving routing of resources. An adversary that can manipulate the graph structure could alter traffic patterns to gain some benefit (e.g., make more money by directing traffic to a toll road). This paper presents the Force Path Cut problem, in which an adversary removes edges from a graph to make a particular path the shortest between its terminal nodes. We prove that this problem is APX-hard, but introduce PATHATTACK, a polynomial-time approximation algorithm that guarantees a solution within a logarithmic factor of the optimal value. In addition, we introduce the Force Edge Cut and Force Node Cut problems, in which the adversary targets a particular edge or node, respectively, rather than an entire path. We derive a nonconvex optimization formulation for these problems, and derive a heuristic algorithm that uses PATHATTACK as a subroutine. We demonstrate all of these algorithms on a diverse set of real and synthetic networks, illustrating the network types that benefit most from the proposed algorithms.
△ Less
Submitted 20 November, 2022;
originally announced November 2022.
-
Optimal Edge Weight Perturbations to Attack Shortest Paths
Authors:
Benjamin A. Miller,
Zohair Shafi,
Wheeler Ruml,
Yevgeniy Vorobeychik,
Tina Eliassi-Rad,
Scott Alfeld
Abstract:
Finding shortest paths in a given network (e.g., a computer network or a road network) is a well-studied task with many applications. We consider this task under the presence of an adversary, who can manipulate the network by perturbing its edge weights to gain an advantage over others. Specifically, we introduce the Force Path Problem as follows. Given a network, the adversary's goal is to make a…
▽ More
Finding shortest paths in a given network (e.g., a computer network or a road network) is a well-studied task with many applications. We consider this task under the presence of an adversary, who can manipulate the network by perturbing its edge weights to gain an advantage over others. Specifically, we introduce the Force Path Problem as follows. Given a network, the adversary's goal is to make a specific path the shortest by adding weights to edges in the network. The version of this problem in which the adversary can cut edges is NP-complete. However, we show that Force Path can be solved to within arbitrary numerical precision in polynomial time. We propose the PATHPERTURB algorithm, which uses constraint generation to build a set of constraints that require paths other than the adversary's target to be sufficiently long. Across a highly varied set of synthetic and real networks, we show that the optimal solution often reduces the required perturbation budget by about half when compared to a greedy baseline method.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
PATHATTACK: Attacking Shortest Paths in Complex Networks
Authors:
Benjamin A. Miller,
Zohair Shafi,
Wheeler Ruml,
Yevgeniy Vorobeychik,
Tina Eliassi-Rad,
Scott Alfeld
Abstract:
Shortest paths in complex networks play key roles in many applications. Examples include routing packets in a computer network, routing traffic on a transportation network, and inferring semantic distances between concepts on the World Wide Web. An adversary with the capability to perturb the graph might make the shortest path between two nodes route traffic through advantageous portions of the gr…
▽ More
Shortest paths in complex networks play key roles in many applications. Examples include routing packets in a computer network, routing traffic on a transportation network, and inferring semantic distances between concepts on the World Wide Web. An adversary with the capability to perturb the graph might make the shortest path between two nodes route traffic through advantageous portions of the graph (e.g., a toll road he owns). In this paper, we introduce the Force Path Cut problem, in which there is a specific route the adversary wants to promote by removing a minimum number of edges in the graph. We show that Force Path Cut is NP-complete, but also that it can be recast as an instance of the Weighted Set Cover problem, enabling the use of approximation algorithms. The size of the universe for the set cover problem is potentially factorial in the number of nodes. To overcome this hurdle, we propose the PATHATTACK algorithm, which via constraint generation considers only a small subset of paths -- at most 5% of the number of edges in 99% of our experiments. Across a diverse set of synthetic and real networks, the linear programming formulation of Weighted Set Cover yields the optimal solution in over 98% of cases. We also demonstrate a time/cost tradeoff using two approximation algorithms and greedy baseline methods. This work provides a foundation for addressing similar problems and expands the area of adversarial graph mining beyond recent work on node classification and embedding.
△ Less
Submitted 8 April, 2021;
originally announced April 2021.
-
Topological Effects on Attacks Against Vertex Classification
Authors:
Benjamin A. Miller,
Mustafa Çamurcu,
Alexander J. Gomez,
Kevin Chan,
Tina Eliassi-Rad
Abstract:
Vertex classification is vulnerable to perturbations of both graph topology and vertex attributes, as shown in recent research. As in other machine learning domains, concerns about robustness to adversarial manipulation can prevent potential users from adopting proposed methods when the consequence of action is very high. This paper considers two topological characteristics of graphs and explores…
▽ More
Vertex classification is vulnerable to perturbations of both graph topology and vertex attributes, as shown in recent research. As in other machine learning domains, concerns about robustness to adversarial manipulation can prevent potential users from adopting proposed methods when the consequence of action is very high. This paper considers two topological characteristics of graphs and explores the way these features affect the amount the adversary must perturb the graph in order to be successful. We show that, if certain vertices are included in the training set, it is possible to substantially an adversary's required perturbation budget. On four citation datasets, we demonstrate that if the training set includes high degree vertices or vertices that ensure all unlabeled nodes have neighbors in the training set, we show that the adversary's budget often increases by a substantial factor---often a factor of 2 or more---over random training for the Nettack poisoning attack. Even for especially easy targets (those that are misclassified after just one or two perturbations), the degradation of performance is much slower, assigning much lower probabilities to the incorrect classes. In addition, we demonstrate that this robustness either persists when recently proposed defenses are applied, or is competitive with the resulting performance improvement for the defender.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.
-
Model Selection Framework for Graph-based data
Authors:
Rajmonda S. Caceres,
Leah Weiner,
Matthew C. Schmidt,
Benjamin A. Miller,
William M. Campbell
Abstract:
Graphs are powerful abstractions for capturing complex relationships in diverse application settings. An active area of research focuses on theoretical models that define the generative mechanism of a graph. Yet given the complexity and inherent noise in real datasets, it is still very challenging to identify the best model for a given observed graph. We discuss a framework for graph model selecti…
▽ More
Graphs are powerful abstractions for capturing complex relationships in diverse application settings. An active area of research focuses on theoretical models that define the generative mechanism of a graph. Yet given the complexity and inherent noise in real datasets, it is still very challenging to identify the best model for a given observed graph. We discuss a framework for graph model selection that leverages a long list of graph topological properties and a random forest classifier to learn and classify different graph instances. We fully characterize the discriminative power of our approach as we sweep through the parameter space of two generative models, the Erdos-Renyi and the stochastic block model. We show that our approach gets very close to known theoretical bounds and we provide insight on which topological features play a critical discriminating role.
△ Less
Submitted 15 September, 2016;
originally announced September 2016.
-
Spectral Anomaly Detection in Very Large Graphs: Models, Noise, and Computational Complexity
Authors:
Benjamin A. Miller,
Nicholas Arcolano,
Michael M. Wolf,
Nadya T. Bliss
Abstract:
Anomaly detection in massive networks has numerous theoretical and computational challenges, especially as the behavior to be detected becomes small in comparison to the larger network. This presentation focuses on recent results in three key technical areas, specifically geared toward spectral methods for detection. We first discuss recent models for network behavior, and how their structure can…
▽ More
Anomaly detection in massive networks has numerous theoretical and computational challenges, especially as the behavior to be detected becomes small in comparison to the larger network. This presentation focuses on recent results in three key technical areas, specifically geared toward spectral methods for detection. We first discuss recent models for network behavior, and how their structure can be exploited for efficient computation of the principal eigenspace of the graph. In addition to the stochasticity of background activity, a graph of interest may be observed through a noisy or imperfect mechanism, which may hinder the detection process. A few simple noise models are discussed, and we demonstrate the ability to fuse multiple corrupted observations and recover detection performance. Finally, we discuss the challenges in scaling the spectral algorithms to large-scale high-performance computing systems, and present preliminary recommendations to achieve good performance with current parallel eigensolvers.
△ Less
Submitted 14 December, 2014;
originally announced December 2014.
-
A Spectral Framework for Anomalous Subgraph Detection
Authors:
Benjamin A. Miller,
Michelle S. Beard,
Patrick J. Wolfe,
Nadya T. Bliss
Abstract:
A wide variety of application domains are concerned with data consisting of entities and their relationships or connections, formally represented as graphs. Within these diverse application areas, a common problem of interest is the detection of a subset of entities whose connectivity is anomalous with respect to the rest of the data. While the detection of such anomalous subgraphs has received a…
▽ More
A wide variety of application domains are concerned with data consisting of entities and their relationships or connections, formally represented as graphs. Within these diverse application areas, a common problem of interest is the detection of a subset of entities whose connectivity is anomalous with respect to the rest of the data. While the detection of such anomalous subgraphs has received a substantial amount of attention, no application-agnostic framework exists for analysis of signal detectability in graph-based data. In this paper, we describe a framework that enables such analysis using the principal eigenspace of a graph's residuals matrix, commonly called the modularity matrix in community detection. Leveraging this analytical tool, we show that the framework has a natural power metric in the spectral norm of the anomalous subgraph's adjacency matrix (signal power) and of the background graph's residuals matrix (noise power). We propose several algorithms based on spectral properties of the residuals matrix, with more computationally expensive techniques providing greater detection power. Detection and identification performance are presented for a number of signal and noise models, including clusters and bipartite foregrounds embedded into simple random backgrounds as well as graphs with community structure and realistic degree distributions. The trends observed verify intuition gleaned from other signal processing areas, such as greater detection power when the signal is embedded within a less active portion of the background. We demonstrate the utility of the proposed techniques in detecting small, highly anomalous subgraphs in real graphs derived from Internet traffic and product co-purchases.
△ Less
Submitted 22 October, 2014; v1 submitted 29 January, 2014;
originally announced January 2014.