# privacypreserving_obfuscation_of_critical_infrastructure_networks__bf01551d.pdf

Privacy-Preserving Obfuscation of Critical Infrastructure Networks

Ferdinando Fioretto1,2 , Terrence W.K. Mak1 and Pascal Van Hentenryck1

1 Georgia Institute of Technology 2 Syracuse University {ﬁoretto, wmak}@gatech.edu, pvh@isye.gatech.edu

The paper studies how to release data about a critical infrastructure network (e.g., a power network or a transportation network) without disclosing sensitive information that can be exploited by malevolent agents, while preserving the realism of the network. It proposes a novel obfuscation mechanism that combines several privacy-preserving building blocks with a bi-level optimization model to significantly improve accuracy. The obfuscation is evaluated for both realism and privacy properties on real energy and transportation networks. Experimental results show the obfuscation mechanism substantially reduces the potential damage of an attack exploiting the released data to harm the real network.

1 Introduction Critical Infrastructure Networks (CINs), such as electrical power grids and public transportation networks, rely on the tight interaction of cyber and physical components. They play crucial roles in ensuring social and economic stability, and their operations require advanced machine-learning and optimization algorithms. For instance, power network operations perform a power ﬂow computation every few minutes. Research on CINs is highly dependent on the availability of realistic test cases. However, the release of such datasets is challenging due to privacy and national security concerns. For instance, the electrical load of an industrial customer may indirectly reveal sensitive information on its production levels and strategic investments. Similarly, the ability to link the physical and cyber locations of power generators can be used to launch coordinated attacks on cyber-physical facilities, signiﬁcantly damaging the targeted network. To mitigate these concerns, this paper develops an obfuscation mechanism based on Differential Privacy (DP) [Dwork et al., 2006], a robust framework that bounds the privacy risks associated with answering sensitive queries or releasing datasets. A DP algorithm introduces carefully calibrated noise to the data to prevent the disclosure of sensitive information. It is immune to linkage attacks attacks in which one exploits auxiliary data to expose sensitive information. However, DP faces signiﬁcant challenges when the resulting privacy-preserving datasets are used as inputs to complex

Location Obfuscation

Value Obfuscation

Fidelity Restoration

Figure 1: The POCIN Framework

optimization algorithms on CINs. For instance, direct applications of DP may impact the realism of the original dataset or produce networks that do not admit feasible solutions for the problem of interest [Fioretto and Van Hentenryck, 2018]. The paper addresses this gap by introducing a Privacypreserving Obfuscation mechanism for CINs (POCINs), sketched in Figure 1. POCIN takes as input a CIN and executes three phases to (1) obfuscate the sensitive location of network elements, (2) hide sensitive values, and (3) ensure the satisfaction of important global properties of the CIN and the problem of interest, through the use of optimization. The paper makes the following contributions: (1) It proposes POCIN, a novel privacy-preserving data release scheme that protects parameters and locations of network elements; (2) POCIN uses a novel bi-level optimization approach to preserve salient properties of the released data and solves it with either exact or approximated methods; and (3) It applies POCIN to two real CINs from energy and transportation networks. The paper shows that POCIN ensures strong privacy guarantees and that, on the considered CINs, the damage inﬂicted on the real network, when the attacker exploits the obfuscated data, converges to that of a random uninformed attack as the privacy requirements increase. All proofs are in the extended version of the paper [Fioretto et al., 2019].

2 Preliminaries: Differential Privacy

Differential Privacy (DP) is a framework used to protect the privacy of individuals in a dataset. This notion has emerged as the de-facto standard for privacy protection. A randomized mechanism M : D Ñ R with domain D and range R is ϵdifferential private if, for any output response O Ď R and

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

any two neighboring inputs D, D1 P D differing in at most one individual (written D D1),

Prr Mp Dq P Os ď exppϵq Prr Mp D1q P Os, (1)

where the probability is calculated over the coin tosses of M. The parameter ϵą0 is the privacy loss of the algorithm, with values close to 0 denoting strong privacy. DP satisﬁes several important properties, including composability, which ensures that a combination of differentially private algorithms preserve differential privacy, and immunity to post-processing, which ensures that privacy guarantees are preserved by arbitrary post-processing steps [Dwork and Roth, 2013]. In private data analysis settings, users interact with datasets by issuing queries. A (numeric) query is a function from a data set D P D to a result set R Ď Rd. A query Q can be made differentially private by injecting random noise to its output. The amount of noise depends on the sensitivity of the query, denoted by Q and deﬁned as Q max D D1 }Qp Dq Qp D1q}1 . In other words, the sensitivity of a query is the maximum l1-distance between the query outputs of any two neighboring dataset D and D1. While the classical DP notion protects individuals from participating into a dataset, many applications involve components whose presence is public information. However, their values and (cyber) locations are highly sensitive, as they may reveal, for instance, how much power a generator is producing and where it is located, or the most congested segments in transportation or logistics network. Therefore the privacy goal of data curators is to protect observed values and locations associated with these components. The concept of indistinguishability was introduced by Andr es et al. [2013] to protect user locations in the Euclidean plane and then generalized in [Chatzikokolakis et al., 2013; Koufogiannis et al., 2015] to arbitrary metric spaces. Consider a dataset D to which n individuals contribute their data xi and α ą 0. An adjacency relation that captures the data variation of a single individual is deﬁned as:

D α D1 ô Di : dpxi, x1 iq ď α @j i : dpxj, x1 jq 0

where d is a distance function on D. Such adjacency definition is useful to hide individual participation up to some quantity α, or a location within a radius α. Given ϵ ą 0, α ą 0, a randomized mechanism M : D Ñ R with domain D and range R is α-indistinguishable ϵ-DP (pϵ, α)- indistinguishable for short) if, for any event O Ď R and any pair D α D1, (D, D1 P D), Equation (1) holds.

3 The CIN Obfuscation Problem

Problem Setting Consider a dataset p N, Eq describing some critical infrastructure network where N P Rnˆp and E P Rmˆq describe the set of n nodes and m edges of the network, together with their attributes. The dataset p N, Eq is referred to as CIN description. An element ni of N is a p-dimensional vector describing the attribute values associated with node i and an element eij of E, is a q-dimensional vector describing the attribute values associated with edge pi, jq between nodes i and j. Furthermore, consider an optimization problem P that takes as

200 MW 140 MW

360 MW 40 MW

Figure 2: An example CIN with four nodes (power generators). The highlighted node is one being compromised by an attacker.

input a CIN description, denoted with G for notational clarity: O p Gq min x Opx, Gq s.t. gipx, Gq ď 0 @i P rks, (2)

where x PRl is a vector of decision variables, O is the objective function of P, and gi (i Prks) are the problem constraints. To illustrate these concepts, consider the CIN in Figure 2, representing a simpliﬁed power network. The nodes describe the network generators and loads (represented as cities in the ﬁgure) with values denoting the amount of power being injected into, or withdrawn from, the power grid. The edges represent the transmission lines to carry power from generators to loads. The optimization problem P may be the optimal power ﬂow (OPF) that amounts to ﬁnding the most costeffective generator dispatch to serve the load demands while satisfying the physical constraints of the network. The data curator desires to release a CIN description p N, Eq that obfuscates the location and value of some sensitive network elements, e.g., the locations and capacities of generators. The curator also wants to ensure that the OPFs on the released and original CINs behave similarly (e.g., are feasible and have similar costs) so that the released data is of high ﬁdelity. For notational simplicity, this paper focuses on obfuscating sensitive nodes of a CIN where each node ni P N is a pair pℓi, viq P N ˆ R, describing its location and salient parameter. The n-dimensional vectors of locations and values are denoted by ℓand v and G is used to denote a CIN description.

Attack Model The paper considers an attack model A in which a malicious agent can disrupt up to b elements (called attack budget) to damage the network. Such action produces a new damaged network Ga Ap Gq with Ga Ď G. For instance, Figure 2 highlights in red the generators affected by an attack with a budget b 1, while Ga is shown in black. The paper further assumes that the attacker has full knowledge of the network topology and is capable of solving problem P (e.g., to ﬁnd the generators with highest dispatch) and assessing the damages resulting from an attack. The damage of the attacker is measured using the objective value difference O p Gaq O p Gq for problem P. For simplicity, the paper focuses on ranking attacks, i.e., attacks that disrupt the b highest-ranked network elements when their values are sorted according to an ordering ă (e.g., the most dispatched generators), and evaluates the released data against such attacks.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

The CIN Obfuscation Problem This paper uses α-indistinguishability to obfuscate the values and locations of nodes. For a given αℓą 0, the relation ℓĎ N2 captures indistinguishability between node positions in the network, using a distance dℓdeﬁned as a function of the minimum number of nodes separating two nodes. Similarly, for a given αv ą 0 and distance function dv, relation vĎ R2 captures the indistinguishability between node values. These neighboring relations can be combined into a new relation ℓv that captures indistinguishability for both values and locations. Two CIN descriptions G pℓ, vq, G1 pℓ1, v1q P N ˆ R are location and value indistinguishable if and only if

G ℓv G1 ô ℓ ℓℓ1 _ v v v1. (3)

A randomized mechanism M is pϵ, αℓ, αvq-indistinguishable if, for any pair of ℓv-adjacent datasets, Equation (1) holds. The design of obfuscation mechanisms for CIN descriptions should satisfy three desiderata, formalized through the Privacy-preserving Obfuscation for CIN (POCIN) problem:

Deﬁnition1 Given a CIN description G, a problem P, and positive real values αv, αℓ, β and ϵ, the POCIN problem produces an obfuscated CIN description G p ℓ, vq such that: 1. Privacy: G satisﬁes pϵ, αℓ, αvq-indistinguishability. 2. Fidelity: G admits a candidate solution x that (i) satisﬁes the constraints gip x, Gq of P and (ii) is β-faithful to the P s objective, i.e., |Op x, Gq O p Gq| O p Gq ď β.

3. Robustness: G minimizes |O p Gq O p Ap Gqq|, i.e., the damage inﬂicted by attack A.

4 The POCIN Mechanism

The POCIN mechanism is divided in three phases, as illustrated in Figure 1: 1. The location obfuscation phase produces a new location indistinguishable CIN Gℓ p ℓ, vq from the original G. 2. The value obfuscation phase takes Gℓas input and produces a new value indistinguishable CIN G p ℓ, vq. 3. The ﬁdelity restoration phase produces a new CIN description 9G p ℓ, 9vq that satisﬁes the problem constraints and is faithful to its objective.

Phase 1: Location Obfuscation The ﬁrst phase provides location indistinguishability. The idea is to shufﬂe the node locations using an instance of the Exponential Mechanism [Mc Sherry and Talwar, 2007]. Such mechanism releases a privacy-preserving answer to a query by sampling from its output discrete space O. The sampling probability is determined by a utility function u : p D ˆOq Ñ R that assigns a score to each output o P O.

Theorem1 (Exponential Mechanism) Let u:p D ˆ OqÑR with sensitivity u maxo PO max D D1 |up D, oq up D1, oq|. The exponential mechanism that outputs o with probability Prro is selected s9 exp ϵup D, oq{2 u satisﬁes ϵ DP.

POCIN uses a privacy-preserving shufﬂing function that maps each node i with location ℓi to a new location ℓi as:

0.101 0.182

Figure 3: Probabilities of location obfuscations (αℓ ϵ 1.0).

ℓi Ð ℓj with Pr 9 exp ˆϵ dℓpℓi, ℓjq

where ℓj is the location of a node j. This is an application of the Exponential mechanism with u being the sensitivity of distance function dℓ. Since two different locations may be mapped to the same location, POCIN solves an assignment problem so that each node i is mapped to a unique location.

Corollary1 Given αℓą 0, Phase 1 of POCIN achieves αℓlocation-indistinguishability.

Figure 3 illustrates the intuition underlying the mechanism: It displays the probabilities of moving the central node to a speciﬁc location. For a sufﬁciently large αℓ, an attacker is provided with no useful location information. Of course, there is a tradeoff between αℓand the ﬁdelity of the network, which is addressed in Phase 3.

Phase 2: Value Obfuscation The second phase of POCIN takes as input Gℓ p ℓ, vq and constructs a privacy-preserving answer v for the node values v using the Laplace mechanism.

Theorem2 (Laplace Mechanism) Let Q be a numeric query that maps datasets to Rd. The Laplace mechanism that outputs Qp Dq z, where z P Rd is drawn from the Laplace distribution Lapp Q{ϵqd, achieves ϵ-DP.

In the above, Lappλq denotes the Laplace distribution with 0 mean and scale λ, and Lappλqd denotes the i.i.d. Laplace distribution over d dimensions with parameter λ. The Laplace mechanism with parameter λ α{ϵ satisﬁes αindistinguishability [Chatzikokolakis et al., 2013]. As a result, the privacy-preserving values v of the CIN description are obtained as follows:

v v Lappαv{ϵqn. (5)

The Laplace mechanism has been shown to be optimal: it minimizes the mean-squared error for identity queries with respect to the L1-norm [Koufogiannis et al., 2015].

Corollary2 Given αv ą 0, Phase 2 of POCIN achieves αvvalue-indistinguishability.

Phase 2 generates a CIN description G p ℓ, vq obfuscating locations and values of the original network G. As a result, Gv satisﬁes condition (1) of the CIN obfuscation problem.

Phase 3: Fidelity Restoration The noise introduced by Phases 1 and 2 may signiﬁcantly alter the structure of the data and the solutions to Problem P. To restore the ﬁdelity of the network, POCIN leverages the postprocessing immunity of DP and uses a bi-level optimization problem to redistribute the noise introduced in the previous

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

PBL min p 9v,xq} 9v v}2 (b1)

s.t.: |Opx , 9vq O | ď β (b2) x argminx Opx, 9vq s.t. gipx, 9vq ď 0 @i P rks (b3)

Figure 4: The bi-level optimization of the Fidelity Restoration.

phases. The primary decision variables of the problem are the vector 9v p 9v1, . . . , 9vnq that represents the post-processed node values after the noise redistribution. The problem, called PBL, is shown in Figure 4. It searches for a vector 9v for which problem P has an optimal solution x and whose objective value is close to the original optimum O (assumed to be public information). Moreover, the vector 9v must be as close as possible to obfuscated vector v, which is ensured by objective (b1). Constraint (b3) ensures that x is an optimal solution to problem O p 9G p ℓ, 9vqq, and Constraint (b2) ensures the ﬁdelity of the objective.

Theorem3 POCIN is pϵ, αp, αvq-indistinguishable.

Theorem4 The error induced by POCIN on the CIN node values is bounded by the inequality: } 9v v}2 ď 2} v v}2.

Note that POCIN does not explicitly minimize the damages inﬂicted by an attack on the critical infrastructure network. Indeed, simply evaluating the damage of an attack requires access to the real network G, which would violate privacy. However, the shufﬂing mechanism indirectly addresses this issue. Intuitively, for a sufﬁciently large αℓin the location obfuscation, the probability of choosing the most dispatched generator with a single attack will be close to 1

5 Solving the Fidelity Restoration Model PBL is a bi-level problem and bi-level programming is known to be strongly NP-hard [Hansen et al., 1992]: Even evaluating a solution for optimality is NP-hard [Vicente et al., 1994]. This paper explores two avenues to solve or approximate PBL. When the subproblem (b3) is convex, it can be replaced by its Karush-Kuhn-Tucker (KKT) conditions, producing a single-level model. Moreover, when the subproblem is linear, the resulting subproblem is a MIP model, which is the case in our transportation case study. The subproblem can also be replaced by the relaxation of PBL shown in Figure 5. This relaxation ensures that 9G has a feasible solution whose value is close to the original objective. However, it does not constrain the optimal solution which will be a lower bound to Opx , 9vq. This relaxation is used in the power network case study and also satisﬁes Theorem 4. When the subproblem is convex, it is possible to strengthen Theorem 4 using the existence of a solution (vector v), the optimality of 9v, and the angle property of a projection on a convex set.

Theorem5 When PCL is used for ﬁdelity restoration and is convex, the error induced by POCIN on the CIN node values is bounded by the inequality: } 9v v}2 ď } v v}2.

Other methods to solve bi-level problems include descent methods [Kolstad and Lasdon, 1990], penalty function

PCL min p 9v,xq} 9v v}2 (n1)

s.t.: |Opx, 9vq O | ď β (n2) gipx, 9vq ď 0 @i P rks (n3)

Figure 5: The Relaxation of PBL.

Network instance 1 3 5 7 10

nesta case14 ieee 6 6 6 6 6 nesta case30 ieee 0 0 0 0 0 nesta case57 ieee 12 10 6 12 12 nesta case118 ieee 0 0 0 0 0

Table 1: Feasibility Statistics before Phase 3 (in percentage).

methods [Aiyoshi and Shimizu, 1984], and evolutionary approaches [Mathieu et al., 1994]. The reader can consult [Sinha et al., 2018] for an extensive review on the topic.

6 Experimental Results

This section presents experimental results on two case studies: power network and trafﬁc obfuscation problems.

6.1 Power Network Obfuscation Problem Consider a transmission operator who would like to release a description of its network to stimulate research but is worried that malicious actors could use it to design a cyber-attack targeting the generators that would cause maximum damage. The operator seeks to obfuscate the locations and maximum capacities of its generators while preserving the realism of the network description. The experiments were performed on a variety of benchmarks from the NESTA library [Coffrin et al., 2014]. The results analyze the dispatch values produced by POCIN and determine how well the obfuscated networks sustain an attack. For brevity, the results are highlighted on the IEEE 118 bus test case. The experiments use a privacy loss ϵ of 1.0, and vary the indistinguishability levels αl from 1% to 10% of the network diameter dp Gq, and the faithfulness level β in t10 2, 10 1u, while αv is ﬁxed to 0.1 p.u. 10 MW. The model was implemented using the Julia package Power Models.jl with IPOPT [Coffrin et al., 2018]. POCIN takes less than 1 minute to produce the obfuscated networks.

Dispatch & Cost Analysis The section studies the privacy and realism of the obfuscated networks produced by POCIN. Figure 6a shows the active dispatch values of all generators in the obfuscated network and compares them with their associated values in the original network. The left plot compares the dispatch for each generator, while the right plot compares the generation on each bus. Note that POCIN only swaps generator locations and do not assign generators to arbitrary buses. Not surprisingly, the dispatch differences are more pronounced as the indistinguishability level increases. More importantly, the dispatch values can no longer be used to distinguish generators and/or buses. Table 1 reports the success rate of obtaining a feasible AC power ﬂow after Phase 2 on various benchmarks over 50 runs.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

0.0 2.5 5.0 7.5 dispatch (original, p.u.)

dispatch (private, p.u.)

0.0 2.5 5.0 7.5 dispatch (original, p.u.)

dispatch (private, p.u.)

0.0 2.5 5.0 7.5 dispatch (original, p.u.)

dispatch (private, p.u.)

0.0 2.5 5.0 7.5 dispatch (original, p.u.)

dispatch (private, p.u.)

(a) Active dispatch between original and obfuscated networks with ϵ 1.0, β 0.01, αl α% ˆ dp Gq, αv 0.1. Left: Generator active generation. Right: Bus active injection.

1.0 3.0 5.0 7.0 10.0 Indistinguishability α (β = 0.1)

percentage (%)

(b) Original vs. obfuscated network AC optimal dispatch costs differences (in %).

Figure 6: (IEEE 118 bus) Dispatch & Cost Analysis

Random Obfuscated =10 Obfuscated =1 Real Data Attack Type

10% 20% 30% Attack Budget (% generators)

Anc. Service Cost ($10/MWh)

10% 20% 30% Attack Budget (% generators)

Anc. Service Cost ($10/MWh)

Figure 7: Ancillary service costs for the attack type on the IEEE-57 (left) and IEEE-118 (right) Networks pβ 0.1q.

Without Phase 3, the optimization problem P is infeasible in most or all runs for all test cases. These results highlight the critical role of Phase 3 of POCIN. Figure 6b shows the difference, in percentage, between the dispatch costs of the obfuscated and original networks for increasing the indistinguishability levels αl : αl α% ˆ dp Gq. It reports the mean and standard deviation (shown with black, solid, lines) over 50 runs. The results indicate that obfuscated networks are largely within the faithfulness requirement and provide the necessary ﬁdelity for Problem P on the obfuscated network.

Attack Simulation It remains to study how an attacker may leverage the obfuscated network to damage the original CIN. The attacker is given an attack budget b denoting the percentage of generators that can be damaged. To assess the beneﬁts of POCIN, three type of attacks are compared (where k denote the number of generators that makes up b%). 1. Random Attack: k generators are randomly selected. 2. Obfuscated Attack: The attacker chooses the k generators with the largest dispatches in 9G. 3. Fully-Informed Attack: The attacker chooses the k generators with the largest dispatches in G. The experiments assume estimated ancillary service costs of $10/MWh to serve the load that cannot be provided by the remaining generators. Figure 7 shows the costs for each attack type at varying of the attack budget b P t10, 20, 30u and the indistinguishability value α P t1.0, 10.0u, αl α% ˆ dp Gq on the IEEE-57 and IEEE-118 benchmarks with faithfulness value β and αv both set to 0.1. The results average 50 simulations for each combination of parameters. The random attacks are used as a baseline to assess the damage inﬂicted by an uninformed attacker. Not surprisingly, they result in the lowest costs in each setting. In contrast, fully-informed

attacks produce the largest damages on the networks with increasing ancillary costs as the attack budget increases. The results for the obfuscated attacks are particularly interesting: The obfuscation signiﬁcantly reduces the power of an attacker. Remarkably, as the location indistinguishability values increase, the inﬂicted damages decrease and converge to that of random attacks. Larger indistinguishability implies more noise, and thus a higher chance for an attacker to damage less important generators. Note however that the mechanism still preserves the desired network ﬁdelity.

6.2 Trafﬁc Network Obfuscation Problem

The scenario involves a city that would like to release its trafﬁc data, but is concerned about a cyber-attack on its trafﬁc control system. The city aims at releasing an obfuscated version of the data that would preserve the trip durations of its commuters and would minimize the damage of a cyber-attack on its trafﬁc controllers regulating trafﬁc ﬂows on the road segments. The case study involves the trafﬁc network of the city of Ann Arbor, MI and 8,000 real trips from an origin O to a destination D (O-D pairs). For simplicity, the travel times on an edge e are given by a linear combination de γte of the distance de and the trafﬁc te, but the results generalize naturally to more complex models. POCIN obfuscates the location of the edges and their transit data, the distance and the network being public information. Problem P is characterized by solving a shortest path for each commuter from her origin to her destination, using historical trafﬁc data. Since the shortest path problem is totally unimodular, the bi-level program PBL of POCIN can be reformulated as a MIP, and thus the problem can be solved exactly. The experiments use a privacy loss of ϵ 1.0, vary the indistinguishability level αl from 1% to 25% of the number of nodes and select the faithfulness level β of 0.1. The implementation takes less than 20 minutes to generate the obfuscated networks.

Trafﬁc Weights & Travel Costs The ﬁrst experiments concern the privacy and realism of the obfuscated network produced by POCIN. Figure 8 shows the travel times on all the roads in the obfuscated and original networks. As the indistinguishability increase, the networks become signiﬁcantly different. Figure 9 shows the shortest paths cost of 15 O-D pairs in the obfuscated and original networks. Yellow/ + dots denote travel costs before Phase 3 and blue dots the travel costs of POCIN. The results show that POCIN preserves the shortest paths costs with high ﬁdelity and that

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

0 100 200 300 Traffic (original)

Traffic (private)

0 100 200 300 Traffic (original)

Traffic (private)

Figure 8: Trafﬁc weights on all the roads between original and obfuscated networks with faithfulness parameters β 0.1.

2500 5000 7500 Cost value (original)

Cost value (obfuscated)

2500 5000 7500 Cost value (original)

Cost value (obfuscated)

Figure 9: Travel costs of 15 trips between original and obfuscated networks with faithfulness parameters β 0.1.

the ﬁdelity restoration step brings signiﬁcant beneﬁts.

Attack Simulation The second set of results concerns the obfuscation capability to mitigate the damage of an attack. A malicious agent targets the trafﬁc controllers of various road segments to increase the total travel costs of all O-D pairs. Figure 10 shows the average increases (in %) of the travel costs for each attack strategy, at various attack budgets consisting of k P t10, 20, 50u roads, indistinguishability value α, and with faithfulness value β 0.1. Larger values of β attain similar results. Results are averages over 50 simulations for each combination of parameters. The results show that random attacks are completely ineffective, while fully-informed attacks increase travel times substantially. In contrast, the obfuscation dramatically decreases the damage. When the indistinguishability level increases, the obfuscation eliminates the damage for attack budgets targeting 10 roads and only increases the travel times by less than 20% on larger attacks.

7 Related Work

The release of differentially private datasets that protects the value of the data is a challenging task that has been studied by several authors. Often the released data is generated from a data synopsis in the form of a noisy histogram [Li et al., 2014; Qardaji et al., 2014; Fioretto et al., 2018]. Protecting locations privacy typically uses the framework of geoindistinguishability Andr es et al. [2013], in which locations are made indistinguishable in the Euclidean plane. The release of differentially private networks has also been studied in the literature: It primarily focuses on social networks [Proserpio et al., 2014; Blocki et al., 2013;

10 20 50 Attack Budget (No. of roads)

Inc. travel costs (%)

Attack Type

Obfuscated =1

Obfuscated =25

Figure 10: Average increase on travel times (in %) of all O-D pairs with faithfulness parameters β 0.1.

Kasiviswanathan et al., 2013]. These methods focus on either protecting the privacy of the node or the edge and ensure that the output distribution does not change signiﬁcantly when a node (resp. edge) and all its adjacent edges (resp. nodes) are added to, or removed from, the graph. The goal of these methods is not to preserve the original network topology. In contrast to these approaches, POCIN focuses on the release of privacy-preserving network data that protects locations and values of the network components while preserving the network topology and ﬁdelity. As shown by the experimental results, this places additional requirements on the mechanism, including the need for POCIN to use optimization to redistribute the noise and ensure that the data release admits a realistic solution to the optimization problem. The evaluation of privacy-preserving techniques with respect to malicious attacks is also a novel contribution of this paper. Traditional research on attack detection and prevention on cyber-physical systems is not concerned with privacy and data release. For instance Ghafouri et al. [2018] uses regression to detect anomalous sensor readings and Junejo and Goh [2016] focuses on predicting attacks observing real and simulated data. Finally, bi-level problems have been used in security applications of AI [Pita et al., 2009; Jain et al., 2011; Nguyen et al., 2016; Zhao et al., 2017].

8 Conclusions This paper presented a privacy-preserving scheme for the release of Critical Infrastructure Networks (CINs). The proposed Privacy-preserving Obfuscation mechanism for CIN (POCIN) obfuscates values and locations of sensitive network elements combining several differential privacy mechanisms with a bi-level optimization problem. It does so without altering the CIN topology and ensuring the released obfuscated network preserves the properties of an optimization problem of interest. The paper proposes exact and relaxation solutions for solving the bi-level optimization problems and tested POCIN on power grid and transportation system real problems. The results show that POCIN is effective in obfuscating values and locations of the network parameters. Importantly, the result also illustrates the effectiveness of POCIN to deceive malicious agents that exploit the data release to produce as much damage as possible to the CIN.

Acknowledgments The authors are thankful to Kory Hedman for extensive discussions. This research is partly funded by the ARPA-E Grid Data Program under Grant 13571530.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

References [Aiyoshi and Shimizu, 1984] Eitaro Aiyoshi and Kiyotaka Shimizu. A solution method for the static constrained stackelberg problem via penalty method. AC, 29(12):1111 1114, 1984. [Andr es et al., 2013] Miguel Andr es, Nicol as Bordenabe, Konstantinos Chatzikokolakis, and Catuscia Palamidessi. Geo-indistinguishability: Differential privacy for locationbased systems. In CCS, pages 901 914, 2013. [Blocki et al., 2013] Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. Differentially private data analysis of social networks via restricted sensitivity. In ITCS, pages 87 96, 2013. [Chatzikokolakis et al., 2013] Konstantinos Chatzikokolakis, Miguel E Andr es, Nicol as Emilio Bordenabe, and Catuscia Palamidessi. Broadening the scope of differential privacy using metrics. In PETS, pages 82 102, 2013. [Coffrin et al., 2014] Carleton Coffrin, Dan Gordon, and Paul Scott. Nesta, the NICTA energy system test case archive. Co RR, abs/1411.0359, 2014. [Coffrin et al., 2018] Carleton Coffrin, Russell Bent, Kaarthik Sundar, Yeesian Ng, and Miles Lubin. Powermodels.jl: An open-source framework for exploring power ﬂow formulations. In PSCC, pages 1 8, June 2018. [Dwork and Roth, 2013] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Theoretical Computer Science, 9(3-4):211 407, 2013. [Dwork et al., 2006] Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In TCC, volume 3876, pages 265 284, 2006. [Fioretto and Van Hentenryck, 2018] Ferdinando Fioretto and Pascal Van Hentenryck. Constrained-based differential privacy: Releasing optimal power ﬂow benchmarks privately. In CPAIOR, pages 215 231, 2018. [Fioretto et al., 2018] Ferdinando Fioretto, Chansoo Lee, and Pascal Van Hentenryck. Constrained-based differential privacy for private mobility. In AAMAS, pages 1405 1413, 2018. [Fioretto et al., 2019] Ferdinando Fioretto, Terrence W. K. Mak, and Pascal Van Hentenryck. Privacy-preserving obfuscation of critical infrastructure networks (extended version). Co RR, abs/1905.09778, 2019. [Ghafouri et al., 2018] Amin Ghafouri, Yevgeniy Vorobeychik, and Xenofon D. Koutsoukos. Adversarial regression for detecting attacks in cyber-physical systems. In IJCAI, pages 3769 3775, 2018. [Hansen et al., 1992] Pierre Hansen, Brigitte Jaumard, and Gilles Savard. New branch-and-bound rules for linear bilevel programming. SIAM J. Sci. and Stat. Comput., 13(5):1194 1217, 1992. [Jain et al., 2011] Manish Jain, Dmytro Korzhyk, Ondˇrej Vanˇek, Vincent Conitzer, Michal Pˇechouˇcek, and Milind Tambe. A double oracle algorithm for zero-sum security games on graphs. In AAMAS, pages 327 334, 2011.

[Junejo and Goh, 2016] Khurum Nazir Junejo and Jonathan Goh. Behaviour-based attack detection and classiﬁcation in cyber physical systems using machine learning. In CPSSec, pages 34 43, 2016. [Kasiviswanathan et al., 2013] Shiva Prasad Kasiviswanathan, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Analyzing graphs with node differential privacy. In Theory of Cryptography, pages 457 476. Springer, 2013. [Kolstad and Lasdon, 1990] Charles D Kolstad and Leon S Lasdon. Derivative evaluation and computational experience with large bilevel mathematical programs. J. of optimization theory and applications, 65(3):485 499, 1990. [Koufogiannis et al., 2015] Fragkiskos Koufogiannis, Shuo Han, and George J Pappas. Optimality of the laplace mechanism in differential privacy. ar Xiv:1504.00065, 2015. [Li et al., 2014] Chao Li, Michael Hay, Gerome Miklau, and Yue Wang. A data-and workload-aware algorithm for range queries under differential privacy. VLDB, 7(5):341 352, 2014. [Mathieu et al., 1994] R Mathieu, L Pittard, and G Anandalingam. Genetic algorithm based approach to bi-level linear programming. RAIRO, 28(1):1 21, 1994. [Mc Sherry and Talwar, 2007] Frank Mc Sherry and Kunal Talwar. Mechanism design via differential privacy. In FOCS, pages 94 103, 2007. [Nguyen et al., 2016] Thanh Hong Nguyen, Arunesh Sinha, and Milind Tambe. Conquering adversary behavioral uncertainty in security games: An efﬁcient modeling robust based algorithm. In AAAI, 2016. [Pita et al., 2009] James Pita, Manish Jain, Fernando Ord onez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. Using game theory for los angeles airport security. AI magazine, 30(1):43, 2009. [Proserpio et al., 2014] Davide Proserpio, Sharon Goldberg, and Frank Mc Sherry. Calibrating data to sensitivity in private data analysis: a platform for differentially-private analysis of weighted datasets. VLDB, 7(8):637 648, 2014. [Qardaji et al., 2014] Wahbeh Qardaji, Weining Yang, and Ninghui Li. Priview: practical differentially private release of marginal contingency tables. In ICMD, pages 1435 1446, 2014. [Sinha et al., 2018] Ankur Sinha, Pekka Malo, and Kalyanmoy Deb. A review on bilevel optimization: from classical to evolutionary approaches and applications. IEEE Trans. on Evol. Comp., 22(2):276 295, 2018. [Vicente et al., 1994] Luis Vicente, Gilles Savard, and Joaquim J udice. Descent approaches for quadratic bilevel programming. J. of optimization theory and applications, 81(2):379 399, 1994. [Zhao et al., 2017] Mengchen Zhao, Bo An, Wei Gao, and Teng Zhang. Efﬁcient label contamination attacks against black-box learning models. In IJCAI, pages 3945 3951, 2017.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)