# privacy_implications_of_shuffling__bb974054.pdf Published as a conference paper at ICLR 2022 PRIVACY IMPLICATIONS OF SHUFFLING Casey Meehan 1, Amrita Roy-Chowdhury 2, Kamalika Chaudhuri 1, Somesh Jha 2 1UC San Diego, 2 University of Wisconsin, Madison LDP deployments are vulnerable to inference attacks as an adversary can link the noisy responses to their identity and subsequently, auxiliary information using the order of the data. An alternative model, shuffle DP, prevents this by shuffling the noisy responses uniformly at random. However, this limits the data learnability only symmetric functions (input order agnostic) can be learned. In this paper, we strike a balance and show that systematic shuffling of the noisy responses can thwart specific inference attacks while retaining some meaningful data learnability. To this end, we propose a novel privacy guarantee, dσ-privacy, that captures the privacy of the order of a data sequence. dσ-privacy allows tuning the granularity at which the ordinal information is maintained, which formalizes the degree the resistance to inference attacks trading it off with data learnability. Additionally, we propose a novel shuffling mechanism that can achieve dσ-privacy and demonstrate the practicality of our mechanism via evaluation on real-world datasets. 1 INTRODUCTION Differential Privacy (DP) and its local variant (LDP) are the most commonly accepted notions of data privacy. LDP has the significant advantage of not requiring a trusted centralized aggregator, and has become a popular model for commercial deployments, such as those of Microsoft (Ding et al., 2017), Apple (Greenberg, 2016), and Google (Erlingsson et al., 2014; Fanti et al., 2015; Bittau et al., 2017b). Its formal guarantee asserts that an adversary cannot infer the value of an individual s private input by observing the noisy output. However in practice, a vast amount of public auxiliary information, such as address, social media connections, court records, property records, income and birth dates (La Corte, 2019), is available for every individual. An adversary, with access to such auxiliary information, can learn about an individual s private data from several other participants noisy responses. We illustrate this as follows. Problem. An analyst runs a medical survey in Alice s community to investigate how the prevalence of a highly contagious disease changes from neighborhood to neighborhood. Community members report a binary value indicating whether they have the disease. Next, consider the following two data reporting strategies. Strategy 1. Each data owner passes their data through an appropriate randomizer (that flips the input bit with some probability) in their local devices and reports the noisy output to the untrusted data analyst. Strategy 2. The noisy responses from the local devices of each of the data owners are collected by an intermediary trusted shuffler which dissociates the device IDs (metadata) from the responses and uniformly randomly shuffles them before sending them to the analyst. Strategy 1 corresponds to the standard LDP deployment model (for example, Apple and Microsoft s deployments). Here the order of the noisy responses is informative of the identity of the data owners the noisy response at index 1 corresponds to the first data owner and so on. Thus, the noisy responses can be directly linked with its associated device/account ID and subsequently, auxiliary information. This puts Alice s data under the threat of inference attacks. For instance, an adversary1 may know the home addresses of the participants and use this to identify the responses of all the individuals from Alice s household. Being highly infectious, all or most of them will have the same true value (0 or 1). Hence, the adversary can reliably infer Alice s value by taking a simple majority vote of her and her household s noisy responses. Note that this does not violate the LDP guarantee since the inputs are appropriately randomized when observed in isolation. Additionally, on account of being 1The analyst and the adversary could be same, we refer to them separately for the ease of understanding. Published as a conference paper at ICLR 2022 (a) Original Data (c) Our scheme: ra (d) Our scheme: rb (e) Uniform shuffle (f) Attack: LDP (g) Attack: ra (h) Attack: rb (i) Attack: unif. shuff. Figure 1: Demonstration of how our proposed scheme thwarts inference attacks at different granularities. Fig. 1a depicts the original sensitive data (such as income bracket) with eight color-coded labels. The position of the points represents public information (such as home address) used to correlate them. There are three levels of granularity: warm vs. cool clusters, blue vs. green and red vs. orange crescents, and light vs. dark within each crescent. Fig. 1b depicts ϵ = 2.55 LDP. Fig. 1c and 1d correspond to our scheme, each with α = 1 (privacy parameter, Def. 4.3). The former uses a smaller distance threshold (r1, used to delineate the granularity of grouping see Sec. 4.2) that mostly shuffles in each crescent. The latter uses a larger distance threshold (r2) that shuffles within each cluster. Figures in the bottom row demonstrate an inference attack (uses Gaussian process correlation) on all four cases. We see that LDP reveals almost the entire dataset (Fig. 1f) while uniform shuffling prevents all classification (1i). However, the granularity can be controlled with our scheme (Figs. 1g, 1h). public, the auxiliary information is known to the adversary (and analyst) a priori no mechanism can prevent their disclosure. For instance, any attempts to include Alice s address as an additional feature of the data and then report via LDP is futile the adversary would simply discard the reported noisy address and use the auxiliary information about the exact addresses to identify the responses of her household members. We call such threats inference attacks recovering an individual s private input using all or a subset of other participants noisy responses. It is well known that protecting against inference attacks that rely on underlying data correlations is beyond the purview of DP (Kifer & Machanavajjhala, 2014; Tschantz et al., 2020). Strategy 2 corresponds to the recently introduced shuffle DP model, such as Google s Prochlo (Bittau et al., 2017b). Here, the noisy responses are completely anonymized the adversary cannot identify which LDP responses correspond to Alice and her household. Under such a model, only information that is completely order agnostic (i.e., symmetric functions that can be computed over just the bag of values, such as aggregate statistics) can be extracted. Consequently, the analyst also fails to accomplish their original goal as all the underlying data correlation is destroyed. Thus, we see that the two models of deployment for LDP present a trade-off between vulnerability to inference attacks and scope of data learnability. In fact, as demonstrated in Kifer & Machanavajjhala (2011), it is impossible to defend against all inference attacks while simultaneously maintaining utility for learning. In the extreme case that the adversary knows everyone in Alice s community has the same true value (but not which one), no mechanism can prevent revelation of Alice s datapoint short of destroying all utility of the dataset. This then begs the question: Can we formally suppress specific inference attacks targeting each data owner while maintaining some meaningful learnability of the private data? Referring back to our example, can we thwart attacks inferring Alice s data using specifically her households responses and still allow the medical analyst to learn its target trends? Can we offer this to every data owner participating? In this paper, we strike a balance and propose a generalized shuffle framework that meets the utility requirements of the above analyst while formally protecting data owners against inference attacks. Our solution is based on the key insight: the order of the data acts as the proxy for the identity of data owners as illustrated above. The granularity at which the ordering is maintained formalizes resistance to inference attacks while retaining some meaningful learnability of the private data. Specifically, we guarantee each data owner that their data is shuffled together with a carefully chosen group of other data owners. Revisiting our example, consider uniformly shuffling the responses from Alice s household and her immediate neighbors. Now an adversary cannot use her household s responses to predict her value any better than they could with a random sample of responses from this group. In the same way that LDP prevents reconstruction of her datapoint using specifically her noisy response, this scheme prevents reconstruction of her datapoint using specifically her households responses. The real challenge is offering such guarantees equally to every data owner. Bob, Alice s Published as a conference paper at ICLR 2022 neighbor, needs his households responses shuffled in with his neighbors, as does Luis who is a neighbor of Bob but not of Alice. Thus, we have n data owners with n distinct, overlapping groups. Our scheme supports arbitrary groupings (overlapping or not), introducing a diverse and tunable class of privacy/utility trade-offs which is not attainable with either LDP or uniform shuffling alone. For the above example, our scheme can formally protect each data owner from inference attacks using specifically their household, while still learning how disease prevalence changes across the neighborhoods of Alice s community. This work offers two key contributions to the machine learning privacy literature: Novel privacy guarantee. We propose a novel privacy definition, dσ-privacy that captures the privacy of the order of a data sequence (Sec. 4.2) and formalizes the degree of resistance against inference attacks (Sec. 4.3). dσ-privacy allows assigning an arbitrary group, Gi, to each data owner, DOi, i [n]. For instance, the groups can represent individuals in the same age bracket, friends on social media, or individuals living in each other s vicinity (as in case of Alice in our example). Recall that the order is informative of the data owner s identity. Intuitively, dσ-privacy protects DOi from inference attacks that arise from knowing the identity of the members of their group Gi (Sec. 4.3). Additionally, this grouping determines a threshold of learnability any learning that is order agnostic within a group (disease prevalence in a neighborhood the data analyst s goal in our example) is utilitarian and allowed; whereas analysis that involves identifying the values of individuals within a group (disease prevalence within specific households the adversary s goal) is regarded as a privacy threat and protected against. See Fig. 1 for a toy demonstration of how our guarantee allows tuning the granularity at which trends can be learned. Novel shuffle framework. We propose a novel mechanism that shuffles the data systematically and achieves dσ-privacy. This provides a generalized shuffle framework that interpolates between no shuffling (LDP) and uniform random shuffling (shuffle model) in terms of protection against inference attacks and data learnability. 2 RELATED WORK The shuffle model of DP (Bittau et al., 2017a; Cheu et al., 2019; Erlingsson et al., 2019) differs from our scheme as follows. These works (1) study DP benefits of shuffling whereas we study the inferential privacy benefits, and (2) only study uniformly random shuffling where ours generalizes this to tunable, non-uniform shuffling (see App. A.15). A steady line of work has studied inferential privacy (Kasiviswanathan & Smith, 2014; Kifer & Machanavajjhala, 2011; Ghosh & Kleinberg, 2016; Dalenius, 1977; Dwork & Naor, 2010; Tschantz et al., 2020). Our work departs from those in that we focus on local inferential privacy and do so via the new angle of shuffling. Older works such as k-anonymity (Sweeney, 2002), l-diversity Machanavajjhala et al. (2007), Anatomy (Xiao & Tao, 2006) and others (Wong et al., 2010; Tassa et al., 2012; Xue et al., 2012; Choromanski et al., 2013; Doka et al., 2015) have studied the privacy risk of non-sensitive auxiliary information or quasi identifiers . These works (1) focus on the setting of dataset release, whereas we focus on dataset collection, and (2) do not offer each data owner formal inferential guarantees, whereas we do. The De Finetti attack (Kifer, 2009) shows how shuffling schemes are vulnerable to inference attacks that correlate records to recover the original permutation of sensitive attributes. A strict instance of our privacy guarantee can thwart such attacks (at the cost of no utility, App. A.3). 3 BACKGROUND Notations. Boldface (such as x = x1, , xn ) denotes a data sequence (ordered list); normal font (such as x1) denotes individual values and { } represents a multiset or bag of values. 3.1 LOCAL DIFFERENTIAL PRIVACY The local model consists of a set of data owners and an untrusted data aggregator (analyst); each individual perturbs their data using a LDP algorithm (randomizers) and sends it to the analyst. The LDP guarantee is formally defined as Definition 3.1. [Local Differential Privacy, LDP Warner (1965); Evfimievski et al. (2003); Kasiviswanathan et al. (2008)] A randomized algorithm M : X Y is ϵ-locally differentially private (or ϵ-LDP ), if for any pair of private values x, x X and any subset of output, Pr M(x) W eϵ Pr M(x ) W (1) The shuffle model is an extension of the local model where the data owners first randomize their inputs. Additionally, an intermediate trusted shuffler applies a uniformly random permutation to Published as a conference paper at ICLR 2022 all the noisy responses before the analyst can view them. The anonymity provided by the shuffler requires less noise than the local model for achieving the same privacy. 3.2 MALLOWS MODEL A permutation of a set S is a bijection S 7 S. The set of permutations of [n], n N forms a symmetric group Sn. As a shorthand, we use σ(x) to denote applying permutation σ Sn to a data sequence x of length n. Additionally, σ(i), i [n], σ Sn denotes the value at index i in σ and σ 1 denotes its inverse. For example, if σ = (1 3 5 4 2) and x = 21, 33, 45, 65, 67 , then σ(x) = 21, 45, 67, 65, 33 , σ(2) = 3, σ(3) = 5 and σ 1 = (1 5 2 4 3). Mallows model is a popular probabilistic model for permutations (MALLOWS, 1957). The mode of the distribution is given by the reference permutation σ0 the probability of a permutation increases as we move closer to σ0 as measured by rank distance metrics, such as the Kendall s tau distance (Def. A.2). The dispersion parameter θ controls how fast this increase happens. Definition 3.2. For a dispersion parameter θ, a reference permutation σo Sn, and a rank distance measure d : Sn Sn 7 R, PΘ,d(σ : σ0) = 1 ψ(θ,d)e θd(σ,σ0) is the Mallows model where ψ(θ, d) = P σ Sn e θd(σ,σ0) is a normalization term and σ Sn. 4 DATA PRIVACY AND SHUFFLING Figure 2: Trusted shuffler mediates on y In this section, we present dσ-privacy and a shuffling mechanism capable of achieving the dσ-privacy guarantee. 4.1 PROBLEM SETTING In our problem setting, we have n data owners DOi, i [n] each with a private input xi X (Fig. 2). The data owners first randomize their inputs via a ϵ-LDP mechanism to generate yi = M(xi). Additionally, just like in the shuffle model, we have a trusted shuffler. It mediates upon the noisy responses y = y1, , yn to obtain the final output sequence z = A(y) (A corresponds to Alg. 1) which is sent to the untrusted data analyst. The shuffler can be implemented via trusted execution environments (TEE) just like Google s Prochlo. Next, we formally discuss the notion of order and its implications. Definition 4.1. (Order) The order of a sequence x = x1, , xn refers to the indices of its set of values {xi} and is represented by permutations from Sn. When the noisy response sequence y = y1, , yn is represented by the identity permutation σI = (1 2 n), the value at index 1 corresponds to DO1 and so on. Standard LDP releases the identity permutation w.p. 1. The output of the shuffler, z, is some permutation of the sequence y, i.e., z = σ(y) = yσ(1), , yσ(n) where σ is determined via A( ). For example, for σ = (4 5 2 3 1), we have z = y4, y5, y2, y3, y1 which means that the value at index 1 (DO1) now corresponds to that of DO4 and so on. 4.2 DEFINITION OF dσ-PRIVACY Inferential risk captures the threat of an adversary who infers DOi s private xi using all or a subset of other data owners released yj s. Since we cannot prevent all such attacks and maintain utility, our aim is to formally limit which data owners can be leveraged in inferring DOi s private xi. To make this precise, each DOi may choose a corresponding group, Gi [n], of data owners. Figure 3: An example social media connectivity graph te.g dσ-privacy guarantees that yj values originating from a data owner s group Gi are shuffled together. In doing so, the LDP values corresponding to subsets of DOi s group I Gi cannot be reliably identified, and thus cannot be singled out to make inferences about DOi s xi. If Alice s group includes her whole neighborhood, LDP data originating from her household cannot be singled out to recover her private xi. Any choice of grouping G = {G1, G2, . . . , Gn} can be accommodated under dσ-privacy. Each data owner may choose a group large enough to hide anyone they feel sufficient risk from. We outline two systematic approaches to assigning groups as follows: Let t = t1, , tn , ti T denote some public auxiliary information about each individual. DOi s group, Gi, could consist of all those DOj s who are similar to DOi w.r.t. the public auxiliary information ti, tj according to some distance measure d : T T R. Here, we define Published as a conference paper at ICLR 2022 similar as being under a threshold2 r R such that Gi = {j [n] d(ti, tj) r}, i [n]. For example, d( ) can be Euclidean distance if T corresponds to geographical locations, thwarting inference attacks leveraging one s household or immediate neighbors. If T represents a social media connectivity graph, d( ) can measure the path length between two nodes, thwarting inference attacks using specifically one s close friends. For the example social media connectivity graph depicted in Fig. 3, assuming distance metric path length and r = 2, the groups are defined as G1 = {1, 7, 8, 2, 5, 6}, G2 = {2, 1, 7, 5, 6, 3} and so on. Alternatively, the data owners might opt for a group of a specific size r < n. Collecting private data from a social media network, we may set r = 50, where each Gi is encouraged to include the 50 data owners DOi interacts with most frequently. Intuitively, dσ-privacy protects DOi against inference attacks that leverages correlations at a finer granularity than Gi. In other words, under dσ-privacy, one subset of k data owners Gi (e.g. household) is no more useful for targeting xi than any other subset of k data owners Gi (e.g. some combination of neighbors). This leads to the following key insight for the formal privacy definition. Key Insight. Formally, our privacy goal is to prevent the leakage of ordinal information from within a group. We achieve this by systematically bounding the dependence of the mechanism s output on the relative ordering (of data values corresponding to the data owners) within each group. First, we introduce the notion of neighboring permutations. Definition 4.2. (Neighboring Permutations) Given a group assignment G, two permutations σ, σ Sn are defined to be neighboring w.r.t. a group Gi G (denoted as σ Giσ ) if σ(j) = σ (j) j / Gi. Neighboring permutations differ only in the indices of its corresponding group Gi. For example, σ = (1 2 4 5 7 6 10 3 8 9) and σ = (7 3 4 5 6 2 1 10 8 9) are neighboring w.r.t G1 (Fig. 3) since they differ only in σ(1), σ(2), σ(5), σ(6), σ(7) and σ(8). We denote the set of all neighboring permutations as NG = {(σ, σ )|σ Gi σ , Gi G} (2) Now, we formally define dσ-privacy as follows. Definition 4.3 (dσ-privacy). For a given group assignment G on a set of n entities and a privacy parameter α R 0, a randomized mechanism A : Yn 7 V is (α, G)-dσ private if for all y Yn and neighboring permutations σ, σ NG and any subset of output O V, we have Pr[A σ(y) O] eα Pr A σ (y) O (3) σ(y) and σ (y) are defined to be neighboring sequences. dσ-privacy states that, for any group Gi, the mechanism is (almost) agnostic of the order of the data within the group. Even after observing the output, an adversary cannot learn about the relative ordering of the data within any group. Thus, two neighboring sequences are indistinguishable to an adversary. An important property of dσ-privacy is that post-processing computations does not degrade privacy. Additionally, when applied multiple times, the privacy guarantee degrades gracefully. Both the properties are analogous to DP and are presented in App. A.4. Note. Any data sequence x = x1, , xn can be viewed as a two-tuple, {x}, σ , where {x} denotes the bag of values and σ Sn denotes the corresponding indices of the values which represents the order of the data. The ϵ-LDP protects the bag of data values, {x}, while dσ-privacy protects the order, σ. Thus, the two privacy guarantees cater to orthogonal parts of a data sequence (see Thm. 4.2 ). Also, α = (0), r = 0 (n) represents the standard LDP (shuffle DP) setting. 4.3 PRIVACY IMPLICATIONS The group assignment G delineates a threshold of learnability which determines the privacy/utility tradeoff as follows. Learning allowed (Analyst s goal). dσ-privacy can answer queries that are order agnostic within groups, such as aggregate statistics of a group. In Alice s case, the analyst can estimate the disease prevalence in her neighborhood. Learning disallowed (Adversary s goal). Adversaries cannot identify (noisy) values of individuals within any group. While they may learn the disease prevalence in Alice s neighborhood, they cannot determine the prevalence within her household and use that to recover her value xi. To make this precise, we first formalize the privacy implications of the dσ guarantee in the standard Bayesian framework, typically used for studying inferential privacy. Next, we formalize the privacy provided by the combination of LDP and dσ guarantees by way of a decision theoretic adversary. 2We could also have different thresholds, ri, for every data owner, DOi. Published as a conference paper at ICLR 2022 Bayesian Adversary. Consider a Bayesian adversary with any prior P on the joint distribution of noisy responses, Pr P[y], which models their beliefs on the correlation between the participants (such as the correlation between Alice and her households disease status). Their goal is to infer DOi s private input xi. As with early DP works (Dwork et al., 2006), we consider an informed adversary. Here, the adversary knows (1) the sequence (assignment) of noisy values outside Gi, y Gi, and (2) the (unordered) bag of noisy values in Gi, {y Gi}. dσ-privacy bounds the prior-posterior odds gap on xi for such as informed adversary as follows: Theorem 4.1. For a given group assignment G on a set of n data owners, if a shuffling mechanism A : Yn 7 Yn is (α, G)-dσprivate, then for each data owner DOi, i [n], max i [n] a,b X log Pr P[xi = a|z, {y Gi}, y Gi] Pr P[xi = b|z, {y Gi}, y Gi] log Pr P[xi = a|{y Gi}, y Gi] Pr P[xi = b|{y Gi}, y Gi] for a prior distribution P, where z = A(y) and y Gi is the noisy sequence for data owners outside Gi. See App A.5 for the proof and further discussion on the semantic meaning of the above guarantee. Decision Theoretic Adversary. Here, we analyse the privacy provided by the combination of LDP and dσ guarantees. Consider a decision theoretic adversary who aims to identify the noisy responses, {z I}, that originated from a specific subset of data owners, I Gi (such as the members of Alice s household). We denote the adversary by a (possibly randomized) function mapping from the output z sequence to a set of k indices, DAdv : Yn [n]k, where k = |I|. These k indices, H [n]k, represent the elements of z that DAdv believes originated from the data owners in I. DAdv wins if > k/2 of the chosen indices indeed originated from I, i.e, |σ(H) I| > k/2, where zi = yσ(i) and σ(H) = {σ(i) : i H}. DAdv loses if most of H did not originate from I, i.e., |σ(H) I| k/2. We choose the above adversary because this re-identification is a key step in carrying out inference attacks in failing to reliably re-identify the noisy values originating from I, one cannot make inferences on xi specifically from the subset I Gi. Theorem 4.2. For A(M(x)) = z where M( ) is ϵ-LDP and A( ) is α - dσprivate, we have Pr[DAdv loses] r k k e (2kϵ+α) Pr[DAdv wins] for any input subgroup I Gi, r = |Gi| and k < r/2. The adversary s ability to re-identify the {z I} values comes partially from the bag of values (quantified by ϵ) and partially from the order (quantified by α). We highlight two implications of this fact. When ϵ is small ( 1), an adversary s ability to re-identify the noisy values {z I} originating from I may very well be dominated by α. For instance, if ϵ = 0.2 and k = 5, the adversary s advantage is dominated by α for any α > 2. When using LDP alone (no shuffling), α = and the adversary can exactly recover which values came from Alice s household. As such, even a moderate α value (obtained via dσ-privacy) significantly reduces the ability to re-identify the values. When the loss is dominated by ϵ (2kϵ α), the above expression allows us to disentangle the source of privacy loss. In this regime, adversaries get most of their advantage from the bag of values released, not from the order of the release. That is, even if α = 0 (uniform random shuffling), participants still suffer a large risk of re-identification simply due to the noisy values being reported. Thus, no shuffling mechanism can prevent re-identification in this regime. Discussion. In spirit, DP does not guarantee protection against recovering DOi s private xi value. It guarantees that had a user not participated (or equivalently submitted a false value x i) the adversary would have about the same ability to learn their true value, potentially from the responses of other data owners. In other words, the choice to participate is unlikely to be responsible for the disclosure of xi. Similarly, dσ-privacy does not prevent disclosure of xi. By requiring indistinguishability of neighboring permutations, it guarantees that had the data owners of any group Gi completely swapped identities the adversary would have about the same ability to learn xi. So most likely, Alice s household is not uniquely responsible for a disclosure of her xi: had her household swapped identities with any of her neighbors, the adversary would probably draw the same conclusion on xi. Or, as detailed in Thm.4.2, an adversary cannot reliably resolve which {z} values originated from Alice s household, so they cannot draw conclusions based on her household s responses. In a nutshell, Inference attacks can recover a data owner DOi s private data xi from the responses of other data owners. The order of the data acts as the proxy for the data owner s identity which can aid an adversary in corralling the subset of other data owners who correlate with DOi (required to make a reliable inference of xi). Published as a conference paper at ICLR 2022 DP alleviates concerns that DOi s choice to share data (yi) will result in disclosure of xi, and dσ-privacy alleviates concerns that DOi s group s (Gi) choice to share their identity will result in disclosure of xi. 4.4 dσ-PRIVATE SHUFFLING MECHANISM Algorithm 1: dσ-private Shuffling Mech. Input: LDP sequence y = y1, , yn ; Public aux. info. t = t1, tn ; Dist. threshold r; Priv. param. α; Output: z - Shuffled output sequence; 1 G = Compute Group Assignment (t, r); 2 Construct graph G with a) vertices V = {1, 2, , n} b) edges E = {(i, j) : j Gi, Gi G} 3 root = arg maxi [n] |Gi|; 4 σ0 = BFS(G, root); 5 = Compute Sensitivity(σ0, G) 7 ˆσ Pθ,d(σ0) ; 8 σ = σ 1 0 ˆσ; 9 z = yσ (1), yσ (n) ; 10 Return z; We now describe our novel shuffling mechanism that can achieve dσ-privacy. In a nutshell, our mechanism samples a permutation from a suitable Mallows model and shuffles the data sequence accordingly. We can characterize the dσ-privacy guarantee of our mechanism in the same way as that of the DP guarantee of classic mechanisms (Dwork & Roth, 2014) with variance and sensitivity. Intuitively, a larger dispersion parameter θ R (Def. 3.2) reduces randomness over permutations, increasing utility and increasing (worsening) the privacy parameter α. The maximum value of θ for a given α guarantee depends on the sensitivity of the rank distance measure d( ) over all neighboring permutations NG. Formally, we define the sensitivity as (σ0 : d, G) = max (σ,σ ) NG |d(σ0σ, σ0) d(σ0σ , σ0)| , the maximum change in distance d( ) from the reference permutation σ0 for any pair of neighboring permutations (σ, σ ) NG permuted by σ0. The privacy parameter of the mechanism is then proportional to its sensitivity α = θ (σ0 : d, G). Given G and a reference permutation σ0, the sensitivity of a rank distance measure d( ) depends on the width, ωσ G, which measures how spread apart the members of any group of G are in σ0: ωσ Gi = max (j,k) Gi Gi σ 1(j) σ 1(k) , i [n]; ωσ G = max Gi G ωσ Gi For example, for σ = (1 3 7 8 6 4 5 2 9 10) and G1 = {1, 7, 8, 2, 5, 6}, ωσ G1 = |σ 1(1) σ 1(2)| = 7. The sensitivity is an increasing function of the width. For instance, for Kendall s τ distance dτ( ) we have (σ0 : dτ, G) = ωσ0 G (ωσ0 G + 1)/2. If a reference permutation clusters the members of each group closely together (low width), then the groups are more likely to permute within themselves. This has two benefits. First, for the same θ (θ is an indicator of utility as it determines the dispersion of the sampled permutation), a lower value of width gives lower α (better privacy). Second, if a group is likely to shuffle within itself, it will have better (η, δ)-preservation a novel utility metric, we propose, for a shuffling mechanism. Intuitively, a mechanism is (η, δ)-preserving w.r.t a subset of indices S [n] if at least η% of its indices are shuffled within itself with probability (1 δ). The rationale behind this metric is that it captures the utility of the learning allowed by dσ-privacy if S is equal to some group G G, high (η, δ)-preservation allows overall statistics of G to be captured since η% of the correct data values remain preserved. We present the formal discussion in App. A.7. Unfortunately, minimizing ωσ G is an NP-hard problem (Thm. A.3 in App. A.9). Instead, we estimate the optimal σ0 using the following heuristic3 approach based on a graph breadth first search. Algorithm Description. Alg. 1 above proceeds as follows. We first compute the group assignment, G, based on the public auxiliary information and desired threshold r following discussion in Sec. 4.2 (Step 1). Then we construct σ0 with a breadth first search (BFS) graph traversal. We translate G into an undirected graph (V, E), where the vertices are indices V = [n] and two indices i, j are connected by an edge if they are both in some group (Step 2). Next, σ0 is computed via a breadth first search traversal (Step 4) if the k-th node in the traversal is i, then σ0(k) = i. The rationale is that neighbors of i (members of Gi) would be traversed in close succession. Hence, a neighboring node j is likely to be traversed at some step h near k which means |σ 1 0 (i) σ 1 0 (j)| = |h k| would be small (resulting in low width). Additionally, starting from the node with the highest degree (Steps 3-4) which corresponds to the largest group in G (lower bound for ωσ G for any σ) helps to curtail the maximum width in σ0. See App. A.16 for evaluations of this heuristic s approximation. 3The heuristics only affect σ0 (and utility). Once σ0 is fixed, is computed exactly as discussed above. Published as a conference paper at ICLR 2022 This is followed by the computation of the dispersion parameter, θ, for our Mallows model (Steps 5-6). Next, we sample a permutation from the Mallows model (Step 7) ˆσ Pθ(σ : σ0) and we apply the inverse reference permutation to it, σ = σ 1 0 ˆσ to obtain the desired permutation for shuffling. Recall that ˆσ is (most likely) close to σ0, which is unrelated to the original order of the data. σ 1 0 therefore brings σ back to a shuffled version of the original sequence (identity permutation σI). Note that since Alg. 1 is publicly known, the adversary/analyst knows σ0. Hence, even in the absence of this step from our algorithm, the adversary/analyst could perform this anyway. Finally, we permute y according to σ and output the result z = ˆσ(y) (Steps 9-10). Theorem 4.3. Alg. 1 is (α, G)-dσ private where α = θ (σ0 : d, G). The proof is in App. A.11. Note that Alg. 1 provides the same level of privacy (α) for any two group assignment G, G as long as they have the same sensitivity, i.e, (σ0 : dτ, G) = (σ0 : dτ, G ). This leads to the following theorem which generalizes the privacy guarantee for any group assignment. Theorem 4.4. Alg. 1 satisfies (α , G )-dσprivacy for any group assignment G with α = α (σ0:d,G ) (σ0:d,G) (proof in App. A.12.) Note. Producing σ is completely data (y) independent. It only requires access to the public auxiliary information t. Hence, Steps 1 6 can be performed in a pre-processing phase and do not contribute to the actual running time. See App. A.10 for an illustration of Alg. 1 and runtime analysis. 5 EVALUATION (a) PUDF: Attack (b) Twitch: Attack (c) PUDF: Learnability (d) Twitch: Learnability Figure 4: Our scheme interpolates between standard LDP (orange line) and uniform shuffling (blue line) in both privacy and data learnability. All plots increase group size along x-axis (except (d)). (a) (b): The fraction of participants vulnerable to an inferential attack. (c) (d): The accuracy of a calibration model trained on z predicting the distribution of LDP outputs at any point t T , such as the distribution of medical insurance types used specifically in the Houston area (not possible when uniformly shuffling across Texas). The previous sections describe how our shuffling framework interpolates between standard LDP and uniform random shuffling. We now experimentally evaluate this asking the following two questions Q1. Does the Alg. 1 mechanism protect against realistic inference attacks? Q2. How well can Alg. 1 tune a model s ability to learn trends within the shuffled data, i.e., tune data learnability? We evaluate on four datasets. We are not aware of any prior work that provides comparable local inferential privacy. Hence, we baseline our mechanism with the two extremes: standard LDP and uniform random shuffling. For concreteness, we detail our procedure with the PUDF dataset (PUD) (license), which comprises n 29k psychiatric patient records from Texas. Each data owner s sensitive value xi is their medical payment method, which is reflective of socioeconomic class (such as medicaid or charity). Public auxiliary information t T is the hospital s geolocation. Such information is used for understanding how payment methods (and payment amounts) vary from town to town for insurances in practice (Eric Lopez, 2020). Uniform shuffling across Texas precludes such analyses. Standard LDP risks inference attacks, since patients attending hospitals in the same neighborhood have similar socioeconomic standing and use similar payment methods, allowing an adversary to correlate their noisy yi s. To trade these off, we apply Alg. 1 with d( ) being distance (km) between hospitals, α = 4 and Kendall s τ rank distance measure for permutations. Our inference attack predicts DOi s xi by taking a majority vote of the zj values of the 25 data owners within r of ti and who are most similar to DOi w.r.t some additional privileged auxiliary information tp j Tp. For PUDF, this includes the 25 data owners who attended hospitals that are within r km of DOi s hospital, and are most similar in payment amount tp j. Using an ϵ = 2.5 randomized response mechanism, we resample the LDP sequence y 50 times, and apply Alg. 1 s chosen permutation to Published as a conference paper at ICLR 2022 each, producing 50 z s. We then mount the majority vote attack on each xi for each z. If the attack on a given xi is successful across 90% of these LDP trials, we mark that data owner as vulnerable although they randomize with LDP, there is a 90% chance that a simple inference attack can recover their true value. We record the fraction of vulnerable data owners as ρ. We report 1-standard deviation error bars over 10 trials. Additionally, we evaluate data learnability how well the underlying statistics of the dataset are preserved across T . For PUDF, this means training a model on the shuffled z to predict the distribution of payment methods used near, for instance, ti = Houston for DOi. For this, we train a calibrated model, Cal : T Dx, on the shuffled outputs where Dx is the set of all distributions on the domain of sensitive attributes X. We implement Cal as a gradient boosted decision tree (GBDT) model (Friedman, 2001) calibrated with Platt scaling (Niculescu-Mizil & Caruana, 2005). For each location ti, we treat the empirical distribution of xi values within r as the ground truth distribution at ti, denoted by E(ti) Dx. Then, for each ti, we measure the Total Variation error between the predicted and ground truth distributions TV E(ti), Calr(ti) . We then report λ(r) the average TV error for distributions predicted at each ti t normalized by the TV error of naively guessing the uniform distribution at each ti. With standard LDP, this task can be performed relatively well at the risk of inference attacks. With uniformly shuffled data, it is impossible to make geographically localized predictions unless the distribution of payment methods is identical in every Texas locale. We additionally perform the above experiments on the following three datasets Twitch (Rozemberczki et al., 2019). This dataset, gathered from the Twitch social media platform, includes a graph of 9K edges (mutual friendships) along with node features. The user s history of explicit language is private X = {0, 1}. T is a user s mutual friendships, i.e. ti is the i th row of the graph s adjacency matrix. We do not have any TP here and select the 25 neighbors randomly. Syn. This is a synthetic dataset of size 20K which can be classified at three granularities 8-way, 4-way and 2-way (Fig. 1a shows a scaled down version of the dataset). The eight color labels are private X = [8]; the 2D-positions are public T = R2. For learnability, we measure the accuracy of 8-way, 4-way and 2-way GBDT models trained on z on an equal sized test set at each r. Adult (Dua & Graff, 2017). This dataset is derived from the 1994 Census and has 33K records. Whether DOi s annual income is 50k is considered private, X = { 50k, < 50k}. T = [17, 90] is age and TP is the individual s marriage status. Due to lack of space figures are in App. A.14.2. Experimental Results. Q1. Our formal guarantee on the inferential privacy loss (Thm. 4.1) is described w.r.t to a strong adversary (with access to {y Gi}, y Gi). Here, we test how well does our proposed scheme (Alg. 1) protect against inference attacks on real-world datasets without any such assumptions. Additionally, to make our attack more realistic, the adversary has access to extra privileged auxiliary information TP which is not used by Alg. 10. Fig. 4a 4b show that our scheme significantly reduces the attack efficacy. For instance, ρ is reduced by 2.7X at the attack distance threshold r for PUDF. Figure 5: Syn: Learnability Additionally, ρ for our scheme varies from that of LDP4 (minimum privacy) to uniform shuffle (maximum privacy) with increasing r (equivalently group size as in Fig. 4b) thereby spanning the entire privacy spectrum. As expected, ρ decreases with decreasing privacy parameter α (Fig. 8b). Q2. Fig.4c 4d show that λ varies from that of LDP (maximum learnability) to that of uniform shuffle (minimum learnability) with increasing r (equivalently, group size), thereby providing tunability. Interestingly, for Adult our scheme reduces ρ by 1.7X at the same λ as that of LDP for r = 1 (Fig. 8c). Fig. 5 shows that the distance threshold r defines the granularity at which the data can be classified. LDP allows 8-way classification while uniform shuffling allows none. The granularity of classification can be tuned by our scheme r8, r4 and r2 mark the thresholds for 8-way, 4-way and 2-way classifications, respectively. 6 CONCLUSION We have proposed a new privacy definition, dσ-privacy that casts new light on the inferential privacy benefits of shuffling and a novel shuffling mechanism to achieve the same. 4Our scheme gives lower ρ than LDP at r = 0 because the resulting groups are non-singletons. For instance, for PUDF, Gi includes all individuals with the same zipcode as DOi. Published as a conference paper at ICLR 2022 7 ETHIS STATEMENT It is the aim of this paper to formalize the enhanced privacy guarantees offered by shuffling, to provide intuition of what those formal guarantees semantically offer to data owners, and to demonstrate an algorithm + experiments which offer these guarantees while meeting analyst utility requirements. We feel that all of these aims as well as the public datasets used are ethical. 8 REPRODUCIBILITY STATEMENT The majority of this paper formalizes a novel perspective on the privacy guarantees achieved by shuffling (i.e. randomizing the order of the data as opposed to the values). Detailed proofs as well as intuitive discussions are provided in the Appendix. All datasets are public. A .zip file demonstrating code of each experiment has been uploaded as supplementary material. Hospital discharge data public use data file. https://www.dshs.state.tx.us/THCIC/ Hospitals/Download.shtm. Derangement. https://en.wikipedia.org/wiki/Derangement. Simonetti N. Vazacopoulos A. Balas, E. Job shop scheduling with setup times, deadlines and precedence constraints. J Sched, 11:253 262, 2008. URL https://doi.org/10.1007/ s10951-008-0067-7. Victor Balcer and Albert Cheu. Separating local shuffled differential privacy via histograms. In ITC, 2020. Borja Balle, James Bell, Adrià Gascón, and Kobbi Nissim. The privacy blanket of the shuffle model. In Alexandra Boldyreva and Daniele Micciancio (eds.), Advances in Cryptology CRYPTO 2019, pp. 638 667, Cham, 2019. Springer International Publishing. ISBN 978-3-030-26951-7. R. Bassily, A. Groce, J. Katz, and A. Smith. Coupled-worlds privacy: Exploiting adversarial uncertainty in statistical data privacy. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pp. 439 448, 2013. doi: 10.1109/FOCS.2013.54. Raghav Bhaskar, Abhishek Bhowmick, Vipul Goyal, Srivatsan Laxman, and Abhradeep Thakurta. Noiseless database privacy. In Dong Hoon Lee and Xiaoyun Wang (eds.), Advances in Cryptology ASIACRYPT 2011, pp. 215 232, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. ISBN 978-3-642-25385-0. Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP 17, pp. 441 459, New York, NY, USA, 2017a. Association for Computing Machinery. ISBN 9781450350853. doi: 10.1145/3132747.3132769. URL https://doi.org/10.1145/ 3132747.3132769. Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP 17, pp. 441 459, New York, NY, USA, 2017b. ACM. ISBN 978-1-4503-5085-3. doi: 10.1145/3132747.3132769. URL http://doi.acm.org/10.1145/3132747.3132769. Rui Chen, Benjamin C. Fung, Philip S. Yu, and Bipin C. Desai. Correlated network data publication via differential privacy. The VLDB Journal, 23(4):653 676, August 2014. ISSN 1066-8888. doi: 10. 1007/s00778-013-0344-8. URL https://doi.org/10.1007/s00778-013-0344-8. Albert Cheu, Adam Smith, Jonathan Ullman, David Zeber, and Maxim Zhilyaev. Distributed differential privacy via shuffling. In Yuval Ishai and Vincent Rijmen (eds.), Advances in Cryptology EUROCRYPT 2019, pp. 375 403, Cham, 2019. Springer International Publishing. ISBN 978-3030-17653-2. Published as a conference paper at ICLR 2022 Krzysztof M Choromanski, Tony Jebara, and Kui Tang. Adaptive anonymity via b-matching. Advances in Neural Information Processing Systems, 26:3192 3200, 2013. Tore Dalenius. Towards a methodology for statistical disclosure control. Statistik Tidskrift, 15: 429 444, 1977. Shaleen Deep and Paraschos Koutris. Ranked enumeration of conjunctive query results. In 24th International Conference on Database Theory (ICDT 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021. Shaleen Deep, Xiao Hu, and Paraschos Koutris. Enumeration algorithms for conjunctive queries with projection. In 24th International Conference on Database Theory (ICDT 2021). Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2021. Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems 30, pp. 3571 3580. Curran Associates, Inc., 2017. URL http://papers.nips.cc/paper/ 6948-collecting-telemetry-data-privately.pdf. Jean-Paul Doignon, Aleksandar Pekeˇc, and Michel Regenwetter. The repeated insertion model for rankings: Missing link between two subset choice models. Psychometrika, 69(1):33 54, March 2004. ISSN 1860-0980. doi: 10.1007/BF02295838. URL https://doi.org/10.1007/ BF02295838. Katerina Doka, Mingqiang Xue, Dimitrios Tsoumakos, and Panagiotis Karras. k-anonymization by freeform generalization. In Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security, pp. 519 530, 2015. Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. URL http://archive. ics.uci.edu/ml. Cynthia Dwork and Moni Naor. On the difficulties of disclosure prevention in statistical databases or the case for differential privacy. January 2010. Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., pp. 211 407, August 2014. ISSN 1551-305X. Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating Noise to Sensitivity in Private Data Analysis, volume 3876. March 2006. ISBN 978-3-54032731-8. URL https://www.microsoft.com/en-us/research/publication/ calibrating-noise-to-sensitivity-in-private-data-analysis/. Gary Claxton Eric Lopez. Comparing private payer and medicare payment rates for select inpatient hospital services. Kaiser Family Foundation, Jul 2020. Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacypreserving ordinal response. In CCS, 2014. Úlfar Erlingsson, Vitaly Feldman, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, and Abhradeep Thakurta. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 19, pp. 2468 2479, USA, 2019. Society for Industrial and Applied Mathematics. Alexandre Evfimievski, Johannes Gehrke, and Ramakrishnan Srikant. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the Twenty-second ACM SIGMOD-SIGACTSIGART Symposium on Principles of Database Systems, PODS 03, pp. 211 222, New York, NY, USA, 2003. ACM. ISBN 1-58113-670-6. doi: 10.1145/773153.773174. URL http: //doi.acm.org/10.1145/773153.773174. Giulia Fanti, Vasyl Pihur, and Úlfar Erlingsson. Building a rappor with the unknown: Privacypreserving learning of associations and data dictionaries, 2015. Published as a conference paper at ICLR 2022 Vitaly Feldman, Audra Mc Millan, and Kunal Talwar. Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling, 2020. Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pp. 1189 1232, 2001. Johannes Gehrke, Edward Lui, and Rafael Pass. Towards privacy for social networks: A zeroknowledge based definition of privacy. In Proceedings of the 8th Conference on Theory of Cryptography, TCC 11, pp. 432 449, Berlin, Heidelberg, 2011. Springer-Verlag. ISBN 9783642195709. Johannes Gehrke, Michael Hay, Edward Lui, and Rafael Pass. Crowd-blending privacy. In Reihaneh Safavi-Naini and Ran Canetti (eds.), Advances in Cryptology CRYPTO 2012, pp. 479 496, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. ISBN 978-3-642-32009-5. Joseph Geumlek and Kamalika Chaudhuri. Profile-based privacy for locally private computations. In IEEE International Symposium on Information Theory, ISIT 2019, Paris, France, July 7-12, 2019, pp. 537 541. IEEE, 2019. doi: 10.1109/ISIT.2019.8849549. URL https://doi.org/10. 1109/ISIT.2019.8849549. Arpita Ghosh and Robert Kleinberg. Inferential privacy guarantees for differentially private mechanisms. Co RR, abs/1603.01508, 2016. URL http://arxiv.org/abs/1603.01508. Andy Greenberg. Apple s differential privacy is about collecting your data but not your data. Wired, Jun 13 2016. Krzysztof Grining and Marek Klonowski. Towards extending noiseless privacy: Dependent data and more practical approach. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ASIA CCS 17, pp. 546 560, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450349444. doi: 10.1145/3052973.3052992. URL https://doi.org/10.1145/3052973.3052992. Xi He, Ashwin Machanavajjhala, and Bolin Ding. Blowfish privacy: Tuning privacy-utility tradeoffs using policies. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD 14, pp. 1447 1458, New York, NY, USA, 2014. Association for Computing Machinery. ISBN 9781450323765. doi: 10.1145/2588555.2588581. URL https://doi.org/10.1145/2588555.2588581. S. Kasiviswanathan and A. Smith. On the semantics of differential privacy: A bayesian formulation. J. Priv. Confidentiality, 6, 2014. S. P. Kasiviswanathan, H. K. Lee, K. Nissim, S. Raskhodnikova, and A. Smith. What can we learn privately? In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pp. 531 540, 2008. doi: 10.1109/FOCS.2008.27. Yusuke Kawamoto and Takao Murakami. Differentially private obfuscation mechanisms for hiding probability distributions. Co RR, abs/1812.00939, 2018. URL http://arxiv.org/abs/ 1812.00939. Daniel Kifer. Attacks on privacy and de Finetti s theorem. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, SIGMOD 09, pp. 127 138, Providence, Rhode Island, USA, June 2009. Association for Computing Machinery. ISBN 978-1-60558-551-2. doi: 10.1145/1559845.1559861. URL https://doi.org/10.1145/1559845.1559861. Daniel Kifer and Ashwin Machanavajjhala. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 11, pp. 193 204, New York, NY, USA, 2011. Association for Computing Machinery. ISBN 9781450306614. doi: 10.1145/1989323.1989345. URL https://doi.org/10.1145/1989323.1989345. Daniel Kifer and Ashwin Machanavajjhala. Pufferfish: A framework for mathematical privacy definitions. ACM Trans. Database Syst., 39(1), January 2014. ISSN 0362-5915. doi: 10.1145/ 2514689. URL https://doi.org/10.1145/2514689. Rachel La Corte. Supreme court: State employee birthdates are public record. https://apnews. com/article/c1ff652f271947b2884dfe1216a11bc2/, 2019. Published as a conference paper at ICLR 2022 Katrina Ligett, Charlotte Peale, and Omer Reingold. Bounded-Leakage Differential Privacy. In Aaron Roth (ed.), 1st Symposium on Foundations of Responsible Computing (FORC 2020), volume 156 of Leibniz International Proceedings in Informatics (LIPIcs), pp. 10:1 10:20, Dagstuhl, Germany, 2020. Schloss Dagstuhl Leibniz-Zentrum für Informatik. ISBN 978-3-95977-142-9. doi: 10. 4230/LIPIcs.FORC.2020.10. URL https://drops.dagstuhl.de/opus/volltexte/ 2020/12026. Changchang Liu, Supriyo Chakraborty, and Prateek Mittal. Dependence makes you vulnberable: Differential privacy under dependent tuples. In NDSS. The Internet Society, 2016. URL http: //dblp.uni-trier.de/db/conf/ndss/ndss2016.html#Liu MC16. Ashwin Machanavajjhala, Daniel Kifer, Johannes Gehrke, and Muthuramakrishnan Venkitasubramaniam. L-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data, 1(1):3 es, March 2007. ISSN 1556-4681. doi: 10.1145/1217299.1217302. URL https://doi.org/10.1145/1217299.1217302. C. L. MALLOWS. NON-NULL RANKING MODELS. I. Biometrika, 44(1-2):114 130, 06 1957. ISSN 0006-3444. doi: 10.1093/biomet/44.1-2.114. URL https://doi.org/10.1093/ biomet/44.1-2.114. Alexandru Niculescu-Mizil and Rich Caruana. Predicting good probabilities with supervised learning. In Proceedings of the 22nd international conference on Machine learning, pp. 625 632, 2005. Benedek Rozemberczki, Carl Allen, and Rik Sarkar. Multi-scale attributed node embedding. ar Xiv preprint ar Xiv:1909.13021, 2019. URL http://snap.stanford.edu/data/ twitch-social-networks.html. Shuang Song, Yizhen Wang, and Kamalika Chaudhuri. Pufferfish privacy mechanisms for correlated data. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD 17, pp. 1291 1306, New York, NY, USA, 2017. Association for Computing Machinery. ISBN 9781450341974. doi: 10.1145/3035918.3064025. URL https: //doi.org/10.1145/3035918.3064025. Latanya Sweeney. Achieving k-anonymity privacy protection using generalization and suppression. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5):571 588, October 2002. ISSN 0218-4885. doi: 10.1142/S021848850200165X. URL https://doi. org/10.1142/S021848850200165X. Tamir Tassa, Arnon Mazza, and Aristides Gionis. k-concealment: An alternative model of k-type anonymity. Trans. Data Priv., 5(1):189 222, 2012. M. C. Tschantz, S. Sen, and A. Datta. Sok: Differential privacy as a causal property. In 2020 IEEE Symposium on Security and Privacy (SP), pp. 354 371, 2020. doi: 10.1109/SP40000.2020.00012. Stanley L Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60 60, no. 309:63 69, 1965. Wai Kit Wong, Nikos Mamoulis, and David Wai Lok Cheung. Non-homogeneous generalization in privacy preserving data publishing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 747 758, 2010. Xiaokui Xiao and Yufei Tao. Anatomy: Privacy and Correlation Preserving Publication. January 2006. Mingqiang Xue, Panagiotis Karras, Chedy Raïssi, Jaideep Vaidya, and Kian-Lee Tan. Anonymizing set-valued data by nonreciprocal recoding. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1050 1058, 2012. Bin Yang, Issei Sato, and Hiroshi Nakagawa. Bayesian differential privacy on correlated data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 15, pp. 747 762, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450327589. doi: 10.1145/2723372.2747643. URL https://doi.org/10.1145/ 2723372.2747643. Published as a conference paper at ICLR 2022 Wanrong Zhang, Olga Ohrimenko, and Rachel Cummings. Attribute privacy: Framework and mechanisms, 2020. T. Zhu, P. Xiong, G. Li, and W. Zhou. Correlated differential privacy: Hiding information in non-iid data set. IEEE Transactions on Information Forensics and Security, 10(2):229 242, 2015. doi: 10.1109/TIFS.2014.2368363. A.1 BACKGROUND CNTD. A.2 LOCAL INFERENTIAL PRIVACY Local inferential privacy captures what information a Bayesian adversary Kifer & Machanavajjhala (2014), with some prior, can learn in the LDP setting. Specifically, it measures the largest possible ratio between the adversary s posterior and prior beliefs about an individual s data after observing a mechanism s output . Definition A.1. (Local Inferential Privacy Loss Kifer & Machanavajjhala (2014)) Let x = x1, , xn and let y = y1, , yn denote the input (private) and output sequences (observable to the adversary) in the LDP setting. Additionally, the adversary s auxiliary knowledge is modeled by a prior distribution P on x. The inferential privacy loss for the input sequence x is given by LP(x) = max i [n] a,b X log Pr P[y|xi = a] Pr P[y|xi = b] = max i [n] a,b X log Pr P[xi = a|y] Pr P[xi = b|y] log Pr P[xi = a] Pr P[xi = b] Bounding LP(x) would imply that the adversary s belief about the value of any xi does not change by much even after observing the output sequence y. This means that an informed adversary does not learn much about the individual i s private input upon observation of the entire private dataset y. Here we define two rank distance measures Definition A.2 (Kendall s τ Distance). For any two permutations, σ, π Sn, the Kendall s τ distance dτ(σ, π) counts the number of pairwise disagreements between σ and π, i.e., the number of item pairs that have a relative order in one permutation and a different order in the other. Formally, dτ(σ, π) = (i, j) : i < j, σ(i) > σ(j) π(i) < π(j) σ(i) < σ(j) π(i) > π(j) (5) For example, if σ = (1 2 3 4 5 6 7 8 9 10) and π = (1 2 3 6 5 4 7 8 9 10), then dτ(σ, π) = 3. Next, Hamming distance measure is defined as follows. Definition A.3 (Hamming Distance). For any two permutations, σ, π Sn, the Hamming distance d H(σ, π) counts the number of positions in which the two permutations disagree. Formally, d H(σ, π) = i [n] : σ(i) = π(i) Repeating the above example, if σ = (1 2 3 4 5 6 7 8 9 10) and π = (1 2 3 6 5 4 7 8 9 10), then d H(σ, π) = 2. A.3 dσ-PRIVACY AND THE DE FINETTI ATTACK We now show that a strict instance of dσprivacy is sufficient for thwarting any de Finetti attack Kifer (2009) on individuals. The de Finetti attack involves a Bayesian adversary, who, assuming some degree of correlation between data owners, attempts to recover the true permutation from the shuffled data. As written, the de Finetti attack assumes the sequence of sensitive attributes and side information (x1, t1), . . . , (xn, tn) are exchangeable: any ordering of them is equally likely. By the de Finetti theorem, this implies that they are i.i.d. conditioned on some latent measure θ. To balance privacy with utility, the x sequence is non-uniformly randomly shuffled w.r.t. the t sequence producing a shuffled sequence z, which the adversary observes. Conditioning on z the adversary updates their posterior on θ (i.e. posterior on a model predicting xi|ti), and thereby their posterior Published as a conference paper at ICLR 2022 predictive on the true x. The definition of privacy in Kifer (2009) holds that the adversary s posterior beliefs are close to their prior beliefs by some metric on distributions in X, δ( , ): δ Pr[xi], Pr[xi|z] α We now translate the de Finetti attack to our setting. First, to align notation with the rest of the paper we provide privacy to the sequence of LDP values y since we shuffle those instead of the x values as in Kifer (2009). We use max divergence (multiplicative bound on events used in DP ) for δ: Pr[yi O] eα Pr[yi O|z] Pr[yi O|z] eα Pr[yi O] which, for compactness, we write as Pr[yi O] α Pr[yi O|z] . (6) We restrict ourselves to shuffling mechanisms, where we only randomize the order of sensitive values. By learning the unordered values {y} alone, an adversary may have arbitrarily large updates to its posterior (e.g. if all values are identical), breaking the privacy requirement above. With this in mind, we assume the adversary already knows the unordered sequence of values {y} (which they will learn anyway), and has a prior on permutations σ allocating values from that sequence to individuals. We then generalize the de Finetti problem to an adversary with an arbitrary prior on the true permutation σ, and observes a randomize permutation σ from the shuffling mechanism. We require that the adversary s prior belief that σ(i) = j is close to their posterior belief for all i, j [n]: Pr[σ Σi,j] α Pr[σ Σi,j|σ ] i, j [n], σ Sn , (7) where Σi,j = {σ Sn : σ(i) = j}, the set of permutations assigning element j to DOi. Conditioning on any unordered sequence {y} with all unique values, the above condition is necessary to satisfy Eq. equation 6 for events of the form O = {yi = a}, since {yi = a} = {Σi,j} for some j [n]. For any {y} with repeat values, it is sufficient since Pr[yi = a] is the sum of probabilities of disjoint events of the form Pr[σ Σi,k] for various k [n] values. We now show that a strict instance of dσ-privacy satisfies Eq. equation 7. Let b G be any group assignment such that at least one Gi b G includes all data owners, Gi = {1, 2, . . . , n}. Property 1. A ( b G, α)-dσ-private shuffling mechanism σ A satisfies Pr[σ Σi,j] α Pr[σ Σi,j|σ ] for all i, j [n] and all priors on permutations Pr[σ]. Lemma 1. For any prior Pr[σ], Eq. equation 7 is equivalent to the condition P ˆσ Σi,j Pr[ˆσ] Pr[σ |ˆσ] P ˆσ Σi,j Pr[ˆσ] Pr[σ |ˆσ] α ˆσ Σi,j Pr[ˆσ] P ˆσ Σi,j Pr[ˆσ] (8) where the set Σi,j is the complement of Σi,j. Under grouping ˆG, every permutation σa Σi,j neighbors every permutation σb Σi,j, σa ˆG σb, for any i, j. By the definition of dσ-privacy, we have that for any observed permutation σ output by the mechanism: Pr[σ |σ = σa] α Pr[σ |σ = σb] σa Σi,j, σb Σi,j, σ Sn . This implies Eq. 8. Thus, ( b G, α)-dσ-privacy implies Eq. 8, which implies Eq. 7, thus proving the property. Using Lemma 1, we may also show that this strict instance of dσ-privacy is necessary to block all de Finetti attacks: Property 2. A ( b G, α)-dσ-private shuffling mechanism σ A is necessary to satisfy Pr[σ Σi,j] α Pr[σ Σi,j|σ ] for all i, j [n] and all priors on permutations Pr[σ]. Proof. If our mechanism A is not ( b G, α)-dσ-private, then for some pair of true (input) permutations σa = σb and some released permutation σ A, we have that Pr[σ |σb] eα Pr[σ |σa] . Published as a conference paper at ICLR 2022 Under ˆG, all permutations neighbor each other, so σa ˆG σb. Since σa = σb, then for some i, j [n], σa Σi,j and σb Σi,j: one of the two permutations assigns some j to some DOi and the other does not. Given this, we may construct a bimodal prior on the true σ that assigns half its probability mass to σa and the rest to σb, Pr[σa] = Pr[σb] = 1 Therefore, for released permutation σ , the RHS of Eq. 8 is 1, and the LHS is P ˆσ Σi,j Pr[ˆσ] Pr[σ |ˆσ] P ˆσ Σi,j Pr[ˆσ] Pr[σ |ˆσ] = 1/2 Pr[σ |σb] 1/2 Pr[σ |σa] eα , violating Eq. 8, thus violating Eq. 7, and failing to prevent de Finetti attacks against this bimodal prior. Ultimately, unless we satisfy dσ-privacy shuffling the entire dataset, there exists some prior on the true permutation Pr[σ] such that after observing the shuffled z permuted by σ , the adversary s posterior belief on one permutation is larger than their prior belief by a factor eα. If we suppose that the set of values {y} are all distinct, this means that for some a {y}, the adversary s belief that yi = a is signficantly larger after observing z than it was before. Now to prove Lemma 1: Pr[σ Σi,j] α Pr[σ Σi,j|σ ] Pr[σ Σi,j] α Pr[σ |σ Σi,j] Pr[σ Σi,j] P ˆσ Sn Pr[ˆσ] Pr[σ |ˆσ] X ˆσ Sn Pr[ˆσ] Pr[σ |ˆσ] α Pr[σ |σ Σi,j] ˆσ Sn Pr[ˆσ] Pr[σ |ˆσ] α Pr[σ Σi,j] 1 X ˆσ Σi,j Pr[ˆσ] Pr[σ |ˆσ] ˆσ Σi,j Pr[ˆσ] Pr[σ |ˆσ] + X ˆσ Σi,j Pr[ˆσ] Pr[σ |ˆσ] α Pr[σ Σi,j] 1 X ˆσ Σi,j Pr[ˆσ] Pr[σ |ˆσ] ˆσ Σi,j Pr[ˆσ] Pr[σ |ˆσ] α X ˆσ Σi,j Pr[ˆσ] Pr[σ |ˆσ] 1 Pr[σ Σi,j] 1 ˆσ Σi,j Pr[ˆσ] Pr[σ |ˆσ] P ˆσ Σi,j Pr[ˆσ] Pr[σ |ˆσ] α ˆσ Σi,j Pr[ˆσ] P ˆσ Σi,j Pr[ˆσ] As such, a strict instance of dσ-privacy can defend against any de Finetti attack (i.e. for any prior Pr[σ] on permutations), wherein at least one group Gi G includes all data owners. Furthermore, it is necessary. This makes sense. In order to defend against any prior, we need to significantly shuffle the entire dataset. Without a restriction of priors as in Pufferfish Kifer & Machanavajjhala (2014), the de Finetti attack (i.e. uninformed Bayesian adversaries) is an indelicate metric for evaluating the privacy of shuffling mechanisms: to achieve significant privacy, we must sacrifice all utility. This in many regards is reminiscent of the no free lunch for privacy theorem established in Kifer & Machanavajjhala (2011). As such, there is a need for more flexible privacy definitions for shuffling mechanisms. A.4 ADDITIONAL PROPERTIES OF dσ-PRIVACY Lemma 2 (Convexity). Let A1, . . . Ak : Yn 7 V be a collection of k (α, G)-dσprivate mechanisms for a given group assignment G on a set of n entities. Let A : Yn 7 V be a convex combination of these k mechanisms, where the probability of releasing the output of mechanism Ai is pi, and Pk i=1 pi = 1. A is also (α, G)-dσprivate w.r.t. G. Published as a conference paper at ICLR 2022 Proof. For any (σ, σ ) NG and y Y: Pr[A σ(y) O] = i=1 pi Pr[Ai σ(y) O] i=1 pi Pr[Ai σ (y) O] = Pr[A σ (y) O] Theorem A.1 (Post-processing). Let A : Yn 7 V be (α, G)-dσprivate for a given group assignment G on a set of n entities. Let f : V 7 V be an arbitrary randomized mapping. Then f A : Yn 7 V is also (α, G)-dσprivate. Proof. Let g : V V be a deterministic, measurable function. For any output event Z V , let W be its preimage: W = {v V|g(v) Z}. Then, for any (σ, σ ) NG, Pr h g A σ(y) Z i = Pr h A σ(y) W i eα Pr h A σ (y) W i = eα Pr h g A σ (y) Z i This concludes our proof because any randomized mapping can be decomposed into a convex combination of measurable, deterministic functions Dwork & Roth (2014), and as Lemma 2 shows, a convex combination of (α, G)-dσprivate mechanisms is also (α, G)-dσprivate. Theorem A.2 (Sequential Composition). If A1 and A2 are (α1, G)- and (α2, G)-dσprivate mechanisms, respectively, that use independent randomness, then releasing the outputs A1(y), A2(y) satisfies (α1 + α2, G)-dσprivacy. Proof. We have that A1 : Yn V and A1 : Yn V each satisfy dσ-privacy for different α values. Let A : Yn (V V ) output A1(y), A2(y) . Then, we may write any event Z (V V ) as Z Z , where Z V and Z V . We have for any (σ, σ ) NG, Pr A σ(y) Z = Pr A1 σ(y) , A2 σ(y) Z = Pr {A1 σ(y) Z } {A2 σ(y) Z } = Pr {A1 σ(y) Z } Pr {A2 σ(y) Z } eα1+α2Pr {A1 σ (y) Z } Pr {A2 σ (y) Z } = eα1+α2 Pr A σ (y) Z A.5 PROOF FOR THM. 4.1 Theorem 4.1 For a given group assignment G on a set of n data owners, if a shuffling mechanism A : Yn 7 Yn is (α, G)-dσprivate, then for each data owner DOi, i [n], max i [n] a,b X log Pr P[xi = a|z, {y Gi}, y Gi] Pr P[xi = b|z, {y Gi}, y Gi] log Pr P[xi = a|{y Gi}, y Gi] Pr P[xi = b|{y Gi}, y Gi] for a prior distribution P, where z = A(y) and y Gi is the noisy sequence for data owners outside Gi. Published as a conference paper at ICLR 2022 Proof. We prove the above by bounding the following equivalent expression for any i [n] and a, b X. Pr P[z|xi = a, {y Gi}, y Gi] Pr P[z|xi = b, {y Gi}, y Gi] R Pr P[y|xi = a, {y Gi}, y Gi] Pr A[z|y]dy R Pr P[y|xi = b, {y Gi}, y Gi] Pr A[z|y]dy σ Sr Pr P[σ(y Gi)|xi = a, y Gi] Pr A[z|σ(y Gi), y Gi] P σ Sr Pr P[σ(y Gi)|xi = b, y Gi] Pr A[z|σ(y Gi), y Gi] max {σ,σ Sr} Pr A[z|σ(y Gi), y Gi] Pr A[z|σ (y Gi), y Gi] max {σ,σ NGi} Pr A[z|σ(y)] Pr A[z|σ (y)] The second line simply marginalizes out the full noisy sequence y. The third line reduces this to a sum over permutations of of y Gi, where r = |Gi| and y Gi is any fixed permutation of values {y Gi}. This is possible since we are given the values outside the group, y Gi, and the unordered set of values inside the group, {y Gi}. Note that the permutations σ written here are possible permutations of the LDP input, not permutations output by the mechanism applied to the input as sometimes written in other parts of this document. The fourth line uses the fact that the numerator and denominator are both convex combinations of Pr A[z|σ(y Gi), y Gi] over all σ Sr. The fifth line uses the fact that for any y Gi, (σ(y Gi), y Gi) Gi (σ (y Gi), y Gi) . This allows a further upper bound over all neighboring sequences w.r.t. Gi, and thus over any permutation of y Gi, as long as it is the same in the numerator and denominator. Discussion The above Bayesian analysis measures what can be learned about DOi s xi from observing the private release z relative to some other known information (the conditioned information). Under dσ-privacy, we condition on the bag of LDP values in Alice s group {y Gi} as well as the sequence (order and value) of LDP values outside her group y Gi. This implies that releasing the shuffled sequence z cannot provide much more information about Alice s xi than would releasing the LDP values outside her neighborhood (her group) and the unordered bag of LDP values inside her neighborhood, regardless of the adversary s prior knowledge P. This is a communicable guarantee: if Alice feels comfortable with the data collection knowing that her entire neighborhood s responses will be uniformly shuffled together (including those of her household), then she ought to be comfortable with dσ-privacy. Now, we have to provide this guarantee to Bob, a neighbor of Alice, as well as Luis, a neighbor of Bob but not of Alice. Thus, Bob, Alice and Luis have distinct and overlapping groups (neighborhoods). Hence, the trivial solution of uniformly shuffling the noisy responses of every group separately does not work in this case. dσ-privacy, however, offers the above guarantee to each user (knowing that their entire neighborhood is nearly uniformly shuffled) while still maintaining utility (estimate disease prevalence within neighborhoods). Semantically, this is very powerful, since it implies that the noisy responses specific to one s household cannot be leveraged to infer one s disease state xi. A.6 PROOF OF THEOREM 4.2 Theorem 4.2 For A(M(x)) = z where M( ) is ϵ-LDP and A( ) is α - dσprivate, we have Pr[DAdv loses] r k k e (2kϵ+α) Pr[DAdv wins] for any input subgroup I Gi, r = |Gi| and k < r/2. Published as a conference paper at ICLR 2022 Proof. We first focus on deterministic adversaries and then expand to randomized adversaries afterwards using the fact that randomized adversaries are mixtures of deterministic ones. Our adversary DAdv is then defined by a deterministic decision function η : Yn [n]k. Upon observing z, η(z) selects k elements in z which it believes originated from I Gi. In the following, let Prz be the probability of events conditioned on the shuffled output sequence z, where randomness is over the ϵ-LDP mechanism M and the α-dσ-private shuffling mechanism A. 5 The adversary wins if it reidentifies > k 2 of the LDP values originating from I. Let H = η(z) be the indices of elements in z selected by η. Let W = {σ Sn : |σ(H) I| > k 2} be the set of permutations where the adversary wins and let L = {σ Sn : σ(H) I| k 2} be the set of permutations where the adversary loses. Pr z [η(z) wins] = Pr z [σ W] Pr z [η(z) loses] = Pr z [σ L] where σ is the shuffling permutation produced by A, z = σ(y) i.e. zi = yσ(i). Concretely, this is equivalent to DOi releasing DOσ(i) s LDP response. Since the permutation and LDP outputs are randomized, many subgroups of size k in Gi could have produced the LDP values (z H1, . . . , z Hk) and then been mapped to H by a permutation. Concretely, there is a reasonable probability that Alice s household output the LDP values of another k-member household in her neighborhood and they output her household s LDP values. In the worst case, this is e 2kϵ less likely than without swapping values, by group DP guarantees. Since both households are part of the same group Gi, the permutation that maps her household to elements H in the output is close in probability to that which maps the other household to elements H in the output. As such, we have in the worst case a e (2kϵ+α) reduction in probability of the other household having swapped LDP values with Alice s and permuting to subset H. The above provides intuition on how we could get the same output z many different ways, and how Alice s household could or could not contribute to elements H. It does not, however, explain why an adversary who is given output z has limited advantage in choosing a subset H such that they recover most of Alice s household s values. We formalize this fact as follows. We may rewrite the probabilities of winning or losing by marginalizing out all possible LDP sequences y. Conditioning on the output sequence z, the only possible LDP sequences y are permutations of z. Note that the probability of any sequence y is determined by the input x and the LDP mechanism M: Pr z [η(z) loses] = Pr z [σ W] σ W Pr A(x) = y = σ 1(z) Pr[σ|y]/ Pr[z] Note that Prz[σ|y] = Prz[σ] for the mallows mechanism, which chooses its permutations independently of y. Now consider when η(z) loses. By similar arguments as above: Pr z [η(z) loses] = Pr z [σ L] σ L Pr A(x) = y = σ 1(z) Pr[σ|y]/ Pr[z] The odds of losing versus winning is given by Prz[η(z) loses] Prz[η(z) wins] = P σ L Pr h A(x) = y = σ 1(z) i Pr[σ |y] P σ W Pr[A(x) = y = σ 1(z)] Pr[σ|y] We now show that for each σ in the denominator, we may construct m = r k k distinct permutations σ in the numerator that are close in probability to it. Lemma 3. For every σ W there exists a set of m = r k k permutations, E(σ), such that 5As an abuse of notation, we assume the output space of the LDP randomizers, Y, have outcomes with non-zero measure e.g. randomized response. The following analysis can be expanded to continuous outputs (with outcomes of zero measure) by simply replacing the output sequence z Yn with an output event Z Yn. Published as a conference paper at ICLR 2022 2. σ 1 Gi σ 1 3. E(σa) E(σb) = for any pair σa, σb W 4. Pr A(x) = y = σ 1(z) e2kϵ Pr h A(x) = y = σ 1(z) i for any x X n and any z Yn Proof. Given σ W, we construct E(σ) by first taking the inverse σ 1. Recall that, since σ W, we have that |σ 1(I) H| > k 2. (σ 1(i) = j could be interpreted as data owner i s LDP value will be output at position j). We then divide the remainder of the group Gi\I into m disjoint subsets of size k each, J1, J2, . . . , Jm. These represent the other distinct subsets of size k that Alice s household could swap LDP values with. We then produce m permutations, σ 1 1 , . . . , σ 1 m , by making σ 1 i (I) = σ 1(Ji) and σ 1 i (Ji) = σ 1(I) (preserving order within those subsets) and σ 1 = σ 1 everywhere else. On the first point, we know that every σ E(σ) is also in L. We know this because σ 1 i (I) = σ 1(Ji). Since σ W, we have that |σ 1(Ji) H| < k 2 since |σ 1(I) H| k 2 and I Ji = by definition. Thus, |σ 1 i (I) H| < k 2, so |σ i(H) I| < k 2 and σ i L. On the second point, we know that the inverse permutations are neighboring σ 1 Gi σ 1 simply by construction they only differ on elements in Gi. On the third point, we know that the sets E(σa) and E(σb) are distinct since we can map any permutation σ E(σa) uniquely back to σa for any σa W. We do so by taking its inverse σ 1, finding which subset Ji has majority elements from H i.e. |σ 1(Ji) H| > k 2. Swap elements back: σ 1(Ji) with σ 1(I). Invert back to σa. On the fourth point, we know that σ 1(z) and σ 1(z) differ on at most 2k indices. As such, by group DP guarantees, we know that their probabilities must be close to a factor of e 2kϵ regardless of z and x. Using the above Lemma we may bound the odds of losing vs. winning. Prz[η(z) loses] Prz[η(z) wins] = σ L Pr h A(x) = y = σ 1(z) i Pr[σ |y] P σ W Pr[A(x) = y = σ 1(z)] Pr[σ|y] σ E(σ) Pr h A(x) = y = σ 1(z) i Pr[σ |y] P σ W Pr[A(x) = y = σ 1(z)] Pr[σ|y] σ E(σ) Pr h A(x) = y = σ 1(z) i Pr[σ |y] Pr[A(x) = y = σ 1(z)] Pr[σ|y] k e (2kϵ+α) where the last line follows from the fourth point of the above Lemma (for the 2kϵ term) and the fact that the inverse permutations σ 1, σ 1 are neighboring (second point of the Lemma) so the probabilities of the mechanism to produce σ vs. σ to reach z from these neighboring permutations must be close by a factor of eα. Since the above holds for any z and x, the bound holds on average across all outcomes z, thus Pr[η loses] r k k e (2kϵ+α) Pr[η wins] for any deterministic adversary with decision function η. Finally, we may write any probabilistic adversary as mixture of decision functions. By convexity (same argument used in Lemma 2), the above bound still holds. As such, Published as a conference paper at ICLR 2022 Pr[DAdv loses] r k k e (2kϵ+α) Pr[DAdv wins] A.7 UTILITY OF SHUFFLING MECHANISM We now introduce a novel metric, (η, δ)-preservation, for assessing the utility of any shuffling mechanism. Let S [n] correspond to a set of indices in y. The metric is defined as follows. Definition A.4. ((η, δ)-preservation) A shuffling mechanism A : Yn 7 Yn is defined to be (η, δ)- preserving (η, δ [0, 1]) w.r.t to a given subset S [n], if Pr |Sσ S| η |S| 1 δ, σ Sn (9) where z = A(y) = σ(y) and Sσ = {σ(i)|i S}. For example, consider S = {1, 4, 5, 7, 8}. If A( ) permutes the output according to σ = (5 3 2 6 7 9 8 1 4 10), then Sσ = {5, 6, 7, 8, 1} which preserves 4 or 80% of its original indices. This means that for any data sequence y, at least η fraction of its data values corresponding to the subset S overlaps with that of shuffled sequence z with high probability (1 δ). Assuming, {y S} = {yi|i S} and {z S} = {zi|i S} = {yσ(i)|i S} denotes the set of data values corresponding to S in data sequences y and z respectively, we have Pr |{y S} {z S}| η |S| 1 δ, y. For example, let S be the set of individuals from Nevada. Then, for a shuffling mechanism that provides (η = 0.8, δ = 0.1)-preservation to S, with probability 0.9, 80% of the values that are reported to be from Nevada in z are genuinely from Nevada. The rationale behind this metric is that it captures the utility of the learning allowed by dσ-privacy if S is equal to some group G G, (η, δ) preservation allows overall statistics of G to be captured. Note that this utility metric is agnostic of both the data distribution and the analyst s query. Hence, it is a conservative analysis of utility which serves as a lower bound for learning from {z S}. We suspect that with the knowledge of the data distribution and/or the query, a tighter utility analysis is possible. A formal utility analysis of Alg. 10 is presented in App. A.13. Empirical evaluation of (η, δ) - preservation is presented in App. A.14. A.8 DISCUSSION ON PROPERTIES OF MALLOWS MECHANISM Property 3. For group assignment G, a mechanism A( ) that shuffles according to a permutation sampled from the Mallows model Pθ,d( ), satisfies (α, G)-dσprivacy where (σ0 : d, G) = max (σ,σ ) NG |d(σ0σ, σ0) d(σ0σ , σ0)| and α = θ (σ0 : d, G) We refer to (σ0 : d, G) as the sensitivity of the rank-distance measure d( ) Proof. Consider two permutations of the initial sequence y, σ1(y), σ2(y) that are neighboring w.r.t. some group Gi G, σ1 Gi σ2. Additionally consider any fixed released shuffled sequence z. Let Σ1, Σ2 be the set of permutations that turn σ1(y), σ2(y) into z, respectively: Σ1 = {σ Sn : σσ1(y) = z} Σ2 = {σ Sn : σσ2(y) = z} . In the case that {y} consists entirely of unique values, Σ1, Σ2 will each contain exactly one permutation, since only one permutation can map σi(y) to z. Lemma 4. For each permutation σ 1 Σ1 there exists a permutation in σ 2 Σ2 such that σ 1 Gi σ 2 . Proof follows from the fact that since only the elements j Gi differ in σ1(y) and σ2(y) only those elements need to differ to achieve the same output permutation. In other words, we may define σ 1, σ 2 at all inputs i / Gi identically, and then define all inputs i Gi differently as needed. As such, they are neighboring w.r.t. Gi. Published as a conference paper at ICLR 2022 Recalling that Alg. 1 applies σ 1 0 to the sampled permutation, we must sample σ0σ 1 (for some σ 1 Σ1) for the mechanism to produce z from σ1(y). Formally, since σ 1σ1(y) = z we must sample σ0σ 1 to get z since we are going to apply σ 1 0 to the sampled permutation. Pr A σ1(y) = z = Pθ,d σ0σ , σ Σ1 : σ0 Pr A σ2(y) = z = Pθ,d σ0σ , σ Σ2 : σ0 Taking the odds, we have Pθ,d σ0σ , σ Σ1 : σ0 Pθ,d σ0σ , σ Σ2 : σ0 = σ Σ1 PΘ,d(σ0σ : σ0) P σ Σ2 PΘ,d(σ0σ : σ0) σ Σ1 e θd(σ0σ ,σ0) P σ Σ2 e θd(σ0σ ,σ0) e θd(σ0σa,σ0) e θd(σ0σb,σ0) eθ|d(σ0σa,σ0) d(σ0σb,σ0)| where σa = arg max σ Σ1 e θd(σ0σ ,σ0) and σa = arg min σ Σ2 e θd(σ0σ ,σ0) . Therefore, setting α = θ , we achieve (α, G)-dσprivacy. Property 4. The sensitivity of a rank-distance is an increasing function of the width ωσ0 G . For instance, for Kendall s τ distance dτ( ), we have (σ0 : dτ, G) = ωσ0 G (ωσ0 G +1) 2 . To show the sensitivity of Kendall s τ, we make use of its triangle inequality. Proof. Recall from the proof of the previous property that the expression d(σ, σ0) = d σ0σ, σ0 , where d is the actual rank distance measure e.g. Kendall s τ. As such, we require that d(σ0σa, σ0) d(σ0σb, σ0) ωσ0 G (ωσ0 G + 1) 2 for any pair of permutations (σa, σb) NG. For any group Gi G, let Wi n represent the smallest contiguous subsequence of indices in σ0 that contains all of Gi. For instance, if σ0 = [2, 4, 6, 8, 1, 3, 5, 7] and Gi = {2, 6, 8}, then Wi = {2, 4, 6, 8}. Then the group width width is ωi = |Wi| 1 = 3. Now consider two permutations neighboring w.r.t. Gi, σa Gi σb, so only the elements of Gi are shuffled between them. We want to bound d(σ0σa, σ0) d(σ0σb, σ0) For this, we use a pair of triangle inequalities: d(σ0σa, σ0σb) d(σ0σa, σ0) d(σ0σb, σ0) & d(σ0σa, σ0σb) d(σ0σb, σ0) d(σ0σa, σ0) so, d(σ0σa, σ0) d(σ0σb, σ0) d(σ0σa, σ0σb) Since σ0σa and σ0σb only differ in the contiguous subset Wi, the largest number of discordant pairs between them is given by the maximum Kendall s τ distance between two permutations of size ωi + 1: |d(σ0σa, σ0σb)| ωi(ωi + 1) 2 Since ωσ0 G ωi for all Gi G, we have that (σ0 : d, G) ωσ0 G (ωσ0 G + 1) 2 Published as a conference paper at ICLR 2022 A.9 HARDNESS OF COMPUTING THE OPTIMUM REFERENCE PERMUTATION Theorem A.3. The problem of finding the optimum reference permutation, i.e., σ 0 = arg minσ Sn ωσ G is NP-hard. Proof. We start with the formal representation of the problem as follows. Optimum Reference Permutation Problem. Given n subsets G = {Gi 2[n], i [n]}, find the permutation σ 0 = arg minσ Sn ωσ G. Now, consider the following job-shop scheduling problem. Job Shop Scheduling. There is one job J with n operations oi, i [n] and n machines such that oi needs to run on machine Mi. Additionally, each machine has a sequence dependent processing time pi. Let S be the sequence till There are n subsets Si [n], each corresponding to a set of operations that need to occur in contiguous machines, else the processing times incur penalty as follows. Let pi denote the processing time for the machine running the i-th operation scheduled. Let Si be the prefix sequence with i schedulings. For instance, if the final scheduling is 1 3 4 5 9 8 10 6 7 2 then S4 = 1345. Additionally, let P j Si be the shortest subsequence such of Si such that it contains all the elements in Sj {Si}. For example for S1 = {3, 5, 7}, P 1 S4 = 345. pi = max i [n](|P j Si| |Sj {Si}|) (10) The objective is to find a scheduling for J such that it minimizes the makespan, i.e., the completion time of the job. Note that pn = maxi pi, hence the problem reduces to minimizing pn. Lemma 5. The aforementioned job shop scheduling problem with sequence-dependent processing time is NP-hard. Proof. Consider the following instantiation of the sequence-dependent job shop scheduling problem where the processing time is given by pi=pi 1 + wkl, p1 = 0 where Si[i 1] = k, Si[i] = l and wij, j Si represents some associated weight. This problem is equivalent to the travelling salesman problem (TSP) Balas (2008) and is therefore, NP-hard. Thus, our aforementioned job shop scheduling problem is also clearly NP-hard. Reduction: Let the n subsets Si correspond to the groups in G. Clearly, minimizing ωσ G minimizes pn. Hence, the optimal reference permutation gives the solution to the scheduling problem as well. A.10 ILLUSTRATION OF ALG. 1 We now provide a small-scale step-by-step example of how Alg. 1 operates. Fig. 6a is an example of a grouping G on a dataset of n = 8 elements. The group of DOi includes i and its neighbors. For instance, G8 = {8, 3, 5}. To build a reference permutation, Alg. 1 starts at the index with the largest group, i = 5 (highlighted in purple), with G5 = {5, 2, 3, 8, 4}. As shown in Figure 6b, the σ0 is then constructed by following a BFS traversal from i = 5. Each j G5 is visited, queuing up the neighbors of each j G5 that haven t been visited along the way, and so on. The algorithm completes after the entire graph has been visited. The goal is to produce a reference permutation in which the width of each group in the reference permutation ωi is small. In this case, the width of the largest group G5 is as small as it can be ω5 = 5 1 = 4. However, the width of G4 = {4, 5, 7} is the maximum possible since σ 1(5) = 1 and σ 1(7) = 8, so ω4 = 7. This is difficult to avoid when the maximum group size is large as compared to the full dataset size n. Realistically, we expect n to be significantly larger, leading to relatively smaller groups. With the reference permutation in place, we compute the sensitivity: (σ0 : d, G) = ω4(ω4 + 1) Published as a conference paper at ICLR 2022 (a) Group graph (b) BFS reference permutation σ0 Figure 6: Illustration of Alg. 1 Which lets us set θ = α 28 for any given α privacy value. To reiterate, lower θ results in more randomness in the mechanism. We then sample the permutation ˆσ = Pθ,d(σ0). Suppose ˆσ = [3 2 5 4 8 1 7 6] Then, the released z is given as z = σ = σ 1ˆσ(y) = [y1 y2 y5 y8 y3 y7 y6 y4] One can think of the above operation as follows. What was 5 in the reference permutation (σ0(1) = 5) is 3 in the sampled permutation (ˆσ(1) = 3). So, index 5 corresponding to DO5 now holds DO3 s noisy data y3. As such, we shuffle mostly between members of the same group, and minimally between groups. The runtime of this mechanism is dominated by the Repeated Insertion Model sampler Doignon et al. (2004), which takes O(n2) time. It is very possible that there are more efficient samplers available, but RIM is a standard and simple to implement for this first proposed mechanism. Additionally, the majority of this is spent computing sampling parameters which can be stored in advanced with O(n2) memory. Furthermore, sampling from a Mallows model with some reference permutation σ0 is equivalent to sampling from a Mallows model with the identity permutation and applying it to σ0. As such, permutations may be sampled in advanced, and the runtime is dominated by computation of σ0 which takes O(|V | + |E|) time (the number of vertices and edges in the graph). A future direction could be exploring even better heuristics for computing σ0. One possibility could be based on ranked enumeration of the permutations Deep & Koutris (2021); Deep et al. (2021). Published as a conference paper at ICLR 2022 A.11 PROOF OF THM. 4.3 Theorem 4.3 Alg. 1 is (α, G)-dσ private. Proof. The proof follows from Prop. 3. Having computed the sensitivity of the reference permutation σ0, , and set θ = α/ , we are guaranteed by Property 3 that shuffling according to the permutation ˆσ guarantees (α, G)-dσprivacy. A.12 PROOF OF THM. 4.4 Theorem 4.4 Alg. 1 satisfies (α , G )-dσprivacy for any group assignment G where α = α (σ0:d,G ) Proof. Recall from Property 3 that we satisfy (α, G) dσ-privacy by setting θ = α/ (σ0 : d, G). Given alternative grouping G with sensitivity (σ0 : d, G ), this same mechanism provides α = θ (σ0 : d, G ) = α/ (σ0 : d, G) (σ0 : d, G ) = α (σ0 : d, G ) (σ0 : d, G) A.13 FORMAL UTILITY ANALYSIS OF ALG. 1 Theorem A.4. For a given set S [n] and Hamming distance metric, d H( ), Alg. 1 is (η, δ)- preserving for δ = 1 ψ(θ,d H) Pn h=2k+1(e θ h ch) where k = (1 η) |S| and ch is the number of permutations with hamming distance h from the reference permutation that do not preserve η% of S and is given by max(ls, h/2 ) X " min(ls j,h 2j) X f(i, j) n ls j h 2j i f(h 2j i, j)! f(i, 0) =!i, f(0, q) = q! j! f(i q, q) ls = |S|, k = (1 η) ls, !n = n! Proof. Let ls = |S| denote the size of the set S and k = (1 η) l S denote the maximum number of correct values that can be missing from S. Now, for a given permutation σ Sn, let h denote its Hamming distance from the reference permutation σ0, i.e, h = d H(σ, σ0). This means that σ and σ0 differ in h indices. Now, h can be analysed in the the following two cases, Case I. h 2k + 1 For (1 η) fraction of indices to be removed from S, we need at least k + 1 indices from S to be replaced by k + 1 values from outside S. This is clearly not possible for h 2k + 1. Hence, here ch = 0. Published as a conference paper at ICLR 2022 Case II. h > 2k For the following analysis we consider we treat the permutations as strings (multi-digit numbers are treated as a single string character). Now, Let Sσ0 denote the non-contiguous substring of σ0 such that it consists of all the elements of S, i.e., |S| = l S (11) i [l S], Sσ0[i] S (12) Let Sσ denote the substring corresponding to the positions occupied by Sσ0 in σ. Formally, |Sσ| = l S (13) i [l S], Sσ0[i] = σ(σ 1 0 (Sσ0[i])) (14) For example, for σ0 = (1 2 3 5 4 7 8 10 9 6), σ = (1 3 2 7 8 5 4 6 10 9) and S = {2, 4, 5, 8}, we have Sσ0 = 2548 and Sσ = 3784 where h = d H(σ, σ0) = 9. Let {Sσ} denote the set of the elements of string Sσ. Let A be the set of characters in Sσ such that they do not belong to S, i.e, A = {Sσ[i]|Sσ[i] S, i [l S]}. Let B be the set of characters in Sσ that belong to S but differ from Sσ0 in position, i.e., B = {Sσ[i]|Sσ[i] S, Sσ[i] = Sσ0[i], i [l S]}. Additionally, let C = S {Sσ}. For instance, in the above example, A = {3, 7}, B = {4, 8}, C = {2, 5}. Now consider an initial arrangement of p + m distinct objects that are subdivided into two types p objects of Type A and m objects of Type B. Let f(p, m) denote the number of permutations of these p + m objects such that the m Type B objects can occupy any position but no object of Type A can occupy its original position. For example, for f(p, 0) this becomes the number of derangements der denoted as !p = p! 2 . Therefore, f(|B|, |A|) denotes the number of permutations of Sσ such that d H(Sσ0, Sσ) = |A| + |B|. This is because if elements of B are allowed to occupy their original position then this will reduce the Hamming distance. Now, let Sσ ( Sσ0) denote the substring left out after extracting from Sσ (Sσ0) from σ (σ0). For example, Sσ = 1256109 and Sσ0 = 1371096 in the above example. Let D be the set of elements outside of S and A that occupy different positions in Sσ and Sσ0 (thereby contributing to the hamming distance), i.e., D = { Sσ0[i]| Sσ0[i] S, Sσ0[i] = Sσ[i], i [n l S]}. For instance, in the above example D = {9, 6, 10}. Hence, h = d H(σ, σ0) = |A| + |B| + |C| + |D| and clearly f(|D|, |C|) represents the number of permutations of Sσ such that d H( Sσ, Sσ0) = |C| + |D|. Finally, we have max(ls, h/2 ) X | {z } # ways of selecting set C | {z } # ways of selecting set A min(ls j,h 2j) X | {z } # ways of selecting set B n ls j h 2j i | {z } # ways of selecting set D f(h 2j i, j) Now, for f(i, j) let E be the set of original positions of Type A that are occupied by Type B objects in the resulting permutation. Additionally, let F be the set of the original positions of Type B objects that are still occupied by some Type B object. Clearly, Type B objects can occupy these |E| + |F| = m in any way they like. However, the type A objects can only result in f(p q, q) permutations. Therefore, f(p, m) is given by the following recursive function f(p, 0) =!p f(0, m) = m! |{z} # ways of selecting set E | {z } # ways of selecting set F m! f(p q, q) Published as a conference paper at ICLR 2022 Thus, the total probability of failure is given by δ = 1 ψ(θ, d H) h=2k+2 (e θ h ch) (15) A.14 ADDITIONAL EXPERIMENTAL DETAILS A.14.1 EVALUATION OF (η, δ)-PRESERVATION (a) Variation with α (b) Variation with ω; α = 3 (c) Variation with l S; α = 3 Figure 7: (η, δ)-Preservation Analysis In this section, we evaluate the characteristics of the (η, δ)-preservation for Kendall s τ distance dτ( , ). Each sweep of Fig. 7 fixes δ = 0.01, and observes η. We consider a dataset of size n = 10K and a subset S of size l S corresponding to the indices in the middle of the reference permutation σ0 (the actual value of the reference permutation is not significant for measuring preservation). For the rest of the discussion, we denote the width of a permutation by ω for notational brevity. For each value of the independent axis, we generate 50 trials of the permutation σ from a Mallows model with the appropriate θ (given the ω and α parameters). We then report the largest η (fraction of subset preserved) that at least 99% of trials satisfy. In Fig. 7a, we see that preservation is highest for higher α and increases gradually with declining width ω and increasing subset size ls. Fig. 7b demonstrates that preservation declines with increasing width. increases quadratically with width ω for dτ, resulting in declining θ and increasing randomness. We also see that larger subset sizes result in a more gradual decline in η. This is due to the fact that the worst-case preservation (uniform random shuffling) is better for larger subsets. i.e. we cannot do worse than 80% preservation for a subset that is 80% of indices. Finally, Fig. 7c demonstrates how preservation grows rapidly with increasing subset size. For large widths, we are nearly uniformly randomly permuting, so preservation will equal the size of the subset relative to the dataset size. For smaller widths, we see that preservation offers diminishing returns as we grow subset size past some critical ls. For ω = 30, we see that subset sizes much larger than a quarter of the dataset gain little in preservation. A.14.2 ADULT DATASET A.15 ADDITIONAL RELATED WORK In this section, we discuss the relevant existing work. The anonymization of noisy responses to improve differential privacy was first proposed by Bittau et al. Bittau et al. (2017a) who proposed a principled system architecture for shuffling. This model was formally studied later in Erlingsson et al. (2019); Cheu et al. (2019). Erlingsson et al. Erlingsson et al. (2019) showed that for arbitrary ϵ-LDP randomizers, random shuffling results in privacy amplification. Cheu et al. Cheu et al. (2019) formally defined the shuffle DP model and analyzed the privacy guarantees of the binary randomized response in this model. The shuffle DP model Published as a conference paper at ICLR 2022 (a) Adult: Attack (b) Adult: Attack (α) (c) Adult: Learnability Figure 8: Adult dataset experiments differs from our approach in two ways. First, it focuses completely on the DP guarantee. The privacy amplification is manifested in the from of a lower ϵ (roughly a factor of n) when viewed in an alternative DP model known as the central DP model. Erlingsson et al. (2019); Cheu et al. (2019); Balle et al. (2019); Feldman et al. (2020); Bittau et al. (2017a); Balcer & Cheu (2020). However, our result caters to local inferential privacy. Second, the shuffle model involves an uniform random shuffling of the entire dataset. In contrast, our approach the granularity at which the data is shuffled is tunable which delineates a threshold for the learnability of the data. A steady line of work has sudied the inferential privacy setting Kasiviswanathan & Smith (2014); Kifer & Machanavajjhala (2011); Ghosh & Kleinberg (2016); Dalenius (1977); Dwork & Naor (2010); Tschantz et al. (2020). Kifer et al. Kifer & Machanavajjhala (2011) formally studied privacy degradation in the face of data correlations and later proposed a privacy framework, Pufferfish Kifer & Machanavajjhala (2014); Song et al. (2017); He et al. (2014), for analyzing inferential privacy. Subsequently, several other privacy definitions have also been proposed for the inferential privacy setting Liu et al. (2016); Yang et al. (2015); Chen et al. (2014); Zhu et al. (2015); Bassily et al. (2013). For instance, Gehrke et al. proposed a zero-knowledge privacy Gehrke et al. (2011; 2012) which is based on simulation semantics. Bhaskar et al. proposed noiseless privacy Bhaskar et al. (2011); Grining & Klonowski (2017) by restricting the set of prior distributions that the adversary may have access to. A recent work by Zhang et al. proposes attribute privacy Zhang et al. (2020) which focuses on the sensitive properties of a whole dataset. In another recent work, Ligett et al. study a relaxation of DP that accounts for mechanisms that leak some additional, bounded information about the database Ligett et al. (2020). Some early work in local inferential privacy include profile-based privacy Geumlek & Chaudhuri (2019) by Gehmke et al. where the problem setting comes with a graph of data generating distributions, whose edges encode sensitive pairs of distributions that should be made indistinguishable. In another work by Kawamoto et al., the authors propose distribution privacy Kawamoto & Murakami (2018) local differential privacy for probability distributions. The major difference between our work and prior research is that we provide local inferential privacy through a new angle data shuffling. Finally, older works such as k-anonymity Sweeney (2002), l-diversity Machanavajjhala et al. (2007), and Anatomy Xiao & Tao (2006) and other Wong et al. (2010); Tassa et al. (2012); Xue et al. (2012); Choromanski et al. (2013); Doka et al. (2015) have studied the privacy risk of non-sensitive auxiliary information, or quasi identifiers (QIs). In practice, these works focus on the setting of dataset release, where we focus on dataset collection. As such, QIs can be manipulated and controlled, whereas we place no restriction on the amount or type of auxiliary information accessible to the adversary, nor do we control it. Additionally, our work offers each individual formal inferential guarantees against informed adversaries, whereas those works do not. We emphasize this last point since formalized guarantees are critical for providing meaningful privacy definitions. As established by Kifer and Lin in An Axiomatic View of Statistical Privacy and Utility (2012), privacy definitions ought to at least satisfy post-processing and convexity properties which our formal definition does. Published as a conference paper at ICLR 2022 A.16 EVALUATION OF HEURISTIC Figure 9: Comparison of our heuristic s performance with that of an optimal reference permutation σ 0. An optimal σ 0 is generated with every group having size w. A graph is generated from this optimal σ 0 from which our heuristic (blue) attempts to reconstruct the optimal permutation. For baselining, the performance of a random σ0 selection is plotted (orange). We observe that at worst, our heuristic picks a reference permutation with width 2.5 that of the optimal reference permutation (green). See Section 4.4 for definition of terms. Algorithm 10 is designed to find a reference permutation σ0 with low width ωσ G w.r.t. the given grouping G. A low width is desirable, since it leads to low sensitivity (σ0 : d, G), which in turn leads to higher dispersion parameter θ = α/ , and thus less randomness over permutations (higher utility). Theorem A.3 proves that computing the optimal reference permutation (minimum width) is NP-hard. As such, we propose a BFS-based heuristic. Comparison with optimal reference permutation To demonstrate the value of the heuristic used in Alg. 10, we provide two evaluations of its performance. For our first evaluation, we compare the performance of our heuristic BFS reference permutation selection (σ0) with that of the optimal reference permutation and that of a random reference permutation. As identified by Theorem A.3, finding the optimal reference permutation for a given grouping G is NP-hard. For these experiments, we first create an optimal reference permutation, where each group Gi G is equally sized w and maximally compact. The optimal width, ωσ G, is then min(n, w). We then generate a graph from this optimal reference permutation. Finally, we run the BFS reference permutation computation described in Alg. 10 attempting to approximate the optimal σ 0, and compute its width. To compare with a naive approach, we also plot the performance of a randomly chosen reference permutation. We expect the maximum width across groups ωσ G to be large for this technique. If one of the n groups has a single entry low (near 0) in σ0 and a single entry high (near n) in σ0, the width will be near n. The random baseline is averaged over 10 trials with a 1 standard deviation envelope plotted (but difficult to see, since the variance is low). Figure 9 depicts our findings. Each plot has a different group size w, listed at the top, used in the optimal reference permutation. We find that the random baseline (orange) consistently chooses a reference permutation such that ωσ G is near n, as expected. Our method (blue), on the other hand, closely tracks the optimal solution (green). We find that in the worst case, our algorithm s solution has a width 2.5 larger than the optimal. Note that for r = 0 (upper left), all methods trivially Published as a conference paper at ICLR 2022 have a width of one, since the corresponding graph has no edges. While there may be room for improvement, we find this to be sufficient for the present work. Figure 10: example of our heuristic s performance on randomly generated graphs. As r increases, so does the connectivity of the random graphs and the average group size (green). As shown by Theorem A.3, computing the optimal ωσ G is NP-hard. The average group size (green) in G is a loose lower bound on the optimal ωσ G. The performance of a random σ0 assignment (orange) is also plotted for reference. Our heuristic BFS algorithm (blue) consistently outperforms the random baseline. Performance on randomly generated graphs For our second evaluation, we observe how well our BFS heuristic (in Algorithm 10) performs on randomly generated graphs. Here, we sample n points uniformly on the unit interval. We then say that the ith point s group, Gi, consists of all other points within r of it. As r increases, so does the groups size. Since computing the optimal reference permutation is NP-hard (Theorem A.3), we do not show the optimal width. Instead, we show a loose lower bound of the optimal width (green) by plotting the average group size for a given r (recall that the width is greater than or equal to the largest group size, so we expect this to be a loose lower bound, solely for reference). For comparison, we evaluate the performance of a random σ0 choice as well. For both of these methods, we run 10 trials of generating a random graph (and picking a random σ0) at each value of n and plot the mean along with a 1 standard deviation envelope, which is difficult to see due to low variance. Figure 10 depicts our findings. We find that across values of n and r our heuristic (blue) significantly outperforms the random baseline (orange). Additionally, we observe the trends we expect. For a low r values, our heuristic BFS algorithm chooses a σ0 with width close to the lower bound (green) of the optimal width ωσ G. As r increases, the graph become significantly more connected. Both the lower bound and our heuristic move closer to the width of the random baseline. Note that for r = 0 (upper left), all methods trivially have a width of one, since the corresponding graph has no edges. Ultimately, these findings indicate that our heuristic for computing σ0 significantly outperforms a naive random choice, and follows the same trend as the lower bound of the optimal.