# inflow_outflow_and_reciprocity_in_machine_learning__42ded312.pdf

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

Mukund Sundararajan 1 Walid Krichene 2

Data is pooled across entities (individuals or enterprises) to create machine learning models, and sometimes, the entities that contribute the data also beneﬁt from the models. Consider for instance a recommender system (e.g. Spotify, Instagram or You Tube), a health care app that predicts the risk for some disease, or a service built by pooling data across enterprises. In this work we propose a framework to study this value exchange, i.e., we model and measure contributions (outﬂows), beneﬁts (inﬂows) and the balance between contributions and beneﬁts (the degree of reciprocity). We show theoretically, and via experiments that under certain distributional assumptions, some classes of models are approximately reciprocal. These results only scratch the surface; we conclude with several open directions.

1. Introduction

Machine learning (henceforth ML) depends on training data, and as machine learning and the artiﬁcial intelligence powered by it have increased in importance, there is an interest in understanding the role of data. This data may be collected from raters who are compensated for labeling the data, or from end-users of products powered by ML, or from enterprises that own and license data.

Different approaches have been considered to study the role of data. Techniques such as (Jia et al., 2019; Ghorbani & Zou, 2019; Hara et al., 2019; Pruthi et al., 2020; Hoaglin & Welsch, 1978; Koh & Liang, 2017a; Yeh et al., 2018) measure the contribution of data to model quality. Data has also been viewed through the lens of privacy (for instance, (Dwork & Roth, 2014)), which asks whether an individual s contributions reveal something about the individual.

1Google 2Google Research. Correspondence to: Mukund Sundararajan <mukunds@google.com>.

Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s).

However, notice that both of these views are one-sided: they are concerned with the contributions of the individual to the system, but do not take into consideration how much the individual beneﬁts from the system. In this work, we study contributions, beneﬁts, and the balance between them.

1.1. Running Examples

In some applications, the agents that contribute data to a system also beneﬁt from it. We would like to study this inﬂow of value to the user, and the resulting exchange of value stemming from contributions and beneﬁts.

Recommender Systems: Popular recommender systems such as Spotify, Instagram or You Tube recommend items (e.g. music, videos, posts) to their users. Past interactions of an individual with items are used to learn not only the preferences of the individual, but also the characteristics of the item, that are then used to recommend it to other individuals. This is the premise of collaborative ﬁltering (Koren & Bell, 2015). Indeed, an individual s data helps others and vice-versa. Furthermore, this applies to any system that leverages user behavior (such as clicks on content, or the length of engagement with the content) to determine recommendations or rankings.

Healthcare: Alternatively, consider initiatives that collect health data from several individuals and use this to build models that predict sleep, or the risk of disease (Perez et al., 2019; Gulshan et al., 2016; Wang et al., 2016; Brajer et al., 2020; Rajkomar et al., 2018). Such models beneﬁt the individual contributing the data, besides their data beneﬁting other individuals that use the model s predictions.

Federated Learning: Federated Learning (FL) is a distributed training techniques where individuals (or larger entities such as enterprises) build shared models in a decentralized way, without the data moving to a central location. This is important in domains where data is scarce and pooling data from different entities can lead to signiﬁcant quality improvements. One problem in federated learning is the incentives for entities to contribute data. This is particularly true in the cross-silo FL setting, in which larger entities (such as companies) contribute their data to train a shared model. As the survey (Kairouz et al., 2021, Section 2.2) discusses, Clients might worry that contributing their data to training federated learning models will beneﬁt their com-

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

petitors, who do not contribute as much but receive the same ﬁnal model nonetheless.

1.2. Inﬂow, Outﬂow and Reciprocity

Generally, an individual1 contributes data to an ML based system, and this results in a beneﬁt to other users of the system this is the individual s outﬂow and they, in turn, beneﬁt from data contributed by others this is the individual s inﬂow.

There are various questions one can ask about these inﬂows and outﬂows. Are they large or small? Are they positive or negative? (In the recommender system example, an outﬂow is negative if one individual s data hurts the recommendation quality of another, perhaps because the two have opposing tastes.) Do outﬂows and inﬂows vary greatly across individuals?

While we touch upon these in our analyses, our main focus is on whether the inﬂows and outﬂows are balanced individual by individual; i.e., are they reciprocal.

Reciprocity is a well-studied concept in sociology (Gouldner, 1960), in economics, (e.g. Fehr & G achter (2000)) in matching markets such as kidney or student exchanges (Gill et al., 2017; Guibert & Ray on, 2021; European Commission et al., 2020). In all of these cases, reciprocity enables systems to function based on (approximately) fair trades, without the need for money to change hands.

Indeed, we would like to understand when the value exchange from contributing data to a system is fair in this sense. We say that a system is (approximately) reciprocal if for most individuals, the individual s inﬂow matches their outﬂow. This is formalized in Section 2.

1.3. Actionability

While the focus of this work is on measuring inﬂows, outﬂows and reciprocity, we brieﬂy discuss some actions that may stem from such an analysis.

In the recommender system, federated learning and healthcare examples, individuals (or enterprises) with large outﬂows and small inﬂows could perhaps be compensated for their contributions, and conversely, individuals with large inﬂows and small outﬂows could pay. This is particularly true for systems in which a user can opt out from data collection entirely, but still use the system, potentially resulting in large unfairness (such individuals would have, by deﬁnition, no outﬂow, but still beneﬁt from other users data. Measuring their inﬂow quantiﬁes this imbalance).

1By individual, we just mean an agent that owns some data. This could also be an enterprise as in the Federated Learning example.

Payments need not be in purely monetary terms. In a recommender system, users with contributions that exceed beneﬁts could be given preferential access to new content, or fewer ads.

Another alternative is to design training algorithms that enforce a certain level of reciprocity, see discussion in Section 6.

1.4. Reciprocity as a Guardrail

Reciprocity should not be thought of as an objective, it is not something we should optimize for in isolation. It is possibly best seen as a guardrail, similar to privacy (Dwork & Roth, 2014) and fairness (Mehrabi et al., 2021). To elaborate on the analogy, privacy (fairness) is often used as a constraint, and the goal is to obtain the best model quality under a certain privacy budget (fairness constraint). Similarly, reciprocity can be used as (one of possibly many) constraints. We are not arguing that all ML should be reciprocal, rather that the inﬂows, outﬂows and their imbalance are worth measuring.

1.5. Our Contributions

Our primary contribution is to initiate the study of reciprocity and propose a measurement approach (Section 2). We build on top of previously proposed techniques (e.g. Pruthi et al. (2020); Hoaglin & Welsch (1978); Koh & Liang (2017a); Yeh et al. (2018)) that quantify the inﬂuence of individual training examples on individual predictions, and use these to measure contributions (outﬂow), beneﬁts (inﬂow) and reciprocity.

Our main theoretical result (Theorem 3.2) states that models trained using Stochastic Gradient Descent, a popular training algorithm for neural networks, are strongly reciprocal under an assumption over the data distribution (Assumption 3.1). One key observation is that the inﬂuence measure satisﬁes certain symmetry properties.

We also demonstrate how to compute inﬂows and outﬂows efﬁciently, avoiding a naive quadratic complexity over the number of data points, which would make measurement infeasible (see Section 4).

Finally, we perform experiments on one recommendation and two healthcare data sets (Section 5). We observe that reciprocity is still satisﬁed to a large extent, even though some of the assumptions from Theorem 3.2 are violated.

2. Modeling Reciprocity

2.1. Setup and Notation

Let Z be a training data set of labelled examples. The model is learned using this data and applied to an inference set Z .

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

For an individual u, let Zu be the set of training data examples that belong to u, similarly let Z u be the set of inference examples that belong to u.

Every example z = (x, y) consists of input features x and a label y (the prediction target or the response variable).

2.2. Measuring Inﬂow, Outﬂow, and Reciprocity

We propose a measure of reciprocity by building on previously proposed techniques that quantify the inﬂuence of training data points on predictions (Pruthi et al., 2020; Hoaglin & Welsch, 1978; Koh & Liang, 2017a; Yeh et al., 2018). An inﬂuence technique measures the inﬂuence of a single training example z on a single inference example z , and will be denoted Influence(z, z ). We will consider two standard examples: Marginal inﬂuence, which generates this measure by deleting training examples, and Trac In inﬂuence, which tracks the effect of a training example on a prediction via the parameter changes that occur during training. We will describe these inﬂuence techniques in more detail in Section 2.3.

Consider an individual u (the protagonist). We aggregate Influence(z, z ) of the protagonist s training and inference examples as follows.

Inﬂow is the inﬂuence of other individuals (v = u) data on the predictions of u, i.e.:

z Z u Influence(z, z ). (1)

Outﬂow is the inﬂuence of the protagonist u s training data on other individuals:

z Zu Influence(z, z ). (2)

There is also the inﬂuence of the protagonist on their own predictions, P

z Z u Influence(z, z ). But since reciprocity is only concerned with ﬂows between the protagonist and other individuals, this self-inﬂuence does not play a role in reciprocity.

We now deﬁne reciprocity for an individual, and for a population of individuals. Deﬁnition 2.1. A machine learning model is α-reciprocal for individual u if the ratio of outﬂow to inﬂow Iu/Ou is in the range [α, 1/α] for some α [0, 1]. If the signs of Iu and Ou do not match, we say that the model is 0-reciprocal for individual u.

A model is (p, α)-reciprocal for a population of individuals if it is α-reciprocal for p fraction of individuals.

Thus, a model is at best (1, 1) reciprocal and at worse (0, 0) reciprocal. Depending on the measure of inﬂuence that

we use, we will say (p, α)-Trac In-reciprocal or (p, α)- Marginal-reciprocal.

Another measure one can study is the correlation (e.g. Pearson or Spearman correlation) between the inﬂow and the outﬂow across individuals. One important difference between reciprocity and Pearson correlation is that the latter is sensitive to the magnitude of the ﬂows, individuals with large ﬂows dominate the correlation measure; while reciprocity is less sensitive to magnitudes since it is a statement about the ratios of inﬂows and outﬂows. Spearman correlation measures the correlation of ranks and is not sensitive to magnitudes, but notice that a high correlation of ranks does not imply that the inﬂows and outﬂows are balanced, which is what our measure seeks to capture.

Remark 2.2 (Interpreting Reciprocity). Reciprocity does not require that the outﬂows (or inﬂows) be equal or similar across individuals. Indeed, in a recommender system, a frequent user is likely to have larger outﬂows (and inﬂows) in comparison to an occasional user. Reciprocity only requires that the outﬂows and inﬂows be balanced individual-byindividual, i.e., an individual who contributes a lot, beneﬁts a lot, and one who contributes a little beneﬁts a little.2

Reciprocity does not even require that inﬂows and outﬂows be positive. Indeed, in a recommender system, the data of an individual with atypical tastes may hurt the recommendations to other individuals. Reciprocity only demands that if inﬂow is negative, outﬂow should be equally negative.

In general, reciprocity can be affected by the choice of model class, and by the data distribution. We give some examples to illustrate. Consider the following example.

Example 2.3 (k-Nearest Neighbors). In k NN, there is a distance function on points in feature space, and the prediction of the algorithm is the mean (for regression) or majority (for classiﬁcation) of the k nearest neighbors, as per the distance function, of the prediction point. Suppose there is an outlier individual (the protagonist) with outlier data points in the feature space. The protagonist will not have any inﬂuence on other individuals because the protagonist s training examples will not appear in the top k nearest list for prediction points of other individuals. However, the protagonist s predictions will be inﬂuenced by the training data of other individuals.

The distribution of interactions matters. If all of an individual s interactions with the system are late in the system s lifetime, this individual will beneﬁt but not contribute. In the analysis, we will make a stationarity assumption to control for this effect, see Assumption 3.1.

2Contrast this with differential privacy (Dwork & Roth, 2014), where the goal is asymmetric: one seeks to limit the impact of a user s data on the model (for example via gradient clipping (Abadi et al., 2016)) regardless of the user s beneﬁt from the model.

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

2.3. Inﬂuence Techniques

As discussed earlier, reciprocity is parameterized by a technique that identiﬁes the inﬂuence of a training example on a prediction example. There are several ways to measure inﬂuence; in this paper, we investigate two methods.

2.3.1. MA R G I N A L INFLUENCE

Let MZ be the machine learning model trained on the data set Z; MZ(x) is the model s prediction on a data point x. Let ℓ(ˆy, y) be a loss function measuring the loss for a prediction ˆy and label y. The marginal inﬂuence of a training example z on an inference example z = (x , y ) is based on the counterfactual of removing the example z from the training set:

Marginal(z, z ) = ℓ(MZ\{z}(x ), y ) ℓ(MZ(x ), y ). (3)

We will adopt the convention that inﬂuence functions measure reduction in the loss of the inference example from the presence of a training example. Thus a positive quantity connotes that loss was reduced by that amount, and a negative quantity connotes that loss was increased by that magnitude.

Computing exact marginal inﬂuence often requires retraining the model on the modiﬁed data set. However, this can be estimated without retraining, via certain Hessian approximations (Koh & Liang, 2017b). Remark 2.4. Marginal inﬂuence relies on deleting a single individual s data and retraining the model to optimality. This may result in a substantially similar model, if the data set is large. This makes this measure of inﬂuence more susceptible to noise, as will be conﬁrmed in our experiments.

2.3.2. TR A CIN INFLUENCE

The second inﬂuence measure we use is called Trac In (Pruthi et al., 2020). Whereas Marginal relies on a counterfactual approach, Trac In assigns contributions and beneﬁts based on actual work done during the training process. It is therefore reliant on the training algorithm, and applies only to models trained using Stochastic Gradient Descent (SGD). Suppose the model Mw is parameterized by a vector w Rp, and let Lz(w) = ℓ(Mw(x), y) be the loss of the model on an example z = (x, y). In SGD, the weight parameters are updated iteratively: at each iteration t, a random batch Bt of training examples is visited, and the parameters are updated in the direction opposite (we are minimizing loss) to the gradient: P

z Bt ηt Lz(wt); here Lz(wt) is the gradient of the loss with respect to the weight parameters wt, and ηt is the step-size at time t.

The idea of Trac In is as follows: First, suppose that SGD visits examples one at a time, i.e., the batch size is one. Then,

the visit (to the training example z) changes the model parameters, and this changes the model s loss on the prediction example z . It is natural to attribute this change in loss to the contribution of example z and the beneﬁt of example z .

If the model visits a batch Bt of training examples at once, we have to disentangle the outﬂows of the examples z Bt. Trac In does this using dot products of gradients: ηt Lz(wt) Lz (wt). The term ηt Lz(wt) captures the change in the weight parameters due to example z; this is by deﬁnition of gradient descent. And Lz (wt) models change in the loss of the prediction example z due to a change in the weight parameters. The inﬂuence of training example z on inference example z is computed by summing across all the batches in which the example is present:

Trac In(z, z ) = X

t: z Bt ηt Lz(wt) Lz (wt). (4)

Remark 2.5. The use of gradients entails a ﬁrst-order approximation. The actual change in loss of the example z , can be written as Lz (wt+1) = Lz (wt)+ Lz (wt) (wt+1 wt), plus a higher-order term of order O(η2 t ), that is ignored. This approximation is reasonable when the step-sizes are small. We measure this discrepancy for our experiments (see Figure 5).

3. Models Trained Using SGD Are Trac In Reciprocal

In this section, we study Trac In reciprocity for models trained using SGD. We make the following assumption for our analysis:

Assumption 3.1. Let there be a population U of individuals, a set X of features and a set Y of labels. We assume that the training and inference sets are both drawn IID from a joint distribution over U X Y .

This is a standard assumption in machine learning literature. Notice that the assumption allows for different individuals to have different distributions (individuals are not interchangeable). What the assumption requires is that for a given individual, the training and inference distributions be the same, i.e. that an example is equally likely to be in the training or the inference set. We discuss breakages of this assumption further in Section 6.

Theorem 3.2. Consider a model trained with Stochastic Gradient Descent for T steps. Suppose that Assumption 3.1 holds. Furthermore, suppose that batches of training data (Bt)t {1,...,T } are mutually independent. Then the model is (1, 1)-Trac In-reciprocal in expectation, in the sense that for all individuals u, E[Iu] = E[Ou].

We give a sketch of the argument, the full proof is deferred to the appendix. Given a pair of users (u, v), the Trac In

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

inﬂuence at time t of z Zu on z Z v is ηt Lz (wt) Lz(wt), which is symmetric in z, z . Conditioned on the model parameters at time step t, Zu, Z u have the same distribution, and Zv, Z v have the same distribution. This allows us to argue that the expected inﬂuence of u on v at time t is the same as the expected inﬂuence of v on u. The proof formalizes this argument. Remark 3.3. The theorem suggests that reciprocity holds whether the inﬂows and outﬂows are positive or negative; indeed an individual s data could hurt another s predictions if their characteristics are very different. (In our model, negative outﬂows and inﬂows would manifest as negative dot-products Lz(wt) Lz (wt), what helps one individual hurts the other.) It holds irrespective of the variation in inﬂows and outﬂows across individuals; indeed some individuals may contribute more data than others (depending on the size of Zu). Remark 3.4. The theorem assumes that at each step of gradient descent, a new set of independent examples is drawn, which precludes revisiting the same example multiple times. In practice, training examples are revisited, which breaks independence. But notice that for the result to hold, it sufﬁces that future samples Bt be independent of the past trajectory w0, . . . , wt 1. In some regimes, this may be a reasonable approximation. For example, when the batch size is very large, there is little variance in the gradients, and one can informally treat the trajectory w0, . . . , wt 1 as being deterministic, unaffected by random sampling. Another regime is when the data set is very large, and training only requires a very small number of passes over the training data. In such cases, independence may be a reasonable approximation. The experiments in Section 5 suggest that models can be approximately reciprocal even when the independence assumption is broken. Remark 3.5. The proof crucially relies on the symmetry of the inﬂuence function ηt Lz (wt) Lz(wt) in z, z . Thus, any modiﬁcations to SGD that break this symmetry may also break reciprocity. In particular, it is common to clip (rescale) gradients to enforce differential privacy (Abadi et al., 2016). This modiﬁcation breaks the symmetry because clipping affects the gradient update Lz(wt), but not the gradient of the inference example Lz (wt). The purpose of clipping in this case is to control the inﬂuence of any individual on the model, it is therefore expected that reciprocity is broken. Clipping and normalization are also used for stabilizing training of deep neural networks (Pascanu et al., 2013), which may also break reciprocity. Remark 3.6 (Homogeneity). If users are homogeneous (i.e., the marginal distributions are identical across users), then by a simple argument, any model, whether it is trained using SGD or not, would be (1, 1)-reciprocal. (Indeed our example of non-reciprocity (Example 2.3) based on k-Nearest Neighbors relies on outliers that differ from other users.) Theorem 3.2 allows for heterogeneity among users.

4. Efﬁcient Computation of Trac In Flows

The deﬁnition of Inﬂows and Outﬂows as a sum of terms of the form Influence(z, z ) may suggest that one needs to compute this matrix of pairwise inﬂuence, which can be prohibitively expensive, but we show that for Trac In, this can be done more efﬁciently. This optimization was necessary in our experiments.

Suppose we apply minibatch SGD, and let Bt be the batch at step t. A naive computation of outﬂow (2) would suggest that we need to compute, at each step t, Influence(z, z ) for all z Bt and z Z . This represents |Bt||Z | dot products, and if p is the dimension of the parameter space, the total cost per step t would be O(|Bt||Z |p).

Observe that by equations (2) and (4), we can write the outﬂow as

t:z Bt ηt Lz (wt) Lz(wt)

t:z Bt ηt Lz(wt) X

z Z \Z u Lz (wt)

t:z Bt ηt Lz(wt) ( LZ (wt) LZ u(wt)),

where, in the last equality, we deﬁne, for any subset S Z ,

z S Lz (wt). (6)

To compute the outﬂows of all individuals, from equation (5), it sufﬁces to compute at each step t the following quantities: Lz(wt), z Bt (cost O(p|Bt|)), LZ (wt) (cost O(p|Z |)), LZ u, u U (cost O(p|Z |)), then compute one dot product for each pair (z, u) Bt U (cost O(p|Bt||U|)). The total cost is therefore O(p(|Bt||U| + |Z |)). In summary, the per-step cost is reduced from O(p|Bt||Z |) to O(p(|Bt||U| + |Z |)). This is an improvement by a factor min(|Bt|, |Z |/|U|), leading to a signiﬁcant improvement when the average number of inference examples per individual is large.

5. Experiments

We perform experiments on a recommender system data set and two health data sets. Here is what we hope to learn from the experiments:

Theorem 3.2 assumes that data points are visited exactly once during training. Practically, for many use cases, training data are visited more than once. Does reciprocity hold in such scenarios? We will ﬁnd that it does, though approximately.

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

0.002 0.000 0.002 0.004 0.006 0.008 Relative Outflow

Relative Inflow

Relative Outflow vs. Relative Inflow

0.002 0.000 0.002 0.004 0.006 0.008 Relative Outflow

Relative Inflow

Relative Outflow vs. Relative Inflow

(a) Average outﬂow vs. inﬂow for all individuals.

10 3 10 2 10 1 100

Reciprocity

Number of users

Reciprocity distribution across users

10 3 10 2 10 1 100

Reciprocity

Number of users

Reciprocity distribution across users

(b) Histogram of reciprocity across individuals.

0 200 400 600 800 1000 Step

Time evolution of (p, )-Trac In-Reciprocity

p=25 p=50 p=75

0 200 400 600 800 1000 Step

Time evolution of (p, )-Trac In-Reciprocity

p=25 p=50 p=75

(c) Time evolution of (p, α)-Trac In-Reciprocity along the optimization trajectory.

Figure 1. Experiment results on the Movie Lens data set. The top row shows averaged results over ten training runs for a single split, while the bottom rows shows averages across all runs and all splits (ten runs per split for ten splits).

The theorem is a statement about inﬂows and outﬂows in expectation. To what extent is there reciprocity for a single realization of the data? We will show that in the recommender system example, there is approximate reciprocity for a single realization, though the degree of reciprocity increases as we average over realizations.

Can Marginal reciprocity be measured reliably (see Remark 2.4)? Is there agreement between Marginal and Trac In reciprocity? We will ﬁnd that Marginal reciprocity is noisy for the recommender system example, possibly because the data set is large and an individual has relatively small inﬂows and outﬂows. For healthcare datasets, we ﬁnd that Marginal reciprocity is more stable, and there is a directional agreement between the two measures.

For each experiment, we randomly partition the data into training and inference sets; this partitioning reﬂects Assumption 3.1. Measurements are averaged across several such random splits.

5.1. Recommendation Data Set

We conduct experiments on Movie Lens Data (Harper & Konstan, 2015), speciﬁcally, the Movie Lens 100K data set with 943 individuals, 1682 items (movies), and 100,000

ratings, i.e., an average of about 106 ratings per individual. Each individual has at least 20 ratings. Each movie is rated on a scale from 1-5. We randomly split the ratings into training and inference sets in the ratio 80:20.3

We run the experiment on 10 different splits. For each split, we average the measurements across 10 random initializations. We measure inﬂows, outﬂows, and (p, α) reciprocity. We also report in Appendix C additional measurements, such as the ﬁrst-order approximation error in Trac In (recall Remark 2.5), and the signal-to-noise ratio (SNR) of each measure (Appendix C.2). We ﬁnd that the SNR is much lower for Marginal inﬂuence than for Trac In inﬂuence, due to the reasons discussed in Remark 2.4. For this reason, we only report results for Trac In.

3We train a matrix factorization model with embedding dimension d = 16. We randomly initialize the user and item embeddings. Given the relatively small size of the training data, we use fullbatch Gradient Descent, i.e., all examples are visited at every time step. We use the following hyper-parameters, which we tuned on a random split of the data: regularization coefﬁcient λ = 1, number of steps T = 1000, and learning rate η = 0.0002. As a sanity check, the quality of the model is consistent with previously reported results. For example, (Zhang et al., 2017; Rashed et al., 2019) report an RMSE of 0.911 using a 90-10 split. Our model has an RMSE of 0.910 on the same split.

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

Degree of reciprocity We plot inﬂows vs. outﬂows in Figure 1a. Inﬂows and outﬂows appear commensurate; large (resp. small) inﬂows correspond to large (resp. small) outﬂows. The linear trend is stronger when we average across ten splits (top vs. bottom ﬁgure).

Regarding reciprocity: For a single split, we ﬁnd that 75% of the individuals have reciprocities in the range [0.16, 1], i.e. the model is (0.75, 0.16)-Trac In-reciprocal. The correlation between inﬂow and outﬂow is 0.89.

When averaging across ten splits, the model is (0.75, 0.58)- Trac In-reciprocal, and the correlation between inﬂows and outﬂows increases to 0.98.

The experiment shows that although the independence assumptions of Theorem 3.2 don t hold (since training examples are revisited), reciprocity is relatively high in practice, both on a single split, and when averaged across splits. It is remarkable that we measure some degree of reciprocity even on a single split (recall that the theorem is a statement in expectation). Averaging results across splits gives an empirical estimate of the expected inﬂows/outﬂows, and indeed this averaging increases the measured reciprocity, in line with the theorem.

Signs of inﬂows and outﬂows We also observe that inﬂows and outﬂows are largely non-negative. In a single split, 0.7% of individuals have a negative inﬂow and 19.6% have a negative outﬂow. When averaging across all splits, 0.1% have a negative inﬂow and 8% have a negative outﬂow; on average, these individuals training examples degrade the prediction quality of other individuals. However, these individuals tend to have small magnitudes of outﬂow and inﬂow; this also explains why the correlation measure (which is dominated by ﬂows of large magnitudes) indicates a higher degree of reciprocity than the (p, α) measure.

Effect of training dynamics Figure 1c shows the time evolution of Trac In-Reciprocity along the optimization trajectory. At initialization, reciprocity is low. It quickly increases during the early phase of training, and as the model nears convergence, reciprocity starts to degrade. This may be due to cancellations in the gradients of training examples: although the aggregate gradient is small, individual training examples may have large gradients which results in large credits (inﬂows and outﬂows) being assigned.

5.2. Healthcare Data Sets

We investigate reciprocity in two healthcare data sets. The ﬁrst is a data set from (Efron et al., 2004) about predicting diabetes. It has ten features: age, sex, body mass index, average blood pressure, and six blood serum measurements, and the task is to predict disease progression one year after the time of the readings. The data is from 442 individuals.

0.02 0.01 0.00 0.01 0.02 Relative Outflow

Relative Inflow

Relative Outflow vs. Relative Inflow

(a) Diabetes

0.3 0.2 0.1 0.0 0.1 Relative Outflow

Relative Inflow

Relative Outflow vs. Relative Inflow

(b) Breast cancer

Figure 2. Average Marginal-outﬂow vs. Marginal-inﬂow for all individuals.

The second is a data set from the UCI Machine Learning Repository about predicting breast-cancer. There are thirty features that relate to geometric properties of cell nuclei from a digitized image of a ﬁne needle aspirate of a breast mass. The task is to predict whether the breast cancer is malignant or benign. The data is from 569 individuals.

In both data sets, each individual corresponds to a single data point, unlike the Movielens data set where individuals correspond to at least 20 data points. Consequently, every individual belongs to exactly one of the training or inference sets; individuals in the training set only have outﬂows and individuals in the inference set only have inﬂows. In this case, measuring reciprocity is only possible in expectation over random train/test splits. More precisely, we average measurements over 100 random splits of the data.4

4On the diabetes prediction task, we train a linear regression model optimized for the mean squared error, with a number of steps T = 200, and a learning rate η = 0.01. On the breast cancer classiﬁcation task, we train a logistic regression model with a number of steps T = 600 and a learning rate η = 0.1. In both cases, some features have very different scales, so we found it important to normalize all features (using mean and variance computed on the training set).

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

0.010 0.005 0.000 0.005 0.010 0.015 0.020 Ralative Outflow

Ralative Inflow

Ralative Outflow vs. Ralative Inflow (200 steps)

(a) Diabetes

0.020 0.015 0.010 0.005 0.000 0.005 Ralative Outflow

Ralative Inflow

Ralative Outflow vs. Ralative Inflow (600 steps)

(b) Breast cancer

Figure 3. Average Trac In-outﬂow vs. Trac In-inﬂow for all individuals.

Degree of reciprocity We compute outﬂows and inﬂows using Trac In and Marginal as measures of inﬂuence. See Figures 2 and 3. Recall from Remark 2.4 that Marginal is susceptible to noise. We ﬁnd that the results are not as noisy as in the Movie Lens experiment (the 50-th percentile of SNR is 0.30, compared to 0.19 in Movie Lens), possibly because these data sets are smaller (hence individual data points have larger inﬂuence). We report results for both Marginal and Trac In so we can compare them.

We ﬁnd that the diabetes model is (0.75, 0.36)-Marginalreciprocal, and (0.75, 0.76)-Trac In-reciprocal. The breast cancer model is (0.75, 0.48)-Marginal-reciprocal and (0.75, 0.93)-Trac In-reciprocal. Trac In reciprocity is large for both experiments, inline with the Theorem 3.2, even though here again, the independence assumptions of the theorem don t hold. Finally, although Marginal reciprocity is lower, there is a directional agreement between the two measures. When one is higher, so is the other.

6. Discussion

6.1. Modeling the User

In this work, we assume that user s utility is modeled by the machine learning objective (the loss function ℓ). There

are two reasons why this is not completely true. First, ML objectives are based on observable user behavior, such as the interaction with a piece of content; such observable behavior is not completely reﬂective of user satisfaction.

Second, this ignores any value that is created outside of the ML process. For instance in a recommender system, besides value derived from the quality of the recommendation model, there is the value created by content creators. Similarly, in a healthcare system, besides value derived from the data of other patients, some beneﬁt stems from the healthcare professionals. The impact of this issue will vary across use-cases; for instance, when content is cheap to create, perhaps the ﬂow through the ML is the dominant ﬂow.

6.2. Stationarity

Our main result, Theorem 3.2, states that we have reciprocity under a certain stationarity assumption (Assumption 3.1). This assumption requires that a speciﬁc user interaction should be equally likely to occur during training or inference. In practice, this assumption will almost certainly be violated. (For instance, in a recommender system, a new user might contribute fewer examples to the training set, but have more in the inference set.)

While our theorem requires the assumption, our measurements do not. Indeed, we are able to measure inﬂows and outﬂows whatever the arrival pattern (our experiments demonstrate this), and if there is an imbalance between inﬂows and outﬂows, one can take measures to address this, via payments or different service levels.

6.3. Other Measures of Inﬂuence

Our measures of inﬂow and outﬂow depend on how we compute inﬂuence. We study two measures: Marginal and Trac In. There are other measures worth investigating, for instance, the Shapley value. Modiﬁcations of the Shapley value have been used to determine feature attribution (Sundararajan & Najmi, 2020). However, to determine the inﬂuence of training data, even approximately, would requires a number of training runs that are super-linear in the number of training examples, and this is usually intractable. That said, Shapley may be computationally feasible when the exchange happens between a small number of large entities, as in the cross-silo federated setting described in Section 1.1, or for restricted model classes such as decision trees (with additional assumptions).

6.4. Balancing Flows

One exciting open direction is the design of training algorithms that enforce a certain level of reciprocity. For example, one can imagine introducing per-example weights in SGD; changing the weight of an example z would directly

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

scale Trac In(z, z ) for all z (by linearity of gradients), and this can potentially be used to rebalance inﬂows and outﬂows at each training step.

This bears similarities with differential privacy (DP), which seeks to limit the impact that any training example has on the distribution of model parameters. One of the most popular methods to achieve DP is DPSGD (Abadi et al., 2016), which crucially relies on rescaling per-example gradients to bound their norm (a.k.a. gradient clipping).

Changes to the training algorithm (via example weights, sampling, or clipping) often has a non-trivial impact on model quality, and much like in DP literature, an important question will be to understand optimal trade-offs between reciprocity and model quality.

6.5. Other Settings

It is worth investigating inﬂow, outﬂow and reciprocity in any setting that involves user data, whether ML is involved or not. However, only those that have some model of user utility will be amenable to mathematical analysis. For instance, mapping applications often display real time information about trafﬁc congestion, and the user beneﬁts by using the application to avoid congestion. It is meaningful to measure inﬂow, outﬂow and reciprocity in such settings.

Acknowledgements

We are grateful to Amir Najmi, Kedar Dhamdhere, Steffen Rendle, Diane Tang, Fernando Pereira, John Anderson and the anonymous reviewers (from this conference and others) for insightful comments and stimulating discussions related to problem framing.

Abadi, M., Chu, A., Goodfellow, I., Mc Mahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS 16, pp. 308 318, New York, NY, USA, 2016. Association for Computing Machinery.

Bobadilla, J., Ortega, F., Hernando, A., and Guti errez, A. Recommender systems survey. Knowledge-Based Systems, 46:109 132, 2013.

Brajer, N., Cozzi, B., Gao, M., Nichols, M., Revoir, M., Balu, S., Futoma, J., Bae, J., Setji, N., Hernandez, A., and Sendak, M. Prospective and External Evaluation of a Machine Learning Model to Predict In-Hospital Mortality of Adults at Time of Admission. JAMA Network Open, 3 (2), 2020.

Dwork, C. and Roth, A. The algorithmic foundations of dif-

ferential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 4):211 407, 2014.

Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. Least angle regression. The Annals of Statistics, 32(2):407 451, 2004.

European Commission, Directorate-General for Education, Youth, S., and Culture. Erasmus+ annual report 2019. Publications Ofﬁce, 2020.

Fehr, E. and G achter, S. Fairness and retaliation: The economics of reciprocity. Journal of Economic Perspectives, 14(3):159 181, September 2000.

Ghorbani, A. and Zou, J. Data shapley: Equitable valuation of data for machine learning. In International Conference on Machine Learning, pp. 2242 2251, 2019.

Gill, J., Tinckam, K., Fortin, M., Rose, C., Shick-Makaroff, K., Young, K., Lesage, J., Cole, E., Toews, M., Landsberg, D., and Gill, J. Reciprocity to increase participation of compatible living donor and recipient pairs in kidney paired donation. Am J Transplant., 17(7), 2017.

Gouldner, A. The norm of reciprocity: A preliminary statement. American Sociological Review, (2):161 178, 1960.

Guibert, J. and Ray on, A. Uk s turing scheme. International Higher Education, (106):23 24, Apr. 2021.

Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., Venugopalan, S., Widner, K., Madams, T., Cuadros, J., Kim, R., Raman, R., Nelson, P. Q., Mega, J., and Webster, D. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA, 2016.

Hara, S., Nitanda, A., and Maehara, T. Data cleansing for models trained with sgd. In Advances in Neural Information Processing Systems, pp. 4215 4224, 2019.

Harper, F. M. and Konstan, J. A. The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst., 5 (4), December 2015.

Hoaglin, D. C. and Welsch, R. E. The hat matrix in regression and anova. The American Statistician, 32(1):17 22, 1978.

Jia, R., Dao, D., Wang, B., Hubis, F. A., Hynes, N., G urel, N. M., Li, B., Zhang, C., Song, D., and Spanos, C. J. Towards efﬁcient data valuation based on the shapley value. In The 22nd International Conference on Artiﬁcial Intelligence and Statistics, pp. 1167 1176, 2019.

Kairouz, P., Mc Mahan, H. B., Avent, B., Bellet, A., Bennis, M., Nitin Bhagoji, A., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., D Oliveira, R. G. L., Eichner,

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

H., El Rouayheb, S., Evans, D., Gardner, J., Garrett, Z., Gasc on, A., Ghazi, B., Gibbons, P. B., Gruteser, M., Harchaoui, Z., He, C., He, L., Huo, Z., Hutchinson, B., Hsu, J., Jaggi, M., Javidi, T., Joshi, G., Khodak, M., Konecn y, J., Korolova, A., Koushanfar, F., Koyejo, S., Lepoint, T., Liu, Y., Mittal, P., Mohri, M., Nock, R., Ozg ur, A., Pagh, R., Qi, H., Ramage, D., Raskar, R., Raykova, M., Song, D., Song, W., Stich, S. U., Sun, Z., Suresh, A. T., Tram er, F., Vepakomma, P., Wang, J., Xiong, L., Xu, Z., Yang, Q., Yu, F. X., Yu, H., and Zhao, S. Advances and open problems in federated learning. Found. Trends Mach. Learn., 14(1 2):1 210, jun 2021.

Koh, P. W. and Liang, P. Understanding black-box predictions via inﬂuence functions. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1885 1894, 2017a.

Koh, P. W. and Liang, P. Understanding black-box predictions via inﬂuence functions. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 1885 1894. PMLR, 06 11 Aug 2017b.

Koren, Y. and Bell, R. Advances in Collaborative Filtering, pp. 77 118. Springer US, Boston, MA, 2015.

Koren, Y., Bell, R., and Volinsky, C. Matrix factorization techniques for recommender systems. Computer, 42(8): 30 37, 2009.

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. A survey on bias and fairness in machine learning. ACM Comput. Surv., 54(6), jul 2021.

Pascanu, R., Mikolov, T., and Bengio, Y. On the difﬁculty of training recurrent neural networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML 13, pp. III 1310 III 1318. JMLR.org, 2013.

Perez, M. V., Mahaffey, K. W., Hedlin, H., Rumsfeld, J. S., Garcia, A., Ferris, T., Balasubramanian, V., Russo, A. M., Rajmane, A., Cheung, L., Hung, G., Lee, J., Kowey, P., Talati, N., Nag, D., Gummidipundi, S. E., Beatty, A., Hills, M. T., Desai, S., Granger, C. B., Desai, M., and Turakhia, M. P. Large-scale assessment of a smartwatch to identify atrial ﬁbrillation. New England Journal of Medicine, 381(20):1909 1917, 2019.

Pruthi, G., Liu, F., Kale, S., and Sundararajan, M. Estimating training data inﬂuence by tracing gradient descent. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 19920 19930. Curran Associates, Inc., 2020.

Rajkomar, A., Oren, E., Chen, K., Dai, A. M., Hajaj, N., Liu, P. J., Liu, X., Sun, M., Sundberg, P., Yee, H., Zhang, K., Duggan, G. E., Flores, G., Hardt, M., Irvine, J., Le, Q. V., Litsch, K., Marcus, J., Mossin, A., Tansuwan, J., Wang, D., Wexler, J., Wilson, J., Ludwig, D., Volchenboum, S. L., Chou, K., Pearson, M., Madabushi, S., Shah, N. H., Butte, A. J., Howell, M., Cui, C., Corrado, G., and Dean, J. Scalable and accurate deep learning for electronic health records. Co RR, abs/1801.07860, 2018.

Rashed, A., Grabocka, J., and Schmidt-Thieme, L. Attributeaware non-linear co-embeddings of graph features. In Proceedings of the 13th ACM Conference on Recommender Systems, Rec Sys 19, pp. 314 321, New York, NY, USA, 2019. Association for Computing Machinery.

Sundararajan, M. and Najmi, A. The many shapley values for model explanation. In III, H. D. and Singh, A. (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 9269 9278. PMLR, 13 18 Jul 2020.

Wang, D., Khosla, A., Gargeya, R., Irshad, H., and Beck, A. H. Deep learning for identifying metastatic breast cancer, 2016.

Yeh, C.-K., Kim, J., Yen, I. E.-H., and Ravikumar, P. K. Representer point selection for explaining deep neural networks. In Advances in Neural Information Processing Systems, pp. 9291 9301, 2018.

Zhang, S., Yao, L., and Xu, X. Autosvd++: An efﬁcient hybrid collaborative ﬁltering model via contractive autoencoders. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 17, pp. 957 960, New York, NY, USA, 2017. Association for Computing Machinery.

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

A. Reciprocity in Recommender Systems

We give additional details about reciprocity in the context of a standard recommendation model (Bobadilla et al., 2013). There is a population U of individuals (indexed by u) and a set I of items (pieces of content, indexed by i). When individuals interact with items, they rate the item implicitly or explicitly. This produces a score rui. The individuals and the items may have features associated with them. These could be ids, or descriptive features such as the genre of the movie, the location of the individual etc. The task is to predict the scores for unseen interactions given scores for past interactions.

We seek to measure the value exchange from the implicit curation that occurs when individuals consume recommendations. For a recommender system, the features (x from Section 2.1 are characteristics of individuals and items, and the label (y from Section 2.1) is a rating rui for the individual-item pair u, i associated with the example.

In such a model, every individual u is endowed with a d dimensional vector pu and every item i is endowed with a d dimensional vector qi; we refer to these vectors as embeddings. The prediction matrix is the product ˆR = PQ , where pu is the u-th row of matrix P and similarly for Q. In other words, the predicted rating for user-item pair (u, i) is given by the dot product ˆru,i = pu qi. See (Koren et al., 2009) for more details about matrix factorization.

The model is optimized for a regularized quadratic loss, i.e.,

(u,i) Z (pu qi rui)2 + λ

where rui is the label of pair (u, i). The regularization term helps generalization.

For matrix factorization, we redistribute the regularization term as a sum over training examples, and deﬁne the loss as

(pu qi rui)2 + λ |Zu| pu 2 + λ |Zi| qi 2 ,

where Zu = {i : (u, i) Z} and Zi = {u : (u, i) Z}. Remark A.1 (Trac In for Matrix Factorization). Notice that the loss gradients and the Trac In inﬂuence have a simple structure. A visit to a training example (u, i) only updates the vectors pu and qi. Moreover, the change in the user embedding pu only affects the predictions for user u; therefore updates to the user vectors do not play a role in the deﬁnitions of inﬂow and outﬂow. (Equations 1 and 2).

The update to the item embedding qi only inﬂuences users who interact with the item i in the inference set. Thus, for this loss function, inﬂows and outﬂows only ﬂow through updates to item embeddings. Remark A.2 (Computing Trac In for Matrix factorization). Section 4 suggests that we can compute Trac In in O(|Z|p + |Z |p) where p is the total number of model parameters, in this case p = d(|U| + |I|). But due to the structure of the problem, this computation can be done more efﬁciently for matrix factorization. Notice that the gradient of the loss w.r.t. a training example z = (u, i) is 2d-sparse, only the embeddings pu, qi have a non-zero gradient. Thus, computing LZ(wt) in Equation (6), can be done in O(|Z|d) instead of O(|Z|p). Similarly, computing the sum of dot products in Equation (5) requires O(|Z |d) operations (since each Lz(wt) is 2d-sparse). The total complexity is therefore O(|Z|d + |Z |d). This is equal to the complexity of running gradient descent, which means that computing Inﬂows and Outﬂows along the SGD trajectory does not signiﬁcantly increase the computational cost of model training.

B. Proof of Theorem 3.2

First, we introduce some notation. Let Z = U X Y , where U is the population of individuals, X is the feature set and Y is the label set. Let D be the joint distribution over Z. For an individual u U, we write Zu = {u} X Y , so that a training example z belongs to individual u if z Zu.

Our goal is to show that for all u, E[Iu] = E[Ou].

Given batches of training data B1, . . . , BT and an inference set Z (note that both are random variables), we rewrite inﬂows

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

and outﬂows in a form that is more amenable to taking expectations.

z Bt:z / Zu

z Z u Lz(wt) Lz (wt)

z Bt Lz(wt)1[z / Zu]

z Z Lz (wt)1[z Zu]

where 1[z Zu] is the indicator of the event z belongs to user u . Similarly, we have for outﬂows

z Bt Lz(wt)1[z Zu]

z Z Lz (wt)1[z / Zu]

Let (F1, . . . , Ft) denote the ﬁltration arising from the sequence of random variables (B1, . . . , Bt). Taking the expectation of inﬂow in Equation (7), and using the tower property of conditional expectations, we have

z Bt Lz(wt)1[z / Zu] X

z Z Lz (wt)1[z Zu] Ft 1

Now, notice that the batches (B1, . . . , Bt 1) completely determine the model parameters wt (since wt = w0 Pt 1 τ=0 ητ P

z Bτ Lz(wτ)), and by assumption, the next batch of training examples Bt is independent of previous batches, and so is the inference set Z . So conditioned on Ft 1, the two random variables P

z Bt Lz(wt)1[z / Zu] and P

z Z Lz (wt)1[z Zu] are independent. Let us denote by

g u t 1 = Ez D[ Lz(wt)1[z / Zu]|Ft 1],

gu t 1 = Ez D[ Lz (wt)1[z Zu]|Ft 1].

Then, by the aforementioned independence, linearity of expectations, and the assumption that elements of Z and Z (and hence Bt) follow the same distribution D, we have

t=1 ηt(|Bt|g u t 1) (|Z |gu t 1)

We make a similar calculation for outﬂows (the only difference is in the indicators): taking expectations in Equation (8),

z Bt Lz(wt)1[z Zu] X

z Z Lz (wt)1[z / Zu] Ft 1

t=1 ηt(|Bt|gu t 1) (|Z |g u t 1)

where we used independence (conditional on Ft 1) of the random variables P

z Bt Lz(wt)1[z Zu] and P

z Z Lz (wt)1[z / Zu]. The two quantities (9) and (10) are equal. This concludes the proof.

C. Additional Experimental Results

C.1. Sanity Checks on Movie Lens

In order to ensure that inﬂows and outﬂows are meaningful quantities, we measure the noise in the inﬂows and outﬂows due to randomness in the training process.

We plot the distributions across ﬁve arbitrarily chosen individuals. See Figure 4. To aid interpretation, we normalize the inﬂows and outﬂows by the total inﬂow (which approximates the total change in loss). Therefore, one can read the numbers as the fraction of overall inﬂow, (or approximately the fraction of total loss reduction). We observe that the inﬂows and outﬂows are consistent across runs, however, there is some variation, stemming from the random initialization.

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

Distribution of Outflow for five users

Distribution of Inflow for five users

Figure 4. Distribution of inﬂows and outﬂows across 10 runs for 5 individuals. The box edges show the upper and lower quartiles, the whiskers show the ﬁfth and ninety-ﬁfth percentile.

Percentage Trac In discrepancy over 1000 steps of Gradient Descent.

Figure 5. Trac In approximation discrepancy across 1000 steps of Gradient Descent.

Next, we measure the error induced by the ﬁrst-order approximation of Trac In (recall Remark 2.5). Figure 5 shows the relative error for one run. The numerator is the sum of the gradient dot products (in Equation 4) across examples for one step of gradient descent minus the total change in loss across Z . The denominator is the total change in loss across Z . The percentage relative discrepancy remains relatively small; its 80th percentile is 1.1%. This can be further reduced by using a smaller learning rate.

10 3 10 2 10 1 100 101 102 103

Outflow Signal-to-Noise Ratio

Number of users

Histogram of Outflow Signal-to-Noise Ratio

Trac In Marginal

Figure 6. Histogram of user outﬂow signal-to-noise ratios, computed across ten splits.

C.2. Marginal Inﬂuence on the Movie Lens Data Set

In the Movie Lens experiment, we found that Marginal-outﬂows and inﬂows are noisy, and extremely susceptible to random initialization (recall Remark 2.4). This can be quantiﬁed by measuring the Signal-to-Noise Ratio (SNR), deﬁned as the mean divided by the standard deviation. In an attempt to improve the SNR, we used a slightly different deﬁnition

Inﬂow, Outﬂow, and Reciprocity in Machine Learning

of marginal ﬂows: instead of measuring the effect of deleting a single example, as in Equation (3), then summing over z Zu, we measure the effect of removing all of user u s examples simultaneously. Despite this change, the Marginal SNR remains low, see Figure 6. For Marginal, over 99% of individuals have a SNR less than one. For Trac In, 90% of individuals have a SNR greater than 76, making Trac In measurements much more meaningful.