# equilibrium_characterization_for_data_acquisition_games__98cb9edb.pdf

Equilibrium Characterization for Data Acquisition Games

Jinshuo Dong , Hadi Elzayn , Shahin Jabbari , Michael Kearns and Zachary Schutzman University of Pennsylvania {jinshuo, hads}@sas.upenn.edu, {jabbari, mkearns, ianzach}@cis.upenn.edu

We study a game between two ﬁrms in which each provides a service based on machine learning. The ﬁrms are presented with the opportunity to purchase a new corpus of data, which will allow them to potentially improve the quality of their products. The ﬁrms can decide whether or not they want to buy the data, as well as which learning model to build with that data. We demonstrate a reduction from this potentially complicated action space to a one-shot, two-action game in which each ﬁrm only decides whether or not to buy the data. The game admits several regimes which depend on the relative strength of the two ﬁrms at the outset and the price at which the data is being offered. We analyze the game s Nash equilibria in all parameter regimes and demonstrate that, in expectation, the outcome of the game is that the initially stronger ﬁrm s market position weakens whereas the initially weaker ﬁrm s market position becomes stronger. Finally, we consider the perspective of the users of the service and demonstrate that the expected outcome at equilibrium is not the one which maximizes the welfare of the consumers.

1 Introduction

Recent years have seen explosive growth in the domain of digital data-driven services. Search engines, restaurant recommendations, and social media are among the many products we use day-to-day which sit atop modern data analysis and machine learning (ML). In such markets, ﬁrms live and die by the quality of their models; thus success in the race for data , whether acquired directly from customers or indirectly via acquisition of rival ﬁrms or purchasing data corpuses, is crucial. In this work, we study two questions: whether such markets tend towards monopoly, and how competition affects consumer welfare. Importantly, we consider these questions in light of the modeling choices that ﬁrms must make. In our model, two ﬁrms compete for market share (utility) by providing identical services that each rely on an ML

The full technical version of this paper is available at https://www.cis.upenn.edu/~mkearns/mlgames.pdf.

model. The ﬁrms error rates depend on their choices of algorithms, models and the volume of available training data. Each ﬁrm s market share is proportional to the error of its model relative to the model built by its competitor. This is motivated by the observation that the services built using ML are highly accurate, so users are more conscientious of the mistakes the service makes, rather than the successes. A competition exponent measures relative ferocity of competition and maps to a plausible Markov model of consumer choice. See Section 2.2 for more details. The ﬁrms initially possess (possibly differing) quantities of data, and are given the opportunity to buy additional data at a ﬁxed price to improve their models. Since data is costly and relative (rather than absolute) model quality determines market share, each ﬁrm s best course of action may depend on the actions of its rival. Hence, each ﬁrm acts strategically and faces two decisions: whether to buy the additional data, and what type of model to build in order to produce the best product given the data it ends up with. The decision of what model to build seems to complicate the ﬁrms action space greatly; there is a very large set of model classes to select from, and different classes have different efﬁciencies. For example, when restricting attention to neural networks, the choices of depth and number of nodes per layer produce different hypothesis classes with different optimal models. Thus, in principle, the decisions of what model class to select and whether to purchase additional data must be made jointly. However, learning theory allows us to greatly reduce this large action space. In Section 2.1, we show that the game in which ﬁrms jointly choose a model and whether to attempt to buy the additional data reduces to a strategically equivalent game in which ﬁrms ﬁrst choose whether to buy the data and then choose optimal models. In Section 3, we characterize the Nash equilibria of our game for different parameter regimes. For no combination of parameters does exactly one ﬁrm wish to buy the data; unsurprisingly, for very high prices, neither ﬁrm buys data, and for very low prices, both ﬁrms do. In the middling regime, the competitive aspect of the game imposes a prisoners' dilemma -like ﬂavor: both ﬁrms would prefer neither ﬁrm buy the data, but each do so in order to prevent the other from strengthening its position. Moreover, the unique mixed strategy Nash equilibrium in this regime involves ﬁrms increasing their probability of buying data as price increases.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

This counterintuitive result follows from the logic of equilibrium: ﬁrms playing mixed strategies must be indifferent to buying and not buying the data, and as the price rises, the probability that a competitor acquires the data must rise in order to make investing in data acquisition a palatable option. Finally, we study whether any of the dynamics of the game push the market towards a monopoly. Perhaps counter to a rich-get-richer feedback loop that might be expected in data races, we observe that in all equilibria, the data gap (and thus, market share gap) always narrows (in expectation). As measured by consumer welfare, this is actually undesirable. Both the direction of the data gap as well as the welfare implication may be counterintuitive, particularly with respect to the well-known stylized fact that market concentration is bad for consumers. However, consumer data that improves a service can be viewed as exhibiting a form of network effects, in which case perfect competition can result in inefﬁciency and under-provisioning of a good [Katz and Shapiro, 1985]. In other words, a greater data gap would result in more consumers using a less error-prone service. As for the data race, anecdotal evidence, such as GM s acquisition of automated driving startup Cruise, despite Waymo s earlier market entry and research head-start, are suggestive (though not conclusive) that these predictions may be indicative of real-world dynamics [Primack and Korosec, 2016]. We view our work as a ﬁrst step towards modeling and analyzing competition for data in markets driven by ML. Under our simplifying assumptions, we derive concrete results with relevance both for policymakers analyzing algorithmic actors as well as engineering or business decision-makers considering the tasks of data acquisition and model selection. Our results are qualitatively robust to other natural modeling choices, such as allowing both ﬁrms to purchase the data, as well as treating the data seller as a market participant; however, more signiﬁcant departures may lead to different conclusions. See Sections 4 and 5 for more details.

1.1 Related Work The theory of ML from a single learner s perspective is well developed, but until recently, little work had studied competition between learning algorithms (see e.g [Ben-Porat and Tennenholtz, 2018; Mansour et al., 2018] for notable exceptions). We differ from both works by exploring the comparative statics and welfare consequences of a single decision (data acquisition). Concurrently, Ben-Porat and Tennenholtz [2019] study a game in which learners strategically choose their model to compete for users, but users only care about the accuracy of predictions on their particular data. In contrast, users in our model choose based on the overall model error. Our work also intersects with several strains of economic literature, including industrial organization and network effects [David, 1987; Economides, 1996; Katz and Shapiro, 1985]. We differ from such models in two key ways. First, in contrast to assuming a static equilibrium [Katz and Shapiro, 1985] or ﬁxing a dynamic but unchanging process at the outset [Farrell and Saloner, 1986], our work can be viewed as an analysis of a shock to a given potentially asymmetric equilibrium in the form of the availability of new data. Second,

the consumers in our model do not behave strategically (see e.g. [Berry et al., 2017; Mansour et al., 2018]). Finally, our work is related to spectrum auctions, competition with congestion externalities [Berry et al., 2017], and the sale of information or patents [Kamien and Tauman, 1986; Kamien et al., 1992]. Our results primarily share qualitative similarities: the choice of one ﬁrm to buy data (spectrum) forces the other to do so to avoid losing market share, though it would not have been proﬁtable absent the rival, and actual outcomes run counter to consumer preferences (see e.g. [Berry et al., 2017]).

2 Framework

We formally motivate and model the ML problem of the ﬁrms and demonstrate how this reduces to a game in which the ﬁrms can either buy or not buy the new data.

2.1 Choosing a Model Class

Consider a ﬁrm using ML to build a service e.g. a recommendation system. The amount of data available to the ﬁrm is a crucial determinant to the effectiveness of the predictive service of the ﬁrm. Fixing the amount of data, the ﬁrm faces a fundamental tradeoff; it can use a more complex model that can ﬁt the data better, but learning using a complicated model requires more training data to avoid overor underﬁtting. We can formally represent this tradeoff as follows. Let H denote the hypothesis class from which the ﬁrm is selecting its model and assume the data is generated from a distribution D. Then given m i.i.d. draws from D the error of the ﬁrm when learning a hypothesis from H can be written as err D(H) = err(m, H) + minh H err D(h) [Shalev-Shwartz and Ben-David, 2014]. The ﬁrst term, known as estimation error, determines how well in expectation a model learned with m draws from D can predict compared to the best model in class H. The second term, known as approximation error, determines how well the best model in class H can ﬁt the data generated from D. The approximation error is independent of the amount of training data, while the estimation error decreases as the volume of training data increases. The choice of H affects both errors. In particular, ﬁxing the amount of training data, increasing the complexity of H will increase the estimation error. On the other hand, the additional complexity will decrease the approximation error as more complicated data generating processes can be ﬁt with more complicated models. Once the amount of data is ﬁxed, the ﬁrm can optimize over its choice of model complexity to achieve the best error. We examine a few widely used ML models and their error forms. As a ﬁrst example, consider the case where the ﬁrm is building a neural network and has to decide how many nodes d to use. d is the measure of the complexity of the model class and given m data points, the error of the model can be written using the following simpliﬁcation of a result from Baron [1994].

Lemma 1 ([Barron, 1994]). Let H be the class of neural networks with d nodes. Then for any distribution D, with high probability, the error when using m data points to learn a

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

model from H is at most c1d/m + c2/d, for constants c1 and c2.

Fixing m, the choice of d that minimizes the error can be computed by minimizing the bound in Lemma 1 with respect to d. This corresponds to d = c2 m/c1 and we get that the error of the model built by the ﬁrm is p

c1c2/m. As another example, consider the very simple setting of realizable PAC learning where the data points are generated by some hypothesis in a ﬁxed hypothesis class.

Lemma 2 ([Kearns and Vazirani, 1994]). Any algorithm for PAC learning a concept class of VC dimension d must use Ω(d/ϵ) examples in the worst case.

Thus in this setting, in the worst case, ﬁrms need Θ(1/ϵ) training data points to achieve error ϵ. A similar bound gives that with high probability, the ﬁrms can guarantee error of Θ(1/m) (see e.g. [Kearns and Vazirani, 1994]). In the examples above the error of a ﬁrm with m data points takes the form of either Θ(m 1/2) or Θ(m 1) after the ﬁrm optimizes over the choice of model complexity. Importantly, the error in both cases (and more generally) degrades as the number of data points increases. The rate at which the error degrades is commonly known as the learning rate. There are other learning tasks with learning rates different than the examples above. Consider a stylized model of a search engine where the set of queries is drawn from a ﬁxed and discrete distribution over a very large or even inﬁnite set, and the search engine can only correctly answer queries that it has seen before. If, as is often assumed, the query distribution is heavy-tailed, then the search engine will require a large training set to return accurate answers. In this framework, the probability that a search engine incorrectly answers a query drawn from the distribution is exactly the expectation of the unobserved mass of the distribution given the queries observed so far. This quantity is known as the missing mass of a distribution (see e.g. [Berend and Kontorovich, 2011; Decrouez et al., 2018; Good, 1953; Orlitsky et al., 2003]). Lemma 3 shows how to bound the expected missing mass for the class of polynomially decaying query distributions.

Lemma 3 ([Decrouez et al., 2018]). Let P k for k > 1 be a discrete distribution with polynomial decay deﬁned over i N 0 such that Prx P k [x = i] = i k/Σ j=0j k. Then the expected missing mass given m draws from P k is Θ(m1/k 1).

By varying k in the query distribution of Lemma 3, the learning rate in the search problem can take the form of Θ(m i) for any i (0, 1). Thus, the learning rate for search may be much faster or slower compared to the previous examples, and the exact rate depends on the value of k. We saw that given a ﬁxed amount of data, a ﬁrm using ML can optimize over its learning decisions to get the best possible error guarantee. Furthermore, while error decays as more data becomes available, the rate of decay can vary widely depending on the task. We next see how various learning rates can be incorporated into the parameters of our game.

Figure 1: Plot of f = (1 r)a/(ra +(1 r)a) for various a values. When r = err1/(err1 + err2), f is the market share of Firm 1.

2.2 Error-Based Market Share Consider two competing ﬁrms (denoted by Firm 1 and 2) that provide identical services e.g. search engines. We assume the market shares of the ﬁrms depend on their ability to make accurate predictions e.g. responding to search queries. As discussed above, the quality of their models is determined ultimately by the size of their training data with a task-dependent learning rate. Each ﬁrm trains a model on its data and uses its model to provide the service. Let err1 and err2 denote the excess error of the ﬁrms for the corresponding models. Intuitively, these errors measure the quality of the ﬁrms services, so a ﬁrm with smaller error should have higher market share. We assume each ﬁrm captures a market share proportional to the relative errors of the two models. Formally, we deﬁne Firm 1 and 2 s error-based market share as

µ1 = 1 erra 1 erra 1 + erra 2 = erra 2 erra 1 + erra 2 and µ2 = 1 µ1. (1)

The constant a N, which we call the competition exponent (following Tullock [2001]), indicates the ferocity of the competition, or how strongly a relative difference in the errors of the ﬁrms models translates to a market advantage. As a approaches 0, the tendency is towards each ﬁrm capturing half of the market, and thus a large difference in the models errors is needed for one ﬁrm to gain a signiﬁcant advantage in the market share. Conversely, as a grows larger, even tiny differences in the models errors translate to massive differences in the market share. (See Figure 1.) An error-based model reﬂects markets for services which demand extremely low errors, such as vision systems for selfdriving cars. Under the error-based model, if Firm 1 has 99.99% accuracy and Firm 2 has 99% accuracy, Firm 1 will capture 99% of the market share. By contrast, an accuracybased model (i.e. when the market share of Firm 1 is deﬁned as acca 1/(acca 1 + acca 2)) would suggest much less realistic near-even split.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Firm 1 Firm 2

Figure 2: Vertices denote the ﬁrms and the directed arrows denote the probability of transition. acc is shorthand for accuracy and err is shorthand for error.

We provide another justiﬁcation suggesting that an errorbased market can arise even when the learned model is used to provide an everyday service in which high accuracy is not a strict requirement. Consider a customer who, each day, uses the service. She begins by choosing the service of one of the ﬁrms uniformly at random. As long as the answers she receives are correct, she has no reason to switch to the other ﬁrm s service, and uses the same ﬁrm s service tomorrow. However, once the ﬁrm makes an error, the customer switches to the other ﬁrm s service. The transition probabilities are therefore given by the accuracy and error of each ﬁrm. See Figure 2 for the Markov process representing this example. We can think of the market share captured by each ﬁrm as the proportion of the days on which each ﬁrm saw the customer. This is exactly the stationary distribution of the associated Markov process as stated in Lemma 4.

Lemma 4. Let µ1 and µ2 denote the probability mass that the stationary distribution of the Markov process in Figure 2 assigns to Firms 1 and 2. Then µ1 = err2/(err2 + err1), and µ2 = err1/(err1 + err2).

We defer proof of this result (and others) to the full version. By Lemma 4, each ﬁrms market share in the Markov process is exactly the error-based market share deﬁned in Equation 1 for a = 1. Similar arguments can motivate error-based market shares for values of a N, where the customer switches ﬁrms after experiencing a mistakes in a row. The probability of making a mistakes in a row is just erra i for Firm i, so the stationary distribution of the Markov process is exactly the two error-based market shares deﬁned in Equation 1. Using our observations from Section 2.1, we can write the error-based market share in the large-data regime as follows.

Theorem 1. Let m1 and m2 denote the number of data points of Firm 1 and 2, respectively. Then for some b R+, the market share of Firm 1 can be written (asymptotically) as µ1 = mb 1/(mb 1 + mb 2).

Because a can be any integer and there exists a corresponding learning problem for any learning rate in (0, 1], Theorem 1 implies that the combined competition exponent in our game can be any positive real number, motivated by the initial choice of a and the learning rate of the ﬁrms ML algorithms. The reductions and derivations in Sections 2.1 and 2.2 allow us to simplify the acquisition games as follows. We ﬁrst simplify the actions of each ﬁrm to only decide whether to buy the data or not, since model choice can be optimized once the number of available data points is known. Moreover,

Firm 1/Firm 2 Buy (B) Not Buy (NB) Buy (B) 1 2 (µ1(m1 + n, m2, b) + µ1(m1, m2 + n, b) p)

µ1(m1+n, m2, b) p

Not Buy (NB) µ1(m1, m2 + n, b) µ1(m1, m2, b)

Table 1: u1(s) in all of the strategy proﬁles of the game.

Theorem 1 not only allows us to simplify the form of market share, but also provides us with a meaningful interpretation for any positive (combined) competition exponent.

2.3 The Structure of the Game Given the reductions so far, we model our game as a twoplayer, one-shot, simultaneous move game. Firms 1 and 2 begin the game endowed with an existing number of data points, denoted by m1 and m2, respectively. Without loss of generality, we assume m1 m2. Each ﬁrm must decide whether or not to purchase an additional corpus of n data points1 at a ﬁxed price of p. The ﬁrm can either Buy (denoted by B) or Not Buy (denoted by NB) the new data. If both ﬁrms attempt to buy the data, the tie is broken uniformly at random (Section 4 discusses relaxing the assumption that only one ﬁrm may buy the data). After the purchase, each ﬁrm uses its data to train an ML model for its service. We assume the particular form of the market share of Firm 1 using the reduction in Theorem 1. The market share of Firm 2 is deﬁned to be one minus the market share of Firm 1. A strategy proﬁle s is a pair of strategies, one for each of the ﬁrms. Fixing s, the utility of Firm i (denoted by ui(s)) is its market share less any expenditure. The utility of Firm 1 in all of the strategy proﬁles of the game is summarized in Table 1 (rows and columns correspond to the actions of Firm 1 and 2). The utility of Firm 2 is deﬁned symmetrically. A strategy proﬁle is a pure strategy Nash equilibrium (pure equilibrium) if no ﬁrm can improve its utility by taking a different action, ﬁxing the other ﬁrm s action. A mixed strategy Nash equilibrium (mixed equilibrium) is a pair of distributions over the actions (one for each ﬁrm) where neither ﬁrm can improve its expected utility by using a different distribution over the actions, ﬁxing the other ﬁrm s distribution. We are interested in analyzing the Nash equilibria (equilibria).

3 Equilibria of the Game We now turn to ﬁnding and analyzing the equilibria. Let

A = (m1 + n)b

(m1 + n)b + m2b m1 b

m1b + (m2 + n)b ,

C = (m1 + n)b

(m1 + n)b + m2b m1 b

m1b + m2b ,

D = (m2 + n)b

m1b + (m2 + n)b m2 b

m1b + m2b .

These parameters have intuitive interpretations. A/2 is the expected change in Firm 1 s (or Firm 2 s) market share when moving the outcome from (NB, B) (or similarly (B, NB))

1For simplicity we assume this data is independent of and identically distributed to the data in possession of the ﬁrms.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

to (B, B). C is the change in market share that Firm 1 receives if it moves from (NB, NB) to (B, NB), and D is the symmetric relation from the perspective of Firm 2. We observe that A = C + D. Moreover, since C and D are nonnegative, it is immediately clear that A > max{C, D}. Finally, when m1 > m2 (i.e. Firm 1 starts with strictly more data), Firm 2 experiences a larger absolute change in market share moving from (NB, NB) to (NB, B) than to (B, NB). In other words, Firm 2 experiences a larger increase in market share when it buys the data compared to the decrease it experiences when Firm 1 receives the data.

Lemma 5. If m1 > m2, then for all n and b, we have C < D.

3.1 Characterization of the Equilibria The equilibria of the game clearly depend on the values of the parameters m1, m2, n, p and b. For example, if p > 1 (p 0), then neither ﬁrm should ever (not) buy the data. We observe that, ﬁxing the values of m1, m2, n and b, there is a range of values for p where the data is too expensive (too cheap) and NB (B) is a dominant strategy for both ﬁrms. There is also an intermediate range where more interesting behaviors emerge, as formally characterized in Theorem 2.

1. When p max{C, D}, (B, B) is the unique equilibrium.

2. When p A, (NB, NB) is the unique equilibrium. 3. When max{C, D} < p < A, (B, B) and (NB, NB) are both equilibria. Furthermore, there exists a (unique) mixed equilibrium ((α, 1 α), (β, 1 β)) such that

α 2(1 α) = p D

A p and β 2(1 β) = p C

where α and β denote the probabilities that Firms 1 and 2 select the action B, respectively.

Theorem 2 allows us to make several key observations about the market structure of this game. First, since C and D represent the maximum market share increase achievable by buying the data, the fact that the only equilibrium when p [min{C, D}, max{C, D}] is (B, B) means that both ﬁrms buy the data despite the fact that the best-case market share improvement is less than what they pay. This race for data thus mirrors the prisoner s dilemma if both ﬁrms could agree not to buy the data, they would be better off, but either would be tempted to buy the data and improve market share. Second, Theorem 2 illustrates how several features of equilibrium depend on the ferocity of competition, as determined by the exponent a; as a varies, the frontiers of the regimes described in Theorem 2 shift too. For example, if a = 0, market share is split evenly between the two ﬁrms, regardless of error or accuracy; unsurprisingly, as a 0 (which implies b 0), A, C, and D also approach 0, so the payoff difference between strategy proﬁles becomes negligible. As a consequence, regimes (1), (2), and (3) collapse, and all but very small p induce regime (4), where (NB, NB) is the only equilibrium. We observe similar behavior when a is large. Assuming that m1 > m2 + n, then a implies that

A 0 (and hence b ), again implying that regimes (1), (2), and (3) collapse. Thus again, unless p is very close to 0, (NB, NB) is the unique equilibrium. This is for a different reason than the small a case, however: Firm 1 now has no incentive to buy, since it is guaranteed almost the whole market share using its current model, and Firm 2 s initial disadvantage is too great to be overcome by buying the data. For a between these two extremes, many choices of m1 and m2 lead to a non-empty interval (max{C, D}, A), with endpoints far from 0 and 1. When p falls in this interval, regime (2) holds, so a mixed equilibrium exists; we solve for mixed equilibria in Section 3.2. The complete equilibrium characterization for all regimes in Theorem 2 allows us to pin down the optimal ﬁxed price from the seller revenue perspective. However, in full generality, the seller s problem encompasses further possibilities like auction pricing; hence, we defer this calculation to future work. See Section 5 for a discussion.

3.2 Mixed Equilibrium and Monotonicity Analysis Next, we carefully examine the mixed equilibrium and study the relationship between the weights each ﬁrm places on each action and the parameters of the game. Recall that α and β in Theorem 2 denote the probability that Firms 1 and 2 purchase the data in the mixed equilibrium. When m1 > m2, then α < β which implies that the smaller ﬁrm will succeed more often in purchasing the data in the mixed equilibrium. The relationship of α and β with the number of data points n and the price p is as follows.

Lemma 6. Let ((α, 1 α), (β, 1 β)) denote the mixed equilibrium in the the regime where max{C, D} < p < A. Then α and β, both increase when p increases or n decreases.

Lemma 6 may seem counterintuitive, as it implies that as the price p rises through the range in which a mixed equilibrium exists, the probability that any of the ﬁrms want to buy the data also increases. However, once the price p crosses the threshold A, the unique equilibrium is the pure strategy (NB, NB). This gives rise to a discontinuity. See Figure 3. Of course, this all says nothing about the equilibrium utilities for the ﬁrms; as long as the equilibrium utilities are not identical, players will naturally have ordinal preferences over the set of equilibria. We analyze these preferences in Lemma 7, which elucidates the discontinuity at p = A.

Lemma 7. When p (max{C, D}, A), u1(NB, NB) u1(s) and u2(NB, NB) u2(s) for all strategy proﬁles s. However, u1(B, NB) u1(B, B) u1(NB, B), while u2(NB, B) u2(B, B) u2(B, NB).

While both ﬁrms agree that (NB, NB) is the most preferred outcome, their preferences over other outcomes are discordant. In particular, given that at least one ﬁrm will try to buy the data, each ﬁrm would prefer itself to be the buyer. If either ﬁrm believes the other may try to buy the data, it will put positive weight on the action B in the mixed equilibrium. But for p > A, both ﬁrms know that it would be irrational for the other to buy, so the unique equilibrium is (NB, NB).

3.3 Change in the Market Share We now analyze the change in the market shares.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

Figure 3: Firm 1 selection probability of B by p. Blue, red and yellow lines correspond to (B, B), mixed, and (NB, NB) equilibria.

Lemma 8. When m1 m2, the only strategy proﬁle that strictly increases the market share of Firm 1 is (B, NB). So while only (B, NB) leads to an increase in the market share of Firm 1, it is not a pure equilibrium. We show that even when ﬁrms play according to the mixed equilibrium, the expected market share of Firm 1 does not strictly increase. Theorem 3. When m1 m2 and p (max{C, D}, A), the expected market share of Firm 1 does not strictly increase if both ﬁrms play according to the mixed equilibrium. Together, Lemma 8 and Theorem 3 demonstrate that the natural forces of the interaction on the market are, perhaps surprisingly, antimonopolistic. Since we assume that Firm 1 enters the game with a greater market share than Firm 2, but that no equilibrium allows Firm 1 to increase its market share, the game disfavors the concentration of market power. We analyze the implications for consumers below.

3.4 Consumer Welfare in Equilibrium In this section, we show that consumers prefer the outcome (B, NB), in which the initially stronger ﬁrm concentrates its market power. This is not supported by a pure equilibrium in any regime, nor is it favored by mixed equilibrium; hence, the interests of the ﬁrms do not align with the interests of the consumers. We deﬁne the consumer welfare as follows. Deﬁnition 1. Let m1(s) and m2(s) denote the (expected) number of data points that Firm 1 and 2 posses when playing according to strategy proﬁle s. Then the consumer welfare is CW(s) = µ1(s)(1 err1(m1(s)))+µ2(s)(1 err2(m2(s))). The welfare deﬁnition arises from assuming consumers receive 1 unit of utility for accurate predictions and 0 for erroneous ones. Notice that maximizing this deﬁnition of consumer welfare is exactly equivalent to minimizing the marketshare weighted error probability. This leads to Theorem 4. Theorem 4. Suppose m1 > m2. Then the consumers have the following preferences over the strategy proﬁles. CW(B, NB) > CW(B, B) > CW(NB, B) > CW(NB, NB).

Note that consumers preference for the outcome in which Firm 1 concentrates its market power is not the same as saying that the consumers prefer a monopoly. Rather, the consumers prefer higher quality services. When m1 > m2, Firm 1 s model before acquiring the data has a lower error rate than that of Firm 2, and so, of all the possible outcomes, the one which leads to a product with the lowest error rate is the one in which Firm 1 is able to improve on its already superior product. But if Firm 2 were not a player at all, then a monopolistic Firm 1 would have no incentive to buy the new data. Therefore, a monopoly without the threat of competition will not lead to the best outcome from the consumer s perspectives.

4 Extensions and Robustness

Next, we consider robustness to two simple extensions.

Firm Acquisition We treat the data seller as a market participant with its own customers and market share. This allows us to model ﬁrm acquisition: buying the data translates to acquiring the ﬁrm and its customers, and neither ﬁrm buying the data corresponds to the third ﬁrm remaining in the market.

Simultaneous Sale Rather than the data being exclusively sold to one ﬁrm in the case that both ﬁrms buy, we allow the seller to sell the data to both ﬁrms at the same ﬁxed price. In both of these extensions, we can again derive the quantities A, C, and D; while the precise quantities change, their rankings and relationships do not. Thus the general phenomenon of three regimes, with mixing over the middle regime, remains unchanged. Moreover, in expectation, the market share becomes less asymmetric in both extensions.

5 Future Directions

We view our work as a ﬁrst step towards modeling and analyzing competition for data in markets driven by ML. There are several directions for further investigation. First, we modeled the data to be acquired as having a ﬁxed size and a ﬁxed price, but real datasets are divisible. Second, we can consider a strategy space expanded to include buying any number of data points at a ﬁxed price per data point or nonlinear function of the number of data points purchased. More generally, treating the seller of the data as an additional player in the game allows for further questions, such as: what is the optimal revenue-generating mechanism to sell the data? And does the optimal mechanism maximize social welfare? Additionally, many ﬁrms that provide learning-based services acquire their data through their customers that use the service. In this way, capturing a larger market share induces a feedback loop which allows a ﬁrm to iteratively improve its product. What can be said about our game in a repeated setting with dynamic feedback effects? Furthermore, ﬁrms that provide digital services often operate in a secondary market in which other ﬁrms pay for advertising spots in their product. Improving one s market share should in principle allow a ﬁrm to charge advertisers a higher price, but we do not know to what extent this affects the analysis of the equilibria of the game. Incorporating advertiser behavior would greatly complicate the model but provide potentially interesting results.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)

[Barron, 1994] Andrew Barron. Approximation and estimation bounds for artiﬁcial neural networks. Machine Learning, 14(1):115 133, 1994.

[Ben-Porat and Tennenholtz, 2018] Omer Ben-Porat and Moshe Tennenholtz. A game-theoretic approach to recommendation systems with strategic content providers. In Proceedings of the 32nd Annual Conferenceon Neural Information Processing Systems, pages 1118 1128, 2018.

[Ben-Porat and Tennenholtz, 2019] Omer Ben-Porat and Moshe Tennenholtz. Regression equilibrium. In Proceedings of the 2019 ACM Conference on Economics and Computation, 2019.

[Berend and Kontorovich, 2011] Daniel Berend and Aryeh Kontorovich. The missing mass problem. Co RR, abs/1111.2328, 2011.

[Berry et al., 2017] Randall Berry, Michael Honig, Thanh Nguyen, Vijay Subramanian, and Rakesh Vohra. The value of sharing intermittent spectrum. Co RR, abs/1704.06828, 2017.

[David, 1987] Paul David. Some new standards for the economics of standardization in the information age. Economic Policy and Technological Performance, pages 206 239, 1987.

[Decrouez et al., 2018] Geoffrey Decrouez, Michael Grabchak, and Quentin Paris. Finite sample properties of the mean occupancy counts and probabilities. Bernoulli, 24(3):1910 1941, 2018.

[Economides, 1996] Nicholas Economides. The economics of networks. International Journal of Industrial Organization, 14(6):673 699, 1996.

[Farrell and Saloner, 1986] Joseph Farrell and Garth Saloner. Installed base and compatibility: Innovation, product preannouncements, and predation. The American Economic Review, 76(5):940 955, 1986.

[Good, 1953] Irving Good. The population frequencies of the species and the estimation of population parameters. Biometrika, 40(3-4):237 264, 1953.

[Kamien and Tauman, 1986] Morton Kamien and Yair Tauman. Fees versus royalties and the private value of a patent. The Quarterly Journal of Economics, 101(3):471 491, 1986.

[Kamien et al., 1992] Morton Kamien, Shmuel Oren, and Yair Tauman. Optimal licensing of cost-reducing innovation. Journal of Mathematical Economics, 21(5):483 508, 1992.

[Katz and Shapiro, 1985] Michael Katz and Carl Shapiro. Network externalities, competition, and compatibility. The American Economic Review, 75(3):424 440, 1985.

[Kearns and Vazirani, 1994] Michael Kearns and Umesh Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994.

[Mansour et al., 2018] Yishay Mansour, Aleksandrs Slivkins, and Zhiwei Steven Wu. Competing bandits: Learning under competition. In Proceedings of the 9th Innovations in Theoretical Computer Science Conference, pages 48:1 48:27, 2018. [Orlitsky et al., 2003] Alon Orlitsky, Narayana Santhanam, and Junan Zhang. Always good turing: Asymptotically optimal probability estimation. In Proceedings of the 44th Symposium on Foundations of Computer Science, pages 179 188, 2003. [Primack and Korosec, 2016] Dan Primack and Kirsten Korosec. GM buying self-driving tech startup for more than $1 billion. Fortune, 2016. [Shalev-Shwartz and Ben-David, 2014] Shai Shalev Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014. [Tullock, 2001] Gordon Tullock. Efﬁcient Rent Seeking, pages 3 16. Springer, 2001.

Proceedings of the Twenty-Eighth International Joint Conference on Artiﬁcial Intelligence (IJCAI-19)