# strategic_data_sharing_between_competitors__6a9de91a.pdf Strategic Data Sharing between Competitors Nikita Tsoy INSAIT, Sofia University Sofia, Bulgaria nikita.tsoy@insait.ai Nikola Konstantinov INSAIT, Sofia University Sofia, Bulgaria nikola.konstantinov@insait.ai Collaborative learning techniques have significantly advanced in recent years, enabling private model training across multiple organizations. Despite this opportunity, firms face a dilemma when considering data sharing with competitors while collaboration can improve a company s machine learning model, it may also benefit competitors and hence reduce profits. In this work, we introduce a general framework for analyzing this data-sharing trade-off. The framework consists of three components, representing the firms production decisions, the effect of additional data on model quality, and the data-sharing negotiation process, respectively. We then study an instantiation of the framework, based on a conventional market model from economic theory, to identify key factors that affect collaboration incentives. Our findings indicate a profound impact of market conditions on the data-sharing incentives. In particular, we find that reduced competition, in terms of the similarities between the firms products, and harder learning tasks foster collaboration. 1 Introduction Machine learning has become integral to numerous business functions in the past decade, such as service operations optimization, the creation of new products, consumer service analytics, and risk analysis (Chui et al., 2022). Despite the transformative power of machine learning, its efficacy hinges significantly on the quality and quantity of the training data, making data a key asset corporations can use to increase their profit. One way to enhance data access and machine learning models is collaboration via data sharing (Rieke et al., 2020; Durrant et al., 2022). Data-sharing schemes can bring benefits in many industries, such as agriculture, finance, and healthcare (Durrant et al., 2022; Fed AI; Rieke et al., 2020). However, at least two barriers exist to such collaborations. The first is privacy and the associated regulatory concerns, which can be partially addressed by new collaborative learning techniques, such as federated learning (Kairouz et al., 2021). The second is proper incentives for collaboration. If the entities have no such incentives, they may not collaborate at all, free-ride (Blum et al., 2021; Karimireddy et al., 2022), or attack the shared model (Blanchard et al., 2017). Such conflicting incentives arise naturally between market competitors. On the one hand, the competitors are an appealing source of data, since they operate on the same market and have data of a similar type. On the other hand, the collaboration could also strengthen the competitors machine learning models, potentially leading to downstream losses for the company. This concern is especially important for big firms that can influence prices and act strategically by considering how their decisions affect competitors responses. Strategic actions might increase profits through various channels and produce complex downstream effects. For example, firms might collude to capture 37th Conference on Neural Information Processing Systems (Neur IPS 2023). more revenue1 or engage in a price war to win a market.2 Thus, data sharing between big firms might have a complicated downstream effect on their profits. Our contributions Despite the significance of this data-sharing trade-off, particularly for large, tech-savvy companies, there is limited research on how market competition affects collaborative learning incentives. Our work aims to fill this gap by proposing a framework to investigate datasharing incentives. The framework is modular and consists of three components: market model, data impact, and collaboration scheme. These components represent the firms production decisions, the effect of additional data on model quality and the data-sharing negotiation process, respectively. To investigate the key factors that influence data-sharing decisions, we instantiate our framework using a conventional market model from economic theory and a data impact model grounded in learning theory. We examine three potential collaboration schemes: binary collaboration between two firms, partial collaboration between two firms, and binary collaboration among multiple firms. Our findings consistently indicate a profound impact of market conditions on the collaboration incentives. In the case of two firms, we theoretically demonstrate that collaboration becomes more appealing as market competition, measured by the similarities of the firms products, decreases or the learning task complexity increases. For multiple firms, we conduct simulation studies that reveal similar trends. 2 Related work Data sharing incentives in machine learning Incentives for collaboration in machine learning constitute an important research topic, especially in the context of new learning paradigms, such as federated learning (Kairouz et al., 2021). Some notable lines of work concern incentives under data heterogeneity (Donahue & Kleinberg, 2021; Werner et al., 2022); compensation for computation, energy, and other costs related to training (Yu et al., 2020; Tu et al., 2022; Liu et al., 2022); fair distribution of the benefits resulting from collaboration (Lyu et al., 2020a,b; Blum et al., 2021); and free-riding (Richardson et al., 2020; Karimireddy et al., 2022). We refer to Zeng et al. (2021); Yang et al. (2020); Zhan et al. (2021) for a detailed overview. A fundamental difference between our work and the research described above is that we explicitly consider the downstream market incentives of the firms and their effects on data-sharing decisions. These incentives are orthogonal to those previously studied in the literature. For instance, the datacollection game of Karimireddy et al. (2022) can be included in our framework as an additional stage of the game after our coalition formation stage. In addition, our framework is not specific to federated learning, as it can be applied regardless of the learning procedure used on the shared dataset. To our awareness, Wu & Yu (2022) is the only work considering data sharing between market competitors. However, the authors do not explain the mechanisms behind the effects of ML models quality on the market. Therefore, their framework can not predict whether a potential coalition will be beneficial for the company but only indicate the benefits of the coalition post factum. In addition, they use the marketing model of Rust & Zahorik (1993) instead of the classic economic models (Tirole, 1988) to model competition, potentially constraining the applicability. Finally, their notion of the sustainability of coalitions does not arise from standard concepts in game theory. In contrast, we base our analysis on the standard Nash equilibrium concept (Nash, 1951). Market competition and information sharing Competitive behavior is a well-studied topic in the economic literature (Tirole, 1988). We refer to Shapiro (1989) for an extensive review of the theory of competition. From a more applied perspective, Reiss & Wolak (2007) discuss how to test the behavior of firms empirically, while Berry et al. (2019) overview the modern empirical studies of industrial organization. A rich economic literature studies the incentives of competing firms to share information (Raith, 1996; Bergemann & Bonatti, 2019). For example, demand (Vives, 1984) or costs (Fried, 1984) might be unobservable, and firms might share information about them to improve their decisions. In contrast to these studies, in our model, firms share information to improve their products or optimize their costs, not to reduce uncertainty. Thus, the mechanisms and conclusions of these earlier works and our work differ. 1https://www.nytimes.com/2021/01/17/technology/google-facebook-ad-deal-antitrust.html 2https://www.reuters.com/article/us-uber-results-breakingviews-id USKCN1UY2X5 3 Data-sharing problem In this section, we provide a general formulation of the data-sharing problem, consisting of three components: market model, data impact, and collaboration scheme. The market model outlines consumers and firms consumption and production decisions. Data impact explains how additional data affects machine learning model quality. Lastly, the collaboration scheme details the data-sharing negotiation process among the competitors. Given a specific application, these three components can be derived/modeled by market research or sales teams, operation management and data scientists, and mediators, respectively. We begin with an illustrative example that will help us to clarify the abstract concepts used in our framework and then introduce the three framework components in their full generality. 3.1 Running example Consider a city with a taxi market dominated by a few firms. Each firm collects data (e.g., demand for taxis and traffic situation) to train a machine learning model for optimizing driver scheduling and other internal processes, which is crucial for improving scheduling quality, reducing costs, or enhancing services. Each company can use only its own data or take part in a collaborative training procedure, like federated learning, with its competitors. Collaboration can substantially improve its machine learning model, but it may also strengthen the models of its competitors. Thus, the company must carefully evaluate the impact of data-sharing on its downstream profits. 3.2 Market model The market model encompasses consumer actions (demand factors) and firm production actions (supply factors). We consider a market with m firms, F1, . . . , Fm, each producing qi R+ units of good Gi and offering them to consumers at price pi R+, where R+ = [0, ). In our example, G1, . . . , Gm represent taxi services from different companies, with consumers being city travelers, qi are the total number of kilometers serviced by company i, and pi are the prices per kilometer. In this setting, prices and quantities are 2m unknown variables. Correspondingly, we need 2m constraints to describe the consumers and firms market decisions, which we derive from the consumers and the firms rationality. In contrast to standard market models, our framework allows product utilities and costs to depend on the qualities of the machine learning models of the firms v = (v1, . . . , vm). This dependence models the impact of the learned models on the services and production pipelines of the firms. Consumers behavior Each consumer j optimizes their utility uj(gj, qj, v) by purchasing goods given the market prices. We assume that consumers cannot influence prices, as there are many consumers and they do not cooperate. The consumer s utility depends on three factors. The first one is the consumed quantities qj = (qj 1, . . . , qj m) of products G1, . . . , Gm. The second is the machine learning models qualities v, as these may impact the utility of the corresponding products. The last one is consumed quantities gj = (gj 1, . . . , gj k) of goods outside the considered market (e.g., consumed food in our taxi example). Assuming that each consumer j can only spend a certain budget Bj on all goods, one obtains the following optimization problem max gj,qj uj(gj, qj, v) s.t. l=1 plgj l + i=1 piqj i Bj, (1) where p are the outside products prices (which we consider fixed). The solution to this problem qj, i (p, v) determines the aggregate demanded quantity of goods qi(p, v) := X j qj, i (p, v). (2) The functions qi(p, v) (known as the demand equations) link the prices pi and the demanded quantities qi and constitute the first m restrictions in our setting. Firms behavior Firms maximize their expected profits, the difference between revenue and cost, Πe i = Ev(piqi Ci(qi, vi)). (3) Here Ci(qi, vi) is the cost of producing qi units of good for the firm Fi, and the expectation is taken over the randomness in the models quality, influenced by the dataset and the training procedure. Note that the model quality is often observed only after testing the model in production (at test time). Therefore, we assume that the firms reason in expectation instead. In our running example, Ci depends on driver wages, gasoline prices, and scheduling quality. The firms may act by either deciding on their produced quantities, resulting in what is known as the Cournot competition model (Cournot, 1838); or on their prices, resulting in the Bertrand competition model (Bertrand, 1883). As the demand equations (2) may interrelate prices and quantities for various products, firms strategically consider their competitors actions, resulting in a Nash equilibrium. The equilibrium conditions provide another m constraints, enabling us to solve the market model entirely. 3.3 The impact of data on the market Each company Fi possesses a dataset Di that can be used to train a machine learning model (e.g., trip data in the taxi example) and may opt to participate in a data-sharing agreement with some other firms. Denote the final dataset the company acquired by Dc i . The company then uses the dataset to train a machine learning model. We postulate two natural ways that the model quality vi can impact a company. The first one is by reducing the company s production costs Ci(qi, vi), for example, by minimizing time in traffic jams. The second is by increasing the utility of its products qi(p, v), for example, by minimizing the waiting time for taxi arrival. 3.4 Collaboration scheme Following classic economic logic, we posit that firms will share data if this decision increases their expected profits Πe i . Since firms can not evaluate the gains from unknown data, we assume that they know about the dataset characteristics of their competitors (e.g., their number of samples and distributional information). Although expected profit maximization determines individual datasharing incentives, forming a coalition necessitates mutual agreement. Therefore, the precise gametheoretic formulation of the data-sharing problem depends on the negotiation process details, such as the number of participants and full or partial data sharing among firms. Consider our taxi example. If only two firms are present, it is natural that they will agree on sharing their full datasets with each other if and only if they both expect that this will lead to increased profits. Partial data-sharing will complicate the process, leading to an intricate bargaining process between the firms. Finally, if many companies are present, the data-sharing decisions become even more complicated, as a firm needs to reason not only about the data-sharing benefits but also about the data-sharing decisions of other firms. Legal and training costs considerations Other considerations can also enter into our framework through the collaboration scheme. In particular, if firms have legal requirements on data usage, they should impose suitable constraints on possible data-sharing agreements. If the collaborative learning procedure involves training large models, the firms should negotiate how to split training costs. For example, they might split the costs equally among all coalition members or proportionally to the sizes of their datasets. 4 Example market and data impact models In this section, we instantiate the general framework using a conventional market model from economic theory and a natural data impact model justified by learning theory. These models allow us to reason quantitatively about the data-sharing problem in the following sections, leading to the identification of several key factors in the data-sharing trade-off. To focus solely on the data-sharing trade-off, we make several simplifying assumptions: the firms data is homogeneous, training costs are negligible, and legal constraints are not present or are automatically fulfilled. 4.1 Market model In order to make quantitative statements about the firms actions and profits within the general model of Section 3.2, one needs to make further parametric assumptions on utilities uj and cost functions Ci and specify the competition game. To this end, we use a specific utility and cost model standard in the theoretical industrial organization literature (Tirole, 1988; Carlton & Perloff, 1990). Despite its simplicity, this model effectively captures the basic factors governing market equilibrium for many problems and is often used to obtain qualitative insights about them. 4.1.1 Demand We assume that, in the aggregate, the behavior of consumers (1) can be described by a representative consumer with quasi-linear quadratic utility (QQUM, Dixit 1979) max g,q u(g, q) := i q2 i + 2γ P +g = ιTq q TGq 2 +g s.t. g+p Tq B. (4) Here, ι = (1, . . . , 1)T, G = (1 γ)I + γιιT and g is the quantity from a single outside good. Additionally, γ 1 m 1, 1 is a measure of substitutability between each pair of goods: higher γ corresponds to more similar goods. In our running example, γ describes the difference in service of two taxi companies, such as the difference in the cars quality or the location coverage. We refer to Choné & Linnemer (2019) for a detailed discussion on the QQUM model and the plausibility of assuming an aggregate consumer behavior. In this case, the exact form of the demand equations (2) is well-known. Lemma 4.1 (Amir et al. 2017). Assume that G 1(ι p) > 0 and p TG 1(ι p) B. The solution to the consumer problem (4) is pi = 1 qi γ X j =i qj qi = 1 γ pi γ(m 2)pi + γ P j =i pj (1 γ)(1 + γ(m 1)) . (5) The technical conditions G 1(ι p) > 0 and p TG 1(ι p) B ensure that the consumers want to buy at least a bit of every product and do not spend all of their money in the considered market, respectively. For completeness, we prove Lemma 4.1 in Appendix A.1. 4.1.2 Supply We assume that the cost functions (3) are linear in the quantities Πe i = E(piqi ciqi), (6) where the parameter ci depends on the quality of the machine learning model. We denote ce i = EDc i (ci). Remark 4.2. Here we assume that the quality of the machine learning model only affects the production costs, and not the utilities. However, in this market model, it is equivalent to assuming that the machine learning model affects the consumers utility via increasing the effective quantities of the produced goods. Specifically, assume that each product has an internal quality wi and it increases the effective amount of good in the consumer s utility (4), resulting in utility u(g, w1q1, . . . , wmqm). This model is equivalent to the model above after substituting qi, pi, and ci with their effective versions qi = wiqi, pi = pi/wi, and ci = ci/wi. We now derive the remaining constraints on the prices and costs. Note that these equilibria depend on the expected costs ce i and hence on the quality of the machine learning models. Cournot competition Each firm acts by choosing an output level qi, which determines the prices (5) and expected profits (6). The next standard lemma describes the Nash equilibrium of this game. Lemma 4.3. Assume that the expected marginal costs are equal to ce 1, . . . , ce m, and companies maximize their profits (6) in the Cournot competition game. If i (2 γ)(1 ce i) > γ P j =i(ce i ce j), Nash equilibrium quantities and profits are equal to q i = 2 γ (2 + γ(m 2))ce i + γ P j =i ce j (2 γ)(2 + (m 1)γ) , Πe i = (q i )2. Bertrand competition Each firm acts by setting the price pi for their product, which determines the quantities (5) and expected profits (6). The following lemma describes the Nash equilibrium of this game. Lemma 4.4. Assume that the expected marginal costs are equal to ce 1, . . . , ce m, and companies maximize their profits (6) in the Bertrand competition game. If i d1(1 ce i) > d3 P j =i(ce i ce j), Nash equilibrium prices and profits are equal to p i = d1 + d2ce i + d3 P j =i ce j d4 , Πe i = (1 + γ(m + 2))(p i ce i)2 (1 γ)(1 + γ(m 1)) , where d1, . . . , d4 depend only on γ and m. In both lemmas, the technical conditions i (2 γ)(1 ce i) > γ P j =i(ce i ce j) and i d1(1 ce i) > d3 P j =i(ce i ce j) ensure that every firm produces a positive amount of good in the equilibrium. We prove these Lemmas in Sections A.2 and A.3. 4.2 Data impact model In order to reason strategically about the impact of sharing data with others, firms need to understand how additional data impacts their costs. Here, we consider the case where all datasets are sampled from the same distribution D. While heterogeneity is an important concern in collaborative learning, we focus on the homogeneous case since heterogeneity is orthogonal to the incentives arising from market equilibrium considerations. Homogeneity allows us to consider the expected costs in form ce i = ce(nc i), where nc i = |Dc i | is the number of points the company i has access to. Using the examples below, we motivate the following functional form for ce(n) ce(n) = a + b nβ , β (0, 1]. (7) In this setting, β indicates the difficulty of the learning task, higher β corresponds to a simpler task (Tsybakov, 2004), a represents the marginal costs of production given a perfect machine learning model, while a + b corresponds to the cost of production without machine learning optimizations. Additionally, we assume that a < 1 and b 1 a is small enough, so that the technical requirements of Lemmas 4.1, 4.3, and 4.4 are satisfied (see Equations (10) and (11)). Intuitively, this requirement ensures that firms do not exit the market during the competition stage. In the examples below, we consider a setting where a company needs to perform action s Rn during production that will impact its costs. However, there is uncertainty in the production process coming from random noise X valued in Rm. Thus, the cost of production is a random function c(s, X): Rn Rm R+. Asymptotic normality Assume that the firms use background knowledge about their production processes in the form of a structural causal model of X. However, they do not know the exact parameters of the model and use data to estimate them. Suppose a firm uses the maximum likelihood estimator to find the parameters and chooses the optimal action sfin based on this estimate. In this case, the result of optimization EX(c(sfin, X)) will approximately have a generalized chi-square distribution (Jones, 1983), resulting in the following approximation ED,X(c(sfin, X)) a + b We refer to Appendix A.4 for further details. This result implies the same dependence of the expected marginal costs on the total number of samples n as Equation (7) with β = 1. Stochastic optimization A similar dependence arises when the company uses stochastic optimization to directly optimize the expected cost c(s) = EX(c(s, X)). If the company uses an algorithm with provable generalization guarantees (e.g., SGD with a single pass over the data, Bubeck 2015) and the function c(s) is strongly convex, the outcome sfin will satisfy ED(c(sfin) c(s )) = O 1 where s is the optimal action, resulting in the same dependence as in the previous paragraph. Statistical learning theory Another justification for the dependency on the number of data points comes from statistical learning theory. The observations from this subsection are inspired by similar arguments in Karimireddy et al. (2022). Assume that a firm trains a classifier used in the production process. In this case, the cost c depends on the classifier accuracy a, and the cost overhead will satisfy c(a) c(a ) c (a )(a a), where a is the optimal accuracy achievable by a chosen family of classifiers. If the family of classifiers has a finite VC dimension d, a well-known statistical bound (Shalev-Shwartz & Ben-David, 2014) gives motivating the same dependence as Equation (7) with β = 1/2. 5 Data sharing between two firms Having introduced example market and data impact models, we can specify a negotiation scheme and quantitatively analyze the data-sharing problem. Through this, we aim to obtain qualitative insights into how market parameters impact data-sharing decisions. Following conventional economic logic (Tirole, 1988; Carlton & Perloff, 1990), we start with the simplest possible collaboration scheme, in which two companies make a binary decision of whether to share all their data with each other. In the next sections, we proceed to the more complicated situations of partial data sharing and data sharing between many parties. Full data sharing between two firms According to our framework, both companies compare their expected profits at the Nash equilibrium (from Lemma 4.3 or 4.4) for two cases, when they collaborate and when they do not. Then they share data with each other if and only if they both expect an increase in profits from this action. Remark 5.1. Notice that we can incorporate some legal requirements into this scheme by redefining full data-sharing action. For example, if sharing consent constraints, araising from copyright or other data ownership constraint, are present, we could assume that the companies will share only a part of their dataset for the collaborative learning. While, previously, the competitor gets access to all data of the firm nshare i = n i + ni, now they will get access to only data of people who agree with data sharing nshare i = n i + ni,consent. For Cournot competition, Lemma 4.3 and Equation (7) give the following collaboration criterion i Πe share > Πe i,ind (2 γ)(n β i (n1 + n2) β) > γ(n β i n β i ), where n i is the size of the data of the player that is not i, Πe i,share is the expected profit in collaboration, and Πe i,ind is the expected profit when no collaboration occurs. Similarly, in the case of Berntrand competition, Lemma 4.4 gives i Πe share > Πe i,ind (2 γ γ2)(n β i (n1 + n2) β) > γ(n β i n β i ). The theorem below describes the properties of these criteria (see proof in Appendix A.5). Theorem 5.2. In the case of γ 0, it is profitable for the firms to collaborate. In the case of γ > 0, there exists a value xt(γ, β), where t {Bertrand, Cournot} is the type of competition, such that, for the firm i, it is profitable to collaborate only with a competitor with enough data: Πe share > Πe i,ind n i n1 + n2 > xt(γ, β). The function xt has the following properties: 1. xt(γ, β) is increasing in γ. 2. x Bertrand x Cournot. 3. xt(γ, β) is increasing in β. Discussion The theorem above indicates that the firms are more likely to collaborate when either the market is less competitive (properties 1 and 2) or the learning task is harder (property 3). Indeed, the threshold x becomes smaller when the products are less similar (γ is bigger), making the market less competitive. Similarly, x becomes smaller in the Cournot setting since it is known to be less competitive than the Bertrand one (Shapiro, 1989). Finally, when the learning task is harder (β is smaller), x decreases, making collaboration more likely. Intuitively, it happens because the decrease in cost (7) from a single additional data point is higher for smaller β. In the supplementary material, we explore several extensions for the two firms case. In Appendix C.1 and Appendix C.2, we look at different cost and utility functions, respectively. In Appendix C.3, we consider the case where a, b and/or β might differ among firms and derive an analog of Theorem 5.2. In Appendix C.4, we consider the data-sharing problem with two companies with heterogeneous data in the context of mean estimation. In Appendix E, we look at the welfare implications of data sharing. 6 Partial data sharing In this section, we study a two-companies model in which companies can share any fraction of their data with competitors. For brevity, we only study this model in the case of Cournot competition. In this setup, each firm Fi chooses to share a fraction λi [0, 1] of data with its competitor and expects its costs to be ce i = ce(ni + λ in i) = a + b(ni + λ in i) β, where λ i is the fraction of data shared by the company that is not i. Remark 6.1. Notice that we can incorporate some legal requirements into this scheme by constraining the action spaces of participants. For example, one can implement sharing consent constraints, araising from copyright or other data ownership constraint, by constraining the firms choices of λ. While, previously, the firms were able to share all data λ [0, 1], now they can share only data of people who agree with data sharing λ [0, λconsent]. We will use the Nash bargaining solution (Binmore et al., 1986) to describe the bargaining process between the firms. Namely, the firms will try to compromise between themselves by maximizing the following Nash product max λi [0,1](Πe 1(λ1, λ2) Πe 1(0, 0))(Πe 2(λ1, λ2) Πe 2(0, 0)) s.t. i Πe i (λ1, λ2) Πe i (0, 0) 0. (9) The Nash bargaining solution is a commonly used model for two-player bargaining, as it results in the only possible compromise satisfying invariance to the affine transformations of profits, Pareto optimality, independence of irrelevant alternatives, and symmetry (Binmore et al., 1986). The following theorem describes the firms behavior in this setup (see proof in Appendix A.6). Theorem 6.2. W.l.o.g. assume that n1 n2. Additionally, assume (1 a) b (in particular, b < 5(1 a)). The solution (λ 1, λ 2) to the problem (9) is λ 1 = λ1 + O b 1 a , λ 2 = 1, where β 1 n1 n1+n2 β 1/β 1 , 4+γ2 nβ 2 + (2 γ)2 1 otherwise. Moreover, λ1 is a decreasing function in γ and β, and λ1 = 4+γ2 β+2 (1 + o(1)) when Discussion We can see that the results of Theorem 6.2 are similar to those in Section 5. When there is a big difference between the amount of data (n1 n2), the big firm does not collaborate with the small firm (λ 1 = O((n2/n1)β+2)). However, when the firms have similar amounts of data, the large firm will share almost all of it (λ 1 1), and the sharing proportion increases with a decrease in competition (smaller γ) and an increase in learning task hardness (smaller β). In contrast to the previous results, the smaller firm always shares all its data with the competitor (λ 2 = 1). 7 Data sharing between many firms In this section, we investigate the data-sharing problem for many companies. For brevity, we only consider the case of Cournot competition. The key challenge in the case of many companies is the variety of possible coalitional structures. Moreover, since the companies may have conflicting incentives and data sharing requires mutual agreement, it is not immediately clear how to assess the plausibility of a particular structure. We discuss one possible way to solve this problem using non-cooperative game theory (Kóczy, 2018). We assume a coalition formation process between the firms, model it as a non-cooperative game, and study its Nash equilibria. The non-cooperative approach often offers a unique sub-game perfect equilibrium, a standard solution notion in sequential games (Mas-Colell et al., 1995). However, this equilibrium depends on the assumptions about the bargaining process. For this reason, in Appendix D we also study alternative formulations: a cooperative setting and two non-cooperative settings with only one non-singleton coalition. Non-cooperative data-sharing game We order the firms in a decreasing data size order so that firm F1 has the largest dataset. In this order, the firms propose forming a coalition with their competitors. First, the largest company makes up to 2m 1 proposals. For any offer, each invited firm accepts or declines it in the decreasing data size order. If anyone disagrees, the coalition is rejected, and the first firm makes another offer. Otherwise, the coalition is formed, and all companies in it leave the game. After the first company forms a coalition or exhausts all of its offers, the second firm, in the decreasing data size order, begins to propose, etc. We use the standard backward induction procedure (Mas-Colell et al., 1995) to calculate the sub-game perfect equilibrium of this game. This method solves the game from the end to the beginning. First, the algorithm looks at all possible states one step before the end of the game. Since the actions of the last player are rational, the algorithm can identify these actions accurately. This way, the algorithm moves one step earlier in the coalition formation process and can now identify the action taken before this state occurred. The procedure iterates until the start of the game is reached. Experiments We use the procedure described above to empirically test the conclusions of the previous sections. We sample m dataset sizes, one for each firm, from a distribution P = N(µ, σ2) clipped at 1 form below. Then, we calculate the equilibrium of the resulting data-sharing game and 0.25 0.50 0.75 1.00 m = 3, P = N(1000, 3002) 0.25 0.50 0.75 1.00 1.4 m = 3, P = N(1000, 6002) 0.6 0.8 1.0 m = 3, P = N(1000, 3002) 0.6 0.8 1.0 m = 3, P = N(1000, 6002) 0.25 0.50 0.75 1.00 m = 4, P = N(1000, 3002) 0.25 0.50 0.75 1.00 m = 4, P = N(1000, 6002) 0.6 0.8 1.0 m = 4, P = N(1000, 3002) 0.6 0.8 1.0 m = 4, P = N(1000, 6002) Similarity between products (γ) Simplicity of learning task (β) Average coalition size Figure 1: The dependence of the average coalition size on γ (left part) and β (right part) in the synthetic experiments. The y-axes report the mean of the average size of the coalitions in the equilibrium partition, where mean is taken over 10000 Monte Carlo simulations of the game. See the main text for details. compute the average size (number of companies) of the resulting coalitions at the equilibrium, as a measure of the extend to which companies collaborate. We repeat the experiment 10000 times, for fixed values of m, γ, β, µ, σ and compute the mean of the average coalition size over these runs. Our simulation solves each instance of the data-sharing game exactly and average it over a big number of independent runs, which makes our results very precise. Repeating the experiment for different values of γ and β, we can observe the dependence of this average coalition size on the two parameters of interest. In Figure 1 we plot these dependencies, for different values of m {3, 4} and σ {300, 600}. When varying one of these parameters, the default values for the other one is γ = 0.8 and β = 0.9. We also provide results for other values of m and σ in Appendix F. As we can see, the conclusion of Theorem 5.2 transfer to this experiment: the average size of the coalitions and thus cooperation incentives, are decreasing in both γ and β, across all experimental setups. 8 Conclusion In this work, we introduced and studied a framework for data-sharing among market competitors. The framework is modular and consists of three components market model, data impact, and collaboration scheme which can be specified for any particular application. To examine the effects of various parameters, we instantiated the framework using a conventional market model and a natural model of data impact grounded in learning theory. We then studied several data-sharing games within these models. Our findings indicate that the characteristics of competition may have a significant effect on the data-sharing decisions. Specifically, we found that higher product differentiation generally increases the willingness for collaboration, as does the complexity of the learning task. Interestingly, our results suggests that the firms might not want to participate in the federated learning even in the absence of privacy concerns. On the other hand, we also predict that data-sharing collaborations might emerge between competitors even without any external regulations. We hope our study will inspire further in-depth investigations into the nuanced trade-offs in data sharing, allowing competition and collaboration to coexist in data-driven environments. Limitations and future work While our general framework can describe many data-sharing settings, the quantitative results rely on several important assumptions. First, we assume that the firms data are homogeneous. We expect that designing a suitable model in the case of significant heterogeneity is a significant challenge orthogonal to the focus of our work (e.g., Gulrajani & Lopez Paz, 2021). Moreover, in some cases, additional data from a different distribution may damage model performance, yielding orthogonal data-sharing considerations (Donahue & Kleinberg, 2021). Second, we assume that legal constraints are not present or are automatically fulfilled in all our settings. However, we see the evaluation of AI regulatory frameworks as an important direction for future work. The same applies to training cost considerations. Finally, we do not consider sociological aspects, such as reputational effects (e.g., reputation loss due to a refusal to share data or a poorly communicated decision to share data). Due to the inherent non-rivalry of data (Karimireddy et al., 2022), sociological modeling, e.g., the theory of collective action, may provide further valuable insights into the data-sharing problem. We see our contributions as mostly conceptual and do not aim to provide a fully realistic model that can directly inform practitioners about the benefits of data-sharing decisions. However, we hope our results will remain qualitatively valid in real-world settings since we used a market model widely adopted in the economic theoretical literature (Choné & Linnemer, 2019) and a data impact model motivated by established theoretical frameworks in machine learning (Tsybakov, 2004). Also, we hope that the three-component decomposition of the data-sharing problem will help practitioners leverage their situation-specific knowledge of their market and ML models to build more accurate models for their specific needs. Acknowledgments This research was partially funded by the Ministry of Education and Science of Bulgaria (support for INSAIT, part of the Bulgarian National Roadmap for Research Infrastructure). The authors would like to thank Florian Dorner, Mark Vero, Nikola Jovanovic and Sam Motamed for providing helpful feedback on earlier versions of this work. Amir, R., Erickson, P., and Jin, J. On the microeconomic foundations of linear demand for differentiated products. Journal of Economic Theory, 169:641 665, 2017. Bergemann, D. and Bonatti, A. Markets for information: An introduction. Annual Review of Economics, 11:85 107, 2019. Berry, S., Gaynor, M., and Scott Morton, F. Do increasing markups matter? Lessons from empirical industrial organization. Journal of Economic Perspectives, 33(3):44 68, 2019. Bertrand, J. Book review of theorie mathematique de la richesse social and of recherches sur les principes mathematiques de la theorie des richesses. Journal des savants, 1883. Binmore, K., Rubinstein, A., and Wolinsky, A. The nash bargaining solution in economic modelling. RAND Journal of Economics, 17(2):176 188, 1986. Blanchard, P., El Mhamdi, E. M., Guerraoui, R., and Stainer, J. Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems, volume 30, 2017. Blum, A., Haghtalab, N., Phillips, R. L., and Shao, H. One for one, or all for all: Equilibria and optimality of collaboration in federated learning. In International Conference on Machine Learning, pp. 1005 1014. PMLR, 2021. Bubeck, S. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3-4):231 357, 2015. Carlton, D. W. and Perloff, J. M. Modern industrial organization. Higher Education, 1990. Choné, P. and Linnemer, L. The quasilinear quadratic utility model: An overview. CESifo Working Paper, 2019. Chui, M., Hall, B., Mayhew, H., and Singla, A. The state of AI in 2022 and a half decade in review. Quantum Black By Mc Kinsey, 2022. Cournot, A.-A. Recherches sur les principes mathématiques de la théorie des richesses. chez L. Hachette, 1838. Dixit, A. A model of duopoly suggesting a theory of entry barriers. Bell Journal of Economics, 10 (1):20 32, 1979. Donahue, K. and Kleinberg, J. Model-sharing games: Analyzing federated learning under voluntary participation. In AAAI Conference on Artificial Intelligence, volume 35(6), pp. 5303 5311, 2021. Durrant, A., Markovic, M., Matthews, D., May, D., Enright, J., and Leontidis, G. The role of cross-silo federated learning in facilitating data sharing in the agri-food sector. Computers and Electronics in Agriculture, 193:106648, 2022. Fed AI. We Bank and Swiss Re signed Cooperation Mo U. URL https://www.fedai.org/news/ webank-and-swiss-re-signed-cooperation-mou/. Fried, D. Incentives for information production and disclosure in a duopolistic environment. The Quarterly Journal of Economics, 99(2):367 381, 1984. Gulrajani, I. and Lopez-Paz, D. In search of lost domain generalization. In International Conference on Learning Representations, 2021. Jones, D. Statistical analysis of empirical models fitted by optimization. Biometrika, 70(1):67 88, 1983. Kairouz, P., Mc Mahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz, K., Charles, Z., Cormode, G., and Cummings, R. Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1 2):1 210, 2021. Karimireddy, S. P., Guo, W., and Jordan, M. I. Mechanisms that Incentivize Data Sharing in Federated Learning. Workshop on Federated Learning at Neur IPS (FL-Neur IPS 22), 2022. Kóczy, L. Á. Partition function form games. Theory and Decision Library C, 48:312, 2018. Liu, J., Zhang, G., Wang, K., and Yang, K. Task-load-aware game-theoretic framework for wireless federated learning. IEEE Communications Letters, 27(1):268 272, 2022. Lyu, L., Xu, X., Wang, Q., and Yu, H. Collaborative fairness in federated learning. In Federated Learning, pp. 189 204. Springer, 2020a. Lyu, L., Yu, J., Nandakumar, K., Li, Y., Ma, X., Jin, J., Yu, H., and Ng, K. S. Towards fair and privacy-preserving federated deep models. IEEE Transactions on Parallel and Distributed Systems, 31(11):2524 2541, 2020b. Mas-Colell, A., Whinston, M. D., and Green, J. R. Microeconomic theory. Oxford University Press, 1995. Nash, J. Non-cooperative games. Annals of Mathematics, 54(2), 1951. Raith, M. A general model of information sharing in oligopoly. Journal of economic theory, 71(1): 260 288, 1996. Reiss, P. C. and Wolak, F. A. Structural econometric modeling: Rationales and examples from industrial organization. Handbook of econometrics, 6:4277 4415, 2007. Richardson, A., Filos-Ratsikas, A., and Faltings, B. Budget-bounded incentives for federated learning. Federated Learning: Privacy and Incentive, pp. 176 188, 2020. Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H. R., Albarqouni, S., Bakas, S., Galtier, M. N., Landman, B. A., and Maier-Hein, K. The future of digital health with federated learning. NPJ digital medicine, 3(1):119, 2020. Rust, R. T. and Zahorik, A. J. Customer satisfaction, customer retention, and market share. Journal of retailing, 69(2):193 215, 1993. Shalev-Shwartz, S. and Ben-David, S. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014. Shapiro, C. Theories of oligopoly behavior. Handbook of industrial organization, 1:329 414, 1989. Tirole, J. The theory of industrial organization. MIT Press, 1988. Tsybakov, A. B. Optimal aggregation of classifiers in statistical learning. The Annals of Statistics, 32 (1):135 166, 2004. Tu, X., Zhu, K., Luong, N. C., Niyato, D., Zhang, Y., and Li, J. Incentive mechanisms for federated learning: From economic and game theoretic perspective. IEEE transactions on cognitive communications and networking, 8(3):1566 1593, 2022. Vives, X. Duopoly information equilibrium: Cournot and Bertrand. Journal of economic theory, 34 (1):71 94, 1984. Von Neumann, J. and Morgenstern, O. Theory of games and economic behavior. In Theory of games and economic behavior. Princeton University Press, 1944. Werner, M., He, L., Karimireddy, S. P., Jordan, M., and Jaggi, M. Towards Provably Personalized Federated Learning via Threshold-Clustering of Similar Clients. In Workshop on Federated Learning at Neur IPS (FL-Neur IPS 22), 2022. Wu, X. and Yu, H. Mar S-FL: Enabling Competitors to Collaborate in Federated Learning. IEEE Transactions on Big Data, (1):1 11, 2022. Yang, Q., Fan, L., and Yu, H. Federated Learning: Privacy and Incentive, volume 12500. Springer Nature, 2020. Yu, H., Liu, Z., Liu, Y., Chen, T., Cong, M., Weng, X., Niyato, D., and Yang, Q. A sustainable incentive scheme for federated learning. IEEE Intelligent Systems, 35(4):58 69, 2020. Zeng, R., Zeng, C., Wang, X., Li, B., and Chu, X. A comprehensive survey of incentive mechanism for federated learning. ar Xiv preprint ar Xiv:2106.15406, 2021. Zhan, Y., Zhang, J., Hong, Z., Wu, L., Li, P., and Guo, S. A survey of incentive mechanism design for federated learning. IEEE Transactions on Emerging Topics in Computing, 10(2):1035 1044, 2021. Supplementary Material The supplementary material is structured as follows. Appendix A contains the proofs of all results in the main text. Appendix B contains further examples of the data-sharing problem and costs. Appendix C contains several extensions of the setting in Section 5. Appendix D contains alternative models of coalition formation between many firms. Appendix E contains the welfare analysis of the data-sharing problem between two firms. Appendix F presents additional simulations for Section 7. A.1 Proof of Lemma 4.1 First, we will solve the same consumption problem without positivity constraints max g,q1,...,qm u(g, q1, . . . , qm) s.t. g + i=1 piqi B. Using the KKT conditions for qi we get where λ is the Lagrange multiplier for the budget constraint. The first-order condition for g gives 0 = 1 λ = λ = 1. Therefore, the first-order conditions for q1, . . . , qm look like j =i qj pi = pi = 1 qi γ X Notice that X i qi γ(m 1) X i pi 1 + γ(m 1). qi = (1 + γ(m 1))(1 pi) γ(m P i pi) (1 γ)(1 + γ(m 1)) = 1 γ (1 + γ(m 2))pi γ P j =i pj (1 γ)(1 + γ(m 1)) . This solution is a global maximizer because u(g, q1, . . . , qm) is strongly concave in q1, . . . , qm and linear in g. Since the auxiliary problem has less constraints than the original problem, a solution to the auxiliary problem will be a solution to the original problem as long as all quantities are non-negative. These constraints give the following restrictions on p1, . . . , pm and B i (1 γ)(1 pi) γ X j =i (pj pi), i=1 q i pi. The first series of inequalities hold when prices are less than one and close enough. In our setting, it is achieved when 1 > a and (1 a) b, where a and b are parameters from Equation (7) (since Equations (10) and (11) hold). The last equation holds when B is big enough, for example, when B > m. A.2 Proof of Lemma 4.3 The profit maximization tasks of the firms are max qi E((pi ci)qi) = E(1 γ X j =i qj ci)qi q2 i . Using the first-order conditions, we get q i (q i) = E(1 ci γ P Both equilibrium quantities should be the best responses to the opponent s expected amounts q i = E(1 ci γ P Therefore, X i q i = E(m P i ci (m 1)γ P i q i ) 2 , Notice that E(q i ) = E E(. . . ) 2 = E(. . . ) i q i = m P i ce i 2 + γ(m 1). The best response equation gives q i = 1 ce i γ P j q j + γq i 2 = q i = 2 γ (2 + γ(m 2))ce i + γ P j =i ce j (2 γ)(2 + (m 1)γ) . We get some expression for the quantities produced by the firms. However, we need to ensure that these quantities are positive. This property is equivalent to the following inequalities i (2 γ)(1 ce i) γ X j =i (ce i ce j). (10) These inequalities hold, for example, when 1 > a and (1 a) b, where a and b are parameters from Equation (7). Now, notice that the expected profit function is a second-degree polynomial in qi Πe i = q2 i + βqi. Since the maximum value is achieved at q i (q i), this polynomial looks like Πe i = (qi q i (q i))2 + q i (q i)2. And it gives the following equilibrium expected profits Πe i = (q i )2. A.3 Proof of Lemma 4.4 The expected profit maximization tasks of the firms are max pi E (pi ci) 1 γ (1 + γ(m 2))pi + γ P j =i pj (1 γ)(1 + γ(m 1)) Thus, the best responses are p i (p i) = E(1 γ + (1 + γ(m 2))ci + γ P j =i pj) 2(1 + γ(m 2)) . Notice that E(p i ) = E E(. . . ) 2(1 + γ(m 2)) = E(. . . ) 2(1 + γ(m 2)) = p i . p i = 1 γ + (1 + γ(m 2))ce i + γ P j =i p j 2(1 + γ(m 2)) . If we sum these equations, we get X i p i = (1 γ)m + (1 + γ(m 2)) P i ce i + γ(m 1) P i p i 2(1 + γ(m 2)) = i p i = (1 γ)m + (1 + γ(m 2)) P i ce i 2 + γ(m 3) . After substitution of this sum into the best response equation, we have p i = d1 + d2ce i + d3 P j =i ce j d4 , d1 = 2 + γ(2m 5) γ2(2m 3), d2 = 2 + 3γ(m 2) + γ2(m2 4m + 4), d3 = γ + γ2(m 2), d4 = 4 + 6γ(m 2) + γ2(2m2 9m + 9). Here, we again need to ensure that quantities produced by the firms are positive. This property is equivalent the following inequalities pi ce i 0 d1(1 ce i) d3 X j =i (ce i ce j). (11) These inequalities hold, for example, when 1 > a and (1 a) b, where a and b are parameters from Equation (7). Notice that the expected profit of the firm is a second degree polynomial Πe i = αp2 i + 2βpi γ. Since best response p i (p i) is a maximizer of this polynomial, we get Πe i = α(pi p i (p i))2 + δ. Also, notice that Πe i (ce i, p i) = E((ce i ci)qi(ce i, p i)) = 0. Therefore, Πe i = α((ce i p i (p i))2 (pi p i (p i))2). Direct calculations show that α = 1 + γ(m + 2) (1 γ)(1 + γ(m 1)). These equations imply the following equilibrium profits Πe i (p i , p i) = (1 + γ(m + 2))(p i ce i)2 (1 γ)(1 + γ(m 1)) , A.4 Derivation of Equation (8) Formally, we assume that the cost of production of one unit of good C is a random variable that depends on random noise X Rm and the decisions of a firm s Rn. The firm knows that the noise is distributed according to a density function X f( , θ), θ Rd. However, it does not know the parameters of the distribution θ. The firm wants to minimize its marginal cost on average c(s) = EX f( ,θ)(C(s, X)). To achieve it, the firm decides to use the maximum likelihood estimates bθ of θ using its data about the realizations of the noise. After that, the firm chooses an action that minimizes an estimated cost on average sfin = arg min s EX f( ,bθ)(C(s, X)). The resulting cost c(sfin) is a random variable because of randomness of the firm s sample S. However, the asymptotic normality of MLE allows us to reason about a cost distribution. We use the delta method to achieve this goal. By the asymptotic normality, n(bθ θ) d N(0, I(θ) 1). Further, by the delta method, n(sfin s ) d n s Finally, by the second-order delta method, n(c(sfin) c(s )) d n c(s )T(sfin s ) + n(sfin s )T 2c(s ) 2 (sfin s ) = n(sfin s )T 2c(s ) 2 (sfin s ). Thus, the marginal cost will be distributed approximately as generalized chi-squared distribution. The expected marginal cost will be approximately ce = ES(c(sfin)) c(s ) + 1 Therefore, one of the natural choices for the dependency of expected marginal cost on the number of data points is ce(n) = a + b A.5 Proof of Theorem 5.2 In the case γ 0, the collaboration is always profitable because the right-hand side of the criterion is less than zero, and the left-hand side of the criterion is not less than zero. In the case γ > 0, the criteria for the firm i are the following. In the case of Cournot competition, we have (2 γ)(n β i (n1 + n2) β) > γ(n β i n β i ) f Cournot(x, γ, β) := 2xβ γ(1 x)β (2 γ)xβ(1 x)β > 0, where x = n i/(n1+n2). The right-hand side of the criterion f is increasing in x, f Cournot(0, γ, β) < 0, and f Cournot(1, γ, β) > 0. Thus, there exists a break-even point x Cournot(γ, β) such that f Cournot(x, γ, β) < 0, x < x Cournot(γ, β), = 0, x = x Cournot(γ, β), > 0, x > x Cournot(γ, β). Similarly, a break-even point exists in the Bertrand case. Now, we prove the properties of x. The first property follows from the fact that f is decreasing in γ. The second property follows from the fact that f Cournot > f Bertrand. The proof of the third property is more complicated. In the case of Cournot, the proof is the following. Let x(β) = x Cournot(γ, β). We want to show that x(β) is increasing in β. To achieve it, we will show that f Cournot(x(β), γ, β + ϵ) < 0. From the definition, x(β) = γ(1 x(β))β 2 (2 γ)(1 x(β))β f Cournot(x(β), γ, β + ϵ) < 0 2 (2 γ)(1 x(β))β β < γ(1 x(β))β+ϵ 2 (2 γ)(1 x(β))β+ϵ 2 (2 γ)(1 x(β))β+ϵ β β+ϵ + 2 γ 2 (1 x)β < 1. The last inequality follows from the concavity of the function x β β+ϵ . The proof is similar in the Bertrand case. A.6 Proof of Theorem 6.2 Direct computations show Πe i (λ1, λ2) Πe i (0, 0) = (2ce i(ni) γce i(n i) 2ce i(ni + n iλ i) + γce i(n i + n1λi)) (4 2γ 2ce i(ni) + γce i(n i) 2ce i(ni + n iλ i) + γce i(n i + niλi)) (4 γ2)2 . Denote ui = 1 nβ i 1 (ni + λ in i)β . ce i(ni) ce i(ni + λ in i) = a + b nβ i a b (ni + n iλ i)β = bui. So, the profit gain will be equal to Πe i (λ1, λ2) Πe i (0, 0) = (2ui γu i) nβ i + ξ(2ui γu i) ! b(4 2γ)(1 a) where ξ = b (4 2γ)(1 a). Denote nβ i 1 (n1 + n2)β , Then Equation (9) will simplify to max ui [0,hi](2u1 γu2)(2u2 γu1)(1 ξg1+ξ(2u1 γu2))(1 ξg2+ξ(2u2 γu1)) s.t. i 2ui γu i 0. Now, we will change variables vi := 2ui γu i = ui = 2vi + γv i which will give the following problem max vi 0 v1v2(1 ξg1 + ξv1)(1 ξg2 + ξv2) s.t. i 2vi + γv i (4 γ2)hi. Notice that the restrictions vi 0 are not binding since the point (v1, v2) = (ϵ, ϵ) delivers positive Nash product and satisfies the restrictions. However, one of the restrictions 2vi + γv i (4 γ2)hi should be binding because any internal point (v1, v2) could be transformed to the point ((1+ϵ)v1, (1+ ϵ)v2), which would increase the objective function and satisfy the restrictions. First, consider the case 2v 1 + γv 2 < (4 γ2)h1 at an optimum (v 1, v 2). According to the previous paragraph, we have 2v 2 + γv 1 = (4 γ2)h2, which gives v 2 v 1 > (2 + γ)(h2 h1) 0. The derivative of the objective along the direction (2, γ) equals to (2v 2 γv 1)(1 ξg1 + ξv 1)(1 ξg2 + ξv 2) + ξv 1v 2(2 γ ξ(2g2 γg1 2v 2 + γv 1)). Since 2v 2 γv 1 > 0, |2g2 γg1 2v 2 + γv 1| 10, and ξ b 2(1 a) 1 10, the derivative is positive. Thus, the point (v 1, v 2) can not be optimal because a small shift in the direction (2, γ) would increase the objective and would not violate the restrictions. Therefore, in the optimum, we will have 2v 1 = (4 γ2)h1 γv 2. If we substitute the last expression into the objective function, the objective will become a fourthdegree polynomial of v2. If the stationary point of the objective function vs 2 satisfy the restriction 2vs 2 + γv1(vs 2) (4 γ2)h2, it will be a solution to the original problem. Otherwise, the solution will come form the equation 2v0 2 + γv1(v0 2) = (4 γ2)h2 since all other restrictions are not binding. So, the solution will be equal to v 2 = min(vs 2, v0 2) (the restriction 2v2 + γv1(v2) (4 γ2)h2 does not hold only when v2 is big). Consider the following relaxed problem max vi 0 v1v2(1 ξg1 + ξv1)(1 ξg2 + ξv2) s.t. 2v1 + γv2 = (4 γ2)h1. The stationary point will be the root of the derivative and hence will satisfy the following equation (γvs 2 2v1(vs 2))(1 ξg1 + ξv1(vs 2))(1 ξg2 + ξvs 2) = ξv1(vs 2)vs 2(2(1 ξg1 + ξv1(vs 2)) γ(1 ξg2 + ξvs 2)) = γvs 2 2v1(vs 2) = ξ v1(vs 2)vs 2(2(1 ξg1 + ξv1(vs 2)) γ(1 ξg2 + ξvs 2)) (1 ξg1 + ξv1(vs 2))(1 ξg2 + ξvs 2) . Since 2 gi 4, 0 vi 2, and v1(v2)v2 (4 γ2)2h2 1 8γ , we get |γvs 2 2v1(vs 2)| ξ (4 γ2)h2 1 8γ 2 γ + 12ξ (1 4ξ)2 = O(ξ). So, the stationary point is approximately vs 2 = (4 γ2)h1 Therefore, the solution to the original problem is approximately v 2 = min (4 γ2)h1 2γ + O(ξ), 2h2 γh1 = min (4 γ2)h1 2γ , 2h2 γh1 Substituting everything back, we get the solution presented in the theorem statement. Now, we will prove that λ1 is non-increasing in γ. First, the criterion function for choosing non-unit solution (n1 + n2)β + 4γ nβ 2 4 + γ2 nβ 1 = 4γh2 (4 + γ2)h1. is increasing in γ since its derivative 4h2 2γh1 0 is positive. So, when γ increases non-unit solution becomes more probable. Second, the non-unit solution is decreasing in γ since the function 4+γ2 γ + γ is decreasing in γ on (0, 1). (Note that λ1 = 1, when γ 0.) Now, we will prove that λ1 is non-increasing in β. First, notice that the non-unit criterion is equivalent to f(x, β) = 4γ(1 x)β(1 xβ) (4 + γ2)xβ(1 (1 x)β) 0, where x = n2 n1+n2 . The criterion function f is decreasing in x and have one root x(β) on the interval [0, 1]. We will prove that the root is shifting to the right, making more size pairs produce non-unit solution. Notice the criterion can be also expressed as 4 + γ2 (2 γ)2(1 x)β So, to prove that x(β) x(β + ϵ) it is sufficient to show that 4γ(1 x(β))β 4 + γ2 (2 γ)2(1 x(β))β β = x(β) < 4γ(1 x(β))β+ϵ 4 + γ2 (2 γ)2(1 x(β))β+ϵ 4 + γ2 (1 x)β + 4γ 4 + γ2 4 + γ2 (2 γ)2(1 x)β+ϵ The last inequality follows from the concavity of the function x Now, we will prove that the non-unit solution is decreasing in β given that non-unit criterion holds. Notice that the non-unit solution λ1(β) satisfies nβ 2 1 (n2 + λ1(β)n1)β = 4 + γ2 nβ 1 1 (n1 + n2)β So, to show that λ1(β) λ1(β + ϵ), it is sufficient to show that (x β δ(1 x) β + δ) 1 β (x β ϵ δ(1 x) β ϵ + δ) 1 β+ϵ , where x = n2 n1+n2 and δ = 4+γ2 4γ . This inequality is equivalent to ((1 x)βx β δ + δ(1 x)β) β (1 x)β+ϵx β ϵ δ + δ(1 x)β+ϵ. Denote u0 = 1 x x β, v0 = (1 x)β, and α = β+ϵ β . We get the following inequality (u0 δ + δv0)α uα 0 δ + δvα 0 . Denote f(u, v) = uα δ + δvα (u δ + δv)α. Notice that f(u, 1) = 0. Additionally notice that f v = δαvα 1 δα(u δ + δv)α 1 0 v u δ + δv. If we prove that this derivative is negative for all points (u0, x), where x (v0, 1), we will prove the desired result. Notice that the last inequality is the most stringent for the point (u0, v0). At this point, we have v0 u0 δ + δv0 0 4γ nβ 2 4 + γ2 nβ 1 + (2 γ)2 (n1 + n2)β . The last inequality holds because we consider the non-unit solution. The last property of the theorem follows from the fact that, for small ratio n2 n1 , the non-unit criterion holds and the Taylor formula for this ratio results in the expression presented in the statement. B Examples of data-sharing problem B.1 Taxi rides In this section, how our framework works using motivating example from Section 3. In this example, two taxi firms use data to optimize the driver scheduling algorithm. This optimization allow them to use less driver time; thus, making their expected marginal costs lower. We assume that the only uncertainty arising in scheduling algorithm is the duration of one kilometer ride X N(µ, 1). The firms do not know the parameter µ. However, each firm Fi has ni observations of this random variable Si = {Xj i }ni j=1. Their scheduling algorithm require the estimate s of ride duration X. If the estimate is too low s < X, the company will need to attract more drivers and pay them more. If the estimate is too big s > X, the drivers will be underutilized. We assume that costs associated with these losses have MSE form C(s, X) = a b + b(s X)2. This functional form will imply the following costs on average c(s) = EX(C(s, X)) = a b + b E((s X)2) = a + b(s µ)2. If the firms do not collaborate, Fi uses an average of its own points j=1 Xi j = si,ind N µ, 1 to estimate µ. This will give the following expected marginal costs ce i,ind = ES1,S2 c(si,ind) = a + b E(si,ind µ)2 = a + b If the firms collaborate, they uses both samples to calculate average sshared = 1 n1 + n2 i,j Xi j = sshared N µ, 1 n1 + n2 and the expected marginal costs will be ce shared = ES1,S2 c(sshared) = a + b n1 + n2 . To solve the data-sharing problem, we need to describe the competition between firms. For simplicity we will only consider the case of Cournot competition. As in Section 4.1.2, the firms are interested in increasing their expected profits Πe i = ES1,S2(piqi ciqi). Thus, the firms use the same quantities as in Section 4.1.2. In the case of non-collaboration, the expected profits are Πe i,ind = 2 γ + γce i 2ce i 4 γ2 Similarly, in the case of collaboration, Πe shared = (2 γ)(1 ce shared) 4 γ2 The collaboration criterion is the same as in Section 4.1.2. B.2 Oil drilling Assume that an oil company manager needs to optimize the costs of oil rig construction. Construction of an oil rig happens in two steps. First, geologists look at a new place for a rig and produce a noisy signal of oil presence X. Second, the company might try to build a rig. However, if there is no oil, the company will lose money. Thus, sometimes it is more profitable to repeat the geological expedition. The noisy signal of geologists predicts the presence of oil in the following manner Pr(success | X) = Pr(X + ϵ 0 | X), ϵ N(0, 1). Also, the manager knows that X N(0, σ2). However, the manager does not know σ2 and needs to estimate it from the previous observations. The new expedition costs a, and drilling a new rig costs b. The natural strategy is to repeat expeditions until the geologist find X r and then try to build a rig. Let p be the probability of finding a good spot and q be the probability of success in a good spot p := Pr(X r) = Φ r q := Pr(X + ϵ 0 | X r) = 1 2π dydx = M(r, σ2) where we denote the last integral by M. So, the expected cost is q = a + Φ r σ b M(r, σ2) . Assume that the company estimates bσ2 of σ2 and chooses r according to that estimate. The loss will approximately be f(r) f(r ) f (r )(r r ) + f (r ) 2 (r r )2 = f (r ) 2 r (σ2)2(bσ2 σ2)2. Now, if the company uses MLE to estimate σ2, then the costs will asymptotically look like where n is the number of observations. After that the process is the same as in previous subsection. C Extensions of the results In this section, we investigate how different changes in our setting affect the results. For simplicity, we only study the case of perfect substitutes (γ = 1) and Cournot competition if not told otherwise. C.1 Different cost functions In this subsection, we assume that the cost functions of the firms have the form Ci(q) = ce iq + k The profit maximization problem of the firm now has the form max qi (1 qi q i)qi ce iqi k Therefore, the best responses are q i (q i) = 1 q i ce i 2 + k and equilibrium quantities are q i = 1 + k + ce i (2 + k)ce i (1 + k)(3 + k) . These equations imply the following expected profits Πe i = 2 + k and result in the following collaboration criterion i (2 + k)(n 1 i (n1 + n2) 1) > n 1 i n 1 i . As we can see, the added non-linearity in the cost does not qualitatively change the conclusions of Theorem 5.2. C.2 Multiplicative substitution In the Section 3, we assume that goods produced by the firms can be additively substituted by the outside goods (i.e., the utility function function is quasilinear). However, in general, the firms might operate in different markets with different substitution patterns. To investigate, this question we assume the following form of utility function u(q0, q1, q2) = q1 + q2 q2 1 + 2q1q2 + q2 2 2 (p1 = p2 = p because the goods G1 and G2 are perfect substitutes.) This utility prescribes the consumer to spend the share α of income on goods G1 and G2 (B is small enough, so that q1 and q2 are much smaller than 1). So, we have the following consumer problem max q1,q2 q1 + q2 q2 1 + 2q1q2 + q2 2 2 s.t. p(q1 + q2) αB. So, the demand is p = αB q1 + q2 . Expected profits maximization problems are max qi αBqi q1 + q2 ce iqi. The first order conditions give αBq i (q 1 + q 2)2 = ce i. So, equilibrium quantities are q i = αBq 1q 2 ci(q 1 + q 2)2 = αBce 2ce 1 ce i(ce 2 + ce 1)2 . And equilibrium profits are Πe i = αB(ce i)2 (ce 1 + ce 2)2 = αB 1 + ce i ce i This formula imply the following collaboration criterion i 1 + ce i,ind ce i,ind < 1 + ce share ce share = 2. As can be seen, the collaboration is always not profitable for the firm with more data. C.3 Asymmetric costs In this subsection, we assume that firms have asymmetric cost functions ce i(n) = ai + bi where ce i(n) is the expected marginal cost of the company if it use n points for training. Section 4.1.2 gives the following the expected profits Πe i = 2 γ 2ce i + γce i 4 γ2 As in Section 4.1.2, the companies will collaborate only if both of them profit from collaboration i Πe shared > Πe i,ind 2bi(n βi i n βi shared) > γb i(n β i i n β i shared ). Denote fi(n1, n2) = 2bi(n βi i n βi shared) γb i(n β i i n β i shared ). This function can be interpreted as firm Fi reediness to collaborate: the higher this value is the more firm Fi wants to collaborate with its competitor F i. Theorem C.1. Assume the setting described above. The function fi(n1, n2) has the following properties: 1. fi(n1, n2) is decreasing in γ 2. fi(n1, n2) is decreasing in βi and increasing in β i if i βi ln ni > 1. 3. fi(n1, n2) is increasing in bi and decreasing in b i. Proof. The first property is evident γ = b i(n β i i n β i shared ) < 0. We can prove the last property similarly fi bi = 2(n βi i n βi shared) > 0, fi b i = γ(n β i i n β i shared ) < 0. The first part of the second property follows from the following fi βi = 2bi ln ni nβi shared ln ni Notice that the function g(x) = ln x xβi is decreasing in x on x > e1/βi because g x = 1 βi ln x Therefore, fi βi < 0: fi is decreasing in βi. The second part can be proved similarly. The first property is similar to the first property in Theorem 5.2. The firms with more similar goods have less incentives to collaborate. The last two properties show that the firm wants to collaborate with its competitor when the firm s machine learning model is bad (βi is low or bi is high) or the competitor s model is good (β i is high or b i is low). C.4 Heterogeneity In this section, we solve the same model as in Appendix B, but allowing for heterogeneity between firms. Each firm will need to optimize the MSE costs ci(s) = a b + b E((s Xi)2) = a + b(s µi)2, Xi N(µi, 1). On the contrary to the Appendix B, the means of the noise will be different among firms and will be drawn at the start of the game from the distribution µi N(µ, σ2). We assume that µ is unknown and σ2 is known. When firms do not collaborate, the expected costs are the same as in Appendix B. When firms collaborate, MLE for the means are si,shared = 2σ2n1n2Yi + ni Yi + n i Y i 2σ2n1n2 + n1 + n2 , where ni is the number of data points of firm Fi, Yi is the average of noise. Then expected marginal costs will be ce i,shared = a + b ES1,S2((si,shared µi)2). By substitution, E((si,shared µi)2) = E n2 i(µ1 µ2)2 (2σ2n1n2 + n1 + n2)2 + (2σ2n i + 1)2ni (2σ2n1n2 + n1 + n2)2 + n i (2σ2n1n2 + n1 + n2)2 = 2σ2n i + 1 2σ2n1n2 + n1 + n2 . Therefore, ce i,shared is increasing in σ2: the more different the firms are, the less collaboration gives. Two corner cases give σ2 = 0 = ce i,shared = a + b n1 + n2 , σ2 = ce i,shared a + b However, notice that collaboration criteria in the Cournot case does not depend on σ2 and hence does not change Πe i,share Πe i,ind 0 2 γ 2ce i,share + γce i,share 2 γ 2ce i,ind + γce i,ind 2(2σ2n i + 1) 2σ2n1n2 + n1 + n2 + γ(2σ2ni + 1) 2σ2n1n2 + n1 + n2 2 D Other models of coalition formation In this section, we present models of coalition formation different form the model presented in Section 7. D.1 Cooperative data sharing We use an solution concept motivated by the α-core (Von Neumann & Morgenstern, 1944), which is standard in cooperative game theory. The α-core consists of partitions that are incentive-compatible, ensuring that no subset of firms wants to deviate. Incentive compatibility indicates that these partitions more stable and likely to occur. First, we introduce a modification of α-core, tailored to our setup. This deviation is necessary due to two reasons, non-transferable utility and the presence of externalities (dependencies of utilities on the actions of all participants), which arise in our setting because all firms compete with each other on the market. Let N = {F1, . . . , Fm} be the set of all firms. A partition of N is a set of subsets {S1, . . . , Sk} : Si N, such that i, j, i = j : Si Sj = and i Si = N. We denote the set of all partitions of N as P(N). Definition D.1 (α-Core). A partition P belongs to the α-core of the game if and only if S N : F S, Q P(N \ S) Πe F (P) < Πe F (Q {S}), where Πe F (P) is expected profit of the firm F if market coalition structure is P. Intuitively, a partition P belongs to the α-core if no set of companies wants to deviate from this partition. Here we say that a subset of companies wants to deviate, if any company in this subset will increase its profits by joining this new coalition, regardless of how the remaining companies split into groups. In the next theorem we identify one coalition structure that always belongs to the α-core of the data-sharing game, demonstrating that the not-emptiness of the core. Theorem D.2. W.l.o.g. assume that n1 n2 nm, where ni is the number of data points of firm Fi. Consider partitions Pi = {Ai, N \ Ai}, where Ai = {F1, F2, . . . , Fi}. Let i = arg maxi Πe F1(Pi). Then Pi belongs to α-core. Before we start to prove this theorem, we formulate the following lemma. Lemma D.3. Let Q N and F Q. Then Πe F ({Q, N \ Q}) = (2 γ (2 + γ(m 1 |Q|)ce Q + γ(m |Q|)ce N\Q)2 (2 γ)2(2 + (m 1)γ)2 , where ce X = a + b/nβ X, expected marginal cost of coalition X, and n X = P F X n F , the number of data points of coalition X. Proof. The results of Section 4.1.2 give the following expected profits Πe i = (2 γ (2 + γ(m 2))ce i + γ P j =i ce j)2 (2 γ)2(2 + (m 1)γ)2 . In our case, ci = c Q, Fi Q, c N\Q, Fi / Q, By substituting ce i into the profit equation for F Q, we get Πe F ({Q, N \ Q}) = (2 γ (2 + γ(m 2))ce Q + γ(|Q| 1)ce Q + γ(m |Q|)ce N\Q)2 (2 γ)2(2 + (m 1)γ)2 = (2 γ (2 + γ(m 1 |Q|)ce Q + γ(m |Q|)ce N\Q)2 (2 γ)2(2 + (m 1)γ)2 . Corollary D.4. Let Q, Q N, F Q, and F Q. Then Πe F ({Q, N \ Q}) Πe F ({Q , N \ Q }) 2 + γ(m 1 |Q |) nβ Q γ(m |Q |) nβ N\Q 2 + γ(m 1 |Q|) nβ Q γ(m |Q|) Now, we are ready to prove Theorem D.2. Proof. We prove the theorem by contradiction. Suppose that there exists a coalition Q for which deviation is profitable. First, we consider the case Ai Q = . Let F Ai Q. Notice that Πe F (Pi ) = Πe F1(Pi ) Πe F1(P|Q|) Πe F ({Q, N \ Q}). The first equality follows from Lemma D.3. The first inequality follow from the definition of i . The last inequality follows from Corollary D.4 and observation n A|Q| n Q. So, we get a contradiction. Thus, Ai Q = . Let F Q. In this case, Πe F (Pi ) Πe F ({Q, N \ Q}). This inequality follows from Corollary D.4, the fact that n N\Ai n Q, and observation |N \ Ai | |Q|. This contradiction proves the theorem. In the partition described in the theorem, the firms form two coalitions: one contains the i companies with the largest datasets and the other one contains all other firms and leads to largest profits for the company with the largest amount of data. However, it may not be the only one in the core since the α-core is one of the most permissive cores in the cooperative game literature: a set of companies will deviate only if it is better-off irrespective of the actions of others, sometimes resulting non-economically plausible partitions. The following example illustrates this flaw. Assume that there are three firms in the market and n1 n2 n3. In this case, the α-core consists of two equilibria: {{F1, F2}, {F3}} and {{F1}, {F2}, {F3}}, but the first equilibrium is less economically plausible. Indeed, for the first firm, the second equilibrium is more profitable than the first. However, this firm will not deviate because of being afraid that the second firm might collaborate with the third. Nevertheless, this fear is ungrounded since the second firm expects less profit from the collaboration with the third, than from staying alone. D.2 Universal data-sharing treaty The first model consider a following situation. Imagine that companies might sign some treaty that will their data accessible for use for other companies. To make companies comply to the outcome, the decision is made by consensus. In terms of coalition formation game, each company either agree or disagree to participate in grand coalition. If any company disagrees, the grand coalition is not formed and companies act like singletons. W.l.o.g., assume that n1 n2 nm. The grand coalition will be Nash equilibrium in this game only if all firms will be better off in the grand coalition F Πe F ({{F1, F2, . . . , Fm}}) > Πe F ({{F1}, {F2}, . . . , {Fm}}). Since grand coalition profit is the same for all firms and singleton profit is biggest for the firm with the most amount of data, this system of inequalities is equivalent to the inequality for the first firm Πe F1({{F1, F2, . . . , Fm}}) > Πe F1({{F1}, {F2}, . . . , {Fm}}). By substitution, this inequality takes the form where n = P D.3 Data-sharing treaty The second model is similar to the first, but here the members of the treaty share data only between themselves. Thus, not everybody needs to agree to build a coalition and resulting structure will be one coalition with several members and singleton coalitions. Formally, we assume the following game. Each firm has two actions: Y and N. All firms that answer Y form a coalition between themselves and all firms that answer N form singleton coalitions. We are interested in Nash equilibria of this game. W.l.o.g., assume that n1 n2 nm. The following lemma is useful for the description of all equilibria of this game Lemma D.5. Let S be a set of firm that answer Y in equilibrium and i = min{i | Fi S}. Then j i Fj S. Proof. To show this it is sufficient to show that a single firm F always want to join a coalition S that has more data than the firm. This is equivalent to the following inequality Πe F ({S {F}} {{G} | G / S {F}}) > Πe F ({S} {{G} | G / S}) γ(|S| + 1) m 1 (n S + n F )β > γ|S| nβ S m + 1 γ where n S = P G S n G. Denote y := n F n S . The last inequality is equivalent to (m + 1 γ) 1 yβ > γ|S| yβ yβ Since γ < 1 and |S| < m, we have m + 1 γ > γ|S|. Furthermore, 1 yβ (1+y)β yβ yβ (1+y)β > 0 because 1 y > 0. Therefore, the inequality above holds. Lemma D.5 greatly constraints possible equilibria of the game. Indeed, the equilibrium set of companies that say Y may only have the form Si = {Fj | j i}. To check whether Si appears in the equilibrium, we need to check the following system of inequalities j < i Πe Fj({Si} {{G} | G / Si}) Πe Fj({Si {Fj}} {{G} | G / Si {Fj}}), j i Πe Fj({Si} {{G} | G / Si}) Πe Fj({Si {Fj}} {{G} | G / Si {Fj}}). These inequalities are closest for the firms around threshold i. Thus, this system is equivalent to two inequalities Πe Fi 1({Si} {{G} | G / Si}) Πe Fi i({Si {Fi 1}} {{G} | G / Si {Fi 1}}), Πe Fi({Si} {{G} | G / Si}) Πe Fi({Si {Fi}} {{G} | G / Si {Fi}}). E Welfare analysis In this section, we consider how collaboration affects the cumulative utility of firms and consumers. We start with the definition of welfare. Definition E.1. The sum of consumer surplus and firms surplus W is called welfare. In our term this quantity is equal to the sum of consumers utility and firms profits The definition of welfare in our case is simpler than in general case because consumers have quasilinear preferences. Welfare is an important quantity in policy analysis. Welfare reflects the cumulative benefits of all market participants. Thus, if redistribution between people is possible (e.g., through government transfers), increase in welfare means increase in utility for all people given the right redistribution scheme. Now, we want to evaluate how data-sharing in duopoly case affects the expected welfare. This will help us to understand in which cases the firms should be incentivized to collaborate with each other. By direct substitution, we get W e = q 1 + q 2 (q 1)2 + (q 2)2 + 2γq 1q 2 2 + B ce 1q 1 ce 2q 2. We show that collaboration increases welfare in the Cournot case if the marginal costs of the firms are close enough: |ce 1 ce 2| γ 6 . To accomplish that, we will first demonstrate that W ce i < 0 if ce i > ce i and then show that W decreases along the line ce 1 = ce 2 = c. These properties will show that collaboration increases welfare since after collaboration the costs of the companies equalizes and become smaller. The proof of the first part follows from the derivations below. W.l.o.g. assume that n1 n2, then ce 1 ce 2. Direct computations show ce 2 = q 1 ce 2 (1 ce 1) + q 2 ce 2 (1 ce 2) q 1 ce 2 (q 1 + γq 2) q 2 ce 2 (q 2 + γq 1) q 2. The results of Section 4.1.2 give q i = 2 γ 2ce i + γce i 4 γ2 . Substituting the expressions above, we have ce 2 = (12 γ2)ce 2 (8γ γ3)ce 1 (12 8γ γ2 + γ3) (4 γ2)2 < (12 γ2)ce 2 (8γ γ3) ce 2 γ 6 (12 8γ γ2 + γ3) (4 γ2)2 = (12 8γ γ2 + γ3)(1 ce 2) + γ 6 (8γ γ3) (4 γ2)2 . The restriction c2 1 γ ce 2 (12 8γ γ2 + γ3) γ 6 (8γ γ3) (4 γ2)2 This inequality is equivalent to the following inequality 36 24γ 3γ2 + 3γ3 8γ + γ3 > 0 (36 3γ2)(1 γ) + γ3 > 0. The last inequality holds since γ < 1. The proof of the second part comes from the computations below. Assume ce 1 = ce 2 = c, we want to show that W decreases in c. We have W e = (12 8γ γ2 + γ3)c2 (24 16γ 2γ2 + 2γ3)c + 12 8γ γ2 + γ3 (4 γ2)2 + B = (12 8γ γ2 + γ3)(1 c)2 (4 γ2)2 + B. Clearly, this function is decreasing in c for c < 1. F Additional experiments for Section 7 0.25 0.50 0.75 1.00 m = 3, P = N(1000, 3002) 0.6 0.8 1.0 m = 3, P = N(1000, 3002) 0.25 0.50 0.75 1.00 1.4 m = 3, P = N(1000, 6002) 0.6 0.8 1.0 m = 3, P = N(1000, 6002) 0.25 0.50 0.75 1.00 Similarity between products (γ) m = 3, P = N(1000, 9002) 0.6 0.8 1.0 Simplicity of learning task (β) m = 3, P = N(1000, 9002) Average coalition size Figure 2: The dependence of the average coalition size on γ and β for in synthetic experiments. The y-axes report the mean of the average size of the coalitions in the equilibrium partition, where mean is taken over 10000 Monte Carlo simulations of the game. The firms dataset sizes are sampled from clipped (at 1) distribution P. There are 3 firms in the market engaging in the Cournot competition. The default values of γ and β are equal to 0.8 and 0.9, respectively. 0.25 0.50 0.75 1.00 m = 4, P = N(1000, 3002) 0.6 0.8 1.0 m = 4, P = N(1000, 3002) 0.25 0.50 0.75 1.00 m = 4, P = N(1000, 6002) 0.6 0.8 1.0 m = 4, P = N(1000, 6002) 0.25 0.50 0.75 1.00 Similarity between products (γ) m = 4, P = N(1000, 9002) 0.6 0.8 1.0 Simplicity of learning task (β) m = 4, P = N(1000, 9002) Average coalition size Figure 3: The dependence of the average coalition size on γ and β for in synthetic experiments. The y-axes report the mean of the average size of the coalitions in the equilibrium partition, where mean is taken over 10000 Monte Carlo simulations of the game. The firms dataset sizes are sampled from clipped (at 1) distribution P. There are 4 firms in the market engaging in the Cournot competition. The default values of γ and β are equal to 0.8 and 0.9, respectively. 0.25 0.50 0.75 1.00 m = 5, P = N(1000, 3002) 0.6 0.8 1.0 m = 5, P = N(1000, 3002) 0.25 0.50 0.75 1.00 m = 5, P = N(1000, 6002) 0.6 0.8 1.0 m = 5, P = N(1000, 6002) 0.25 0.50 0.75 1.00 Similarity between products (γ) m = 5, P = N(1000, 9002) 0.6 0.8 1.0 Simplicity of learning task (β) m = 5, P = N(1000, 9002) Average coalition size Figure 4: The dependence of the average coalition size on γ and β for in synthetic experiments. The y-axes report the mean of the average size of the coalitions in the equilibrium partition, where mean is taken over 10000 Monte Carlo simulations of the game. The firms dataset sizes are sampled from clipped (at 1) distribution P. There are 5 firms in the market engaging in the Cournot competition. The default values of γ and β are equal to 0.8 and 0.9, respectively.