# delegated_online_search__e9e39776.pdf

Delegated Online Search

Pirmin Braun1 , Niklas Hahn2 , Martin Hoefer1 and Conrad Schecker1

1 Goethe University Frankfurt, Germany 2 Sorbonne Universit e, CNRS, LIP6 UMR 7606, Paris, France pirminbraun16@gmail.com, niklas.hahn@lip6.fr, {mhoefer,schecker}@em.uni-frankfurt.de

In a delegation problem, a principal P with commitment power tries to pick one out of n options. Each option is drawn independently from a known distribution. Instead of inspecting the options herself, P delegates the information acquisition to a rational and self-interested agent A. After inspection, A proposes one of the options, and P can accept or reject. In this paper, we study a natural online variant of delegation, in which the agent searches through the options in an online fashion. How can we design algorithms for P that approximate the utility of her best option in hindsight? We show that P can obtain a Θ(1/n)-approximation and provide more fine-grained bounds independent of n based on two parameters. If the ratio of maximum and minimum utility for A is bounded by a factor α, we obtain an Ω(log log α/ log α)- approximation algorithm, and we show that this is best possible. If P cannot distinguish options with the same value for herself, we show that ratios polynomial in 1/α cannot be avoided. If the utilities of P and A for each option are related by a factor β, we obtain an Ω(1/ log β)-approximation, and O(log log β/ log β) is best possible.

1 Introduction

The study of delegation problems is a prominent area with numerous applications. There are two parties a decision maker (called principal) P and an agent A. n actions or options are available to P. Each option has a utility for P and a (possibly different) utility for A, which are drawn from a known distribution D. Instead of inspecting options herself, P delegates the search for a good option to A. A sees all realized utility values and sends a signal to P. Based on this signal (and D), P chooses an option. Both parties play this game in order to maximize their respective utility from the chosen option. Many interesting applications can be captured within this framework. For example, consider a company that is trying to hire an expert in a critical area. Instead of searching the market, the company delegates the search to a head-hunting agency that searches the market for suitable candidates. Al-

ternatively, consider an investor, who hires a financial consultant to seek out suitable investment opportunities. Clearly, principal and agent might not always have aligned preferences. While the investor might prefer investments with high interest rates, the financial consultant prefers selling the products for which he gets a provision. In applications such as searching for job candidates or financial investments, availability of options often changes over time, and the pair of agents needs to solve a stopping problem. For example, many lucrative financial investment opportunities arise only within short notice and expire quickly. Therefore, a consultant has to decide whether or not to recommend an investment without exactly knowing what future investment options might become available. Here A faces an online search problem, in which the n options are realized in a sequential fashion. After seeing the realization of option i, he has to decide whether to propose the option to P or discard it. If the option is proposed, P decides to accept or reject this option and the process ends. Otherwise, the process continues with option i + 1. In the study of delegation problems, P usually has commitment power, i.e., P specifies in advance her decision for each possible signal, taking into account the subsequent best response of A. This is reasonable in many applications (e.g., an investor can initially restrict the investment options she is interested in, or the company fixes in advance the required qualifications for the new employee). Interestingly, although P commits and restricts herself in advance, this behavior is usually in her favor. The induced best response of A can lead to better utility for P than in any equilibrium, where both parties mutually best respond. Using a revelation-principle style argument, the communication between P and A can be reduced to A revealing the utilities of a single option and P deciding to accept or reject that option (for a discussion, see, e.g. [Kleinberg and Kleinberg, 2018]). The combination of online search and delegation has been examined before, albeit from a purely technical angle. Kleinberg and Kleinberg [2018] designed approximation algorithms for delegation, showing that P can obtain a constantfactor approximation to the expected utility of her best option in hindsight. Their algorithms heavily rely on techniques and tools developed in the domain of prophet inequalities. However, they are applied to an offline delegation problem. Instead, we consider the natural extension of [Kleinberg and

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

Kleinberg, 2018] to online search. Interestingly, we exhibit a notable contrast in online delegation a constant-factor approximation might be impossible to achieve. In fact, the approximation ratio can be as low as O(1/n), and this can always be achieved. Motivated by this sharp contrast, we provide a fine-grained analysis based on two natural problem parameters: (1) the discrepancy of utility for the agent, and (2) the misalignment of agent and principal utilities.

We study online delegation between principal P and agent A in (up to) n rounds. In every round i, an option is drawn independently from a known distribution Di with finite support Ωi of size si. We denote the options of Di by Ωi = {ωi1, . . . , ωi,si} and the random variable of the draw by Oi. For every i [n] and j [si], the option ωij has probability pij to be drawn from Di. If this option is proposed by A and chosen by P, then A has utility aij 0 and P utility bij 0. We assume that P has commitment power. Before the start of the game, she commits to an action scheme φ with a value φij [0, 1] for each option ωij. φij is the probability that P accepts option ωij when it is proposed by A in round i. For a deterministic scheme with all φij {0, 1}, we define the sets Ei = {ωij | φij = 1} of acceptable options in each round i. In contrast to P, A gets to see the n random draws from the distributions in an online fashion. He decides after each round whether he proposes the current option Oi to P or not. If he decides to propose it, then P sees the option and decides based on φ whether or not she accepts it. If P accepts, the utility values of the option are realized; if not, both players get utility 0. In either case, the game ends after P decides. Both players strive to maximize their expected utility. Initially, both players know the distribution Di for every round i [n]. The sequence of actions then is as follows: (1) P decides φ and communicates this to A; (2) in each round i, A sees Oi Di and irrevocably decides to propose or discard Oi; (3) when A decides to propose some option Oi = ωij, then P accepts it with probability φij, and the game ends.1

A knows the distributions and the action scheme φ of upcoming rounds which determines his expected utility from proposed options. Hence, A essentially faces an online stopping problem that can be solved via backwards induction. We can assume without loss of generality that all decisions (not) to propose an option by A are deterministic: If the expected utility from the realization in the current round is greater than the expected utility obtained in the remaining rounds, propose the current option (otherwise do not). To avoid technicalities, we assume that A breaks ties in favor of P. Our goal is to design action schemes φ with high expected utility for P. We compare the expected utility to the one in the non-delegated (online) problem, where P searches through the n realized options herself. The latter is a classic stopping problem, for which the expected utilities of optimal online and offline search differ by at most a factor of 2 (due to

1In the full version [Braun et al., 2022], we also consider a variant with k 1 proposals. Here the game continues until P accepts or after P has rejected k different options. Action schemes become more complicated and must rely on the history of rejected options.

Round 1 Round 2 option ωij ω11 ω12 ω21 ω22 value-pair (aij, bij) (3,1) (3,8) (2,4) (16,4) probability pij 0.75 0.25 0.75 0.25

Table 1: An example instance

the basic prophet inequality [Krengel and Sucheston, 1977; Krengel and Sucheston, 1978]). We also analyze scenarios with oblivious and semioblivious proposals. In both these scenarios, A reveals only the utility value bij for P when proposing an option (but not his own value aij). In contrast, when P gets to see the utility values of both agents, we term this conscious proposals. The difference between semi-oblivious and (fully) oblivious scenarios lies in the prior knowledge of P. In the semi-oblivious scenario, P is fully aware of the distributions, including all potential utility values aij for A. In the oblivious scenario, P initially observes the probabilities of all options along with her utility values bij, but the values aij of A remain unknown to P throughout. In the scenarios with restricted discrepancy (in Section 3), we assume P is aware of the bound α = maxi,j aij/ mini,j aij.

Example 1. We discuss a simple example to illustrate the definitions. We consider deterministic strategies by P and conscious proposals. There are two rounds with the options distributed according to Table 1. For the benchmark, we assume that P can see and choose the options herself. The best option is ω12. If this is not realized in round 1, the option realized in round 2 is the best choice. Note that this optimal choice for P can be executed even in an online scenario, where she first sees round 1 and gets to see round 2 only after deciding about round 1. The expected utility of this best (online) choice for P is 5. Now in the delegated scenario, suppose P accepts option ω22. Then A would always wait for round 2 and hope for a realization of ω22, even if ω21 would not be accepted by P. Hence, accepting ω22 leads to an expected utility for P of at most 4. In contrast, the optimal decision scheme for P is to accept only ω12 and ω21 with an expected utility of 4.25. For the (semi-)oblivious scenario, P cannot distinguish the options in round 2, and her expected utility is at most 4. Clearly, P has to strike a careful balance between (1) accepting a sufficient number of high-profit options to obtain a high expected utility overall and (2) rejecting options to motivate A to propose better options for P in earlier rounds.

1.2 Contribution and Outline In Section 2 we show that the worst-case approximation ratio for online delegation is Θ(1/n) and this is tight. Intuitively, A waits too long and forgoes many profitable options for P. P can only force A to propose earlier if she refuses to accept later options this, however, also hurts the utility of P. The instances require a ratio of maximum and minimum utility values for A that is in the order of nΘ(n). We further show that the bounds extend to more general variants in which (1) A has a lookahead of k rounds, or (2) A can propose up to

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

k options, resulting in tight approximation ratios of Θ(k/n). Note that all impossibility results in this paper apply already for IID instances, while all algorithmic results apply for general instances. In Section 3, we examine the effect of the discrepancy of utility for A using the ratio α of maximum and minimum utility values. We obtain an Ω(log log α/ log α)-approximation of the optimal (online) search for P. This result is tight. The algorithm limits the acceptable options of P, partitions them into different bins, and then restricts A s search space to the best possible bin for P. The challenge is to design a profitable set of options inside a bin that should be accepted by P without giving A an incentive to forgo proposing many of them. Our algorithm shows that P can obtain a good approximation even if differences in utility of A are polynomial in n. Additionally, we consider the more challenging oblivious scenario in which P does not get to see the agent s utility of the proposed option. Our algorithm for this scenario achieves an Ω(1/α)-approximation. This is contrasted with a set of instances for which no action scheme can extract more than an O(1/α)-approximation. We study the semi-oblivious scenario in the full version [Braun et al., 2022]. In this scenario, P has a priori knowledge of the prior, but still does not see the agent s utility for proposed options. For this setting, our algorithm achieves an Ω(1/( α log α))-approximation, and in general O(1/ α) is best-possible. The results highlight the effect of hiding A s utilities from P (in the proposal, or in the proposal and the prior) the achievable approximation ratios increase from logarithmic to polynomial ratios in α. In Section 4, we consider the misalignment of agent and principal utilities via a parameter β 1, which is the smallest value such that all utilities of P and A are related by a factor in [1/β, β]. Limited misalignment also leads to improved approximation results for P. We show an Ω(1/ log β)-approximation of the optimal (online) search for P. Moreover, every algorithm must have a ratio in O(log log β/ log β). For the agent-oblivious variant, we obtain an Ω(1/β)-approximation, whereas every algorithm must have a ratio in O(1/ β). Additional descriptions, pseudocode for our algorithms as well as all missing proofs can be found in the full version of this paper [Braun et al., 2022].

1.3 Related Work

Holmstrom [1977; 1984] initiated the study of delegation as a bilevel optimization between an uninformed principal and a privately informed agent. The principal delegates the decision to the agent who himself has an interest in the choice of decision. Since then, there have been numerous works on various aspects of delegation. For example, [Melumad and Shibano, 1991; Alonso and Matouschek, 2008] studied the impact of (mis)alignment of the agent s and the principal s interests on the optimal delegation sets. Armstrong and Vickers [2010] studied the delegation problem over discrete sets of random cardinality with elements drawn from some distribution. They identify sufficient conditions for the search problem to have an optimal solution.

A similar model to ours was considered in computer science by Kleinberg and Kleinberg [2018], where the option set searched by the agent consists of n IID draws from a known distribution. They show constant-factor approximations of the optimal expected principal utility when performing the search herself rather than delegating it to the agent. For their analysis, they rely on tools from online stopping theory. The key difference between their work and our paper is that albeit using tools from online optimization they study an offline problem while we focus on an online version. Bechtel and Dughmi [2021] recently extended this line of research by combining delegation with stochastic probing. Here a subset of elements can be observed by the agent (subject to some constraints), and several options can be chosen (subject to a different set of constraints). Similarly, Bechtel et al. [2022] study connections between delegation and a generalized Pandora s Box problem. A related but different area is contract theory, which considers principal-agent settings with uncertainty and monetary transfers. An early formalization was introduced by Grossmann and Hart [1983]. Computational aspects of this problem are recently starting to attract interest, see, e.g., Duetting et al. [2019; 2020] and earlier work (on a slightly different model) [Babaioff et al., 2012; Babaioff and Winter, 2014]. The study of persuasion, another model of strategic communication, has gained a lot of traction at the intersection between economics and computation in recent years. Here, the informed party (the sender ) is the one with commitment power, trying to influence the behavior of the uninformed agent (the receiver ). Persuasion has proven to be a popular topic in AI. Castiglioni et al. [2022] studied Bayesian posted price auctions where buyers arrive sequentially and receive signals from the revenue maximizing seller. Moreover, signaling may be used in other settings, e.g., persuading voters [Castiglioni et al., 2020a; Castiglioni and Gatti, 2021], or for reducing social cost in congestion games with uncertain delays [Castiglioni et al., 2021a; Griesbach et al., 2022]. Closer to our paper is the study of persuasion in the context of stopping problems [Hahn et al., 2020b; Hahn et al., 2020a]. These works study persuasion problems in a prophet inequality [Hahn et al., 2020a] as well as in a secretary setting [Hahn et al., 2020b]. Other notable algorithmic results on persuasion problems concern optimal policies, hardness, and approximation algorithms in the general case [Dughmi and Xu, 2016] as well as in different variations, e.g., with multiple receivers [Babichenko and Barman, 2017; Badanidiyuru et al., 2018; Dughmi and Xu, 2017; Rubinstein, 2017; Xu, 2020], with limited communication complexity [Dughmi et al., 2016; Gradwohl et al., 2021], or relations to online learning [Castiglioni et al., 2020b; Castiglioni et al., 2021b; Zu et al., 2021]

2 Impossibility

2.1 A Tight Bound As a first simple observation, note that P can always achieve an n-approximation with a deterministic action scheme, even in with oblivious proposals. P accepts exactly all options in

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

a single round i with optimal expected utility, i.e., Ei = {ωi ,j | j [si ]} for i = arg maxi [n] E[bij], and Ej = otherwise. This motivates A to always propose the option from round i , and P gets expected utility E[bi ,j]. By a union bound, the optimal utility from searching through all options herself is upper bounded by E [P i bij] n E[bi ,j].

Proposition 1. For online delegation there is a deterministic action scheme φ such that P obtains at least a 1/napproximation of the expected utility for optimal (online) search.

We show a matching impossibility result, even in the IID setting with Di = D for all rounds i [n], and when P gets to see the full utility pair of any proposed option. There are instances in which P suffers a deterioration in the order of Θ(n) over the expected utility achieved by searching through the options herself. For the proof, consider the following class of instances. The distribution D can be cast as an independent composition, i.e., we independently draw the utility values for P and A. For P there are two possibilities, either utility 1 with probability 1/n, or utility 0 with probability 1 1/n. For A, there are n possibilities with agent utility of n4ℓ, for ℓ= 1, . . . , n, where each one has probability 1/n. In combination, we can view D as a distribution over j = 1, . . . , 2n options. Options ωj for j = 1, . . . , n have probability 1/n2 and utilities (bj, aj) = (1, n4j), for j = n + 1, . . . , 2n they have probability 1/n 1/n2 and utilities (bj, aj) = (0, n4(j n)).

Theorem 1. There is a class of instances of online delegation in the IID setting, in which every action scheme φ obtains at most an O(1/n)-approximation of the expected utility for optimal (online) search.

Proof. For simplicity, we first show the result for schemes φ with φij = 0 for all rounds i [n] and all j = n+1, . . . , 2n. In the end of the proof we discuss why this can be assumed for an optimal scheme. Since all options j [n] have the same utility for P, she wants to accept one of them as soon as it appears. If she searches through the options herself, the probability that there is an option of value 1 is 1 (1 1/n)n 1 1/e. Her expected utility is a constant. In contrast, when delegating the search to A, the drastic utility increase motivates him to wait for the latest round in which a better option is still acceptable by P. As a result, A waits too long, and removing acceptable options in later rounds does not remedy this problem for P. More formally, interpret an optimal scheme φ as an n n matrix, for rounds i [n] and options j [n]. We outline some adjustments that preserve the optimality of matrix φ. Consider the set S of all entries with φij 1/n. For each (i, j) S, the probability that option j is realized in round i is 1/n2. When it gets proposed by A, then it is accepted by P with probability at most 1/n. By a union bound, the utility obtained from all these options is at most |S|/n2 1/n 1/n. Suppose we change the scheme by decreasing φij to 0 for each (i, j) S. Then each entry in φ is either 0 or at least 1/n. If A makes the same proposals as before, the change decreases the utility of P by at most 1/n. Then again, in the new scheme A can have an incentive to propose other options

in earlier rounds. Since all options with φij = 0 have utility 1 for P, this only leads to an increase of utility for P. Moreover, in round 1 we increase all acceptance probabilities to φ1j = 1 for j [n]. Then, upon arrival of such an option ωj in round 1, the change can incentivize A to propose this option which is clearly optimal for P, since this is an optimal option for her. Since the change is in round 1, it introduces no incentive to wait for A. As such, it can only increase the utility for P. Now consider any entry φij 1/n. We observe two properties:

1. Suppose φi j 1/n for i < i and j < j. Then P accepts realization ωj in round i with positive probability, but she will also accept the better (for A) realization ωj in a later round i with positive probability. A will not propose ωj in round i but wait for round i, since the expected utility in the later round i is at least n4j 1/n2 φij n4j 3 > n4(j 1) n4j φi j , the utility in round i . As such, w.l.o.g. we set φi j = 0 for all i < i and j < j. 2. Suppose φi j < φij for i < i. By property 1., all realizations ωj with j < j are not accepted in rounds 1, . . . , i 1. Hence, setting φi j = φij does not change the incentives for A w.r.t. other options, and thus only (weakly) increases the expected utility of P. By the same arguments, we set φij = max{φij , φij} for all inferior options j < j in the same round i.

We apply the adjustments implied by the two properties repeatedly, starting for the entries φin in the n-th column for option ωn, then in column n 1, etc. By 1., every positive entry φij 1/n leads to entries of 0 in all dominated entries φi j with i < i and j < j. As a consequence, the remaining positive entries form a Manhattan path in the matrix φ. The path starts at φ1n, ends at φn1, and for each φij 1/n it continues either at φi+1,j 1/n or φi,j 1 1/n. See Table 2 for an example. We can upper bound the expected utility of P by assuming that all 2n 1 entries on the Manhattan path are 1 (i.e., φ is deterministic) and A proposes an acceptable option whenever possible. The probability that this happens is at most (2n 1)/n2 = O(1/n) by a union bound. This is an upper bound on the expected utility of P and proves the theorem for schemes with φij = 0 for all i [n] and j n + 1. Finally, suppose φij > 0 for some j n + 1. Clearly, option ωj adds no value to the expected utility of P. Moreover, the fact that ωj has positive probability to be accepted in round i can only motivate A to refrain from proposing inferior options in earlier rounds. As such, setting φij = 0 only (weakly) increases the utility of P.

The lower bound in Theorem 1 remains robust in several extensions of the model. In the full version of this paper [Braun et al., 2022], we discuss two generalizations (when A has a lookahead of k rounds, and when A is allowed to make k proposals). For both scenarios, we show the following lower bounds (along with simple algorithms providing matching upper bounds). Corollary 1. Consider online delegation in the IID setting with either (1) lookahead k, or (2) k proposals. For each

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

Rnd. ω1 ω2 ω3 ω4 1 0.3 0.2 0.6 0.9 2 0.4 0.3 0.8 1 3 0.2 0.4 0.6 0.2 4 0.5 0.1 0.7 0

Rnd. ω1 ω2 ω3 ω4 1 0.3 0 0.6 0.9 2 0.4 0.3 0.8 1 3 0 0.4 0.6 0 4 0.5 0 0.7 0

Rnd. ω1 ω2 ω3 ω4 1 0 0 0 1 2 0 0 1 1 3 0 0 0.7 0 4 0.7 0.7 0.7 0

Table 2: Adjustments from the proof of Theorem 1 for an example scheme φ for n = 4. Left: Entries φij < 1/n (bold) get set to 0. The expected utility for P decreases by at most 1/n. Middle: Italic entries show options that never get proposed due to options with bold entries (cf. property 1.). Italic entries can be set to 0. Right: Bold entries have been raised according to property 2. A Manhattan path of 2n 1 entries evolves.

scenario, there are classes of instances such that every action scheme φ obtains at most an O(k/n)-approximation of the expected utility for optimal (offline) search.

3 Discrepancy of Agent Utilities 3.1 Conscious Proposals The lower bound instances in Theorem 1 have agent utilities between 1 and n O(n). Is such a drastic discrepancy necessary to show a substantial lower bound? Can we obtain better approximation results for instances with a smaller ratio of the maximum and minimum utility values for A? Here, we restrict to aij > 0 for all options and study the cases with aij = 0 in the full version [Braun et al., 2022]. Let α = max{aij | i [n], j [si]}/ min{aij | i [n], j [si]}. W.l.o.g. we scale all utility values to aij [1, α], where both boundaries α and 1 are attained by at least one option. Then A has α-bounded utilities. We show how to compute a good action scheme with respect to parameter α. Intuitively, we partition the best options for P that add up a total probability mass of roughly 1/2 into O(log α/ log log α) many bins. Each bin is constructed in a way such that A is incentivized to propose the first option he encounters from that particular bin, assuming that P only accepts options from that bin. The algorithm determines the best bin for P and deterministically restricts the acceptable options to the ones from this bin. Let us discuss the algorithm in more detail (pseudocode can be found in the full version of this paper [Braun et al., 2022]). As a first step, the algorithm uses a procedure Restrict Options(D1, . . . , Dn, m) with parameter m = 1/2 as subroutine. The procedure considers all options in descending order of principal utility until a total probability mass m is reached. Starting out with ˆQ = , options are added to ˆQ = {(i, j) | bij bi ,j (i , j ) / ˆQ} as long as P

(i,j) ˆ Q pij < m. The first option (i , j ) that would reach or surpass the combined mass of m (i.e., such that P

(i,j) ˆ Q {(i ,j )} pij m) is considered separately.

The procedure Restrict Options then returns either Q = ˆQ or Q = {(i , j )}, whichever set provides a better expected utility for P. As a consequence, P (i,j) Q pij bij m/2 Eωij Di[maxi [n] bij]. Lemma 1 summarizes the claim. The proof can be found in the full version [Braun et al., 2022]. Lemma 1. The subroutine Restrict Options(D1, . . . , Dn, m) with distributions D1, . . . , Dn and a parameter 0 < m 1

as input identifies Q, the best set of options for P, such that X

(i,j) Q pij bij m/2 Eωij Di[max i [n] bij]

and either (1) the combined mass P (i,j) Q pij < m or (2) all options in Q arrive in the same round.

If the set Q returned by Restrict Options only spans a single round i, the agent will always be incentivized to propose an acceptable option in round i. For this scenario, the algorithm only creates a single bin B1. Otherwise, it continues with the second and third step described in the following. In the second step of the algorithm, the options identified by Restrict Options are classified by their utility for A. The algorithm divides Q into c = log2 α classes depending on the agent utility such that the lowest and highest agent utilities in any given class differ by at most a factor of 2. More precisely, classes C1, . . . , Cc are constructed such that Ck = {(i, j) Q | aij [2k 1, 2k)} for k = 1, . . . , c 1 and Cc = {(i, j) Q | aij [2c 1, 2c]}. For the third step, subsequent classes (by their agent utility value) are combined into bins such that (1) the bins are as big as possible and (2) A optimizes his own expected utility by proposing the first option he encounters from any bin assuming that only options from this bin are allowed. Classes are either added to a bin completely or not at all. Let s be the index of the class with the highest agent utilities currently considered, i.e., the first class to be added to the current bin Bb. We consider the classes by decreasing agent utility values, i.e., with indices k = s, s 1, . . . . While 2k 1 2s P (i,j) Bb Ck pij, a rational A will always propose the first option available from the current bin if that is the only allowed bin as it has a higher utility than what A can expect from later rounds. Hence, the class currently under consideration can be added to the current bin. Finally, having constructed all bins, the algorithm simply chooses the best one for P. Lemma 2 shows that our algorithm achieves an approximation ratio which is linear in the number of bins opened. The subsequent Lemma 3 bounds the number of bins opened, showing that it is in the order of O(log α/ log log α). Together, the lemmas prove Theorem 2, our main result of the section.

Lemma 2. Let ℓbe the number of bins opened by the algorithm. Then the scheme computed by the algorithm obtains

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

at least an 1/(8ℓ)-approximation of the expected utility of the best option for P in hindsight.

Proof. We know that Q satisfies 4 P (i,j) Q pij bij Eωij Di[maxi [n] bij] = OPT by Lemma 1. Now consider the construction of the bins. Suppose we split Q into ℓbins B1, B2, . . . , Bℓ. We pick the best one Bb from the ℓbins B1, . . . , Bℓ, so X

(i,j) Bb pijbij 1

(i,j) Q pijbij 1

The action scheme restricts attention to Bb and accepts each proposed option ωij from the bin with probability 1. Let k = min{k | Ck Bb } be the class of smallest index in Bb , and k+ the one with largest index, respectively. Now suppose the agent learns in round i that an option ωij with (i, j) Bb arrives in this round. We claim that A will then decide to propose this option. This is obvious if all options in Bb are only realized in round i. Otherwise, the agent might want to wait for an option in a later round. If A proposes, then his utility is aij. Otherwise, if he waits for another option from Bb in a later round, then a union bound shows that the expected utility is at most X

(i ,j ) Bb i >i

pi j ai j X

(i ,j ) Bb i >i

(i ,j ) Bb pi j 2k 1 aij ,

where the second-to-last inequality is a consequence from the construction of the bin. Hence, the first option from the bin that is realized also gets proposed by A and accepted by P. Now for each option (i, j) Bb , the probability that this option is proposed and accepted is the combination of two independent events: (1) no other option from Bb was realized in any of the rounds i < i, (2) option ωij is realized in round i. The probability for event (2) is pij. For event (1), we define mi = P (i,j) Bb pij. With probability Q i <i(1 mi ), no option from Bb is realized in rounds i < i. Note that Pn i=1 mi 1/2. The term Qn i=1(1 mi) is minimized for m1 = 1/2 and mi = 0 for 1 < i < i. Thus Qn i=1(1 mi) 1/2, i.e., the probability of event (1) is at least 1/2. Overall, by linearity of expectation, the expected utility of P when using φ is at least X

1 2 pij bij 1

(i,j) Q pij bij 1

Lemma 3. Let ℓbe the number of bins opened by the algorithm. It holds that ℓ= O(log α/ log log α).

Proof. Consider a bin B and its mass p B = P (i,j) B pij. We want to argue that at most O(c/ log c) bins are opened. To do so, we first condition on having ℓopen bins and strive to lower bound the number of classes in these ℓbins.

Consider a bin B starting at Cs. The algorithm adds classes to B until 2k 1 < 2sp B. Thus, s k + 1 > log2(1/p B), i.e., the number of classes in Bi is lower bounded by log2(1/p B). Now consider two bins Bi and Bi+1 and condition on q = p Bi + p Bi+1. Together the bins contain at least log2(1/p Bi) + log2(1/(q p Bi)) classes. Taking the derivative for p Bi, we see that this lower bound is minimized when p Bi = q/2 = p Bi+1. Applying this balancing step repeatedly, the lower bound on the number of classes in all bins is minimized when p Bi = p Bj for all bins Bi, Bj. Thus, when opening ℓbins, we obtain the smallest lower bound on the number of classes in these bins by setting p Bi = 1/ℓ P (i,j) Q pij < 1/(2ℓ) for all bins Bi. Conversely, when opening ℓbins, we need to have at least ℓlog2(2ℓ) classes in these bins. Now, since we need to put c classes into the bins, we need to ensure that for the number ℓof open bins we have ℓ(log2 ℓ+ 1) c, since otherwise the ℓbins would require more than c classes in total. This implies that c = Ω(ℓlog2 ℓ) and, hence, ℓ= O(c/ log c) = O(log α/ log log α).

Theorem 2. If the agent has α-bounded utilities, there is a deterministic action scheme such that P obtains an Ω(log log α/ log α)-approximation of the expected utility for optimal (online) search.

Observe that the approximation ratio of this algorithm is tight in general. Consider the instances in Theorem 1 with α = n O(n). The theorem shows that every scheme can obtain at most a ratio of O(1/n) = O(log log α/ log α).

3.2 Oblivious Proposals In the previous section, we considered algorithms for P when she learns the utility pair for the proposed option. In this section, we show that (fully) oblivious proposals can be a substantial drawback for P. Obviously, the lower bound in Theorem 1 remains intact even for oblivious proposals, when P does not learn the utility value of the proposed option for A. For oblivious proposals and α-bounded agent utilities, we can significantly strengthen the lower bound. In contrast to the logarithmic approximation guarantee above, we provide a linear lower bound in α for oblivious proposals.

Theorem 3. There is a class of instances of online delegation with α-bounded utilities for the agent and oblivious proposals, in which every action scheme φ obtains at most an O(1/α)-approximation of the expected utility for optimal (online) search.

Proof. Consider the following class of instances. In Di, there are two options with the following probabilities and utilities: ωi1 with pi1 = 1 1/n and (bi1, ai1) = (0, 1), as well as ωi2 with pi2 = 1/n and (bi2, ai2) = (1, xi), where xi {1, α} and α [1, n]. In the first rounds i = 1, . . . , i 1 we have xi = 1, then xi = α for rounds i = i , . . . , n. The expected utility when P performs (undelegated) online search is 1 (1 1/n)n 1 1/e. P wants that A proposes any profitable option ωi2 as soon as possible. As in the proof of Theorem 1, we can assume that all φi1 = 0 in an optimal scheme this option has no value for P and can only raise the incentive to wait for A.

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

Due to oblivious proposals, P has to choose φ without being aware of the value of i . For our impossibility result, we adjust i to the scheme φ chosen by P: Set i {1, . . . , n} to the largest number such that Pn i=i φi2 e n/α, or i = 1 if no such number exists. First, suppose that i = 1. Then, even if we force A to propose every option ωi2 as soon as it arises, a union bound shows that the expected utility of P is upper bounded by Pn i=1 1 n φi2 e

n. Hence, P obtains only an O(1/α)- approximation, for any α [1, n]. Now suppose that i > 1. Consider an optimal scheme φ for P. If ωi2 arises in round i, A decides if it is more profitable to propose i or wait for a later round. Indeed, we show that A never proposes ωi2 in any round i < i . Consider the expected utility from proposing the first option ωk2 arising in rounds k = i , . . . , n. This is

α = 1 φi2 ,

i.e., strictly larger than the expected utility φi2 from proposing ωi2 in round i < i . Hence, A only proposes in rounds k = i , . . . , n. Even if A would be able to propose every option ωk2 in rounds k = i , . . . , n, a union bound implies that the expected utility of P from these rounds is upper bounded by Pn k=i 1

n. For any α [1, n], P obtains an O(1/α)-approximation.

Theorem 4. If the agent has α-bounded utilities and makes oblivious proposals, there is a deterministic action scheme such that P obtains an Ω(1/α)-approximation of the expected utility for optimal (online) search.

For the scheme, P simply sets φij = 1 for all (i, j) Q, where Q is the set returned by Restrict Options(D1, . . . , Dn, 1/(2α)). The key insight is that by bounding the combined mass of acceptable options by 1/(2α), A is incentivized to propose the first acceptable option he encounters. Pseudocode as well as the proof of Theorem 4 can be found in the full version [Braun et al., 2022].

4 Misalignment of Principal and Agent Utility

In this section, we consider performance guarantees based on the amount of misalignment of principal and agent utility. For most of the section, we assume that all utility values are strictly positive. Consider the smallest β 1 such that

for any two options ωij and ωi j in the instance. Then the preference of P between any pair ωij, ωi j of options is shared by A up to a factor of at most β. We term this βbounded utilities. Alternatively, consider γ 1 as a bound

on the utility ratio 1/γ aij bij γ aij for every single ωij. Then the utilities are γ2-bounded in the sense defined above, and we obtain asymptotically the same bounds. Suppose we choose an arbitrary realization ωi j . Divide all utility values of P for all realizations by bi j , and all utility values of A by ai j . Note that this adjustment neither affects the incentives of the players nor the approximation ratios of our algorithms. Considering ωij with the adjusted utilities, we see that 1/β bij/aij 1 β bij/aij, and thus 1/β bij/aij β for all ωi j . This condition turns out to be convenient for our analysis. Our main idea is to use O(log β) clusters Ck to group all the options that have a utility ratio between 2k and 2k+1, i.e.,

Ck = {ωij Ω| 2k bij/aij < 2k+1}

for k = log 1/β , . . . , log β . Our deterministic scheme restricts the acceptable options to a single cluster Ck . Note that here P is assumed to see aij upon a proposal. The principal determines the cluster k , such that the best response by A (i.e., his optimal online algorithm applied with the options from that cluster) delivers the largest expected utility for P. Theorem 5. If principal and agent have β-bounded utilities, there is a deterministic action scheme such that P obtains an Ω(1/ log β)-approximation of the expected utility for optimal (online) search.

Proof. Consider any cluster Ck. We denote by b(A, k) and a(A, k) the expected utility for P and A when P uses Ck to determine φ. Now consider a hypothetical algorithm for P that observes all realizations and chooses the best option from Ck for P if possible. If there is no such option, it obtains a utility of 0. Let b(P, k) and a(P, k) be the expected utility of the hypothetical algorithm for P and A, respectively. Clearly, b(P, k) b(A, k) and a(A, k) a(P, k), but also, by definition of Ck,

b(A, k) a(A, k) 2k a(P, k) 2k b(P, k)/2 .

Now consider the best option for P in hindsight. The bestoption-algorithm for cluster Ck picks the best option in hindsight if it comes from cluster Ck. Otherwise, it returns a value of 0. Let b k be the expected utility of this algorithm for P, and let OPT be the expected utility of the best option in hindsight for P. Then

k= log 1/β b k

k= log 1/β b(P, k)

k= log 1/β b(A, k) 2 .

Hence, since the scheme chooses the cluster k that maximizes b(A, k ), we obtain an Ω(1/ log β)-approximation.

Acknowledgments Hahn gratefully acknowledges the support of GIF grant I1419-118.4/2017. Hoefer gratefully acknowledges the support of GIF grant I-1419-118.4/2017 and DFG grants Ho 3831/6-1, 7-1 and 9-1.

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

[Alonso and Matouschek, 2008] Ricardo Alonso and Niko Matouschek. Optimal delegation. Rev. Econ. Stud., 75(1):259 293, 2008. [Armstrong and Vickers, 2010] Mark Armstrong and John Vickers. A model of delegated project choice. Econometrica, 78(1):213 244, 2010. [Babaioff and Winter, 2014] Moshe Babaioff and Eyal Winter. Contract complexity. In Proc. 15th Conf. Econ. Comput. (EC), page 911, 2014. [Babaioff et al., 2012] Moshe Babaioff, Michal Feldman, Noam Nisan, and Eyal Winter. Combinatorial agency. J. Econ. Theory, 147(3):999 1034, 2012. [Babichenko and Barman, 2017] Yakov Babichenko and Siddharth Barman. Algorithmic aspects of private Bayesian persuasion. In Proc. 8th Symp. Innov. Theoret. Comput. Sci. (ITCS), pages 34:1 34:16, 2017. [Badanidiyuru et al., 2018] Ashwinkumar Badanidiyuru, Kshipra Bhawalkar, and Haifeng Xu. Targeting and signaling in ad auctions. In Proc. 29th Symp. Discret. Algorithms (SODA), pages 2545 2563, 2018. [Bechtel and Dughmi, 2021] Curtis Bechtel and Shaddin Dughmi. Delegated stochastic probing. In James R. Lee, editor, Proc. 12th Symp. Innov. Theoret. Comput. Sci. (ITCS), pages 37:1 37:19, 2021. [Bechtel et al., 2022] Curtis Bechtel, Shaddin Dughmi, and Neel Patel. Delegated Pandora s box. In Proc. 23rd Conf. Econ. Comput. (EC), 2022. [Braun et al., 2022] Pirmin Braun, Niklas Hahn, Martin Hoefer, and Conrad Schecker. Delegated online search. https://arxiv.org/abs/2203.01084, 2022. [Castiglioni and Gatti, 2021] Matteo Castiglioni and Nicola Gatti. Persuading voters in district-based elections. In Proc. 35th Conf. Artif. Intell. (AAAI), pages 5244 5251, 2021. [Castiglioni et al., 2020a] Matteo Castiglioni, Andrea Celli, and Nicola Gatti. Persuading voters: It s easy to whisper, it s hard to speak loud. In Proc. 34th Conf. Artif. Intell. (AAAI), pages 1870 1877, 2020. [Castiglioni et al., 2020b] Matteo Castiglioni, Andrea Celli, Alberto Marchesi, and Nicola Gatti. Online bayesian persuasion. In Proc. 34th Conf. Adv. Neural Inf. Processing Syst. (NIPS), 2020. [Castiglioni et al., 2021a] Matteo Castiglioni, Andrea Celli, Alberto Marchesi, and Nicola Gatti. Signaling in bayesian network congestion games: the subtle power of symmetry. In Proc. 35th Conf. Artif. Intell. (AAAI), pages 5252 5259, 2021. [Castiglioni et al., 2021b] Matteo Castiglioni, Alberto Marchesi, Andrea Celli, and Nicola Gatti. Multi-receiver online bayesian persuasion. In Proc. 38th Int. Conf. Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pages 1314 1323, 2021.

[Castiglioni et al., 2022] Matteo Castiglioni, Giulia Romano, Alberto Marchesi, and Nicola Gatti. Signaling in posted price auctions. In Proc. 36th Conf. Artif. Intell. (AAAI), pages 4941 4948, 2022. [Dughmi and Xu, 2016] Shaddin Dughmi and Haifeng Xu. Algorithmic Bayesian persuasion. In Proc. 48th Symp. Theory Comput. (STOC), pages 412 425, 2016. [Dughmi and Xu, 2017] Shaddin Dughmi and Haifeng Xu. Algorithmic persuasion with no externalities. In Proc. 18th Conf. Econ. Comput. (EC), pages 351 368, 2017. [Dughmi et al., 2016] Shaddin Dughmi, David Kempe, and Ruixin Qiang. Persuasion with limited communication. In Proc. 17th Conf. Econ. Comput. (EC), pages 663 680, 2016. [D utting et al., 2019] Paul D utting, Tim Roughgarden, and Inbal Talgam-Cohen. Simple versus optimal contracts. In Proc. 20th Conf. Econ. Comput. (EC), page 369 387, 2019. [D utting et al., 2020] Paul D utting, Tim Roughgarden, and Inbal Talgam-Cohen. The complexity of contracts. In Proc. 31st Symp. Discret. Algorithms (SODA), pages 2688 2707, 2020. [Gradwohl et al., 2021] Ronen Gradwohl, Niklas Hahn, Martin Hoefer, and Rann Smorodinsky. Algorithms for persuasion with limited communication. In Proc. 32nd Symp. Discret. Algorithms (SODA), pages 637 652, 2021. [Griesbach et al., 2022] Svenja Griesbach, Martin Hoefer, Max Klimm, and Tim Koglin. Public signals in network congestion games. In Proc. 23rd Conf. Econ. Comput. (EC), page 736, 2022. [Grossman and Hart, 1983] Sanford J. Grossman and Oliver D. Hart. An analysis of the principal-agent problem. Econometrica, 51(1):7 45, 1983. [Hahn et al., 2020a] Niklas Hahn, Martin Hoefer, and Rann Smorodinsky. Prophet inequalities for Bayesian persuasion. In Proc. 29th Int. Joint Conf. Artif. Intell. (IJCAI), pages 175 181, 2020. [Hahn et al., 2020b] Niklas Hahn, Martin Hoefer, and Rann Smorodinsky. The secretary recommendation problem. In Proc. 21st Conf. Econ. Comput. (EC), page 189, 2020. [Holmstrom, 1977] Bengt Robert Holmstrom. On Incentives and Control in Organizations. Ph D thesis, Stanford University, 1977. [Holmstrom, 1984] Bengt Robert Holmstrom. On the theory of delegation. In Marcel Boyer and Richard Kihlstrom, editors, Bayesian Models in Economic Theory, pages 115 141. Elsevier, 1984. [Kleinberg and Kleinberg, 2018] Jon Kleinberg and Robert Kleinberg. Delegated search approximates efficient search. In Proc. 19th Conf. Econ. Comput. (EC), pages 287 302, 2018. [Krengel and Sucheston, 1977] Ulrich Krengel and Louis Sucheston. Semiamarts and finite values. Bull. Amer. Math. Soc, 83:745 747, 1977.

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)

[Krengel and Sucheston, 1978] Ulrich Krengel and Louis Sucheston. On semiamarts, amarts and processes with finite value. Adv. Prob., 4:197 266, 1978. [Melumad and Shibano, 1991] Nahum D. Melumad and Toshiyuki Shibano. Communication in settings with no transfers. RAND J. Econ., 22(2):173 198, 1991. [Rubinstein, 2017] Aviad Rubinstein. Honest Signaling in Zero-Sum Games Is Hard, and Lying Is Even Harder. In Proc. 44th Int. Colloq. Autom. Lang. Programming (ICALP), pages 77:1 77:13, 2017. [Xu, 2020] Haifeng Xu. On the tractability of public persuasion with no externalities. In Proc. 31st Symp. Discret. Algorithms (SODA), pages 2708 2727, 2020. [Zu et al., 2021] You Zu, Krishnamurthy Iyer, and Haifeng Xu. Learning to persuade on the fly: Robustness against ignorance. In Proc. 22nd Conf. Econ. Comput. (EC), pages 927 928, 2021.

Proceedings of the Thirty-Second International Joint Conference on Artiﬁcial Intelligence (IJCAI-23)