# hiring_under_uncertainty__74f4b3af.pdf

Hiring Under Uncertainty

Manish Raghavan 1 Manish Purohit 2 Sreenivas Gollapudi 2

In this paper we introduce the hiring under uncertainty problem to model the questions faced by hiring committees in large enterprises and universities alike. Given a set of n eligible candidates, the decision maker needs to choose the sequence of candidates to make offers so as to hire the k best candidates. However, candidates may choose to reject an offer (for instance, due to a competing offer) and the decision maker has a time limit by which all positions must be ﬁlled. Given an estimate of the probabilities of acceptance for each candidate, the hiring under uncertainty problem is to design a strategy of making offers so that the total expected value of all candidates hired by the time limit is maximized. We provide a 2-approximation algorithm for the setting where offers must be made in sequence, an 8-approximation when offers may be made in parallel, and a 10-approximation for the more general stochastic knapsack setting with ﬁnite probes.

1. Introduction

Hiring is a core activity of any enterprise where the timely fulﬁllment of stafﬁng needs is critical to its functioning. In addition to estimating the quality and suitability of a candidate, the enterprise also needs to deal with uncertainty that arises as the result of good candidates rejecting the job offer. Balancing this trade-off between hiring good quality candidates while at the same time ensuring that all stafﬁng needs are met by a deadline is one of the most challenging aspects of hiring in practice.

A number of algorithmic questions that are inspired by hiring settings have been well-studied in literature (see Section 1.2) including the popular secretary problem and its many variants. This line of work focuses on the online nature of

1Department of Computer Science, Cornell University 2Google Research. Correspondence to: Manish Raghavan <manish@cs.cornell.edu>.

Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).

the problem and the key question tackled is how to ﬁnd a good set of candidates when the pool of future candidates is unknown. However, this line of research does not model the other source of uncertainty, i.e., the candidate itself may choose to reject the job offer (for instance, due to a better competing offer), which in turn raises the question of hiring enough candidates by the deadline.

During the hiring process, the quality (or value) of a candidate is often estimated by traditional hiring processes such as resume screening and formal interviews and even via algorithmic techniques (see Section 1.2). On the other hand, machine learning models can estimate the probability that a given candidate will accept a job offer based on various features such as the candidate s educational background, salary expectations, location preferences, and so on. Considering both the value as well as offer acceptance probability of each candidate leads to a rich collection of optimization problems. In this paper, we initiate the study of the hiring under uncertainty problem that aims to tackle the inherent trade-off at the heart of the hiring process - how should we make job offers under uncertainty so that all stafﬁng needs are met by a deadline, and yet hire the best available candidates?

Formally, we consider the following model as the basis for all the variants we present in this paper. There is a set of n candidates, and we need to hire k of them. We do this by making offers to candidates, which we ll also refer to more abstractly as probing a candidate. Each candidate i has a known value vi and probability pi of accepting an offer, independent of all other candidates. We have a deadline of t time steps, after which we can t make any further offers. It takes one time step to make an offer and receive an answer from a candidate. Job offers are irrevocable, i.e., once a candidate accepts an offer, that position is ﬁlled and we cannot replace that candidate with a better candidate in the future. The total value of a chosen set of candidates is simply the sum of the individual candidate values. Our goal is to maximize the total expected value of the hired candidates. We also consider two natural generalizations of this model. First, we allow making parallel offers to multiple candidates in a given time step. Second, we consider the knapsack hiring problem where each candidate i has a size si and we have a budget B on the total size of hired candidates.

Hiring Under Uncertainty

The knapsack hiring problem models the scenario when the enterprise has a ﬁxed budget and different candidates need to be offered different salaries.

We note that in all settings, we do not require vi to be known precisely; all of our results hold if vi is only known in expectation. However, our results are sensitive to errors in pi and si. Making them robust to such errors is an interesting subject for future work.

1.1. Our Contributions

We summarize our contributions in this study.

In Section 2, we offer a 2-approximation algorithm for hiring k candidates with a constraint of making at most t sequential offers. In Section 3, we consider the parallel offers model where we are allowed to make as many parallel offers each time step as the number of unﬁlled positions remaining and design a 8-approximation algorithm. In Section 4, we present a 10-approximation for the knapsack hiring problem where each candidate has a different size and the decision-maker is constrained by a total budget (as opposed to hiring k candidates). We offer a connection to other stochastic optimization problems such as stochastic matching and present a lower-bound for the stochastic matching problem. Finally, we show the efﬁcacy of our algorithms using simulations on data drawn from different distributions.

1.2. Related Work

Theoretical questions inspired by hiring scenarios have long been studied in the online setting under the names of optimal stopping or secretary problems (Dynkin, 1963; Chow et al., 1964). A few extensions of this setting incorporate elements of our model. Kleinberg considers the case of hiring multiple candidates instead of the traditional single-hire case (Kleinberg, 2005). An older line of work considers a version of the secretary problem in which candidates may stochastically reject offers, although this is typically modeled as a ﬁxed rejection probability (Smith, 1975; Tamaki, 1991; 2000; Ano & Ando, 2000).

In addition, more recent work on stochastic optimization considers a variety of related problems in the ofﬂine setting. This includes stochastic versions of submodular optimization (Asadpour et al., 2008; Gupta et al., 2017), knapsack (Dean et al., 2004; 2005; Bhalgat et al., 2011), bandits (Gupta et al., 2011; Ma, 2014), and matching (Bansal et al., 2010; Adamczyk et al., 2015; Baveja et al., 2018). Some special cases of our model (speciﬁcally, when one candidate is being hired) can be considered a special case of matching, and in fact, the results we derive here will provide lower bounds for stochastic matching. However, our model cannot in general be captured by any of these prior works.

Algorithmic and data-driven approaches to hiring have become increasingly common with the rise of machine learning (Miller, 2015; Carmichael, 2015). In particular, there is a long line of work focused on predicting teacher quality from data (Kane & Staiger, 2008; Dobbie, 2011; Chalﬁn et al., 2016; Jacob et al., 2018). More broadly, Mullainathan & Spiess (2017) describe the integration of machine learning with traditional econometric techniques to better estimate quantities like employee performance. Furthermore, studying the gig economy, Kokkodis et al. (2015) use machine learning to estimate the likelihood that freelancers get hired.

2. Hiring Problem: How to ﬁll k positions sequentially?

In this section, we consider the basic hiring problem where we want to hire k employees out of n potential candidates with a constraint of making at most t sequential offers.

2.1. Special case: Hiring a single employee (k = 1)

To develop some intuition about the problem as well as to illustrate some of the challenges posed, we begin with the case where k = 1, i.e. , we only want to hire one candidate. One might hope that a simple greedy algorithm is optimal in this special case. Unfortunately, as we will show, a number of seemingly natural greedy algorithms1 do not yield optimal solutions.

However, we can still take advantage of structural properties of the solution. In particular, given a set of t candidates, the optimal order in which to make offers to them is in decreasing order of vi. To see why, for any two candidates i and j, consider the four possible outcomes of making offers to them: both i and j accept, both reject, i accepts and j rejects, and vice versa. The only outcome in which the order of offers matters is when they both accept, since the position will go to the candidate receiving ﬁrst offer, and the second offer will never be made. In this case, it is clearly better to make the ﬁrst offer to the candidate with higher value.

Since the optimal algorithm must always make offers to candidates in decreasing order by value, we can write a dynamic program to compute the optimal subset of t candidates to potentially make offers to. Assume the candidates are sorted in non-increasing order of vi, i.e., v1 v2 . . . vn. Let S(i, s) be the optimal expected value that can be achieved with s time steps remaining by only considering candidates i through n. Then, we have the recurrence

S(i, s) = max{pivi + (1 pi)S(i + 1, s 1), S(i + 1, s)}

where the two terms correspond to either making an offer to candidate i or not. Note that S(1, t) then gives the value of

1For instance, sorting the candidates by decreasing pi, vi, or pi vi and then making at most t offers until one accepts.

Hiring Under Uncertainty

the optimal solution, and the offer strategy can be found by retracing the choices of the dynamic program.

2.2. General Problem: Hiring k employees (k > 1)

While the k = 1 case admits a clean solution, the general case where k > 1 is more complex. First, note that a simple k-approximation exists: with the dynamic program from Section 2.1, we can optimally ﬁll a single slot. Doing so yields a candidate who is in expectation at least as good as any of the k candidates hired by the optimal strategy. In general, however, the optimal solution may display several non-monotonicities that make it difﬁcult to extend the k = 1 solution.

Example 1. Consider the following instance with n = 4, t = 3, and k = 2.

(p1, v1) = (1, 1) (p2, v2) = (0.5, 1)

(p3, v3) = (0.5, 1) (p4, v4) = (0.1, 2)

We will show that in the optimal strategy, the offers made are not necessarily monotone in acceptance probability, value, or expected value. First, note that any deterministic strategy can be represented as a binary decision tree, where each node in the tree corresponds to a candidate to whom an offer is made. The two branches are the resulting strategies if the offer is accepted or rejected. Taking the convention that the right branch corresponds to acceptance, the optimal solution for the above instance is as shown in Figure 1.

Figure 1. An optimal solution to Example 1

Note that there are several counter-intuitive effects at play here. First, despite having the lowest acceptance probability and expected value, candidate 4 still receives an offer with probability 1/2. Second, the candidate with the highest expected value (candidate 1) receives an offer either last or not at all. Finally, despite the fact that candidates 1, 2, and 3 all have the same value, it is strictly optimal to make an offer to candidate 2 (or 3) before candidate 1, even though candidate 1 accepts with higher probability.

Thus, unlike in the k = 1 scenario, the optimal solution may not be value-ordered, so the dynamic programming approach discussed above cannot be optimal here. We conjecture that this problem is NP-hard for general k. In the remainder of this section, we present an approximation algorithm that runs in polynomial time and yields at least half of the optimal expected value. We ﬁrst show that there exists a non-adaptive algorithm yielding a 2-approximation. Then, we show that a dynamic program similar to that in Section 2.1 gives an adaptive algorithm that is better than any non-

adaptive algorithm, and hence is also a 2-approximation.

2.2.1. ESTABLISHING AN ADAPTIVITY GAP OF 2

Gupta et al. (2017) study adaptivity gaps for stochastic probing problems where the goal is to maximize a given submodular function (or XOS function) over the set of active, probed elements. In this setting, each element e is active independently with probability pe and the set of elements that are probed must satisfy given preﬁx-closed constraints. The HIRING WITH UNCERTAINTY problem does not quite ﬁt into their framework, since their framework allows one to choose the best set of active, probed elements, while in our setting we are forced to hire the ﬁrst k candidates that are active. Nevertheless, we can leverage some insights from (Gupta et al., 2017) to show an adaptivity gap of 2 (as opposed to 3 obtained by them for stochastic probing).

Similar to the one shown in Figure 1, the optimal solution to any instance can be represented by a binary tree T . Each node u of T corresponds to a candidate i (denoted by cand(u)) and has two outgoing edges leading to subtrees in case the candidate i is active (happens with probability pi) or inactive (happens with probability (1 pi)). Any root to leaf path in this tree represents the sequence of offers made by the optimal algorithm in a particular realization. The tree T naturally deﬁnes a probability distribution πT over root to leaf paths - start at the root and at each node u, follow the yes edge with probability pi where i = cand(u) and the no edge otherwise. Since the optimal strategy can make offers to at most t candidates, any such path must have at most t nodes.

Figure 2. A path with 3 segments: ABC, DE, and FGHI.

Further, since any strategy can only hire at most k candidates, any root to leaf path in T must have at most k 1 yes edges. Thus any root to leaf path P can be decomposed into at most k segments where a segment is a maximal sub-path composed of only no edges as shown in Figure 2. Let segments(P) = {S1, S2, . . . , Sℓ} denote the set of segments in P. For each segment S segments(P), let last(S) denote the last node on segment S.

Given the optimal tree T , Procedure 1 samples a single path P according to the distribution πT and then probes the candidates on each segment of P in descending order by value to hire at most one candidate from each segment. In the rest of this section, we show that Procedure 1 yields at least half of the total expected value of T in expectation.

Let val(T ) = EP πT h Pℓ j=1 vlast(Sj) i be the total ex-

Hiring Under Uncertainty

Procedure 1 Approx Given Tree(T )

1: P a random path sampled from πT 2: S1, . . . , Sℓ P divided into at most k segments {Each Sj is a list of candidates} 3: for j 1, . . . , ℓdo 4: S j Sj sorted in decreasing order of value 5: for each candidate i in S j do 6: Make an offer to candidate i 7: if i accepts then 8: break 9: end if 10: end for 11: end for

pected value of the tree T (note that ℓis a random variable). Similarly, let proc1(T ) be the expected value obtained by Procedure 1 on tree T . For any segment Sj, we deﬁne segval(Sj) to be the expected value of the active candidate from Sj with largest value. Formally, if Sj consists of candidates {1, 2, . . . , |Sj|} sorted in non-increasing order of their values, then

j<i (1 pj).

We observe that proc1(T ) = EP πT h Pℓ j=1 segval(Sj) i , as Procedure 1 obtains the value of the active element with the largest value in each Sj. The following lemma shows that in expectation, this is a 2-approximation to val(T ).

j=1 segval(Sj)

j=1 vlast(Sj)

Proof. We proceed by induction over the segments. Let I be last(S1), and let J be the random variable denoting the index of the ﬁrst active candidate on S1, so J I. If no candidate on this segment is active, we ll say J = 0. In the base case, ℓ= 1. Otherwise, by the inductive hypothesis,

j=2 segval(Sj)

j=2 vlast(Sj)

In either case, it sufﬁces to show that E [v J] 1 2E [v I]. This follows from Lemma 3.3 of Gupta et al. (2017) and the complete proof is deferred to the supplementary material.

Removing adaptivity. Note that Procedure 1 is adaptive, since it probes within a segment until it ﬁnds an active element. However, we can use it to argue about a simpler

non-adaptive algorithm: pick a random path down the optimal tree T and make offers to candidates on that path in decreasing order of value. This has value at least proc1(T ) because for any realization of active elements, making offers in decreasing order of value is always beneﬁcial. Thus, the adaptivity gap for this problem is at most 2.

2.2.2. A CONSTRUCTIVE 2-APPROXIMATION

We have shown that there exists a non-adaptive algorithm whose total expected value is at least half of the expected value of the optimal algorithm. However, this algorithm relied on the knowledge of the optimal decision tree T

and is thus non-constructive. We now design a polynomial time algorithm (seqalg) whose expected value is at least the expected value of any non-adaptive algorithm, and hence is also at least half the expected value of the optimal algorithm.

By deﬁnition, any non-adaptive algorithm must choose a ﬁxed sequence of t potential candidates and make offers to them in order until k of them accept. Further, as discussed in Section 2.1, the optimal such algorithm must probe the candidates in non-increasing order by value. However, using a dynamic programming strategy similar to that in Section 2.1, we can ﬁnd the optimal algorithm (not necessarily nonadaptive) that probes candidates in non-increasing order by value. This must be better than the optimal non-adaptive algorithm and hence is also a 2-approximation.

Dynamic Program (seqalg). Assume that candidates are sorted in non-increasing order of their values vi. Let S(i, ℓ, s) be the optimal expected value achieveable by hiring at most ℓcandidates in s time steps by only considering candidates i through n. We obtain the following recurrence:

S(i, ℓ, s) = max{pi(vi + S(i + 1, ℓ 1, s 1)) (1)

+ (1 pi)S(i + 1, ℓ, s 1), S(i + 1, ℓ, s)}.

where the two terms correspond to either making an offer to candidate i or not. Let seqalg denote the dynamic program constructed using the above recurrence. We abuse notation slightly and let seqalgk,t = S(1, k, t) be the expected value obtained by this algorithm.

Let seqoptk,t be the value of the optimal adaptive strategy. Lemma 1 shows that the optimal non-adaptive strategy is a 2-approximation to seqoptk,t. Because seqalg is at least as good as any non-adaptive strategy, we have that for a set of candidates C, seqalgk,t(C) 1

2seqoptk,t(C).

A lower bound. It is an open question as to whether the above analysis is tight. However, by modifying the probabilities and values in Example 1, we show in the supplementary material that no algorithm that provides a value-sorted solution (including seqalg) can get more than 0.927 of the optimal algorithm in general.

Hiring Under Uncertainty

3. Filling k Positions in Parallel

In the previous section, we considered the problem of hiring with sequential offers. However, if we have k positions to ﬁll, we could in principle make k offers per timestep. This is clearly more powerful than the sequential offer model, since any sequence of sequential offers is valid in the parallel model. We ll treat the constraint of ﬁlling k positions as hard, meaning that if at a particular timestep there are ℓ< k remaining unﬁlled positions, we can only make ℓoffers at that time, though it would be an interesting future direction to consider a relaxed version in which we hire at most k candidates with high probability.

Intuitively, the more slots remain available, the more offers can be made, which is beneﬁcial when there are many high-value low-probability candidates. This means an optimal strategy must somehow balance the tension between two conﬂicting objectives: ﬁlling slots and maximizing the number of offers that can be made to risky candidates. The following example demonstrates this tension. Example 2. Consider the example with n = 2t 1 candidates and k = 2. 2t 2 of the candidates have pi = 1/(2t 2) and vi = 1, and the last candidate has pn = 1 and vn = 1.

Even though candidate n will surely accept an offer, the optimal strategy here is to make offers to all of the lowprobability candidates (2 at a time) until one of them accepts, and then to make an offer to candidate n, who will deﬁnitely accept. As t gets large, this yields value approximately 2e 1

e 1.63 in expectation, Making an offer to candidate n

ﬁrst can only get value approximately 2 e 1 e 1.39, since we can only make one offer per timestep after we ﬁll the ﬁrst slot. Thus, the order in which offers are made signiﬁcantly impacts the overall value.

An 8-approximation algorithm (paralg). We now design paralg, a constructive 8-approximation algorithm, drawing on the results in Section 2. The basic idea is to relax the parallel offer instance with t timesteps to a sequential offer instance with k t timesteps, solve this using seqalgk,kt, and use the result to construct a solution to the original instance.

Given a set of candidates C, let paroptk,t(C) be the expected value of the optimal solution with parallel offers, ﬁlling k slots in t timesteps. Then, paroptk,t(C) seqoptk,kt(C), since any sequence of parallel offers can be done in sequence over kt timesteps.

We can apply the dynamic programming algorithm seqalg from Section 2 to C to get a sequential-offer strategy over kt timesteps yielding expected value at least 1

2seqoptk,kt(C). Let T be the resulting decision tree. We ll show how to convert this sequential-offer decision tree over kt timesteps into a parallel-offer strategy over t timesteps.

Procedure 2 Parallel From Sequential(T )

1: P a random path sampled from πT 2: S1, . . . , Sℓ the segments of P 3: S 1, . . . , S m segments split such that each has length at most t. 4: Sort each segment S j in decreasing order of vi. 5: Let U be the indices of the k segments with highest segval( ) 6: for s 1, . . . , t do 7: for j U do 8: Make an offer to candidate i, the sth candidate from S j, at this timestep s. 9: if i accepts then 10: Remove j from U 11: end if 12: end for 13: end for

Let proc2(T ) be the expected value of the parallel-offer strategy produced by Procedure 2, where T is the output of seqalgk,kt(C). Then, we have the following.

proc2(T ) 1

8paroptk,t(C)

Proof. By Lemma 1,

seqalgk,kt(C) 1

2seqoptk,kt(C) 1

2paroptk,t(C).

To complete the proof, we must show that proc2(T ) 1 4seqalgk,kt(C). Note that Procedure 2 yields an offer strategy in the parallel model, while T represents a strategy in the sequential model with kt timesteps.

First, observe that by applying Lemma 1 again, if we could make offers along each segment of a random path down T

in decreasing order of value, we d get a 2-approximation to seqalgk,kt(C), since we d get the maximum active element on each segment. Since there are at most k segments, we could make an offer to the highest valued candidate from each segment in the ﬁrst time step and proceed down each segment in parallel, discarding a segment once a candidate accepts an offer. However, since some segments may have length more than t, we may not have enough offers to go all the way down each segment. Consequently, in step 3 of Procedure 2, we partition the segments further so that each new segment contains at most t candidates. More formally, if a segment Sj segments(P) has length at + b for some integers a, b 0 and b < t, arbitrarily split the candidates in Sj into a+1 new segments such that a of those segments have exactly t candidates.

Let |Sj| be the size of the jth segment. When we split it up into new segments of length at most t, Sj will be turned

Hiring Under Uncertainty

t segments. Thus, after splitting, the number of new segments is at most

Pℓ j=1 |Sj|

Thus, if we were to pick k segments uniformly at random, each segment would have probability at least 1/2 of being selected randomly into U, meaning that

j=1 E segval(S j) Pr[j U] 1

j=1 E segval(S j)

j=1 E [segval(Sj)]

4E seqalgk,kt(C) ,

where the last inequality follows from Lemma 1. We can derandomize this by choosing the k segments with highest expected value, which must be at least as good as k random segments. As a result, we have

proc2(T ) 1

4E seqalgk,kt(C) 1

8paroptk,t(C).

Thus, our ﬁnal algorithm (which we call paralgk,t, or just paralg when k and t are clear) is to ﬁrst apply seqalgk,kt to C, producing a tree T , and build a parallel-offer strategy from T with Procedure 2.

paralgk,t(C) Parallel From Sequential(seqalgk,kt(C)) (2) As noted above, our ﬁnal approximation factor will be min(k, 8), since simply ﬁlling one position optimally yields a k-approximation.

4. Knapsack Hiring Problem

We now consider the knapsack hiring problem that directly generalizes the vanilla hiring problem studied in Section 2. In this case, in addition to a value vi and probability pi, each candidate i also has a size si. Instead of a number of slots k, we have a budget B on the total size of the hired candidates. As earlier, we have a deadline of t time steps and can make only one offer per time step.

The knapsack hiring problem is closely related to the wellstudied stochastic knapsack problem (Dean et al., 2004; 2005), which is as follows: we are given n items, each with a known value and size distribution . When an item is added to the knapsack, its size is drawn from this distribution. Once an item exceeds the capacity, this item must be

discarded and no further items can be added to the knapsack. In the multidimensional version, both the capacity of the knapsack and the size of an item are vectors, and the process ends once any component of the vector capacity is exceeded. We observe that the two models differ slightly since in the knapsack hiring problem, both the value and size of an item (candidate) is stochastic.

We ﬁrst give a reduction from the knapsack hiring problem to the multidimensional stochastic knapsack problem. For simplicity, we assume that the budget B = 1 without loss of generality. We construct an instance of 2-dimensional stochastic knapsack as follows - the knapsack capacity is [1 t] where the ﬁrst dimension represents the budget constraint and the second dimension represents the number of allowed probes. The size of item i is represented by the vector [si 1] if the item exists when it is probed (happens with probability pi) and [0 1] otherwise. The value v i of item i is set to the expected value obtained from candidate i, i.e., v i = pivi. With this reduction, the optimal solution to the knapsack hiring problem remains unchanged if items deterministically contribute value v i = pivi, as we show in the supplementary material.

Dean et al. (2005) give a general 1 + 6d-approximation to multidimensional stochastic knapsack, where d is the number of knapsack constraints. Directly applying this, we would get a 13-approximation in our 2-dimensional case. However, by leveraging the structure of the ﬁnite-probe problem, we can tighten this to a 10-approximation.

Without loss of generality, we assume that si 1 for all i (otherwise the item would never ﬁt in the knapsack). We also normalize the number of probes to 1, so each item uses 1/t probes. Let µ(i) denote the vector of expected size of item i, meaning µ(i) = [pisi 1/t] . Let µ(S) = P

i S µ(i). We use the notation µ1(S) and µ2(S) to denote the ﬁrst and second components of µ(S) respectively. Further, let size(i) denote the vector of the realized size of item i and let size(S) = P

i S size(i).

Procedure 3 Knapsack Finite Probes(p, v, s)

1: m1 maxi v i = maxi pivi 2: L the sequence of all items with µ(i) 1 1/3,

sorted in non-increasing order of v i µ(i) 1 3: m L Pℓ i=1 v i(1 P

j i µ1(j)), where ℓis the small-

est integer such that Pℓ i=1 µ(i) 1 < 1 4: if m1 m L then 5: Probe the item with highest expected value v i 6: else 7: Probe the items in L until the knapsack is full 8: end if

Our algorithm (Procedure 3) takes the better of two strate-

Hiring Under Uncertainty

gies: probing the item with highest expected value and probing a sequence of small items. Exactly evaluating m L, the expected value of the second of these strategies, may be difﬁcult; however, we can show that m L = Pℓ i=1 v i(1 P

j i µ1(j)) m L. We obtain v i for item i if and only if the ﬁrst i items in L all ﬁt inside the knapsack. Thus, m L = Pℓ i=1 v i Pr[ size(Li) 1], where Li denotes the set of ﬁrst i items in L. By Claim 3, m L m L. Note that Claim 3 applies since the constraint Pℓ i=1 µ(i) 1 < 1 implies ℓ< t. Claim 3. For any set A of at most t items, Pr[ size(A) 1] 1 µ1(A).

See the supplementary material for a proof.

Let greedy = max{m1, m L} be a lower bound on the expected value of Procedure 3. Let A be the random set of items that are probed by the optimal adaptive algorithm, and let opt be the expected value of the optimal algorithm. Lemma 4 ((Dean et al., 2005), Lemma 4.2 and Lemma 4.3). opt (1 + 3E [ µ(A) 1])greedy.

For any adaptive algorithm, we can bound the expected size of the set of items probed using Lemma 2 from Dean et al. (2004). In particular, we can bound the expected size of the ﬁrst component as E [µ1(A)] 2. On the other hand, since the optimal adaptive algorithm can never probe more than t items, E [µ2(A)] 1. Substituting these bounds into Lemma 4 gives us the desired 10-approximation:

opt (1 + 3E [ µ(A) 1])greedy

(1 + 3(E [µ1(A) + µ2(A)]))greedy

(1 + 3 3)greedy = 10 greedy.

5. A Lower Bound for Stochastic Matching

The hiring with uncertainty problem with k = 1 can be viewed as a special case of the stochastic matching problem, which is as follows: given a graph G = (V, E), probabilities pe and values ve for all e E, and patience parameters tv for all v V , the goal is to obtain a matching with maximum expected weight. As in our hiring problem, edges can be probed sequentially. If an edge e is found to exist, it must be added to the matching, contributing value ve. Each probe decreases the patience parameters of the incident vertices by 1, and when a vertex runs out of patience, it cannot be matched.

The state-of-the-art approach for this problem is to form a probing strategy by solving the linear program relaxation:

max x [0,1]|E| X

e |E| pevexe s.t. v X

e δ(v) xe tv (3)

e δ(v) pexe 1

This LP relaxation has been the primary approach for stochastic matching since Bansal et al. (2010), yielding a 2.845-approximation for bipartite graphs (Adamczyk et al., 2015) and a 3.224-approximation for general graphs (Baveja et al., 2018). However, little is known about the tightness of upper bound produced by the LP. Not only is there an integrality gap, but there is also a probing gap the LP does not fully account for the random realizations of probes.

With k = 1, the hiring problem is a special case of stochastic matching, since it can be expressed as matching on a star-shaped graph. Thus, we can use it provide a lower bound on the worst-case slack created by the LP. In the supplementary material we provide an example showing that the gap between the LP value and the expected value of the optimal probing strategy must be at least 1 1/e, meaning no probing strategy can approximate the optimal LP value to a factor better than e e 1 1.581.

6. Experiments

We test the performance of our algorithms for the HIRING WITH UNCERTAINTY problem in both the sequential and parallel offers setting via simulations. We generate simulated data sets as follows. The values for n = 100 candidates are chosen uniformly at random from [0, 1]. We consider three models to generate the probabilities:

Negative correlation: Higher-value candidates are less likely to accept offers. We sample pi s according to a Beta distribution, with pi Beta(10(1 vi), 10vi). Positive correlation: Higher-value candidates are more likely to accept: pi Beta(10vi, 10(1 vi)). No correlation: pi Uniform[0, 1].

On each of these data sets, we consider the performance of our three algorithms each with k = 20, namely

seqalg: The dynamic programming algorithm from Section 2 to make t sequential offers. paralg: The parallel approximation algorithm from Section 3. We take the best solution over 100 random samples (of paths). parheur: We consider the following heuristic strategy to make offers in the parallel model. Note that the parallel approximation algorithm effectively partitions the set of candidates into up to 2k sets, selects the best k of them, and makes offers to candidates in those sets in decreasing order of value. Our heuristic, then, is to randomly partition the set of candidates into k disjoint sets and use the optimal single-slot solution from Section 2.1 on each set independently to decide which of them to make offers to. These offers can be made in parallel since the sets are disjoint.

For comparison, we include two natural greedy baselines.

Hiring Under Uncertainty

(a) Negative Correlation

(b) Positive Correlation

(c) No Correlation

Figure 3. Comparison of different algorithms on three simulated data sets.

We probe candidates in decreasing order of expected value pi vi (GE) and value vi (GV). We also plot two upper bounds on the value obtained by an optimal algorithm: LP, the value obtained by a natural LP relaxation (similar to (3)), and inf, the optimal algorithm with t = (sort the candidates by decreasing value and make offers until k candidates accept).

Figures 3(a), 3(b), and 3(c) demonstrate the performance of our algorithms on the three data sets with negative correlation, positive correlation, and no correlation respectively. Beyond the theoretical guarantees, seqalg performs well empirically and dominates the greedy baselines, especially in the more natural setting where values and probabilities are negatively correlated. seqalg is in general quite close to the LP upper bound much closer than the theoretical guarantee of 2. Thus, the LP is a fairly tight upper bound on the maximum value achievable by probing.

Even for moderately small values of t, seqalg outperforms paralg, despite the fact that it makes 1 offer per time step when paralg makes multiple offers at a time. Moreover, parheur almost always outperforms paralg. The relatively poor performance of paralg is to be expected. Recall that paralg takes the solution tree to seqalg with kt offers and probes candidates on the segments of a random path down this tree. By construction, candidates on this path are sorted by value, so high-value candidates are concentrated in a small number of segments. paralg can only select at most one candidate per segment, so it must ignore some highvalue candidates. In contrast, parheur partitions the candidates randomly, making it more likely that each set in the partition contains high-value candidates.

6.1. Knapsack setting

We also provide simulated results for a slightly modiﬁed version of the 10-approximation algorithm (approx) in the knapsack setting, where instead of calculating the lower bound m L on the greedy strategy as in Procedure 3, we estimate the true expected value m L by simulating runs of

Figure 4. Experimental results for knapsack setting

the greedy branch of the algorithm. We compare the LP relaxation (LP) to our algorithm. In addition, we compare to a natural greedy baseline (greedy), which probes items in decreasing order of pivi/si. We sample si from a truncated Pareto distribution on [0, 1], and sample vi [0, 1] from a Beta distribution positively correlated with si. We use si so that expected vi s exhibit diminishing returns in si. We choose pi Uniform[0, 1] and set a budget of B = 1.

As the results in Figure 4 show, our algorithm performs roughly as well as greedy, but it does better for very small values of t.

7. Conclusions

With the increased use of data-driven techniques in hiring, predictions for employment outcomes are becoming increasingly accurate. Leveraging these predictions can be non-trivial, leading to the family of stochastic optimization problems we have considered here. As we have shown, imposing a ﬁnite number of offers can lead to highly complex solutions; however, by imposing an intuitive structure on the solution space, we are able to derive approximation algorithms that perform well, both theoretically and in practice.

Hiring Under Uncertainty

Adamczyk, M., Grandoni, F., and Mukherjee, J. Improved approximation algorithms for stochastic matching. In Algorithms-ESA 2015, pp. 1 12. Springer, 2015.

Ano, K. and Ando, M. A note on Bruss stopping problem with random availability. Lecture Notes-Monograph Series, pp. 71 82, 2000.

Asadpour, A., Nazerzadeh, H., and Saberi, A. Stochastic submodular maximization. In International Workshop on Internet and Network Economics, pp. 477 489. Springer, 2008.

Bansal, N., Gupta, A., Li, J., Mestre, J., Nagarajan, V., and Rudra, A. When lp is the cure for your matching woes: Improved bounds for stochastic matchings. In European Symposium on Algorithms, pp. 218 229. Springer, 2010.

Baveja, A., Chavan, A., Nikiforov, A., Srinivasan, A., and Xu, P. Improved bounds in stochastic matching and optimization. Algorithmica, 80(11):3225 3252, 2018.

Bhalgat, A., Goel, A., and Khanna, S. Improved approximation results for stochastic knapsack problems. In Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms, pp. 1647 1665. SIAM, 2011.

Carmichael, S. G. Hiring c-suite executives by algorithm. Harvard Business Review, 2015.

Chalﬁn, A., Danieli, O., Hillis, A., Jelveh, Z., Luca, M., Ludwig, J., and Mullainathan, S. Productivity and selection of human capital with machine learning. American Economic Review, 106(5):124 27, 2016.

Chow, Y., Moriguti, S., Robbins, H., and Samuels, S. Optimal selection based on relative rank (the secretary problem). Israel Journal of mathematics, 2(2):81 90, 1964.

Dean, B. C., Goemans, M. X., and Vondrd ak, J. Approximating the stochastic knapsack problem: The beneﬁt of adaptivity. In Foundations of Computer Science, 2004. Proceedings. 45th Annual IEEE Symposium on, pp. 208 217. IEEE, 2004.

Dean, B. C., Goemans, M. X., and Vondr ak, J. Adaptivity and approximation for stochastic packing problems. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 395 404. Society for Industrial and Applied Mathematics, 2005.

Dobbie, W. Teacher characteristics and student achievement: Evidence from teach for america. Unpublished manuscript, Harvard University, 2011.

Dynkin, E. B. The optimum choice of the instant for stopping a markov process. Soviet Mathematics, 4:627 629, 1963.

Gupta, A., Krishnaswamy, R., Molinaro, M., and Ravi, R. Approximation algorithms for correlated knapsacks and non-martingale bandits. In Foundations of Computer Science (FOCS), 2011 IEEE 52nd Annual Symposium on, pp. 827 836. IEEE, 2011.

Gupta, A., Nagarajan, V., and Singla, S. Adaptivity gaps for stochastic probing: Submodular and XOS functions. In Proceedings of the Twenty-Eighth Annual ACMSIAM Symposium on Discrete Algorithms, pp. 1688 1702. SIAM, 2017.

Jacob, B. A., Rockoff, J. E., Taylor, E. S., Lindy, B., and Rosen, R. Teacher applicant hiring and teacher performance: Evidence from dc public schools. Journal of Public Economics, 166:81 97, 2018.

Kane, T. J. and Staiger, D. O. Estimating teacher impacts on student achievement: An experimental evaluation. Technical report, National Bureau of Economic Research, 2008.

Kleinberg, R. A multiple-choice secretary algorithm with applications to online auctions. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 630 631. Society for Industrial and Applied Mathematics, 2005.

Kokkodis, M., Papadimitriou, P., and Ipeirotis, P. G. Hiring behavior models for online labor markets. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 223 232. ACM, 2015.

Ma, W. Improvements and generalizations of stochastic knapsack and multi-armed bandit approximation algorithms. In Proceedings of the twenty-ﬁfth annual ACMSIAM symposium on Discrete algorithms, pp. 1154 1163. Society for Industrial and Applied Mathematics, 2014.

Miller, C. C. Can an algorithm hire better than a human. The New York Times, 25, 2015.

Mullainathan, S. and Spiess, J. Machine learning: an applied econometric approach. Journal of Economic Perspectives, 31(2):87 106, 2017.

Smith, M. A secretary problem with uncertain employment. Journal of applied probability, 12(3):620 624, 1975.

Tamaki, M. A secretary problem with uncertain employment and best choice of available candidates. Operations Research, 39(2):274 284, 1991.

Tamaki, M. Minimal expected ranks for the secretary problems with uncertain selection. Lecture Notes-Monograph Series, pp. 127 139, 2000.