# on_elicitation_complexity__61dd2ebb.pdf

On Elicitation Complexity

Rafael Frongillo University of Colorado, Boulder raf@colorado.edu

Ian A. Kash Microsoft Research iankash@microsoft.com

Elicitation is the study of statistics or properties which are computable via empirical risk minimization. While several recent papers have approached the general question of which properties are elicitable, we suggest that this is the wrong question all properties are elicitable by ﬁrst eliciting the entire distribution or data set, and thus the important question is how elicitable. Speciﬁcally, what is the minimum number of regression parameters needed to compute the property? Building on previous work, we introduce a new notion of elicitation complexity and lay the foundations for a calculus of elicitation. We establish several general results and techniques for proving upper and lower bounds on elicitation complexity. These results provide tight bounds for eliciting the Bayes risk of any loss, a large class of properties which includes spectral risk measures and several new properties of interest.

1 Introduction

Empirical risk minimization (ERM) is a domininant framework for supervised machine learning, and a key component of many learning algorithms. A statistic or property is simply a functional assigning a vector of values to each distribution. We say that such a property is elicitable, if for some loss function it can be represented as the unique minimizer of the expected loss under the distribution. Thus, the study of which properties are elicitable can be viewed as the study of which statistics are computable via ERM [1, 2, 3].

The study of property elicitation began in statistics [4, 5, 6, 7], and is gaining momentum in machine learning [8, 1, 2, 3], economics [9, 10], and most recently, ﬁnance [11, 12, 13, 14, 15]. A sequence of papers starting with Savage [4] has looked at the full characterization of losses which elicit the mean of a distribution, or more generally the expectation of a vector-valued random variable [16, 3]. The case of real-valued properties is also now well in hand [9, 1]. The general vector-valued case is still generally open, with recent progress in [3, 2, 15]. Recently, a parallel thread of research has been underway in ﬁnance, to understand which ﬁnancial risk measures, among several in use or proposed to help regulate the risks of ﬁnancial institutions, are computable via regression, i.e., elicitable (cf. references above). More often than not, these papers have concluded that most risk measures under consideration are not elicitable, notable exceptions being generalized quantiles (e.g. value-at-risk, expectiles) and expected utility [13, 12].

Throughout the growing momentum of the study of elicitation, one question has been central: which properties are elicitable? It is clear, however, that all properties are indirectly elicitable if one ﬁrst elicits the distribution using a standard proper scoring rule. Therefore, in the present work, we suggest replacing this question with a more nuanced one: how elicitable are various properties? Speciﬁcally, heeding the suggestion of Gneiting [7], we adapt to our setting the notion of elicitation complexity introduced by Lambert et al. [17], which captures how many parameters one needs to maintain in an ERM procedure for the property in question. Indeed, if a real-valued property is found not to be elicitable, such as the variance, one should not abandon it, but rather ask how many parameters are required to compute it via ERM.

Our work is heavily inspired by the recent progress along these lines of Fissler and Ziegel [15], who show that spectral risk measures of support k have elicitation complexity at most k + 1. Spectral risk measures are among those under consideration in the ﬁnance community, and this result shows that while not elicitable in the classical sense, their elicitation complexity is still low, and hence one can develop reasonable regression procedures for them. Our results extend to these and many other risk measures (see 4.6), often providing matching lower bounds on the complexity as well.

Our contributions are the following. We ﬁrst introduce an adapted deﬁnition of elicitation complexity which we believe to be the right notion to focus on going forward. We establish a few simple but useful results which allow for a kind of calculus of elicitation; for example, conditions under which the complexity of eliciting two properties in tandem is the sum of their individual complexities. In 3, we derive several techniques for proving both upper and lower bounds on elicitation complexity which apply primarily to the Bayes risks from decision theory, or optimal expected loss functions. The class includes spectral risk measures among several others; see 4. We conclude with brief remarks and open questions.

2 Preliminaries and Foundation

Let Ωbe a set of outcomes and P (Ω) be a convex set of probability measures. The goal of elicitation is to learn something about the distribution p P, speciﬁcally some function Γ(p) such as the mean or variance, by minimizing a loss function.

Deﬁnition 1. A property is a function Γ : P Rk, for some k N, which associates a desired report value to each distribution.1 We let Γr .= {p P | r = Γ(p)} denote the set of distributions p corresponding to report value r.

Given a property Γ, we want to ensure that the best result is to reveal the value of the property using a loss function that evaluates the report using a sample from the distribution.

Deﬁnition 2. A loss function L : Rk Ω R elicits a property Γ : P Rk if for all p P, Γ(p) = arginfr L(r, p), where L(r, p) .= Ep[L(r, )]. A property is elicitable if some loss elicits it.

For example, when Ω= R, the mean Γ(p) = Ep[ω] is elicitable via squared loss L(r, ω) = (r ω)2.

A well-known necessary condition for elicitability is convexity of the level sets of Γ.

Proposition 1 (Osband [5]). If Γ is elicitable, the level sets Γr are convex for all r Γ(P). One can easily check that the mean Γ(p) = Ep[ω] has convex level sets, yet the variance Γ(p) = Ep[(ω Ep[ω])2] does not, and hence is not elicitable [9].

It is often useful to work with a stronger condition, that not only is Γr convex, but it is the intersection of a linear subspace with P. This condition is equivalent the existence of an identiﬁcation function, a functional describing the level sets of Γ [17, 1].

Deﬁnition 3. A function V : R Ω Rk is an identiﬁcation function for Γ : P Rk, or identiﬁes Γ, if for all r Γ(P) it holds that p Γr V (r, p) = 0 Rk, where as with L(r, p) above we write V (r, p) .= Ep[V (r, ω)]. Γ is identiﬁable if there exists a V identifying it. One can check for example that V (r, ω) = ω r identiﬁes the mean.

We can now deﬁne the classes of identiﬁable and elicitable properties, along with the complexity of identifying or eliciting a given property. Naturally, a property is k-identiﬁable if it is the link of a k-dimensional identiﬁable property, and k-elicitable if it is the link of a k-dimensional elicitable property. The elicitation complexity of a property is then simply the minimum dimension k needed for it to be k-elicitable.

Deﬁnition 4. Let Ik(P) denote the class of all identiﬁable properties Γ : P Rk, and Ek(P) denote the class of all elicitable properties Γ : P Rk. We write I(P) = S

k N Ik(P) and E(P) = S

Deﬁnition 5. A property Γ is k-identiﬁable if there exists ˆΓ Ik(P) and f such that Γ = f ˆΓ. The identiﬁcation complexity of Γ is deﬁned as iden(Γ) = min{k : Γ is k-identiﬁable}.

1We will also consider Γ : P RN.

Deﬁnition 6. A property Γ is k-elicitable if there exists ˆΓ Ek(P) and f such that Γ = f ˆΓ. The elicitation complexity of Γ is deﬁned as elic(Γ) = min{k : Γ is k-elicitable}.

To make the above deﬁnitions concrete, recall that the variance σ2(p) = Ep[(Ep[ω] ω)2] is not elicitable, as its level sets are not convex, a necessary condition by Prop. 1. Note however that we may write σ2(p) = Ep[ω2] Ep[ω]2, which can be obtained from the property ˆΓ(p) = (Ep[ω], Ep[ω2]). It is well-known [4, 7] that ˆΓ is both elicitable and identiﬁable as the expectation of a vector-valued random variable X(ω) = (ω, ω2), using for example L(r, ω) = r X(ω) 2 and V (r, ω) = r X(ω). Thus, we can recover σ2 as a link of the elicitable and identiﬁable ˆΓ : P R2, and as no such ˆΓ : P R exists, we have iden(σ2) = elic(σ2) = 2.

In this example, the variance has a stronger property than merely being 2-identiﬁable and 2elicitable, namely that there is a single ˆΓ that satisﬁes both of these simultaneously. In fact this is quite common, and identiﬁability provides geometric structure that we make use of in our lower bounds. Thus, most of our results use this reﬁned notion of elicitation complexity.

Deﬁnition 7. A property Γ has (identiﬁable) elicitation complexity elic I(Γ) = min{k : ˆΓ, f such that ˆΓ Ek(P) Ik(P) and Γ = f ˆΓ}.

Note that restricting our attention to elic I effectively requires elic I(Γ) iden(Γ); speciﬁcally, if Γ is derived from some elicitable ˆΓ, then ˆΓ must be identiﬁable as well. This restriction is only relevant for our lower bounds, as our upper bounds give losses explicitly.2 Note however that some restriction on Ek(P) is necessary, as otherwise pathological constructions giving injective mappings from R to Rk would render all properties 1-elicitable. To alleviate this issue, some authors require continuity (e.g. [1]) while others like we do require identiﬁability (e.g. [15]), which can be motivated by the fact that for any differentiable loss L for Γ, V (r, ω) = r L( , ω) will identify Γ provided Ep[L] has no inﬂection points or local minima. An important future direction is to relax this identiﬁability assumption, as there are very natural (set-valued) properties with iden > elic.3

Our deﬁnition of elicitation complexity differs from the notion proposed by Lambert et al. [17], in that the components of ˆΓ above do not need to be individually elicitable. This turns out to have a large impact, as under their deﬁnition the property Γ(p) = maxω Ωp({ω}) for ﬁnite Ωhas elicitation complexity |Ω| 1, whereas under our deﬁnition elic I(Γ) = 2; see Example 4.3. Fissler and Ziegel [15] propose a closer but still different deﬁnition, with the complexity being the smallest k such that Γ is a component of a k-dimensional elicitable property. Again, this deﬁnition can lead to larger complexities than necessary; take for example the squared mean Γ(p) = Ep[ω]2 when Ω= R, which has elic I(Γ) = 1 with ˆΓ(p) = Ep[ω] and f(x) = x2, but is not elicitable and thus has complexity 2 under [15]. We believe that, modulo regularity assumptions on Ek(P), our deﬁnition is better suited to studying the difﬁculty of eliciting properties: viewing f as a (potentially dimensionreducing) link function, our deﬁnition captures the minimum number of parameters needed in an ERM computation of the property in question, followed by a simple one-time application of f.

2.1 Foundations of Elicitation Complexity

In the remainder of this section, we make some simple, but useful, observations about iden(Γ) and elic I(Γ). We have already discussed one such observation after Deﬁnition 7: elic I(Γ) iden(Γ).

It is natural to start with some trivial upper bounds. Clearly, whenever p P can be uniquely determined by some number of elicitable parameters then the elicitation complexity of every property is at most that number. The following propositions give two notable applications of this observation.4

Proposition 2. When |Ω| = n, every property Γ has elic I(Γ) n 1.

Proof. The probability distribution is determined by the probability of any n 1 outcomes, and the probability associated with a given outcome is both elicitable and identiﬁable.

2Our main lower bound (Thm 2) merely requires Γ to have convex level sets, which is necessary by Prop. 1. 3One may take for example Γ(p) = argmaxi p(Ai) for a ﬁnite measurable partition A1, . . . , An of Ω. 4Note that these restrictions on Ωmay easily be placed on P instead; e.g. ﬁnite Ωis equivalent to P having support on a ﬁnite subset of Ω, or even being piecewise constant on some disjoint events.

Proposition 3. When Ω= R,5 every property Γ has elic I(Γ) ω (countable).6

One well-studied class of properties are those where Γ is linear, i.e., the expectation of some vectorvalued random variable. All such properties are elicitable and identiﬁable (cf. [4, 8, 3]), with elic I(Γ) k, but of course the complexity can be lower if the range of Γ is not full-dimensional. Lemma 1. Let X : Ω Rk be P-integrable and Γ(p) = Ep[X]. Then elic I(Γ) = dim(aﬀhull(Γ(P))), the dimension of the afﬁne hull of the range of Γ.

It is easy to create redundant properties in various ways. For example, given elicitable properties Γ1 and Γ2 the property Γ .= {Γ1, Γ2, Γ1 + Γ2} clearly contains redundant information. A concrete case is Γ = {mean squared, variance, 2nd moment}, which, as we have seen, has elic I(Γ) = 2. The following deﬁnitions and lemma capture various aspects of a lack of such redundancy. Deﬁnition 8. Property Γ : P Rk in I(P) is of full rank if iden(Γ) = k.

Note that there are two ways for a property to fail to be full rank. First, as the examples above suggest, Γ can be redundant so that it is a link of a lower-dimensional identiﬁable property. Full rank can also be violated if more dimensions are needed to identify the property than to specify it. This is the case with, e.g., the variance which is a 1 dimensional property but has iden(σ2) = 2. Deﬁnition 9. Properties Γ, Γ I(P) are independent if iden({Γ, Γ }) = iden(Γ) + iden(Γ ). Lemma 2. If Γ, Γ E(P) are full rank and independent, then elic I({Γ, Γ }) = elic I(Γ)+elic I(Γ ).

To illustrate the lemma, elic I(variance) = 2, yet Γ = {mean,variance} has elic I(Γ) = 2, so clearly the mean and variance are not both independent and full rank. (As we have seen, variance is not full rank.) However, the mean and second moment satisfy both by Lemma 1.

Another important case is when Γ consists of some number of distinct quantiles. Osband [5] essentially showed that quantiles are independent and of full rank, so their elicitation complexity is the number of quantiles being elicited. Lemma 3. Let Ω= R and P be a class of probability measures with continuously differentiable and invertible CDFs F, which is sufﬁciently rich in the sense that for all x1, . . . , xk R, span({F 1(x1), . . . , F 1(xk)}, F P) = Rk. Let qα, denote the α-quantile function. Then if α1, . . . , αk are all distinct, Γ = {qα1, . . . , qαk} has elic I(Γ) = k.

The quantile example in particular allows us to see that all complexity classes, including ω, are occupied. In fact, our results to follow will show something stronger: even for real-valued properties Γ : P R, all classes are occupied; we give here the result that follows from our bounds on spectral risk measures in Example 4.4, but this holds for many other P; see e.g. Example 4.2. Proposition 4. Let P as in Lemma 3. Then for all k N there exists γ : P R with elic I(γ) = k.

3 Eliciting the Bayes Risk

In this section we prove two theorems that provide our main tools for proving upper and lower bounds respectively on elicitation complexity. Of course many properties are known to be elicitable, and the losses that elicit them provide such an upper bound for that case. We provide such a construction for properties that can be expressed as the pointwise minimum of an indexed set of functions. Interestingly, our construction does not elicit the minimum directly, but as a joint elicitation of the value and the function that realizes this value. The form (1) is that of a scoring rule for the linear property p 7 Ep[Xa], except that here the index a itself is also elicited.7

Theorem 1. Let {Xa : Ω R}a A be a set of P-integrable functions indexed by A Rk. Then if infa Ep[Xa] is attained, the property γ(p) = mina Ep[Xa] is (k + 1)-elicitable. In particular,

L((r, a), ω) = H(r) + h(r)(Xa r) (1)

elicits p 7 {(γ(p), a) : Ep[Xa]=γ(p)} for any strictly decreasing h : R R+ with d dr H = h.

5Here and throughout, when Ω= Rk we assume the Borel σ-algebra. 6Omitted proofs can be found in the appendix of the full version of this paper. 7As we focus on elicitation complexity, we have not tried to characterize all ways to elicit this joint property, or other properties we give explicit losses for. See 4.1 for an example where additional losses are possible.

Proof. We will work with gains instead of losses, and show that S((r, a), ω) = g(r) + dgr(Xa r) elicits p 7 {(γ(p), a) : Ep[Xa] = γ(p)} for γ(p) = maxa Ep[Xa]. Here g is convex with strictly increasing and positive subgradient dg.

For any ﬁxed a, we have by the subgradient inequality,

S((r, a), p) = g(r) + dgr(Ep[Xa] r) g(Ep[Xa]) = S((Ep[Xa], a), p) ,

and as dg is strictly increasing, g is strictly convex, so r = Ep[Xa] is the unique maximizer. Now letting S(a, p) = S((Ep[Xa], a), p), we have

argmax a A S(a, p) = argmax a A g(Ep[Xa]) = argmax a A Ep[Xa] ,

because g is strictly increasing. We now have

argmax a A,r R S((r, a), p) = (Ep[Xa], a) : a argmax a A Ep[Xa] .

One natural way to get such an indexed set of functions is to take an arbitrary loss function L(r, ω), in which case this pointwise minimum corresponds to the Bayes risk, which is simply the minimum possible expected loss under some distribution p.

Deﬁnition 10. Given loss function L : A Ω R on some prediction set A, the Bayes risk of L is deﬁned as L(p) := infa A L(a, p).

One illustration of the power of Theorem 1 is that the Bayes risk of a loss eliciting a k-dimensional property is itself (k + 1)-elicitable.

Corollary 1. If L : Rk Ω R is a loss function eliciting Γ : P Rk, then the loss

L((r, a), ω) = L (a, ω) + H(r) + h(r)(L(a, ω) r) (2)

elicits {L, Γ}, where h : R R+ is any positive strictly decreasing function, H(r) = R r 0 h(x)dx, and L is any surrogate loss eliciting Γ.8 If Γ Ik(P), elic I(L) k + 1.

We now turn to our second theorem which provides lower bounds for the elicitation complexity of the Bayes risk. A ﬁrst observation, which follows from standard convex analysis, is that L is concave, and thus it is unlikely to be elicitable directly, as the level sets of L are likely to be nonconvex. To show a lower bound greater than 1, however, we will need much stronger techniques. In particular, while L must be concave, it may not be strictly so, thus enabling level sets which are potentially amenable to elicitation. In fact, L must be ﬂat between any two distributions which share a minimizer. Crucial to our lower bound is the fact that whenever the minimizer of L differs between two distributions, L is essentially strictly concave between them.

Lemma 4. Suppose loss L with Bayes risk L elicits Γ : P Rk. Then for any p, p P with Γ(p) = Γ(p ), we have L(λp + (1 λ)p ) > λL(p) + (1 λ)L(p ) for all λ (0, 1).

With this lemma in hand we can prove our lower bound. The crucial insight is that an identiﬁcation function for the Bayes risk of a loss eliciting a property can, through a link, be used to identify that property. Corollary 1 tells us that k + 1 parameters sufﬁce for the Bayes risk of a k-dimensional property, and our lower bound shows this is often necessary. Only k parameters sufﬁce, however, when the property value itself provides all the information required to compute the Bayes risk; for example, dropping the y2 term from squared loss gives L(x, y) = x2 2xy and L(p) = Ep[y]2, giving elic(L) = 1. Thus the theorem splits the lower bound into two cases.

Theorem 2. If a loss L elicits some Γ Ek(P) with elicitation complexity elic I(Γ) = k, then its Bayes risk L has elic I(L) k. Moreover, if we can write L = f Γ for some function f : Rk R, then we have elic I(L) = k; otherwise, elic I(L) = k + 1.

Proof. Let ˆΓ Eℓsuch that L = g ˆΓ for some g : Rℓ R.

8Note that one could easily lift the requirement that Γ be a function, and allow Γ(p) to be the set of minimizers of the loss (cf. [18]). We will use this additional power in Example 4.4.

We show by contradiction that for all p, p P, ˆΓ(p) = ˆΓ(p ) implies Γ(p) = Γ(p ). Otherwise, we have p, p with ˆΓ(p) = ˆΓ(p ), and thus L(p) = L(p ), but Γ(p) = Γ(p ). Lemma 4 would then give us some pλ = λp + (1 λ)p with L(pλ) > L(p). But as the level sets ˆΓˆr are convex by Prop. 1, we would have ˆΓ(pλ) = ˆΓ(p), which would imply L(pλ) = L(p).

We now can conclude that there exists h : Rℓ Rk such that Γ = h ˆΓ. But as ˆΓ Eℓ, this implies elic I(Γ) ℓ, so clearly we need ℓ k. Finally, if ℓ= k we have L = g ˆΓ = g h 1 Γ. The upper bounds follow from Corollary 1.

4 Examples and Applications

We now give several applications of our results. Several upper bounds are novel, as well as all lower bounds greater than 2. In the examples, unless we refer to Ωexplicitly we will assume Ω= R and write y Ωso that y p. In each setting, we also make several standard regularity assumptions which we suppress for ease of exposition for example, for the variance and variantile we assume ﬁnite ﬁrst and second moments (which must span R2), and whenever we discuss quantiles we will assume that P is as in Lemma 3, though we will not require as much regularity for our upper bounds.

4.1 Variance

In Section 2 we showed that elic I(σ2) = 2. As a warm up, let us see how to recover this statement using our results on the Bayes risk. We can view σ2 as the Bayes risk of squared loss L(x, y) = (x y)2, which of course elicits the mean: L(p) = minx R Ep[(x y)2] = Ep[(Ep[y] y)2] = σ2(p). This gives us elic I(σ2) 2 by Corollary 1, with a matching lower bound by Theorem 2, as the variance is not simply a function of the mean. Corollary 1 gives losses such as L((x, v), y) = e v((x y)2 v) e v which elict {Ep[y], σ2(p)}, but in fact there are losses which cannot be represented by the form (2), showing that we do not have a full characterization; for example, ˆL((x, v), y) = v2 + v(x y)(2(x + y) + 1) + (x y)2 (x + y)2 + x + y + 1 . This ˆL was

generated via squared loss z h y y2 i 2 with respect to the norm z 2 = z h 1 1/2 1/2 1 i z, which

elicits the ﬁrst two moments, and link function (z1, z2) 7 (z1, z2 z2 1).

4.2 Convex Functions of Means

Another simple example is γ(p) = G(Ep[X]) for some strictly convex function G : Rk R and P-integrable X : Ω Rk. To avoid degeneracies, we assume dim aﬀhull{Ep[X] : p P} = k, i.e. Γ is full rank. Letting {d Gp}p P be a selection of subgradients of G, the loss L(r, ω) = (G(r) + d Gr(X(ω) r)) elicits Γ : p 7 Ep[X] (cf. [3]), and moreover we have γ(p) = L(p). By Lemma 1, elic I(Γ) = k. One easily checks that L = G Γ, so now by Theorem 2, elic I(γ) = k as well. Letting {Xk}k N be a family of such full rank random variables, this gives us a sequence of real-valued properties γk(p) = Ep[X] 2 with elic I(γk) = k, proving Proposition 4.

4.3 Modal Mass

With Ω= R consider the property γβ(p) = maxx R p([x β, x + β]), namely, the maximum probability mass contained in an interval of width 2β. Theorem 1 easily shows elic I(γβ) 2, as ˆγβ(p) = argmaxx R p([x β, x + β]) is elicited by L(x, y) = 1|x y|>β, and γβ(p) = 1 L(p). Similarly, in the case of ﬁnite Ω, γ(p) = maxω Ωp({ω}) is simply the expected score (gain rather than loss) of the mode γ(p) = argmaxω Ωp({ω}), which is elicitable for ﬁnite Ω(but not otherwise; see Heinrich [19]).

In both cases, one can easily check that the level sets of γ are not convex, so elic I(γ) = 2; alternatively Theorem 2 applies in the ﬁrst case. As mentioned following Deﬁnition 6, the result for ﬁnite Ωdiffers from the deﬁnitions of Lambert et al. [17], where the elicitation complexity of γ is |Ω| 1.

4.4 Expected Shortfall and Other Spectral Risk Measures

One important application of our results on the elicitation complexity of the Bayes risk is the elicitability of various ﬁnancial risk measures. One of the most popular ﬁnancial risk measures is expected shortfall ESα : P R, also called conditional value at risk (CVa R) or average value at risk (AVa R), which we deﬁne as follows (cf. [20, eq.(18)], [21, eq.(3.21)]):

ESα(p) = inf z R Ep 1

α(z y)1z y z = inf z R Ep 1

α(z y)(1z y α) y . (3)

Despite the importance of elicitability to ﬁnancial regulation [11, 22], ESα is not elicitable [7]. It was recently shown by Fissler and Ziegel [15], however, that elic I(ESα) = 2. They also consider the broader class of spectral risk measures, which can be represented as ρµ(p) = R

[0,1] ESα(p)dµ(α), where µ is a probability measure on [0, 1] (cf. [20, eq. (36)]). In the case where µ has ﬁnite support µ = Pk i=1 βiδαi for point distributions δ, βi > 0, we can rewrite ρµ using the above as:

i=1 βi ESαi(p) = inf z Rk

βi αi (zi y)(1zi y αi) y

They conclude elic I(ρµ) k + 1 unless µ({1}) = 1 in which case elic I(ρµ) = 1. We show how to recover these results together with matching lower bounds. It is well-known that the inﬁmum in eq. (4) is attained by any of the k quantiles in qα1(p), . . . , qαk(p), so we conclude elic I(ρµ) k +1 by Theorem 1, and in particular the property {ρµ, qα1, . . . , qαk} is elicitable. The family of losses from Corollary 1 coincide with the characterization of Fissler and Ziegel [15] (see D.1). For a lower bound, as elic I({qα1, . . . , qαk}) = k whenever the αi are distinct by Lemma 3, Theorem 2 gives us elic I(ρµ) = k + 1 whenever µ({1}) < 1, and of course elic I(ρµ) = 1 if µ({1}) = 1.

4.5 Variantile

The τ-expectile, a type of generalized quantile introduced by Newey and Powell [23], is deﬁned as the solution x = µτ to the equation Ep [|1x y τ|(x y)] = 0. (This also shows µτ I1.) Here we propose the τ-variantile, an asymmetric variance-like measure with respect to the τ-expectile: just as the mean is the solution x = µ to the equation Ep[x y] = 0, and the variance is σ2(p) = Ep[(µ y)2], we deﬁne the τ-variantile σ2 τ by σ2 τ(p) = Ep |1µτ y τ|(µτ y)2 .

It is well-known that µτ can be expressed as the minimizer of a asymmetric least squares problem: the loss L(x, y) = |1x y τ|(x y)2 elicits µτ [23, 7]. Hence, just as the variance turned out to be a Bayes risk for the mean, so is the τ-variantile for the τ-expectile:

µτ = argmin x R Ep |1x y τ|(x y)2 = σ2 τ = min x R Ep |1x y τ|(x y)2 .

We now see the pair {µτ, σ2 τ} is elicitable by Corollary 1, and by Theorem 2 we have elic I(σ2 τ) = 2.

4.6 Deviation and Risk Measures

Rockafellar and Uryasev [21] introduce risk quadrangles in which they relate a risk R, deviation D, error E, and a statistic S, all functions from random variables to the reals, as follows:

R(X) = min C {C + E(X C)}, D(X) = min C {E(X C)}, S(X) = argmin C {E(X C)} .

Our results provide tight bounds for many of the risk and deviation measures in their paper. The most immediate case is the expectation quadrangle case, where E(X) = E[e(X)] for some e : R R. In this case, if S(X) I1(P) Theorem 2 implies elic I(R) = elic I(D) = 2 provided S is nonconstant and e non-linear. This includes several of their examples, e.g. truncated mean, log-exp, and rate-based. Beyond the expectation case, the authors show a Mixing Theorem, where they consider

D(X) = min C min B1,..,Bk

i=1 λi Ei(X C Bi) X

i λi Bi = 0

= min B 1,..,B k

i=1 λi Ei(X B i)

Once again, if the Ei are all of expectation type and Si I1, Theorem 1 gives elic I(D) = elic I(R) k + 1, with a matching lower bound from Theorem 2 provided the Si are all independent. The Reverting Theorem for a pair E1, E2 can be seen as a special case of the above where

one replaces E2(X) by E2( X). Consequently, we have tight bounds for the elicitation complexity of several other examples, including superquantiles (the same as spectral risk measures), the quantile-radius quadrangle, and optimized certainty equivalents of Ben-Tal and Teboulle [24].

Our results offer an explaination for the existence of regression procedures for some of these risk/deviation measures. For example, a proceedure called superquantile regression was introduced in Rockafellar et al. [25], which computes spectral risk measures. In light of Theorem 1, one could interpret their procedure as simply performing regression on the k different quantiles as well as the Bayes risk. In fact, our results show that any risk/deviation generated by mixing several expectation quadrangles will have a similar procedure, in which the B i variables are simply computed along side the measure of interest. Even more broadly, such regression procedures exist for any Bayes risk.

5 Discussion

We have outlined a theory of elicitation complexity which we believe is the right notion of complexity for ERM, and provided techniques and results for upper and lower bounds. In particular, we now have tight bounds for the large class of Bayes risks, including several applications of note such as spectral risk measures. Our results also offer an explanation for why procedures like superquantile regression are possible, and extend this logic to all Bayes risks.

There many natural open problems in elicitation complexity. Perhaps the most apparent are the characterizations of the complexity classes {Γ : elic(Γ) = k}, and in particular, determining the elicitation complexity of properties which are known to be non-elicitabile, such as the mode [19] and smallest conﬁdence interval [18].

In this paper we have focused on elicitation complexity with respect to the class of identiﬁable properties I, which we denoted elic I. This choice of notation was deliberate; one may deﬁne elic C := min{k : ˆΓ Ek C, f, Γ = f ˆΓ} to be the complexity with respect to some arbitrary class of properties C. Some examples of interest might be elic E for expected values, of interest to the prediction market literature [8], and eliccvx for properties elicitable by a loss which is convex in r, of interest for efﬁciently performing ERM.

Another interesting line of questioning follows from the notion of conditional elicitation, properties which are elicitable as long as the value of some other elicitable property is known. This notion was introduced by Emmer et al. [11], who showed that the variance and expected shortfall are both conditionally elicitable, on Ep[y] and qα(p) respectively. Intuitively, knowing that Γ is elicitable conditional on an elicitable Γ would suggest that perhaps the pair {Γ, Γ } is elicitable; Fissler and Ziegel [15] note that it is an open question whether this joint elicitability holds in general. The Bayes risk L for Γ is elicitable conditioned on Γ, and as we saw above, the pair {Γ, L} is jointly elicitable as well. We give a counter-example in Figure 1, however, which also illustrates the subtlety of characterizing all elicitable properties.

Figure 1: Depictions of the level sets of two properties, one elicitable and the other not. The left is a Bayes risk together with its property, and thus elicitable, while the right is shown in [3] not to be elicitable. Here the planes are shown to illustrate the fact that these are both conditionally elicitable: the height of the plane (the intersept (p3, 0, 0) for example) is elicitable from the characterizations for scalar properties [9, 1], and conditioned on the plane, the properties are both linear and thus links of expected values, which are also elicitable.

[1] Ingo Steinwart, Chlo Pasin, Robert Williamson, and Siyu Zhang. Elicitation and Identiﬁcation of Properties. In Proceedings of The 27th Conference on Learning Theory, pages 482 526, 2014.

[2] A. Agarwal and S. Agrawal. On Consistent Surrogate Risk Minimization and Property Elicitation. In COLT, 2015.

[3] Rafael Frongillo and Ian Kash. Vector-Valued Property Elicitation. In Proceedings of the 28th Conference on Learning Theory, pages 1 18, 2015.

[4] L.J. Savage. Elicitation of personal probabilities and expectations. Journal of the American Statistical Association, pages 783 801, 1971.

[5] Kent Harold Osband. Providing Incentives for Better Cost Forecasting. University of California, Berkeley, 1985.

[6] T. Gneiting and A.E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359 378, 2007.

[7] T. Gneiting. Making and Evaluating Point Forecasts. Journal of the American Statistical Association, 106(494):746 762, 2011.

[8] J. Abernethy and R. Frongillo. A characterization of scoring rules for linear properties. In Proceedings of the 25th Conference on Learning Theory, pages 1 27, 2012.

[9] N.S. Lambert. Elicitation and Evaluation of Statistical Forecasts. Preprint, 2011.

[10] N.S. Lambert and Y. Shoham. Eliciting truthful answers to multiple-choice questions. In Proceedings of the 10th ACM conference on Electronic commerce, pages 109 118, 2009.

[11] Susanne Emmer, Marie Kratz, and Dirk Tasche. What is the best risk measure in practice? A comparison of standard measures. ar Xiv:1312.1645 [q-ﬁn], December 2013. ar Xiv: 1312.1645.

[12] Fabio Bellini and Valeria Bignozzi. Elicitable risk measures. This is a preprint of an article accepted for publication in Quantitative Finance (doi 10.1080/14697688.2014. 946955), 2013.

[13] Johanna F. Ziegel. Coherence and elicitability. Mathematical Finance, 2014. ar Xiv: 1303.1690.

[14] Ruodu Wang and Johanna F. Ziegel. Elicitable distortion risk measures: A concise proof. Statistics & Probability Letters, 100:172 175, May 2015.

[15] Tobias Fissler and Johanna F. Ziegel. Higher order elicitability and Osband s principle. ar Xiv:1503.08123 [math, q-ﬁn, stat], March 2015. ar Xiv: 1503.08123.

[16] A. Banerjee, X. Guo, and H. Wang. On the optimality of conditional expectation as a Bregman predictor. IEEE Transactions on Information Theory, 51(7):2664 2669, July 2005.

[17] N.S. Lambert, D.M. Pennock, and Y. Shoham. Eliciting properties of probability distributions. In Proceedings of the 9th ACM Conference on Electronic Commerce, pages 129 138, 2008.

[18] Rafael Frongillo and Ian Kash. General truthfulness characterizations via convex analysis. In Web and Internet Economics, pages 354 370. Springer, 2014.

[19] C. Heinrich. The mode functional is not elicitable. Biometrika, page ast048, 2013.

[20] Hans Fllmer and Stefan Weber. The Axiomatic Approach to Risk Measures for Capital Determination. Annual Review of Financial Economics, 7(1), 2015.

[21] R. Tyrrell Rockafellar and Stan Uryasev. The fundamental risk quadrangle in risk management, optimization and statistical estimation. Surveys in Operations Research and Management Science, 18(1):33 53, 2013.

[22] Tobias Fissler, Johanna F. Ziegel, and Tilmann Gneiting. Expected Shortfall is jointly elicitable with Value at Risk - Implications for backtesting. ar Xiv:1507.00244 [q-ﬁn], July 2015. ar Xiv: 1507.00244.

[23] Whitney K. Newey and James L. Powell. Asymmetric least squares estimation and testing. Econometrica: Journal of the Econometric Society, pages 819 847, 1987.

[24] Aharon Ben-Tal and Marc Teboulle. AN OLD-NEW CONCEPT OF CONVEX RISK MEASURES: THE OPTIMIZED CERTAINTY EQUIVALENT. Mathematical Finance, 17(3):449 476, 2007.

[25] R. T. Rockafellar, J. O. Royset, and S. I. Miranda. Superquantile regression with applications to buffered reliability, uncertainty quantiﬁcation, and conditional value-at-risk. European Journal of Operational Research, 234:140 154, 2014.