# learning_to_rank_based_on_analogical_reasoning__30c756b6.pdf

Learning to Rank Based on Analogical Reasoning

Mohsen Ahmadi Fahandar, Eyke H ullermeier

Department of Computer Science

Paderborn University Pohlweg 49-51, 33098 Paderborn, Germany

ahmadim@mail.upb.de, eyke@upb.de

Object ranking or learning to rank is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. In this paper, we propose a new approach to object ranking based on principles of analogical reasoning. More speciﬁcally, our inference pattern is formalized in terms of so-called analogical proportions and can be summarized as follows: Given objects A, B, C, D, if object A is known to be preferred to B, and C relates to D as A relates to B, then C is (supposedly) preferred to D. Our method applies this pattern as a main building block and combines it with ideas and techniques from instance-based learning and rank aggregation. Based on ﬁrst experimental results for data sets from various domains (sports, education, tourism, etc.), we conclude that our approach is highly competitive. It appears to be speciﬁcally interesting in situations in which the objects are coming from different subdomains, and which hence require a kind of knowledge transfer.

1 Introduction Preference learning has received increasing attention in machine learning in recent years (F urnkranz and H ullermeier 2011). Roughly speaking, the goal in preference learning is to induce preference models from observational (or experimental) data that reveal information about the preferences of an individual or a group of individuals in a direct or indirect way; the latter typically serve the purpose of predictive modeling, i.e., they are then used to predict the preferences in a new situation.

In general, a preference learning system is provided with a set of items (e.g., products) for which preferences are known, and the task is to learn a function that predicts preferences for a new set of items (e.g., new products not seen so far), or for the same set of items in a different situation (e.g., the same products but for a different user). Frequently, the predicted preference relation is required to form a total order, in which case we also speak of a ranking problem. In fact, among the problems in the realm of preference learning, the task of learning to rank has probably received the most attention in the literature so far,

Copyright c 2018, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

Figure 1: A is to B as C is to D (pictures from Image Net).

and a number of different ranking problems have already been introduced. Based on the type of training data and the required predictions, F urnkranz and H ullermeier distinguish between the problems of object ranking (Cohen, Schapire, and Singer 1999; Kamishima, Kazawa, and Akaho 2010), label ranking (Har-Peled, Roth, and Zimak 2002; Cheng, H uhn, and H ullermeier 2009; Vembu and G artner 2010), and instance ranking (F urnkranz, H ullermeier, and Vanderlooy 2009).

The focus of this paper is on object ranking. Given training data in the form of a set of exemplary rankings of subsets of objects, the goal in object ranking is to learn a ranking function that is able to predict the ranking of any new set of objects. Our main contribution is a novel approach that is based on the idea of analogical reasoning, and essentially builds on the following inference pattern: If object A relates to object B as C relates to D, and knowing that A is preferred to B, we (hypothetically) infer that C is preferred to D. Figure 1 provides an illustration with cars as objects. Cars A and B are more or less the same, except that A is a cabriolet and has color red instead of black, and the same is true for cars C and D. Thus, knowing that someone likes car A more than B, we may conjecture that the same person prefers car C to D. Our method applies this pattern as a main building block and combines it with ideas and techniques from instance-based learning and rank aggregation.

The rest of the paper is organized as follows. In the next section, we recall the setting of object ranking and formalize the corresponding learning problem. In the next section,

The Thirty-Second AAAI Conference

on Artificial Intelligence (AAAI-18)

we present a formalization of analogical reasoning based on the concept of analogical proportion. Next, we introduce our approach to analogy-based learning to rank (able2rank). Finally, we present an experimental evaluation of this approach, prior to concluding the paper with a summary and an outline of future work.

2 Learning to Rank Consider a reference set of objects, items, or choice alternatives X, and assume each item x X to be described in terms of a feature vector; thus, an item is a vector x = (x1, . . . , xd) Rd and X Rd. The goal in object ranking is to learn a ranking function ρ that accepts any (query) subset Q = {x1, . . . , xn} X

of n = |Q| items as input. As output, the function produces a ranking π Sn of these items, where Sn denotes the set of all permutations of length n, i.e., all mappings [n] [n] (symmetric group of order n); π represents the total order

xπ 1(1) xπ 1(2) . . . xπ 1(n) , (1)

i.e., π 1(k) is the index of the item on position k, while π(k) is the position of the kth item xk (π is often called a ranking and π 1 an ordering). Formally, a ranking function is thus a mapping ρ : Q R , (2)

where Q = 2X \ is the query space and R =

n N Sn the ranking space. The order relation is typically (though not necessarily) interpreted in terms of preferences, i.e., x y suggests that x is preferred to y. A ranking function ρ is learned on a set of training data that consists of a set of rankings

D = (Q1, π1), . . . , (QM, πM) , (3)

where each ranking πj deﬁnes a total order of the query set Qj. Once a ranking function has been learned, it can be used for making predictions for new query sets Q (see Figure 2). Such predictions are evaluated in terms of a suitable loss function or performance metric. A common choice is the (normalized) ranking loss, which counts the number of inversions between two rankings π and π :

d RL(π, π ) =

1 i,j n π(i) < π(j) π (i) > π (j)

where is the indicator function.

2.1 Methods The ranking function (2) sought in object ranking is a complex mapping from the query to the ranking space. An important question, therefore, is how to represent a rankingvalued function of that kind, and how it can be learned efﬁciently. As will be seen later on, our approach avoids an explicit representation of this function, and instead implements a query-speciﬁc (transductive) inference, in very much the same way as instance-based learning represents a predictor (through the sample data) in an indirect way.

x1,1 x1,2 x2,1 x2,2 x3,2 x4,2 x3,1 x3,2 x3,1 ... ... ... x N,1 x N,2 x N,1

x1, x2, . . . , xn

x3 xn 1 . . . x2

ranking function ρ

xn 1 x1 . . . x2

Figure 2: The setting of object ranking.

Most commonly, a ranking function is represented by means of an underlying scoring function

so that x x if U(x) > U(x ). In other words, a rankingvalued function is represented through a real-valued function. Obviously, U can be considered as a kind of utility function, and U(x) as a latent utility degree assigned to an item x. Seen from this point of view, the goal in object ranking is to learn a latent utility function on a reference set X. The representation of a ranking function in terms of a realvalued (utility) function also suggests natural approaches to learning. In particular, two such approaches are prevailing in the literature. The ﬁrst one reduces the original ranking problem to regression; as it seeks a model that assigns appropriate scores to individual items x, it is referred to as the pointwise approach in the literature. The second idea is to reduce the problem to binary classiﬁcation; here, the focus is on pairs of items, which is why the approach is called the pairwise approach. As a representative of the ﬁrst category, we will include expected rank regression (ERR) in our experimental study later on (Kamishima and Akaho 2006; Kamishima, Kazawa, and Akaho 2010). ERR reduces object ranking to standard linear regression. To this end, every training example (Q, π) is replaced by a set of data points (xi, yi) X R. Here, the target yi assigned to object xi Q is given by

yi = π(i) |Q| + 1 .

This is justiﬁed by taking an expectation over all (complete) rankings of X and assuming a uniform distribution. In spite of this apparently oversimpliﬁed assumption, and the questionable transformation of ordinal ranks into numerical scores, ERR has shown quite competitive performance in empirical studies, especially when all rankings in the training data (3) are of approximately the same length (Melnikov et al. 2016). Given a ranking (1) as training information, the pairwise approach extracts all pairwise preferences xπ 1(i)

xπ 1(j), 1 i < j n, and considers these preferences as examples for a binary classiﬁcation task. This approach is especially simple if U is a linear function of the form U(x) = w x. In this case, U(x) > U(x ) if w x > w x , which is equivalent to w z > 0 for z = x x Rd. Thus, from the point of view of binary classiﬁcation (with a linear threshold model), z can be considered as a positive and z as a negative example. In principle, any binary classiﬁcation algorithm can be applied to learn the weight vector w from the set of examples produced in this way. As a representative of this class of methods, we will use support vector machines in our experiments; more speciﬁcally, we include Ranking SVM (Joachims 2002) as a state-of-the-art baseline to compare with.

3 Analogical Reasoning

Analogical reasoning has a long tradition in artiﬁcial intelligence research, and various attempts at formalizing analogybased inference can be found in the literature. We are building on a recent approach that is based on the concept of analogical proportion (Miclet and Prade 2009; Prade and Richard 2017), which has already been used successfully in different problem domains, including classiﬁcation (Bounhas, Prade, and Richard 2014a), recommendation (Hug et al. 2016), preference completion (Pirlot, Prade, and Richard 2016), decision making (Billingsley et al. 2017) and solving IQ tests (Beltran, Prade, and Richard 2016). Consider four values a, b, c, d from a domain X. The quadruple (a, b, c, d) is said to be in analogical proportion, denoted a : b :: c : d, if a relates to b as c relates to d . A bit more formally, this idea can be expressed as

R(a, b) R(c, d) , (4)

where the relation denotes the as part of the informal description. R can be instantiated in different ways, depending on the underlying domain X. Speciﬁcally relevant for us are the cases of Boolean variables, where X = B = {0, 1}, and numerical variables, where X = R. In the latter case, we will distinguish between arithmetic proportions, where R(a, b) = a b, and geometric proportions, where R(a, b) = a/b. Independently of the concrete choice of X and R, the following axioms can reasonably be required (assuming an expression a : b :: c : d to be either true or false):

a : b :: a : b (reﬂexivity)

a : b :: c : d c : d :: a : b (symmetry)

a : b :: c : d a : c :: b : d (central permutation)

3.1 Boolean Variables

Consider the case of Boolean variables with values in B = {0, 1}. There are 24 = 16 instantiations of the pattern a : b :: c : d. The smallest (and hence most informative) subset of logical proportions satisfying the above axioms consists of the following 6 instantiations:

a b c d 0 0 0 0 0 0 1 1 0 1 0 1 1 0 1 0 1 1 0 0 1 1 1 1 This formalization captures the idea that a differs from b (in the sense of being equally true , more true , or less true ) exactly as c differs from d, and vice versa. This can also be expressed by the following formula:

((a b) (c d)) ((b a) (d c)) (5)

We remark that the above deﬁnition further satisﬁes the properties of independence with regard to positive or negative encoding of features (a : b :: c : d a : b :: c : d) as well as transitivity (a : b :: c : d and c : d :: e : f a : b :: e : f). For an in-depth discussion of logical proportions, we refer to (Prade and Richard 2013).

3.2 Real-Valued Variables To extend analogical proportions to real-valued variables X, we assume these variables take values in [0, 1]. Practically, this may require a normalization of the original variable in a preprocessing step (a discussion of this will follow below). Values in the unit interval can be interpreted as generalized (graded) truth degrees. For instance, if x is the speed of a car, the normalized value x = 0.8 can be interpreted as the degree to which the property high speed applies, i.e., the truth degree of the proposition the car has high speed . The idea, then, is to generalize the formalization of analogical proportions in the Boolean case using generalized logical operators (Dubois, Prade, and Richard 2016). Naturally, the analogical proportion itself will then become a matter of degree, i.e., a quadruple (a, b, c, d) can be in analogical proportion to some degree; in the following, we will denote this degree by v(a, b, c, d). For example, taking (5) as a point of departure, and generalizing the logical conjunction x y by the minimum operator min(x, y), the implication x y by Lukasiewicz implication max(1 x + y, 0), and the equivalence x y by 1 |x y|, one arrives at the following expression:

v A(a, b, c, d) = 1 |(a b) (c d)|

if sign(a b) = sign(c d) and

v A(a, b, c, d) = 1 max(|a b|, |c d|)

otherwise. We also consider a variant v A of this expression, which is given by v A (a, b, c, d) = v A(a, b, c, d) if sign(a b) = sign(c d) and v A (a, b, c, d) = 0 otherwise. Based on alternative interpretations of R and in (4), other proposals for measuring the degree of analogical proportion can be found in the literature: Geometric proportion (Beltran, Jaudoin, and Pivert 2014):

v G(a, b, c, d) = min(ad, bc)

max(ad, bc) ,

if sign(a b) = sign(c d) and max(ad, bc) > 0, and 0 otherwise.

Min-Max proportion (Bounhas, Prade, and Richard 2014b):

v MM(a, b, c, d) = 1 max | min(a, d) min(b, c)|,

| max(a, d) max(b, c)| .

Approximate equality proportion (Beltran, Jaudoin, and Pivert 2014):

v AE(a, b, c, d) = max a b c d ,

where x y is true if |x y| ϵ for a ﬁxed threshold ϵ [0, 1]. We will also use a graded variant of v AE( ), denoted as v AE , as deﬁned above. To this end, we generalize the indicator function and evaluate the approximate equality of values a, b as a b = max(1 |a b|/ϵ, 0).

3.3 Extension to Feature Vectors

The degree of analogical proportion deﬁned above can be generalized from individual values to objects in the form of vectors x = (x1, . . . , xd), where xi Xi {B, R} in the context of object ranking, x is a feature vector characterizing a choice alternative. To this end, the degrees of analogical proportion for the individual entries are aggregated into a single degree:

v(a, b, c, d) = AGG v(ai, bi, ci, di) | i = 1, . . . , d (6)

The aggregation operator AGG can be speciﬁed in different ways. For example, a conjunctive combination is obtained with AGG = min. This aggregation is very strict, however, and does not allow for compensating a low value of analogical proportion on a single feature by large values on other features. In our current implementation of analogy-based object ranking, we therefore use the arithmetic average as an aggregation.

4 Analogy-based Object Ranking

Recall that, in the setting of learning to rank, we suppose to be given a set of training data in the form

D = (Q1, π1), . . . , (QM, πM) ,

where each πm deﬁnes a ranking of the set of objects Qm. If zi, zj Qm and πm(i) < πm(j), then zi zj has been observed as a preference. In the following, we will denote by

1 i<j |Qm| (zπ 1(i), zπ 1(j))

the set of all pairwise preferences that can be extracted from D. The goal is to generalize beyond the data D so as to be able to predict a ranking of any query set

Q = {x1, . . . , xn} .

Our analogy-based approach to object ranking (able2rank) consists of two main steps:

First, for each pair of objects xi, xj Q, a degree of preference pi,j [0, 1] is extracted from D. If these degrees are normalized such that pi,j + pj,i = 1, they deﬁne a reciprocal preference relation

Normalization is not a strict requirement, however; in our concrete implementation, pairwise preference are expressed in terms of absolute frequencies (see below). Second, the preference relation P is turned into a ranking π using a suitable ranking procedure. Both steps will be explained in more detail further below. Before, however, we brieﬂy return to the issue of data preprocessing.

4.1 Preprocessing Recall that we assume objects to be represented in terms of feature vectors x = (x1, . . . , xd). As already mentioned, analogical reasoning based on analogical proportions assumes real-valued features to be normalized, with values in the unit interval [0, 1]. Therefore, if the kth feature is realvalued, we apply the linear transformation

xk xk mink maxk mink where mink and maxk denote, respectively, the smallest and largest value of that feature in the data. This transformation is applied separately to the training and the test data. Prior to normalization, we apply a logarithmic transformation to some of the features. For each feature, we compute a numeric measure of the skewness of its distribution and the distribution of its log-transform (Joanes and Gill 1998). If the skewness of the latter is smaller than the skewness of the former, the log-transform is adopted, i.e., each value xk is replaced by log(xk); otherwise, the feature is left unchanged.

4.2 Analogical Prediction of Pairwise Preferences The ﬁrst step of able2rank, prediction of pairwise preferences, is based on analogical reasoning. The basic idea is as follows: Consider any pair of query objects xi, xj Q. Moreover, suppose we ﬁnd a pair of objects (z, z ) Dpair in the training data, i.e., a preference z z , such that (z, z , xi, xj) are in analogical proportion. Then, this is taken as an indication in favor of the preference xi xj. We refer to this principle as analogical transfer of preferences, because the observed preference z z between objects z, z is (hypothetically) transferred to xi, xj. Since the training data is not necessarily coherent, both preferences xi xj and xj xi might be supported (via analogical transfer). We deﬁne

pi,j = ci,j ci,j + cj,i ,

where ci,j is the number of preferences that support xi xj, and cj,i the number of preferences supporting xj xi. Here, instead of counting all preferences, we only consider

the most relevant ones, i.e., those with the largest degrees of analogical proportion. More speciﬁcally, let V = Vi,j Vj,i with

(z,z ) Dpair v(z, z , xi, xj) (7)

(z,z ) Dpair v(z, z , xj, xi) (8)

be the set of analogy scores for xi and xj, and V V the k largest scores in V . Then

ci,j = |Vi,j V |, cj,i = |Vj,i V | . (9)

The degrees of analogical proportion in (7 8) are given by (6) with v {v A, v A , v G, v MM, v AE, v AE }. Thus, v plays the role of a parameter of our method, just like the value k of relevant proportions taken into account; in our current implementation, we choose k {10, 15, 20}. Algorithm (1) provides a summary of this part of the method.

Algorithm 1 Analogy-based Pairwise Preferences (APP)

Require: Dpair, Q 1: lst Xi Xj [] // initialize the list 2: for all xi, xj Q do 3: // binary vector: supporting xi xj as 1 and 0 otherwise 4: Xi Xj [] 5: scores [] // numeric vector: the degree of support 6: for all z, z Dpair do 7: sij v(z, z , xi, xj) 8: sji v(z, z , xj, xi) 9: if sij > sji then 10: scores.append(sij) 11: else 12: scores.append(sji) 13: end if 14: Xi Xj.append( not(xor z z , sij > sji ) ) 15: end for 16: // re-arrange Xi Xj based on positions of sorted scores 17: Xi Xj Xi Xj( arg sort(scores) ) 18: lst Xi Xj.append(Xi Xj) 19: end for 20: return lst Xi Xj

4.3 Rank Aggregation To turn pairwise preferences into a total order, we make use of a rank aggregation method. More speciﬁcally, we apply the Bradley-Terry-Luce (BTL) model, which is well-known in the literature on discrete choice (Bradley and Terry 1952). It starts from the parametric model

P(xi xj) = θi θi + θj , (10)

where θi, θj R+ are parameters representing the (latent) utility U(xi) and U(xj) of xi an xj, respectively. Thus, according to the BTL model, the probability to observe a preference in favor of a choice alternative xi, when being compared to any other alternative, is proportional to θi.

Given the data (9), i.e., the numbers ci,j informing about how often every preference xi xj has been observed, the parameter θ = (θ1, . . . , θn) can be estimated by likelihood maximization:

ˆθ arg max θ Rn

Finally, the predicted ranking π is obtained by sorting the items xi in descending order of their estimated (latent) utilities ˆθi (see Algorithm 2). We note that many other rank aggregation techniques have been proposed in the literature and could principally be used as well; see e.g. (Ahmadi Fahandar, H ullermeier, and Couso 2017). However, since BTL seems to perform very well, we did not consider any other method.

Algorithm 2 Rank Aggregation (RA)

Require: lst Xi Xj, k 1: Initialize comparison matrix Cn n 2: for all Xi Xj in lst Xi Xj do

3: Ci,j k r=1 Xi Xj(r) = 1

4: Cj,i k r=1 Xi Xj(r) = 0 5: end for 6: ˆθ BTL(C) 7: ˆπ arg sort(ˆθ) 8: return ˆπ

5 Experiments In order to study the practical performance of our proposed method, we conducted experiments on several real-world data sets, using ERR and SVM-Rank (cf. Section 2.1) as baselines for comparison.

5.1 Data The data sets1 are collected from various domains (e.g., sports, education, tourism) and comprise different types of feature (e.g., numeric, binary, ordinal). Table 1 provides a summary of the characteristics of the data sets. Here is a detailed description: Decathlon: This data contains rankings of the top 100 men s decathletes worldwide in the years 2005 and 2006, with 10 numeric features associated with each athlete. Each feature is the performance achieved by the athlete in the corresponding discipline (e.g., the time in 100 meter race). The ranking of the decathletes is based on a scoring scheme, in which each performance is ﬁrst scored in terms of a certain number of points (the mapping from performance to scores is non-linear), and the total score is obtained as the sum of the points over all 10 disciplines. In addition, the results of Olympic games Rio de Janeiro 2016 (24 instances) as well as under-20 world championships 2016 (22 instances) are considered. The data are extracted from the Decathlon2000 web site2.

1available at https://cs.uni-paderborn.de/is/ 2www.decathlon2000.com

Table 1: Properties of data sets.

data set domain # instances # features numeric binary ordinal name

Year 2005 100 10 x D1 Year 2006 100 10 x D2 Olympic games Rio de Janeiro 2016 24 10 x D3 Under-20 world championships 2016 22 10 x D4

Season 15/16 18 13 x B1 Season 16/17 18 13 x B2 Mid-Season 16/17 18 7 x B3 Season 16/17 Away 18 6 x B4

Year 2016 100 40 36 1 3 F1 Year 2017 100 40 36 1 3 F2 Year 2016 (Position:Streaker) 50 40 36 1 3 F3 Year 2017 (Position:Streaker) 50 40 36 1 3 F4

Hotels D usseldorf 110 28 x x x H1 Frankfurt 149 28 x x x H2

Uni. Rankings Year 2015 100 9 x U1 Year 2014 100 9 x U2

Volleyball WL Group 3 12 15 x V1 Group 1 12 15 x V2

Netﬂix Germany 9 7 x x N1 USA 13 7 x x N2

FIFA: The FIFA Index website3 ranks the best football players in the world based on different metrics each year. These metrics belong to different categories, such as ball skills (ball control, dribbling, etc.), physical performance (acceleration, balance, etc.), defence (marking, slide tackling, etc.), and so on. We considered the list of top 100 footballers in the years 2016 and 2017, where each player is described by 40 attributes. Since the metrics of different types of players are not comparable, the overall evaluation of a player depends on his position (goal keeper, defender, streaker, etc.). Obviously, predicting the overall ranking is therefore a difﬁcult task. In addition to the full data sets, we therefore also considered two positionspeciﬁc data sets, namely the rankings for players with position streaker in the years 2016 and 2017.

Hotels: This data set contains rankings of hotels in two major German cities, namely D usseldorf (110 hotels) and Frankfurt (149 hotels). These rankings have been collected from Trip Advisor4 in September 2014. Each hotel is described in terms of a feature vector of length 28 (e.g., distance to city center, number of stars, number of rooms, number of user rating, etc). The way in which a ranking is determined by Trip Advisor on the basis of these features is not known (and one cannot exclude that additional features may play a role).

University Rankings: This data includes the list of top 100 universities worldwide for the years 2014 and 2015. It is published by the Center for World University Rankings (CWUR). Each university is represented by 9 features such as national rank, quality of education, alumni employment, etc. Detailed information about how the ranking is determined based on these features can be found on

3www.ﬁfaindex.com 4www.tripadvisor.com

the CWUR website5.

Bundesliga: This data set comprises table standings of 18 football teams in the German Bundesliga (German football league) for the seasons 2015/16 and 2016/17.6 Each team is described in terms of 13 features, such as matches, win, loss, draw, goals-for, goals-against, etc. To study the ability of knowledge transfer, we also included the table standing for the season 2016/17, in which only the statistics for away matches are considered (with 6 features). Another case is the table standing in the midseason 2016/17 (i.e., only the ﬁrst half of the season) with 7 features.

Volleyball WL: This data contains the table standing for Group1 (statistics of 12 teams divided into subgroups, each with 9 matches) and Group3 (statistics of 12 teams, each with 6 matches) of volleyball world league 2017 extracted from the FIVB website7. There are 12 features in total, such as win, loss, number of (3-0, 3-1, 3-2, etc.) wins, sets win, sets loss, etc.

Netﬂix: This data set includes the Netﬂix ISP speed index (extracted from Netﬂix website8 in August, 2017) for Germany and USA with 9 and 13 Internet providers, respectively. The Netﬂix ISP Speed Index is a measure of prime time Netﬂix performance on particular internet service providers (ISPs) around the globe. The rankings are represented by 7 (binary and numeric) features like speed, speed of previous month, ﬁber, cable, etc.

5www.cwur.org 6www.bundesliga.com 7worldleague.2017.ﬁvb.com 8ispspeedindex.netﬂix.com

5.2 Experimental Setting

Given the data sets for training and testing, able2rank ﬁrst preprocesses the individual attributes as explained above (log-transformation and normalization). We also apply a standard normalization for the baseline methods (ERR and SVM), transforming each real-valued feature by standardization:

where μ and σ denote the empirical mean and standard deviation, respectively. Recall that able2rank has two parameters to be tuned: The type of analogical proportion v Sv, where Sv = {v A, v A , v G, v MM, v AE, v AE }, and the number k Sk of relevant proportions considered for estimating pairwise preferences, where Sk = {10, 15, 20}. We ﬁxed these parameters in an (internal) 2-fold cross validation (repeated 5 times) on the training data, using simple grid search on Sv Sk (i.e., trying all combinations). The combination (v , k ) with the lowest cross-validated error d RL is eventually adopted and used to make predictions on the test data (using the entire training data). The complexity parameter C of SVM is ﬁxed in a similar way using an internal cross-validation.

5.3 Results

In our experiments, predictions were produced for certain parts Dtest of the data we collected, using other parts Dtrain as training data; an experiment of that kind is denoted Dtrain Dtest. The results of the conducted experiments are summarized in Table 2, with the best performance on each problem displayed in bold-face. Additionally, we report the parameters used by able2rank. From the results, we conclude that able2rank is quite competitive in terms of predictive accuracy. In 8 of 11 cases, it achieves the best performance in terms of the average ranking loss d RL (which translates to a p-value of around 10% when comparing able2rank and SVM with a pairwise sign test, and of around 0.5% for the comparison with ERR).

6 Conclusion and Future Work

This paper advocates the use of analogical reasoning in the context of preference learning. Building on the notion of analogical proportion, we formalize the heuristic principle suggesting that, if an alternative A is preferred to B, and C relates to D as A relates to B, then C is likely to be preferred to D. Based on this formalization, we develop a concrete method, able2rank, for the problem of object ranking. First experimental results on real-world data from different domains are quite promising and suggest that able2rank is competitive to state-of-the-art methods for object ranking. In future work, we plan to elaborate more closely on the setting of transfer learning, because we believe our analogical approach to preference learning, like analogical reasoning in general, to be speciﬁcally useful for this purpose. In fact, preference transfer as deﬁned in this paper only requires the relation R to be evaluated separately for source objects A and B on the one side and target objects C and

Table 2: Results in terms of loss d RL on the test data.

train test (v , k ) able2rank ERR SVM D1 D2 v A , 10 0.066 0.064 0.025 D3 D4 v MM, 10 0.139 0.147 0.152 B1 B2 v A, 15 0.046 0.144 0.059 B4 B2 v A, 20 0.078 0.078 0.026 B3 B2 v A, 10 0.000 0.033 0.013 F1 F2 v G, 20 0.225 0.259 0.310 F3 F4 v G, 10 0.158 0.364 0.169 H1 H2 v A, 20 0.060 0.084 0.072 U1 U2 v A , 15 0.073 0.163 0.154 V1 V2 v MM, 10 0.030 0.485 0.015 N1 N2 v A, 10 0.013 0.090 0.077

D on the other side, but never between sources and targets; in principle, different speciﬁcations of R could even be used for the source and the target. In the experiments conducted in this paper, some kind of knowledge transfer was already required, but source and target were still closely related and essentially from the same domain. As already said, we plan to extend these experiments toward problems where source and target are from different domains, and to generalize able2rank correspondingly. Besides, the basic version of able2rank as presented in this paper ought to be improved and extended in different directions. For example, for the time being, the analogical proportion of feature vectors in able2rank is simply the average of the analogical proportions on the individual attributes. However, since different features may have a different inﬂuence on preferences and analogies, the selection or weighting of features appears to be quite important. Therefore, the development of methods for feature selection and weighting will be addressed in future work. Furthermore, the current version of able2rank ignores possible interactions between features. For example, while a person may prefer color red to black for T-shirts, this preference might be reversed in the case of pullovers. Learning such interactions and capturing them in analogical inference is a highly non-trivial problem. Last but not least, issues related to computational efﬁciency ought to be addressed. Since able2rank follows the lazy learning paradigm (Aha 1997), just like instance-based learning and nearest neighbor prediction, it can learn very easily (simply by storing observed preferences) but is relatively costly at prediction time. In comparison to nearest neighbor classiﬁcation, for example, the cost is even higher due to the need to iterate over pairs of objects in the training data to ﬁnd the highest analogical proportions. Thus, implementing highest analogy search in a naive way, the complexity scales quadratically with the size of the training data. Consequently, there is a need to reduce this complexity by means of suitable data structures and algorithms for efﬁcient retrieval of analogy pairs. Besides, other means for complexity reduction could be considered, such as instance selection or editing strategies like those used in case-based reasoning (Mc Kenna and Smyth 2000; Delany and Cunningham 2004).

References Aha, D., ed. 1997. Lazy Learning. Kluwer Academic Publ. Ahmadi Fahandar, M.; H ullermeier, E.; and Couso, I. 2017. Statistical inference for incomplete ranking data: The case of rank-dependent coarsening. In Precup, D., and Teh, Y., eds., Proceedings ICML 2017, 34th International Conference on Machine Learning. Beltran, W. C.; Jaudoin, H.; and Pivert, O. 2014. Analogical prediction of null values: The numerical attribute case. In Proc. ADBIS, 18th East European Conference on Advances in Databases and Information Systems, 323 336. Ohrid, Macedonia: Springer. Beltran, W. C.; Prade, H.; and Richard, G. 2016. Constructive solving of raven s IQ tests with analogical proportions. Int. J. Intell. Syst. 31(11):1072 1103. Billingsley, R.; Prade, H.; Richard, G.; and Williams, M. 2017. Towards analogy-based decision A proposal. In Proc. FQAS, 12th International Conference on Flexible Query Answering Systems, 28 35. Bounhas, M.; Prade, H.; and Richard, G. 2014a. Analogical classiﬁcation: A new way to deal with examples. In Proc. ECAI, 21st European Conference on Artiﬁcial Intelligence, 135 140. Bounhas, M.; Prade, H.; and Richard, G. 2014b. Analogical classiﬁcation: Handling numerical data. In Proc. SUM, 8th International Conference on Scalable Uncertainty Management, 66 79. Bradley, R., and Terry, M. 1952. The rank analysis of incomplete block designs I. The method of paired comparisons. Biometrika 39:324 345. Cheng, W.; H uhn, J.; and H ullermeier, E. 2009. Decision tree and instance-based learning for label ranking. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML 09, 161 168. New York, NY, USA: ACM. Cohen, W. W.; Schapire, R. E.; and Singer, Y. 1999. Learning to order things. J. Artif. Int. Res. 10(1):243 270. Delany, S., and Cunningham, P. 2004. An analysis of case-base editing in an Spam ﬁltering system. In Proc. ECCBR 2004, European Conference on Case-Based Reasoning, 128 141. Heidelberg: Springer-Verlag. Dubois, D.; Prade, H.; and Richard, G. 2016. Multiplevalued extensions of analogical proportions. Fuzzy Sets and Systems 292:193 202. F urnkranz, J., and H ullermeier, E. 2011. Preference Learning. Springer-Verlag. F urnkranz, J.; H ullermeier, E.; and Vanderlooy, S. 2009. Binary Decomposition Methods for Multipartite Ranking. Berlin, Heidelberg: Springer Berlin Heidelberg. 359 374. Har-Peled, S.; Roth, D.; and Zimak, D. 2002. Constraint Classiﬁcation: A New Approach to Multiclass Classiﬁcation. Berlin, Heidelberg: Springer Berlin Heidelberg. 365 379. Hug, N.; Prade, H.; Richard, G.; and Serrurier, M. 2016. Analogy in recommendation. Numerical vs. ordinal: A discussion. In IEEE International Conference on Fuzzy Systems, 2220 2226.

Joachims, T. 2002. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-02), 133 142. ACM Press. Joanes, D. N., and Gill, C. A. 1998. Comparing measures of sample skewness and kurtosis. Journal of the Royal Statistical Society: Series D (The Statistician) 47:183 189. Kamishima, T., and Akaho, S. 2006. Supervised ordering by regression combined with thurstone s model. Artiﬁcial Intelligence Review 25(3):231 246. Kamishima, T.; Kazawa, H.; and Akaho, S. 2010. A survey and empirical comparison of object ranking methods. In F urnkranz, J., and H ullermeier, E., eds., Preference Learning. Berlin, Heidelberg: Springer-Verlag. 181 202. Mc Kenna, E., and Smyth, B. 2000. Competence-guided editing methods for lazy learning. In Proceedings ECAI 2000, 14th European Conference on Artiﬁcial Intelligence, 60 64. Melnikov, V.; Gupta, P.; Frick, B.; Kaimann, D.; and H ullermeier, E. 2016. Pairwise versus pointwise ranking: A case study. Schedae Informaticae 25:73 83. Miclet, L., and Prade, H. 2009. Handling Analogical Proportions in Classical Logic and Fuzzy Logics Settings. Berlin, Heidelberg: Springer Berlin Heidelberg. 638 650. Pirlot, M.; Prade, H.; and Richard, G. 2016. Completing preferences by means of analogical proportions. In Proc. MDAI, 13th International Conference on Modeling Decisions for Artiﬁcial Intelligence, 135 147. Prade, H., and Richard, G. 2013. From analogical proportion to logical proportions. Logica Universalis 7(4):441 505. Prade, H., and Richard, G. 2017. Analogical proportions and analogical reasoning An introduction. In Proceedings ICCBR, Int. Conf. on Case-Based Reasoning, 16 32. Vembu, S., and G artner, T. 2010. Label ranking: a survey. In F urnkranz, J., and H ullermeier, E., eds., Preference Learning. Springer-Verlag.