# kelp_a_kernelbased_learning_platform__a8bf9dcb.pdf

Journal of Machine Learning Research 18 (2018) 1-5 Submitted 2/16; Revised 10/17; Published 4/18

Ke LP: a Kernel-based Learning Platform

Simone Filice filice@info.uniroma2.it DICII, University of Roma, Tor Vergata, Italy

Giuseppe Castellucci castellucci@ing.uniroma2.it DIE, University of Roma, Tor Vergata, Italy

Giovanni Da San Martino gmartino@hbku.edu.qa Qatar Computing Research Institute, HKBU, Qatar

Alessandro Moschitti amosch@amazon.com Amazon

Danilo Croce croce@info.uniroma2.it Roberto Basili basili@info.uniroma2.it DII, University of Roma, Tor Vergata, Italy

Editor: Cheng Soon Ong Abstract

Ke LP is a Java framework that enables fast and easy implementation of kernel functions over discrete data, such as strings, trees or graphs and their combination with standard vectorial kernels. Additionally, it provides several kernel-based algorithms, e.g., online and batch kernel machines for classiﬁcation, regression and clustering, and a Java environment for easy implementation of new algorithms. Ke LP is a versatile toolkit, very appealing both to experts and practitioners of machine learning and Java language programming, who can ﬁnd extensive documentation, tutorials and examples of increasing complexity on the accompanying website. Interestingly, Ke LP can be also used without any knowledge of Java programming through command line tools and JSON/XML interfaces enabling the declaration and instantiation of articulated learning models using simple templates. Finally, the extensive use of modularity and interfaces in Ke LP enables developers to easily extend it with their own kernels and algorithms. Keywords: Kernel Machines, Structured Data and Kernels, Java Framework.

1. Introduction

Kernel methods for discrete structures (Shawe-Taylor and Cristianini, 2004) are popular and eﬀective techniques for the design of learning algorithms on non-vectorial data, such as strings (Lodhi et al., 2002), trees (Collins and Duﬀy, 2002; Moschitti, 2006; Aiolli et al., 2009; Croce et al., 2011; Annesi et al., 2014) and graphs (G artner, 2003; Borgwardt and Kriegel, 2005; Shervashidze, 2011). These kernels are very valuable to model complex relations in real-world applications, where data naturally has a structured form, e.g., strings and graphs are used to represent DNA and chemical compounds, or parse trees can encode syntactic and semantic information expressed in text. However, current software for structural kernels is mainly limited to speciﬁc research, and is often not made publicly available or easily adaptable to new application domains. SVM-Light-TK toolkit by Moschitti (2006) is one of few exceptions that provides the user with diﬀerent string and tree kernels but no graph kernels. It is written in C language

. Professor at the University of Trento, Italy.

c 2018 Simone Filice, Giuseppe Castellucci, Giovanni Da San Martino, Alessandro Moschitti, Danilo Croce, Roberto Basili.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v18/16-087.html.

Filice et al

thus extending it with new kernels can be costly, especially when new data structures are required. This may also prevent non programmers to use it for their speciﬁc applications. In designing Ke LP, we have capitalized on our previous experience with SVM-Light TK and other toolkits to foster the reuse of previous software and models as well as their extendibility. We provide a software platform for learning on structured data, which is both easy to use for unexperienced users and easily extendable for developers. Ke LP includes many standard kernel algorithms for classiﬁcation, regression and clustering as well as popular kernel functions for strings, trees and graphs. Additionally, it includes kernel functions for modeling relations between pairs of objects, which are, e.g., required in paraphrasing detection, textual entailment and question answering (Moschitti and Zanzotto, 2007; Filice et al., 2015; Tymoshenko and Moschitti, 2015). Most importantly, new data structures, models, algorithms and kernels can be easily added on top of the previous code, facilitating and promoting the development of a library of kernel-based algorithms for structured data. The Ke LP source code is distributed under the terms of Apache 2.0 License. No additional software is required to be installed in order to use it, the Apache Maven project management tool resolves all module dependencies automatically. We also provide and maintain a website with updated tutorials and documentation.

2. The Ke LP Framework: an Overview Ke LP is written in Java and uses three diﬀerent Maven projects to logically separate its three main components: (i) the framework backbone implements classiﬁcation, regression and clustering algorithms operating on vector-based kernels. These core modules along with SVMs1 are always part of any framework instantiation. (ii) Additional-algorithm packages, e.g., online kernel machines, Nystr om method (Williams and Seeger, 2001) and label sequence learning (Altun et al., 2003), and (iii) additional-kernel packages, which include kernel functions for sequences, trees and graphs. A complete and up-to-date list of algorithms and kernel functions, a full Javadoc API documentation in PDF, and tutorials for both end-users and developers are hosted on the Ke LP website, http://www.kelp-ml.org.

2.1 Machine Learning Algorithms

Learning algorithms in Ke LP are implemented following implementation contracts provided by speciﬁc Java interfaces for diﬀerent scenarios, i.e., classiﬁcation, regression and clustering, according to two main learning paradigms, i.e., batch and online. New learning algorithms can implement these interfaces, thus becoming fully integrated with the other library functions. More in detail: (i) The Classification Learning Algorithm interface supports the deﬁnition of classiﬁcation learning methods, such as SVMs (Chang and Lin, 2011) or the Dual Coordinate Descent (Hsieh et al., 2008). (ii) The Regression Learning Algorithm interface supports the deﬁnition of regressors, such as ϵ-SVR (Chang and Lin, 2011). (iii) The Clustering Algorithm interface enables the implementation of clustering algorithms, such as (Kulis et al., 2005). (iv) The Online Learning Algorithm interface supports the deﬁnition of online learning algorithms, e.g., Passive Aggressive (Crammer et al., 2006), or the Soft Conﬁdence Weighted (Wang et al., 2012) algorithms. Finally, (v) the Meta Learning Algorithm interface enables the design of committees, such as the multiclassiﬁcation schemas, e.g., One-VS-One and One-VS-All.

1. We include it because of its wide use.

Ke LP - a Kernel-based Learning Platform

{ "algorithm": "binary CSvm Classification", "c": 10, "kernel": {

"kernel Type": "linear Comb", "weights": [ 1, 1 ], "to Combine": [ { "kernel Type": "norm", "base Kernel": {

"kernel Type": "ptk , "representation": constituent-tree", "mu": 0.4, "lambda": 0.4, "terminal Factor": 1.0 } }, { "kernel Type": "linear", "representation": wordspace } ] }

Linear combination

Kernel normalization

Partial Tree

Linear Kernel

Figure 1: A JSON description of a SVM classiﬁer.

2.2 Data Representation

In Ke LP, data is represented by the Example class, which is constituted by (i) a set of Labels and (ii) a set of Representations. The former enables the design of single or multilabel classiﬁers and multi-variate regressors. The latter model examples in terms of vectors (e.g., Dense Vector and Sparse Vector) or structures (e.g., Sequence Representation, Tree Representation or Graph Representation). In particular, kernels can be deﬁned over examples encoded by multiple representations (e.g., multiple parse trees, strings, graphs and feature vectors). This makes the experimentation with multiple kernel combinations easy, just requiring negligible changes in the code or the JSON description (see Section 2.4), without the need of modifying the input data sets. Additionally, the examples can be combined in more complex structures, e.g., Example Pair, useful to learn relations between objects, e.g., pairs representing question and answer text in QA, or text and hypothesis in textual entailment tasks. Building other types of data format is extremely simple, e.g., Ke LP includes the SVM-Light-TK input format for trees and provides many scripts to use the popular gspan format for graphs (and indirectly for the 111 open Babel formats2).

2.3 Building Kernels from Kernels

Ke LP enables (i) kernel composition, i.e., Kab(s1, s2) = (φa φb)(s1) (φa φb)(s2) from Ka(s1, s2) = φa(s1) φa(s2) and Kb(s1, s2) = φb(s1) φb(s2); and (ii) kernel combinations, e.g., λ1Ka(s1, s2) + λ2Kb(s1, s2) Ka(s1, s2). These operations are coded using three abstractions of the Kernel class: (i) Direct Kernel directly operates on a speciﬁed Representation object, derived from the Example object (e.g., implementing kernels for vectors, sequences, trees and graphs). (ii) The Kernel Composition class composes Kernel objects, e.g., Polynomial Kernel, RBFKernel and Normalization Kernel. (iii) Kernel Combination class enables the combination of diﬀerent Kernels, e.g., the Linear Kernel Combination class applies a weighted kernel sum. (iv) Kernel On Pair class operates

2. http://openbabel.org

Filice et al

on Example Pair, e.g., to learn similarity functions between sentences (Filice et al., 2015) or to implement ranking algorithms with the Preference Kernel class.

public static void run(String train Path , String test Path , String learning Algo Path ){ // Define (load) the learning algorithm (see the JSON in Fig. 1) Jackson Serializer Wrapper serializer = new Jackson Serializer Wrapper (); Classification Learning Algorithm learning Algo ; learning Algo = serializer .read Value(new File( learning Algo Path ), Classification Learning Algorithm .class); // Load the datasets Simple Dataset train Dataset = new Simple Dataset (); train Dataset .populate(train Path); Simple Dataset test Dataset = new Simple Dataset (); test Dataset.populate(test Path); // Learn the classifier List <Label > classes = train Dataset . get Classification Labels (); learning Algo .set Labels(classes); learning Algo .learn( train Dataset ); // Classify and Evaluate Classifier classifier = learning Algo . get Prediction Function (); Evaluator evaluator = new Multiclass Classification Evaluator (classes); for(Example ex: test Dataset . get Examples ()){ evaluator.add Count(ex , classifier .predict(ex)); } System.out.println("ACC:" + evaluator. get Performance Measure ("accuracy")); }

Listing 1: The Java instantiation (and evaluation) of the SVM classiﬁer speciﬁed in Figure 1.

2.4 A User-friendly Interfacing with JSON

Each object, kernel function or algorithm, is serializable in JSON or XML. Thus, new algorithms can be implemented with a JSON description exploiting already implemented building blocks. The JSON interpreter of Ke LP instantiates the corresponding objects without requiring any Java coding. Note that once a new kernel or learning algorithm is coded in Java, it will also be automatically available in the JSON format. Thus, it can be combined and composed with any kernel and algorithm available in Ke LP by simply using JSON speciﬁcations. For example, Figure 1 provides a JSON description of an SVM classiﬁer using a linear combination of a tree kernel with a linear kernel. The procedure for training and evaluating such classiﬁer can be written in less than 20 code lines, as shown in Listing 1. Additionally, new kernels can be designed by combining JSON ﬁles and used in the framework by executing terminal commands (runnable jars). This enables experimenting with most Ke LP features without writing any Java code.

3. Related Work and Conclusions

Most kernel-based software assumes data is represented by feature vectors (Hall et al., 2009; Chang and Lin, 2011; Abeel et al., 2009). Notable exceptions are SVM-Light-TK (Moschitti, 2006) and JKernel Machines (Picard et al., 2013). SVM-Light-TK is entirely written in C language and its main feature is the high computation speed. Unfortunately, C does not allow for fast prototyping of new kernel functions and machines. In contrast, Ke LP enables fast and easy implementation of new kernel methods. JKernel Machines is a Java based package primarily designed to deal with custom kernels that cannot be easily found in standard libraries. However, many features oﬀered by Ke LP are not available in JKernel Machines, e.g., tree and graph kernels and regression algorithms. Moreover, Ke LP supports easier composition and combination of kernels and learning algorithms.

Ke LP - a Kernel-based Learning Platform

Thomas Abeel, Yves Van de Peer, and Yvan Saeys. Java-ml: A machine learning library. Journal of Machine Learning Research, 10:931 934, 2009. Fabio Aiolli, Giovanni Da San Martino, and Alessandro Sperduti. Route Kernels for Trees. In Proceedings of ICML 09, pages 17 24, Montreal, Quebec, Canada, 2009. ACM Press. Yasemin Altun, Ioannis Tsochantaridis, and Thomas Hofmann. Hidden markov support vector machines. In Proceedings of ICML 2003, pages 4 11, Menlo Park, CA, USA, August 2003. Paolo Annesi, Danilo Croce, and Roberto Basili. Semantic compositionality in tree kernels. In Proc. of CIKM 2014, pages 1029 1038, New York, NY, USA, 2014. ACM. Karsten M Borgwardt and Hans-Peter Kriegel. Shortest-Path Kernels on Graphs. In Proceedings of the Fifth IEEE International Conference on Data Mining, pages 74 81, 2005. Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1 27:27, 2011. Michael Collins and Nigel Duﬀy. Convolution kernels for natural language. In Advances in Neural Information Processing Systems, pages 625 632. MIT Press, 2002. Koby Crammer, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz, and Yoram Singer. Online passiveaggressive algorithms. JMLR, 7:551 585, December 2006. Danilo Croce, Alessandro Moschitti, and Roberto Basili. Structured lexical similarity via convolution kernels on dependency trees. In EMNLP, pages 1034 1046, 2011. Simone Filice, Giovanni Da San Martino, and Alessandro Moschitti. Structural representations for learning relations between pairs of texts. In Proc. of ACL, 2015. Thomas G artner. A survey of kernels for structured data. ACM SIGKDD Explorations Newsletter, 5(1):49 58, 2003. Mark Hall, Eibe Frank, Geoﬀrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: An update. sigkdd explor., 11(1), 2009. Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear svm. In Proc. of ICML, pages 408 415. ACM, 2008. Brian Kulis, Sugato Basu, Inderjit Dhillon, and Raymond Mooney. Semi-supervised graph clustering: A kernel approach. In Proc. of ICML, pages 457 464. ACM, 2005. Huma Lodhi, Craig Saunders, John Shawe-Taylor, Nello Cristianini, , and Chris Watkins. Text classiﬁcation using string kernels. journal of Machine Learning Research, pages 419 444, 2002. Alessandro Moschitti. Eﬃcient convolution kernels for dependency and constituent syntactic trees. In Proceedings of ECML 06, pages 318 329, Berlin, Germany, 2006. Alessandro Moschitti and Fabio Massimo Zanzotto. Fast and eﬀective kernels for relational learning from texts. In ICML 07, pages 649 656. ACM, 2007. David Picard, Nicolas Thome, and Matthieu Cord. Jkernelmachines: A simple framework for kernel machines. Journal of Machine Learning Research, 14:1417 1421, 2013. John Shawe-Taylor and Nello Cristianini. La Te X User s Guide and Document Reference Manual. Kernel Methods for Pattern Analysis, Cambridge University Press, 2004. Nino Shervashidze. Weisfeiler-lehman graph kernels. The Journal of Machine Learning, 12:2539 2561, 2011. URL http://dl.acm.org/citation.cfm?id=2078187. Kateryna Tymoshenko and Alessandro Moschitti. Assessing the impact of syntactic and semantic structures for answer passages reranking. In Proc. of CIKM, pages 1451 1460. ACM, 2015. Jialei Wang, Peilin Zhao, and Steven C. Hoi. Exact soft conﬁdence-weighted learning. In John Langford and Joelle Pineau, editors, Proceedings of ICML, pages 121 128, 2012. Christopher K. I. Williams and Matthias Seeger. Using the nystr om method to speed up kernel machines. In Proceedings of NIPS 2000, 2001.