# artificial_intelligence_conferences_closeness__0206b350.pdf

Artiﬁcial Intelligence Conferences Closeness

S ebastien Konieczny and Emmanuel Lonca CRIL - CNRS, Universit e d Artois, France {konieczny, lonca}@cril.fr

We study the evolution of Artiﬁcial Intelligence conference closeness, using the coscinus tool. coscinus computes the closeness between publication supports using the co-publication habits of authors: the more authors publish in two conferences, the closer these two conferences. In this paper we perform an analysis of the main Artiﬁcial Intelligence conferences based on principal components analysis and clustering performed on this closeness relation.

1 Introduction It is often hard and artiﬁcial to put boundaries on scientiﬁc domains. This is especially true for Artiﬁcial Intelligence, due to the highly different theoretical techniques that are used, and the huge spectrum of applications domains. Similarly it can be difﬁcult to judge the proximity or closeness of different Artiﬁcial Intelligence communities, and the historical evolution of this closeness. One possibility is to look at the contents of the main conferences of these communities, in order to analyse this kind of change. This can be done for instance by trying to analyse and to match the title of the sessions of these conferences. But this process would require (subjective) interpretations and text analysis. A more objective way of analysing the closeness between communities is by looking at the publication habits of researchers. Starting from the hypothesis that mosts researchers are specialists of a given single domain, or of a set of connected domains, this means that the publication supports (conferences and journals) used by a researcher connect these publications supports together (around this researcher expertise). So this means that if there are many researchers that publish papers in two distinct conferences, these conferences are clearly thematically connected. So, by looking at the publication habits of researchers, one can draw a landscape of publication supports, and then obtain objective information about the communities and the domains. This is the aim of the coscinus tool (http://www.coscinus. org), that performs such an analysis from publications data issued from DBLP (https://dblp.uni-trier.de), that is currenlty the largest and most stable database of computer science

publications, with more than 4 millions of records (more than 1.5 millions of journal papers and more than 2 millions of conference and workshop papers) in 2018 (source - https://dblp.org/statistics/recordsindblp.html). In order to perform a study of the historical evolution of Artiﬁcial Intelligence conference closeness, we have adapted this tool by adding a temporal window selection, and focusing on Artiﬁcial Intelligence publication supports. This adapted tool can be ﬁnd here: http://www.coscinus.org/ai. In this paper we will sum up and analyze some observations that can be made by using this tool. In Section 2 we will describe the coscinus methodology, how the closeness measure is deﬁned, and how it is used for the principal component analysis and for the clustering. In Section 3 we will study the closeness of the IJCAI conference with conferences from other domains, in order to have a global view of this closeness with other main computer science conferences. In Section 4 we will perform a similar analysis but focusing only on IJCAI-related AI conferences.

2 Coscinus The data used in the coscinus project (http://www.coscinus. org) are issued from two sources: The DBLP database (https://dblp.uni-trier.de) The CORE ranking (http://www.core.edu.au/conference-portal) The CORE ranking classify publication supports (conferences and journals) on several categories. In coscinus we restrict ourselves to publication supports in categories A*, A, B and C. Here is the meaning of the categories as summed up in http://www.core.edu.au/conference-portal: A* - ﬂagship conference, a leading venue in a discipline area A - excellent conference, and highly respected in a discipline area B - good conference, and well regarded in a discipline area C - other ranked conference venues that meet minimum standards For replicability, the data used for the experimentations described in this paper are available here: http://coscinus.org/ data. The version of the DBLP database used for experimentations described in this paper is the 22 may 2018 one.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

We use the last versions of the CORE ranking, i.e. CORE 2018 for conferences and CORE 2010 for journals. We provide also the CSV ﬁle linking the publication supports entries from DBLP and the ones from the CORE ranking that we use for coscinus and for the reported experiments. Since there is no standardized key naming of publication supports (globally based on acronyms of the support, but not always and with possible variations), that some conference names changed during their history, and that DBLP put some conferences under a same key, some publication supports are missing because of a mismatch of entry keys between DBLP and CORE. For the data used in the coscinus project and for this paper, we have: 1 941 999 papers (CORE A*: 339 691 / A: 583 255 / B: 647 668 / C: 371 385) 1 172 474 authors 925 conferences (CORE A*: 63 / A: 201 / B: 308 / C: 353) 480 journals (CORE A*: 51 / A: 112 / B: 137 / C: 180) Now let us turn to the way we exploit these data. The fundamental idea is that researchers in a ﬁeld tend to publish in the same conferences/journals. And most of these researchers have at most one or two ﬁelds of competence. Hence, it is possible to draw some connections between conferences/journals by looking at how much researchers have papers in the two conferences/journals. So we build a matrix of number of publications of each author in each support (conference/journals). Then, we build a matrix [supports supports], where we count the number of authors that publish in the two publication supports. To avoid too much noise we use a threshold to decide which authors are counted. It may happen that an author published once in a publication support that is not in his ﬁeld. While it may be signiﬁcant, there are also some chances that this publication is an anomaly , that is not related at all with the mainstream work of this author. This phenomenon induces some noise in the data. As an example, suppose that an author has 9 papers in conference A, 7 in conference B, 4 in conference C, and 1 in conference D. While we can suppose that conferences A, B and C are close, this is a much weaker conclusion than D is related to these three conferences. The MC (Minimal Count) threshold n works like this: we add 1 in the matrix position [support A, support B] for each author that has published at least n papers in support A and n papers in support B. Choosing high values for n allows to focus on mosts proliﬁc authors only, and reduces the noise produced by anomalies. The obtained counting matrix is the basic information we use as closeness measure between publication supports. There is just still one issue: some publication supports publish much more papers than others, then this matrix is biased by these differences in the total numbers of accepted papers. In order to ﬁx this issue, we use a normalization step. This turns the ﬁrst, counting matrix, in a second, proportional matrix. After this normalization step is applied, we obtain as ﬁnal information the proportion of authors of a publication support that also published in another one. There are three possible standard normalizations, since to normalize the data [support1, support2], one can normal-

ize on the total number of papers of support1 (nsupport1 = [support1, support1]), or the total number of papers of support2 (nsupport2 = [support2, support2])):

min:[support1, support2]/ min(nsupport1, nsupport2)

max:[support1, support2]/ max(nsupport1, nsupport2)

avg:(min + max) / 2

Let us illustrate this process on a small part of the matrix with some AI conferences. The ﬁrst matrix, the counting matrix is given on Table 1. This matrix is obtained from the real data from coscinus, using MC = 3, i.e. each unity in a matrix cell [support A, support B] means that an author published at least 3 papers in support A and support B. So one can see that there are 1826 authors that published at least 3 papers at IJCAI. And among these ones, 935 also published at least 3 papers at AAAI, whereas only 54 published at least 3 papers at JELIA. Then we give the corresponding min-normalized matrix on Table 2. This is one of these three matrices (that we will call nmatrix), depending of the chosen normalization, that is used as input of both the principal component analysis and the clustering methods. We perform a PCA (Principal Component Analysis) on the nmatrix matrix. PCA is a classical statistical method used to reduce the number of dimensions of a set of data [Pearson, 1901; Jackson, 2003; Jolliffe, 2002]. We consider our matrix as indicating a proﬁle of each publication support on a space composed of 1405 dimensions (the total number of publication supports), and we project these publication supports on the 2 dimensions space chosen by the PCA method (the aim of the PCA is to ﬁnd the best choice for a reduced number of dimensions). This result is computed from the nmatrix for different values of MC (Min Count threshold), the three normalization functions (min, max, avg), and for different levels of publication supports according to CORE ranking classiﬁcation (A*, A*+A, A*+A+B, A*+A+B+C). It is also possible, for example, to choose to display only A* or A*+A or A*+A+B publication supports when choosing the nmatrix based on A*+A+B+C publication supports. We also perform clustering on the nmatrix matrix in order to identify clusters of publication supports. We propose two clustering methods :

The ﬁrst one is the k-means clustering method, that is one of the main clustering methods [Macqueen, 1967] (but it is not hierarchical, so clusters can be rather different when we increase the number of clusters).

The second one is an agglomerative clustering method, implying that when we increase the number of cluster by one, one cluster will be splitted in two. More exactly we use the average-link clustering method [Ward, 1963; Rokach and Maimon, 2005].

The coscinus project allows to compute the results for conferences only, for journals only, and for both. From now on, in this study we will focus on conferences. We will also use the min-normalization in the following, and the k-means clustering method.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Table 1: Counting Matrix

Table 2: Min-normalized Matrix

Let us illustrate what kind of information and vizualisation we can obtain from the general coscinus website, before focusing on AI conferences and on the historical study.

Figure 1 shows the 2-dimensional representation obtained by PCA (with MC=3, normalization=min), and the clusters obtained by k-means (for 12 clusters) of the A* conferences in the system (63 conferences).

One can note that the IJCAI cluster is composed of IJCAI, ICAPS, AAAI, AAMAS and KR. And that there is a machine learning cluster composed of NIPS, ICML, COLT,

UAI and RSS.

One can note also that there is a clear Artiﬁcial Intelligence region (these two clusters), and the other distinct region is the Database + Web one, that is clearly outside the main region with all other A* conferences.

These results are obtained by taking only the 63 A* conferences into account. We can obtain more detailed results by considering additionally A conferences. Figure 2 shows (under the same parameters) the A+A* conferences in the system (264 conferences). Figure 3 is a zoom on the IJCAI area, that

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Figure 1: A* conferences (MC=3, norm=min, 12 clusters)

allows to have a better vizualization of the AI related part. In this case the IJCAI cluster (still for 12 clusters for the k-means method) is composed of IJCAI, LPNMR, PKDD, JELIA, AISTATS, ICAPS, ICDM, AAAI, KDD, KR, NIPS, ICML, UAI, ECAI, CIKM, KCAP, WSDM, WWW, PAKDD, IDA, SIGIR, SDM, CVPR, So we obtain an unsurprising result, with, besides purely AI conferences, conferences on data-mining, web, and vision. These results are obtained through the whole set of DBLP data. One interesting question is to study the historical evolution of conference closeness. This is what we address in the following (in order to have a better vizualization and more information, the reader can go to http://www.coscinus.org/ai).

3 IJCAI Closest A* Conferences

We will use a temporal window selection in order to study the historical evolution of closeness between conferences. We have chosen a 5 years window in order to have at least 2 editions of biennial conferences. We have centered the windows on the central year. So in the following when we will refer for instance to 2015, this means the 2013-2017 window period. With a small temporal selection we can not use a big value for MC (Minimal Count). In the standard case (without temporal selection) we use MC=3 as default. Here, with 5year windows, we select MC=1 in order to obtain sufﬁciently many data. One can consider that on a small temporal window the anomalies (i.e. a researcher publishing a paper in a conference outside her expertise domain) will not be numerous, so that the introduced noise will be small. Let us illustrate in Table 3 the closeness between IJCAI and other A* conferences, by looking at the conferences that are in the same cluster as IJCAI for 5, 10, 20 and 30 clusters. For 20 and 30 clusters the conferences that appear the more often are AAAI and KR, and the ones that appear more than

once are AAAI, KR, ICAPS, AAMAS and UAI, so they can be considered as the closest conferences from IJCAI. But one can not really see a temporal evolution on this set (still it is possible to notice that in 2015 LICS and ISSAC appear with 20 clusters). For 10 clusters, one can see mostly AI-related conferences. This means that there is clear boundary of the contours of AI. There are often discussions about what is AI, with a common sentence saying that AI is everywhere , i.e. that AI does not exist per se, and that it is closely intricated with other communities. But this shows that, as a publishing community, there is a clear contour of AI. This singularity can also be seen simply on the PCA result, since the AI conferences are grouped in one of the branches of the obtained representation. For 5 and 10 clusters we can see conferences from other communities. In 1980 this is quite diverse, since appear a conference on information systems (ICIS), one of theoretical computer science (ISSAC) and one of cryptography (EUROCRYPT). In 1985 the only non-AI conferences are theoretical computer science conferences (LICS, POPL) and a distributed computing one (PODC). This can be maybe related to the developments of multi-agent systems, and of object/actor languages. In 1990 appear a veriﬁcation conference (CAV) and an information retrieval one (SIGIR). In 1995 there are visualisation/vision/interface conferences (INFOVIS, ICCV, CHI, ISWC). Then from 2000 one can observe the convergence towards data mining and web conferences (KDD, SIGIR, WWW, WSDM), that lasts until now.

4 IJCAI-related AI Conferences

In this section we focus on a selection of AI conferences. We have chosen a set of A* and A conferences from the CORE ranking. The 33 conferences we focus on are the following ones: AIML, FUZZIEEE, NIPS, ICAPS, PKDD, AAAI, IJ-

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

Figure 2: A+A* conferences (MC=3, norm=min, 12 clusters)

Figure 3: A+A* conferences - IJCAI area

CAI, UAI, ICONIP, ECAI, EMNLP, JELIA, TARK, ACL, TABLEAUX, CIKM, AISTATS, ICLP, ICDM, RSS, ICML, LPAR, ISRR, CP, KR, AAMAS, ESWS, IJCNN, CADE, COLT, SAT, LPNMR, FOGA.

The idea here is to try to identify AI sub-communities and their historical evolution. So in Table 4 we put all the clusters of these 33 conferences, when using the k-means clustering methods with 20 clusters and 30 clusters. This cluster computation is made on all computer science A* and A conferences (that represent 239 conferences), and then we only focus on

these 33 conferences and their clusters. This explains why we have less than 20 or 30 clusters in the 20/30 clusters column, since some clusters do not contain any of these 33 conferences. A global comment on these results is that they illustrate the real singularity of artiﬁcial intelligence in the computer science community, since the clusters containing main AI conferences are mostly composed of AI conferences. Let us ﬁrst try to ﬁnd conferences that are most of the times in a same cluster, that would suggest stable sub-communities.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

year ( 2) 5 clusters 10 clusters 20 clusters 30 clusters 2015 ACL WSDM ICAPS ICDM AAAI WWW KDD SIGIR KR JCDL MM UAI ICCV AAMAS

ICAPS AAAI KR AAMAS ISSAC KR CADE LICS

ICAPS AAAI KR AAMAS 2010 ACL WSDM ICAPS ICDM AAAI WWW KDD SIGIR KR NIPS ICML COLT UAI AAMAS

ACL WSDM ICAPS ICDM AAAI WWW KDD SIGIR KR FOGA NIPS ICML COLT UAI AAMAS

ICAPS AAAI AAAI KR

2005 ICAPS AAAI KDD KR NIPS ICML COLT UAI

RSS ICAPS AAAI KR NIPS ICML COLT UAI ICCV AAMAS

AAAI KR ICAPS UAI 2000 ICAPS AAAI KDD SIGIR KR NIPS ICML COLT UAI

RTSS CAV ICDM AAAI WWW KR ICML CADE MM UAI AAMAS

ICAPS AAAI UAI AAMAS 1995 ACL KR CHI INFOVIS CADE ICCV AAMAS ISWC

ICAPS AAAI KDD KR FOGA NIPS ICML COLT UAI

ICAPS AAAI KR AAMAS

1990 ACL CAV ICAPS AAAI SIGIR KR ICML CADE COLT UAI

ACL ICML CADE UAI AAAI KR ICML AAAI KR

1985 ACL CADE LICS ACL PODC POPL AAAI STOC FOCS CADE LICS 1980 AAAI ISSAC ICIS CADE EUROCRYPT

Table 3: A* conferences in the same cluster than IJCAI (k-means clustering)

We can identify a ﬁrst big group composed of AAAI, IJCAI, ECAI, KR and AAMAS. Then one can also identify a machine learning group (NIPS, COLT, ICML, AISTATS, UAI), a natural language one (ACL, EMNLP), an automated reasoning one (LPAR, CADE, TABLEAUX, LPNMR, SAT), and lastly a data-mining one (PKDD, CIKM, ICDM), but that seems less strongly connected than the other ones. One can note that the conferences on neural networks IJCNN and ICONIP are not closely connected to NIPS. One can note also that the robotics conference group (RSS, ISRR) is often connected to machine learning and neural networks conferences, but neither to the planning and scheduling one (AIPS). It is also interesting to note that on the 30 clusters column, ICLP, CP and JELIA appear sometimes with the IJCAI group (AAAI, IJCAI, ECAI, KR, AAMAS), and sometimes in the automated reasoning group (LPAR, CADE, TABLEAUX, LPNMR, SAT), so this can be interpreted as being at the articulation between these two sub-communities (and this articulation is clear on the 20 clusters columns, where these two groups are usually unioned in a single one). It could be interesting to look at if some scientiﬁc thematic changes could explain the transition to one group to the other. We do not have space to put here the corresponding landscapes obtained with PCA, but the reader can go on http: //www.coscinus.org/ai to explore them.

5 Conclusion

We described the coscinus project, that allows to exploit DBLP data in order to compute proximity between publication supports, and that allows to study the contours of computer science communities. For this paper we added a temporal window selection feature and the possibility to focus on AI conferences only. We think that it can prove useful in order to study the evolution of computer science in general, and of artiﬁcial intelligence in particular. We made a very basic

use of this tool in this paper, but we believe that it could prove much more useful if combined with other observations.

Acknowledgments This work has been partly supported by the project CPER DATA from the Hauts-de-France Region.

References [Jackson, 2003] J. Edward Jackson. A User s Guide to Principal Components. Wiley Series in Probability and Statistics. Wiley-Interscience, 2003. [Jolliffe, 2002] Ian Jolliffe. Principal component analysis. Springer Verlag, 2002. [Macqueen, 1967] James Macqueen. Some methods for classiﬁcation and analysis of multivariate observations. In In 5-th Berkeley Symposium on Mathematical Statistics and Probability, pages 281 297, 1967. [Pearson, 1901] Karl Pearson. On lines and planes of closest ﬁt to systems of points in space. Philosophical Magazine, 2(6):559 572, 1901. [Rokach and Maimon, 2005] Lior Rokach and Oded Maimon. Clustering methods. In Oded Maimon and Lior Rokach, editors, The Data Mining and Knowledge Discovery Handbook., pages 321 352. Springer, 2005. [Ward, 1963] Joe H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301):236 244, 1963.

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)

year ( 2) 20 clusters 30 clusters

2015 ACL LPNMR JELIA AISTATS ICAPS AAAI TARK KR CP ICLP UAI ECAI EMNLP AAMAS IJCAI

NIPS ICML COLT RSS ISRR CIKM ICDM

FOGA AIML LPAR CADE SAT TABLEAUX

PKDD ESWS FUZZIEEE IJCNN ICONIP

AISTATS NIPS ICML COLT UAI RSS ISRR

CP LPAR CADE SAT TABLEAUX AIML FOGA

LPNMR JELIA ICAPS AAAI TARK KR ICLP ECAI AAMAS IJCAI

ACL PKDD ICDM CIKM EMNLP

IJCNN ESWS ICONIP FUZZIEEE

2010 LPNMR JELIA CP LPAR AAAI KR ICLP SAT TABLEAUX ECAI IJCAI AIML

IJCNN ICONIP FOGA FUZZIEEE CADE ICDM

ESWS ICAPS TARK AAMAS ACL EMNLP CIKM

RSS PKDD NIPS ICML ISRR COLT UAI

FOGA PKDD ICDM NIPS ICML COLT UAI

ACL EMNLP RSS ISRR CIKM SAT

CADE ICAPS AAAI TARK ECAI AAMAS IJCAI

LPNMR JELIA CP LPAR KR ICLP TABLEAUX

IJCNN ESWS ICONIP FUZZIEEE AIML

2005 PKDD AISTATS ICDM NIPS ICML COLT UAI

LPNMR JELIA CP LPAR ICAPS AAAI TARK KR ICLP TABLEAUX ECAI AAMAS IJCAI AIML

RSS IJCNN ICONIP ISRR SAT FUZZIEEE FOGA

ESWS ACL EMNLP CIKM CADE

AISTATS NIPS ICML UAI CIKM

FOGA ACL EMNLP COLT

LPAR CADE SAT TABLEAUX PKDD ICDM

RSS ISRR IJCNN ESWS FUZZIEEE ICONIP

LPNMR JELIA CP ICAPS AAAI TARK KR ICLP ECAI AAMAS IJCAI AIML

2000 CIKM LPAR CADE ACL EMNLP ISRR

LPNMR JELIA CP ICAPS AAAI KR ICLP TABLEAUX ECAI AAMAS IJCAI

PKDD IJCNN AISTATS ICDM NIPS ICML COLT UAI

ICONIP TARK FOGA FUZZIEEE AIML

CIKM LPAR CADE TABLEAUX AIML

FOGA ICAPS AAAI KR ECAI IJCAI

CP TARK AAMAS LPNMR JELIA ICLP

ACL EMNLP ICONIP ISRR FUZZIEEE IJCNN

PKDD AISTATS ICDM NIPS ICML COLT UAI

1995 LPNMR ILPS LPAR ICLP CADE TABLEAUX

JELIA ICAPS AAAI KR ECAI AAMAS IJCAI

AISTATS TARK ICML UAI ACL EMNLP

PKDD CP FOGA NIPS COLT CIKM

LPNMR JELIA ILPS LPAR ICLP CADE TABLEAUX

ICAPS AAAI TARK KR UAI ECAI AAMAS IJCAI

CIKM COLT CP ACL EMNLP FOGA NIPS

AISTATS ICML PKDD

1990 LPNMR JELIA ILPS LPAR KR ICLP CADE TABLEAUX ECAI IJCAI

CIKM FOGA TARK ACL AAAI ICAPS

IJCNN NIPS ICML COLT UAI

LPNMR JELIA AAAI TARK KR ECAI IJCAI CIKM

IJCNN NIPS COLT FOGA ICML UAI ACL

ILPS LPAR ICLP CADE TABLEAUX

1985 ACL AAAI UAI ILPS TARK ICLP CADE NIPS

ECAI ILPS CADE UAI AAAI IJCAI ICLP NIPS

1980 AAAI IJCAI ECAI ACL CADE ICLP AAAI IJCAI ECAI ACL CADE ICLP

Table 4: Clusters of A and A* AI conferences (k-means clustering)

Proceedings of the Twenty-Seventh International Joint Conference on Artiﬁcial Intelligence (IJCAI-18)