# global_optimization_of_kcenter_clustering__ddfac8d6.pdf Global Optimization of K-Center Clustering Mingfei Shi * 1 Kaixun Hua * 1 Jiayang Ren 1 Yankai Cao 1 k-center problem is a well-known clustering method and can be formulated as a mixed-integer nonlinear programming problem. This work provides a practical global optimization algorithm for this task based on a reduced-space spatial branch and bound scheme. This algorithm can guarantee convergence to the global optimum by only branching on the centers of clusters, which is independent of the dataset s cardinality. In addition, a set of feasibility-based bounds tightening techniques are proposed to narrow down the domain of centers and significantly accelerate the convergence. To demonstrate the capacity of this algorithm, we present computational results on 32 datasets. Notably, for the dataset with 14 million samples and 3 features, the serial implementation of the algorithm can converge to an optimality gap of 0.1% within 2 hours. Compared with a heuristic method, the global optimum obtained by our algorithm can reduce the objective function on average by 30.4%. 1. Introduction Clustering is a fundamental unsupervised machine learning task that plays a vital role in various fields of applications, such as customer grouping (Aggarwal et al., 2004), data summarization (Kleindessner et al., 2019; Hesabi et al., 2015), and facility location determination (Hansen et al., 2009). Clustering aims to aggregate similar data into one cluster and separate those in diverse into different clusters (Pan et al., 2013). Cluster analysis can be formulated as an optimization problem. Various objective functions are designed and lead to different algorithms to find clusters (Madhulatha, 2012). *Equal contribution 1Department of Chemical and Biological Engineering, University of British Columbia, Vancouver, British Columbia, Canada. Correspondence to: Yankai Cao . Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copyright 2022 by the author(s). This paper focuses on one of the most fundamental centroidbased clustering problems called the k-center problem. It picks a subset of k samples as centers to represent k clusters, and each sample is assigned to its closest center to form the cluster. The distance from a sample to its closest center is called the within-cluster distance. The objective of the kcenter problem is to minimize the maximum within-cluster distance of the dataset (Kaufman & Rousseeuw, 2009). As an NP-hard problem (Gonzalez, 1985), there is no way to achieve an optimal solution in a polynomial-time unless P = NP (Garey & Johnson, 1979). Therefore, many heuristic approximation algorithms are developed to obtain a near-optimal solution quickly. The 2-approximation greedy algorithm is one of the most effective heuristic ways to solve the k-center problem (Gonzalez, 1985). It starts from a randomly selected center, and then the new centers are chosen as the farthest points to the previously selected centers. Alpert & Kahng (1997) applied a complete-linkage-based algorithm that can outperform traditional algorithms with proper ordering heuristics for sample. However, the experiments in (Aloise & Contardo, 2018) showed that it seldom finds optimal solutions. Parametric pruning (Hochbaum & Shmoys, 1985) transferred the k-center problem as finding a minimum dominating set in a pruned graph. It developed corresponding heuristics to solve the dominating set problems. An experimental comparison of these heuristic algorithms indicated the 2-approximation greedy algorithm as the fastest in practical (Miheliˇc & Robic, 2005). Nevertheless, none of the above algorithms can guarantee a global optimal solution for the k-center problem. The exact algorithms that attempt to solve the k-center problem to global optimum lie in many directions. Early seminal works focus on reducing searching space for the optimal solution. For example, Daskin (2000) proposed an iterative algorithm that solves the k-center problem by performing a binary search over possible solution values. In each iteration, a set-covering problem is solved. Elloumi et al. (2004) leveraged the binary search scheme, designed a new integer linear programming formulation of the k-center problem, and generated tighter lower bounds than pure LP relaxations. It was the first algorithm that could solve a dataset of size up to 1817 samples. Another direction to attain global optimum includes adopting a constraint programming frame- Global Optimization of K-Center Clustering work. Duong et al. (2017) proposed a general constraint clustering algorithm that can solve the datasets with up to 5000 samples in 50 seconds. Calik & Tansel (2013) introduces a block of covering constraints for the formulation of k-center problem and updates the lower and upper bounds in the same iteration. However, their algorithm does not perform well on large datasets (Contardo et al., 2019). Recently, the strategy of iteratively solving reduced subproblems has been proposed by researchers to solve the k-center problem to global optimality on large datasets. Aloise & Contardo (2018) presented a sampling-based exact algorithm, which alternates between an exact procedure on a small sample of points and a heuristic procedure to test the optimality of the current solution. Their computational experiment shows that this algorithm can obtain a reasonable solution for a dataset containing 581,012 observations within 4 hours. However, this work does not report the optimality gap, an important index to evaluate the solution quality. Chen & Handler (1987) and Chen & Chen (2009) proposed an iterative algorithm based on relaxation but only considered a small subset of the data samples. They set up heuristics to iteratively update the subset samples to approach the global optimal. Moreover, Contardo et al. (2019) designed a row generation algorithm that iteratively solves a smaller subproblem, and reported the solution of a dataset with 1 million samples to a gap of 6% within 9 hours. However, none of these methods provides a convergence guarantee that the algorithm will converge to an arbitrarily small gap within a finite number of steps. Therefore, these methods often lead to a nontrivial optimality gap, especially for large datasets. In this paper, we proposed a different approach to solve the k-center problem on extremely large datasets, by adopting a reduced-space BB scheme recently proposed for two-stage stochastic programming problems (Cao & Zavala, 2019). This reduced-space BB scheme has already been successfully applied for the k-means clustering problem (Hua et al., 2021), in which the reduced-space BB scheme is combined with Lagrangian decomposition to guarantee convergence to the global ϵ-optimal solutions. This method can solve the k-means problem on a dataset with 210,000 samples (200 cores, 2.6% optimality gap within the runtime of 4 hours). Because the objective of k-center is to minimize the maximum within-cluster distance, instead of the average within-cluster distance, we cannot adopt the Lagrangian decomposition approach to compute the lower bound. Also, because of the centers on samples constraint in the kcenter problem (k-means clustering does not have this constraint), the direct application of Hua s algorithm will lead to infeasible solutions. To address these challenges, we propose a tailored reduced-space BB clustering algorithm for the k-center problem and design several feasibility-based bounds tightening (FBBT) methods to reduce the search space of our BB algorithm efficiently. The novelty of our BB clustering algorithm is that by only branching on the space of centers, we can assure the convergence of our algorithm to the exact global optimum within a finite number of steps for the task of k-center clustering. Note that a classic BB algorithm is unable to solve the k-center problem with large datasets because it needs to branch on all integer variables, the dimension of which scales with the number of samples. We highlight our contributions as follows: We design an exact global optimization algorithm based on a reduced-space spatial branch and bound scheme for the k-center clustering problem. We propose a decomposable approach to compute lower bounds. The lower bound can be analytically derived in a closed-form. The whole algorithm does not need to rely on solvers to solve any optimization sub-problems. We prove that our algorithm is guaranteed to converge to the global optimum by only branching on the centers of clusters. We design several bounds tightening techniques to significantly reduce the search space of the BB algorithm and accelerate the solution process. We demonstrate that the assignment of clusters can be determined for many samples without knowing the optimal solution. Bounds tightening techniques coupled with the decomposable lower bounding techniques enable our algorithm to be extremely scalable. We provide an open-source Julia implementation of the algorithm and perform extensive computational experiments on 32 datasets. Our algorithm is significantly faster than the off-the-shelf global optimal solvers and provides much better solutions than heuristic approaches. Notably, for the dataset with 14 million samples and 3 features, the serial implementation of the algorithm can obtain the global solution to an optimality gap of 0.1% within 2 hours. 2. k-center Formulation To formulate k-center problem, given a dataset with S samples and A attributes, denoted as X = {x1, . . . , x S} RA S, in which xi = [xi,1, ..., xi,A] RA is the ith sample and xi,a is the ath attribute of ith sample, the k-center problem can be defined as: min µ X max s S min k K ||xs µk||2 2 (1) where s S := {1, , S} is the data sample set, k K := {1, , K} is the cluster set, µ := [µ1, , µK] represents the center of each cluster. µ are the variables to be determined. We use µ X to denote the centers Global Optimization of K-Center Clustering on samples constraint that the center of each cluster is restricted to some data samples. Problem 1 can be reformulated as a problem with the form: z = min µ M0 X max s S Qs(µ). (2) where Qs(µ) = min k K ||xs µk||2 2. We denote a closed set M0 := {µ | µ µ µ} in Equation 2 as the domain of centers, where µ represents the lower bound of centers and µ is the upper bound, i.e., µk a = min s Xs,a, µk a = max s Xs,a, k K, a {1, , A}. Here the constraint µ M0 can be inferred directly from data, and thus does not affect optimal solution. It is introduced to simplify the discussion of the BB framework. Since µ M0 X, which is a finite set, we can attain the value of z in Problem 2. This formulation will be utilized in Section 3.1 to introduce the lower bounding problem in the branch and bound scheme. Alternatively, the k-center problem also can be represented as a standard optimization problem with the following extensive form: min µ,d,b,λ d (3a) s.t. dk s ||xs µk||2 2 (3b) N1(1 bk s) d s dk s 0 (3c) d d s (3d) X k K bk s = 1 (3e) bk s {0, 1} (3f) N2(1 λk s) xs µk N2(1 λk s) (3g) X s S λk s = 1 (3h) λk s {0, 1} (3i) bk s λk s (3j) s S, k K (3k) where dk s represents the distance between sample xs and center µk, d s denotes the distance between xs and the center of its cluster, N1 and N2 are both arbitrary large values. bk s and λk s are two binary variables. bk s is equal to 1 if sample xs belongs to the kth cluster, and 0 otherwise. λk s is equal to 1 if µk, which is the center of the kth cluster, is chosen on xs, and 0 otherwise. Constraint 3c is a big M formulation and ensures that d s = dk s if bk s = 1 and d s dk s otherwise, and Constraint 3e guarantees that sample xs belongs to only one cluster. We also adopt Constraint 3g, 3h and 3j to represent µ X, the restriction that centers are selected on data samples for each cluster. Specifically, Constraint 3g uses a big M formula to make sure that µk = xs if λk s = 1, and Constraint 3h confirms that each center can only be selected on one sample. Constraint 3j ensures that if xs is the center of the kth cluster, then obviously it is assigned to the kth cluster. Problem 3 is a convex mixed-integer nonlinear problem and can be solved by optimization solvers such as CPLEX (Cplex, 2020) or Gurobi (Optimization, 2020). However, within a certain time limit (e.g. 4 hours), these all-purpose solvers can only deal with small datasets. 3. Reduced-space Branch and Bound Scheme In this section, we introduce a reduced-space BB algorithm for k-center problem with tailored lower bounding and upper bounding methods. 3.1. Lower Bounds At each node in a BB algorithm, we deal with a subset of M0, which is denoted as M, and solve the following problem with respect to M: z(M) = min µ M X max s S Qs(µ) (4) This problem is referred as the primal node problem. It can be equivalently reformulated as the following problem by duplicating µ across samples and enforcing them to be equal. This gives the lifted form: min µs M X max s S Qs(µs) (5a) s.t. µs = µs+1, s {1, , S 1} (5b) By removing the centers on samples constraint µ X and the Constraints 5b, we attain a lower bounding formulation as follow: β(M) := min µs M max s S Qs(µs). (6) With constraints relaxed, the feasible region of Problem 6 is a superset of the primal feasible region. Therefore, it is obvious that β(M) z(M). In Problem 6, since each sample is independent, it is obvious that: β(M) = max s S min µs M Qs(µs). (7) Clearly, it can be decomposed into S subproblems of the form: βs(M) = min µ M Qs(µ), (8) with β(M) = max s S βs(M). Denote M k := {µk : µk µk µk} where µk and µk are the lower and upper bound Global Optimization of K-Center Clustering of µk respectively. Since Qs(µ) = min k K ||xs µk||2 2, we have βs(M) = min k K min µk M k ||xs µk||2 2, (9) which can be further decomposed into K subsubproblems: βk s (M k) = min µk M k ||xs µk||2 2. (10) with βs(M)= min k βk s (M k). It can be easily derived that the analytical solution to Problem 10 is: µk a = mid{ µk a, xs,a, µk a}, a {1, , A}. 3.2. Upper Bounds An upper bounds of Problem 4 can be obtained by fixing the centers at a candidate feasible solution ˆµ M X. In this way, we can compute the upper bound base on the following equation: α(M) = max s S Qs(ˆµ). (11) It is easy to see that z(M) α(M), ˆµ M X. Since the closed-form expression of Qs(ˆµ) is given, the computation of the upper bound is cheap, with no need to solve any optimization problems, for a given candidate solution. In our implementation, we use two methods to obtain candidate solutions. At the root node, we use a heuristic method called Farthest First Traversal to obtain a candidate solution ˆµ M0 X. Using this method, we first randomly pick an initial point and then select each following point to be as far as possible from the set of previously selected points. Algorithm 1 describes the details of farthest first traversal, where d(xs, T) represents the minimum distance from sample xs to any sample in set T. We use FFT(M0) to denote the upper bound obtained using this approach. At a child node with center region M, for each cluster, we select the data sample which is closest to the middle point of M k as ˆµk, and obtain the corresponding upper bound α(M). Algorithm 1 Farthest First Traversal Initialization Randomly pick s S; Denote T as the set of K points selected by farthest first traversal; Set T {xs}; while |T| < K do Compute xs arg max xs X d(xs, T) to find xs which is the farthest away from set T; T T {xs}; end while 3.3. Branch and Bound Scheme and Convergence Analysis We tailor the reduced-space branch and bound scheme to solve Problem 2. The details of the algorithm are given in Algorithm 2. In the algorithm, We denote relint(.) as the relative interior of a set. Algorithm 2 Branch and Bound Scheme Initialization Initialize the iteration index i 0; Set M {M0}, and tolerance ϵ > 0; Compute initial upper and lower bounds αi = FFT(M0), βi = β(M0); while M = do Node Selection Select a set M satisfying β(M) = βi from M and delete it from M; Update i i + 1; Branching Find two subsets M1 and M2 s.t. relint(M1) relint(M2) = and M1 M2 = M; Update M M {M1}, if M k 1 X = , k K Update M M {M2}, if M k 2 X = , k K Bounding Compute upper and lower bound α(M1), β(M1), α(M2), β(M2); Let βi min{β(M ) | M M}; αi min{αi 1, α(M1), α(M2)}; Remove all M from M if β(M ) αi; If βi αi ϵ, STOP; end while We can also establish the convergence of the branch-andbound scheme in Algorithm 2. Along BB process, it can generate a monotonically nonincreasing sequence {αi} and a monotonically nondecreasing sequence {βi}. We can show that they both converge to z in a finite number of steps. Lemma 3.1. Algorithm 2 terminates in a finite number of steps. The proof of the convergence becomes obvious by noticing that the number of feasible solutions is finite because of the centers on samples constraint µ X. Since we add a partition M to the node set M only if M k X = , k K, the BB procedure will not generate any infinitely decreasing sequence of successively refined partition elements. In the worse case, a sequence of successively refined partition elements will end at a leaf node in which |M k X| = 1, k K. Since each leaf node corresponds to a feasible solution, the number of leaf nodes is finite and so the number of total nodes visited in a BB procedure is finite. Moreover, at a leaf node M, it can be shown that β(M) = α(M). Therefore, the BB procedure will terminate in a finite number of steps by only branching on µ, with the lower bound and upper bound converging to the optimal solution. Note that the convergence of the k-center problem here is stronger than the convergence analysis in (Cao & Zavala, 2019) for two-stage nonlinear optimization problems or the convergence proof in (Hua et al., 2021) for k-means clustering problem. Both Cao & Zavala (2019) and Hua et al. Global Optimization of K-Center Clustering (2021) guarantee the convergence in the sense of lim i αi = lim i βi = z. In a finite number of steps, they can only produce a global ϵ-optimal solution. While for the k-center problem, the algorithm can obtain an exact optimal solution (e.g. ϵ = 0) in a finite number of steps, because the number of feasible solutions is finite. 4. Bounds Tightening Techniques Although the lower bound introduced in Section 3.1 is enough to guarantee convergence, it might not be very tight, leading to a tremendous amount of iterations. Therefore, here we propose bounds tightening techniques to reduce the search space and speed up the BB procedure. Since Algorithm 2 only branches on the space of centers µ, we focus on reducing the region of µ to accelerate the solution process, while guaranteeing the optimal solution of the problem is not excluded. 4.1. Cluster Assignment In this subsection, we propose several strategies to decide the assignment of some samples (e.g. which cluster the sample is assigned to), that is to determine the value of bk s in Problem 3 for some s and k, before finding the optimal solution of the whole problem. This information will be used in the next subsection to reduce the region of µ. Denote α as the current best upper bound of the optimal value achieved using methods described in Section 3.2. Then from Objective 3a and Constraint 3c in Formulation 3, we have d s α. Based on Constraint 3b and 3c, we can conclude that if bk s = 1, then ||xs µk||2 2 α. Therefore, we have Lemma 4.1. Lemma 4.1. If sample xs is in the kth cluster, then ||xs µk||2 2 α, where α is an upper bound of the k-center problem. Lemma 4.1 tells us that if ||xs µk||2 2 > α, then we can infer bk s = 0, that is sample xs cannot be assigned to the kth cluster. Besides inferring from the distance between samples and centers, cluster assignments may also be determined from the distance of two data samples, as shown in the following Lemma 4.2. Lemma 4.2. If two samples xi and xj are in the same cluster, then ||xi xj||2 2 4α where α is an upper bound of the k-center problem. Lemma 4.2 is obvious by noticing that if xi and xj all belong to the kth cluster, then based on Lemma 4.1 we have ||xi µk||2 2 α and ||xj µk||2 2 α. Thus ||xi xj||2 2 = ||xi µk+µk xj||2 2 (||xi µk||2+||µk xj||2)2 4α. Lemma 4.2 tells us that if ||xi xj||2 2 > 4α, then xi and xj are not in the same cluster. Based on these two Lemmas, the followings are three approaches to assign samples to clusters. 4.1.1. K FARTHEST POINTS By Lemma 4.2, if there are K points and the distance between any two of these points xi and xj satisfying ||xi xj||2 2 > 4α, then we can conclude that each point belongs to a distinct cluster. If we can find the K points satisfying this property at the root node, we can arbitrarily assign these points to different clusters. In other words, we can denote the cluster containing the kth point as kth cluster. We call these K points as initial seeds. In order to find these initial seeds, every two samples must be as far as possible. Therefore, in our implementation, we use the heuristic Farthest First Traversal (FFT) (Algorithm 1) to obtain K farthest points. For about half of the case studies shown in Section 5, we can obtain the initial seeds using FFT. However, for other datasets, initial seeds can not be obtained using FFT, or maybe the initial seeds do not even exist. ||x1 x2||2 2 > 4α ||x2 x3||2 2 > 4α ||x3 x1||2 2 > 4α Figure 1. K farthest points with 3 clusters. In this example, ||x1 x2||2 2 > 4α, ||x2 x3||2 2 > 4α and ||x3 x1||2 2 > 4α. Therefore, we can arbitrarily assign x1, x2, x3 to different 3 clusters. 4.1.2. CENTER BASED ASSIGNMENT By Lemma 4.1, if ||xs µk||2 2 > α, then we can conclude that xs is not in cluster k, or bk s = 0. If we can determine that bk s = 0, k K \ {k }, then bk s = 1. However, the value of µ here is not known before solving the overall problem. One observation is that if the node M contains the optimal solution, then we have βk s (M k) = min µk M k ||xs µk||2 2 ||xs µk||2 2. Therefore, if βk s (M k) > α. then by Lemma 4.1, sample xs is definitely not in the kth cluster and bk s = 0. In summary, for sample xs, if k K \ {k }, βk s (M k) > α, then xs is assigned to cluster k with bk s = 1. Figure 2 illustrates an example in two-dimensional space with a total of three clusters. This center based method can be adopted at every node of the BB scheme. Since βk s (M k) is already computed when obtaining the lower bound, there is no additional cost of distance computation or solving any optimization problem. Nevertheless, we do not need to apply this method at the root node, since M 1 = = M K initially. As the BB Global Optimization of K-Center Clustering β2 s(M 2) > α β3 s(M 3) > α Figure 2. Center based assignment with 3 clusters. In this example, β2 s(M 2) > α (b2 s = 0) and β3 s(M 3) > α (b3 s = 0). Therefore, we assign xs to the first cluster (b1 s = 1). scheme continues branching on µ, M k becomes more and more different from those of other clusters, then the cluster of more samples can be determined. 4.1.3. SAMPLE BASED ASSIGNMENT Besides using centers as a benchmark to allocate data points, assigned samples also give us the assignment of undetermined samples. By Lemma 4.2, if ||xi xj||2 2 > 4α, then xi and xj are not in the same cluster. If we already know that xj belongs to cluster k, then obviously xi cannot be assigned to cluster k, or bk i = 0. Using this relation, if all the other K 1 clusters are excluded, xi will be assigned to the only one cluster left. An example of the sample based assignment is depicted in Figure 3. There is a prerequisite to using this method. For each cluster, there must be at least one sample that has already been assigned to the cluster. Based on this condition, sample based assignment is utilized only after the algorithm has already determined at least one sample for each cluster. ||xs x1||2 2 > 4α ||xs x2||2 2 > 4α Figure 3. Sample based assignment with 3 clusters. Assume we have already known that x1, x2, x3 belong to cluster 1, 2 and 3, respectively. xs is the sample to be determined. In this example, ||xs x1||2 2 > 4α (b1 s = 0) and ||xs x2||2 2 > 4α (b2 s = 0). Therefore, xs is assigned to cluster 3 (b3 s = 1). 4.2. Feasibility-Based Bounds Tightening In this subsection we adopt the Feasibility-Based Bounds Tightening (FBBT) technique to reduce the space of µ. 4.2.1. BALL-BASED BOUNDS TIGHTENING For a sample j, denote Bα(xj)={x| ||x xj||2 2 α} as the ball with center xj and radius α. By using methods described in subsection 4.1, assume we already know that sample j belongs to cluster k, by Lemma 4.1, then µk Bα(xj). We use J k A to denote the index of all samples assigned to cluster k, i.e., J k A = {j S | bk j = 1}, then µk Bα(xj), j J k A. Besides this, we also know that µk M k X. Denote Sk + as the index set of samples satisfy all these constraints, Sk +(M) := {s S |xs M k, xs Bα(xj), j J k A}. In this way, we can obtain a tightened box containing all feasible solutions of kth medoid, ˆ M k={µk|ˆ µk µk ˆ µk}, with the bounds of ath attribute in kth medoid to be ˆ µk a= min s Sk +(M) xk s,a and ˆ µk s= max s Sk +(M) xk s,a. Figure 4 gives an example of bounds tightening using this method. One challenge of this ballbased bounds tightening method is that it need to compute the distance of xs and xj for all s S and j J k A. If we know the assignments of most of the samples, we need to do at most S2 times of distance calculation. Note that we only need to do S K times of distance calculation to compute a lower bound. To reduce the computational time, in our implementation, we set a threshold on the maximum number of balls (default: 50) utilized to tighten bounds. Figure 4. Ball-based bounds tightening in two-dimensional space. In this example, suppose it is determined that two points xi and xj belong to the kth cluster. We first compute the index set of samples within all balls and original box, Sk +(M) := {s S |xs M k Bα(xi) Bα(xj)}. We then generate the smallest box containing these samples in Sk +(M). The red rectangle is the tightened bounds we obtain. 4.2.2. BOX-BASED BOUNDS TIGHTENING Another strategy to reduce the computation burden is based on the relaxation of Bα(xj). For any ball Bα(xj), the closed set Rα(xj) = {x | xj α x xj + α} is the smallest box containing Bα(xj). Then we have µk Rα(xj), j J k A. Since Rα(xj) and M k are all boxes, we can easily compute the tighten bounds ˆ M k= T j J k A Rα(xj) M k. Figure 5 gives an example of box-based bounds tightening using this method. Obviously, the bounds generated in Figure 4 is much tighter Global Optimization of K-Center Clustering while the method in Figure 5 is much faster. Consequently, if |J k A| is small for all clusters, the ball-based bounds tightening method gives more satisfactory results. While if |J k A| is large for any k, box-based bounds tightening provides a cheaper alternative. Figure 5. Box-based bounds tightening in two-dimensional space. In this example, we first generate two boxes with Rα(xi) := {x| xi α x xi + α} and Rα(xj) = {x| xj α x xj + α}. We then create a tighten bounds with ˆ M k=Rα(xi) Rα(xj) M k. The red rectangle is the tightened bounds we obtain. 4.3. Symmetry Breaking Another way to get tighter bounds is based on the symmetry breaking constraints. We add the condition µ1 1 µ2 1 µK 1 in the BB algorithm 2, in which µk a denotes ath attribute of kth center. This constraint can also help the algorithm to reduce the search space. For example with M k={µ| µ µ µ} as the original box of µ, we can update µK 1 to be max( µ1 1, µ2 1, , µK 1 ). Note that both symmetry breaking constraints and FFT-based inital seeds serve the function of breaking symmetry by providing a certain order for the clusters, so they cannot be combined together. In our implementation, symmetric breaking is used only when initial seeds are not found from FFT at the root node. 5. Computational Experiments The branch and bound scheme is tested with different methods, including closed-form solution (BB+CF) and closedform with FBBT (BB+CF+FBBT). We compare the numerical results of our algorithm with the state-of-art global optimizer CPLEX 20.1.0 (Cplex, 2020) and the heuristic algorithm, Farthest First Traversal. Since the initial point is selected randomly in FFT, we run 100 trails of FFT with different initialization to avoid this randomness and obtain the best (FFT (BEST)) and average (FFT (AVERAGE)) Table 1. Computational results on synthetic datasets DATASET METHOD UB NODES GAP (%) TIME (S) FFT (AVERAGE) 152.79 - - - FFT (BEST) 118.72 - - - CPLEX+Q 82.71 86880 100 14400 CPLEX+L3 66.97 36797 8.55 73 CPLEX+L3+CUT 61.75 1330122 0.83 7668 BB+CF 61.75 55191 0.1 46 BB+CF+FBBT 61.75 17 0.1 12 FFT (AVERAGE) 216.58 - - - FFT (BEST) 135.83 - - - CPLEX+Q 84.81 1343190 1.61 14400 CPLEX+L3 92.68 10021665 15.78 14400 CPLEX+L3+CUT 92.68 336524 16.55 14400 BB+CF 84.81 1155375 0.1 3609 BB+CF+FBBT 84.81 i3 0.1 11 FFT (AVERAGE) 191.36 - - - FFT (BEST) 153.22 - - - CPLEX+Q 216.85 4083121 71.89 14400 CPLEX+L3 144.78 2566049 36.53 14400 CPLEX+L3+CUT 235.4 153026 100 14400 BB+CF 95.10 1495899 0.1 11606 BB+CF+FBBT 95.10 i3 0.1 11 SYN-42,0002 FFT (AVERAGE) 290.13 - - - FFT (BEST) 194.67 - - - CPLEX+Q NO FEASIBLE SOLUTION CPLEX+L3 378.14 39353 100 14400 CPLEX+L3+CUT 800.45 14 100 14400 BB+CF 142.33 172702 7.14 14400 BB+CF+FBBT 142.33 103 0.1 18 SYN-210,0001 FFT (AVERAGE) 300.98 - - - FFT (BEST) 215.00 - - - CPLEX+Q NO FEASIBLE SOLUTION CPLEX+L3 NO FEASIBLE SOLUTION CPLEX+L3+CUT NO FEASIBLE SOLUTION BB+CF 168.57 43626 7.56 14400 BB+CF+FBBT 168.57 i3 0.1 16 1 CAN ASSIGN INITIAL CLUSTER THROUGH FFT AT THE ROOT NODE. 2 CAN NOT ASSIGN INITIAL CLUSTER THROUGH FFT AT THE ROOT NODE. 3 SOLVED AT THE ROOT NODE. results. We test the performance of our algorithm and CPLEX on the Niagara Compute Canada with a time limit of 4 hours. In all the experiments, the number of Cluster K is set to 3. All experiments are implemented in Julia 1.6.1 and the complete code files can be found in https://github.com/Yankai Group/Kcenter. The computational results are compared by three criteria, including upper bound (UB), optimality gap, and the number of solved BB nodes. UB measures the best feasible solution. The optimality gap represents the relative difference between the best lower bound (LB) and UB. It is defined as Gap = UB LB LB . The optimality gap is a unique property for the deterministic global optimization algorithm. The heuristic algorithm (FFT) does not have such a property. The number of solved nodes is the iteration number of BB scheme before the termination. 5.1. Numerical Results on Synthetic Datasets We first consider numerical results on artificially generated datasets from (Hua et al., 2021). To illustrate the performance of the algorithms, we consider synthetic datasets with different numbers of samples. All datasets are generated with 3 Gaussian clusters randomly. Each data sample has two attributes. Global Optimization of K-Center Clustering Table 1 compares the performance of our reduced space BB algorithm, heuristic algorithm FFT, and CPLEX. FFT (BEST) fails to get the lowest UB in all the datasets. The direct usage of CPLEX on Problem 3 (CPLEX+Q) can not converge to a small optimality gap ( 0.1%) within 4 hours on all the synthetic datasets. Compared with FFT and CPLEX, our algorithms can obtain the best upper bounds and reach a satisfactory gap ( 0.1%) in all the datasets within the run-time of 4 hours. Table 2. Computational results on small-scale datasets (S 1, 000) DATASET SAMPLE DIMENSION METHOD UB NODES GAP (%) TIME (S) IRIS1 150 4 FFT (AVERAGE) 4.79 - - - FFT (BEST) 3.66 - - - CPLEX+Q 2.04 1512876 0.1 548 BB+CF 2.04 12967 0.1 17 BB+CF+FBBT 2.04 i3 0.1 14 SEEDS1 210 7 FFT (AVERAGE) 24.54 - - - FFT (BEST) 14.95 - - - CPLEX+Q 10.44 1093710 8.11 14430 BB+CF 10.44 7155 0.1 17 BB+CF+FBBT 10.44 19 0.1 15 GLASS2 214 9 FFT (AVERAGE) 41.23 - - - FFT (BEST) 27.52 - - - CPLEX+Q OUT OF MEMORY BB+CF 27.52 5559 0.1 15 BB+CF+FBBT 27.52 191 0.1 14 FFT (AVERAGE) 22890.61 - - - FFT (BEST) 15245.0 - - - CPLEX+Q NO FEASIBLE SOLUTION BB+CF 10539.0 14187 0.1 22 BB+CF+FBBT 10539.0 47 0.1 15 FFT (AVERAGE) 0.96 - - - FFT (BEST) 0.72 - - - CPLEX+Q OUT OF MEMORY BB+CF 0.53 315495 0.1 258 BB+CF+FBBT 0.53 17770 0.1 25 FFT (AVERAGE) 4.82 1010 - - - FFT (BEST) 2.53 1010 - - - CPLEX+Q NO FEASIBLE SOLUTION BB+CF 1.72 1010 339 0.1 10 BB+CF+FBBT 1.72 1010 i3 0.1 12 FFT (AVERAGE) 5.77 109 - - - FFT (BEST) 4.50 109 - - - CPLEX+Q NO FEASIBLE SOLUTION BB+CF 3.49 109 3407 0.1 15 BB+CF+FBBT 3.49 109 383 0.1 15 HCV2 602 12 FFT (AVERAGE) 216190.32 - - - FFT (BEST) 174716.17 - - - CPLEX+Q 141440.68 - 0.1 965 BB+CF 141440.68 291 0.1 10 BB+CF+FBBT 141440.68 39 0.1 14 ABS2 740 21 FFT (AVERAGE) 26395.93 - - - FFT (BEST) 18311.19 - - - CPLEX+Q 18647.59 940299 55.33 14400 BB+CF 13905.40 33449 0.1 153 BB+CF+FBBT 13905.40 585 0.1 16 FFT (AVERAGE) 10.44 - - - FFT (BEST) 8.01 - - - CPLEX+Q 8.32 1556105 54.48 14400 BB+CF 5.94 741851 0.1 2953 BB+CF+FBBT 5.94 39118 0.1 128 SGC1 1000 21 FFT (AVERAGE) 1.96 107 - - - FFT (BEST) 1.33 107 - - - CPLEX+Q 1.44 107 48015 100 14400 BB+CF 9.45 106 411 0.1 12 BB+CF+FBBT 9.45 106 i3 0.1 12 1 CAN ASSIGN INITIAL CLUSTER THROUGH FFT AT THE ROOT NODE. 2 CAN NOT ASSIGN INITIAL CLUSTER THROUGH FFT AT THE ROOT NODE. 3 SOLVED AT THE ROOT NODE. To make the comparison fair, we also show another two version of using CPLEX. In the first version (CPLEX+L3), Equation 3b in Problem 3 is linearized by adding three supporting hyperplane for each sample. However, after linearization, CPLEX still struggles for datasets with more than 1,200 samples. Moreover, three supporting hyperplane is not enough to obtain a solution close to the original optimal solution. But adding supporting hyperplanes will increase the computaional time. In the second version (CPLEX+L3+CUT), we manually add a callback of user cuts to enforce µk M k for each BB node, but CPLEX s performance is not improved. We suspect that CPLEX has already inferred this from Constraint 3g. As these two versions does not significantly improve the performance of CPLEX, we do not show their performance in the following numerical experiments. 5.2. Numerical Results on Real-world datasets We then evaluate the performance of our algorithms on 30 datasets from UCI Machine Learning Repository (Dua & Graff, 2017), one dataset called PR2392 from (Padberg & Rinaldi, 1991), and one dataset called HEMI from (Wang et al., 2022). The number of samples varies from 150 to 14,057,567. The number of attribute ranges from 2 to 68. Table 2 demonstrates the computational performance of datasets with less than 1000 samples. Table 3 shows the results of datasets with more than 1000 samples. In Table 4, we illustrate the numerical results of datasets with millions of samples. In these tables, even the best results of FFT can be far away from optimal. This is true even for very small datasets. For example, for IRIS dataset, FFT (BEST) obtains the best UB of 3.66 while our algorithm and CPLEX+Q give a UB of 2.04 with 0.1% gap. Compared with FFT (BEST), our algrorithm can reduce the UB by 30.4%, on average for these 32 datasets. For small datasets, our algorithms obtain the same UB as CPLEX. However, CPLEX+Q takes significantly more computational time than our algorithms. For all datasets with more than 740 samples, CPLEX+Q cannot even close the optimality gap to 50% within 4 hours. For comparison, our algorithms, BB+CF and BB+CF+FBBT can all generate the best UB and a satisfactory gap ( 0.1%) for the majority of the datasets. The comparison of these two versions highlights the importance of FBBT, which can significantly reduce the computational time and the number of BB nodes to solve the problems. Remarkably, it can even solve several datasets in the root node (NODES=i3). Moreover, the number of nodes needed to reach the optimal solution is much smaller when initial seeds are found, which shows that the initial seeds can be essential for cluster assignment and bounds tightening. For about half of the datasets, initial seeds can be obtained from FFT. Both versions of our algorithms maintain a balanced memory usage during the whole execution process, while for datasets with size over 50,000 samples, CPLEX+Q returns the out of memory error. For most of the datasets with millions of samples in Table 4, BB+CF+FBBT can converge to small gaps ( 0.1%) and Global Optimization of K-Center Clustering provide the best optimal solution after 4 hours of running. To the best of our knowledge, it is the first time that the k-center problem is solved under a relatively small gap ( 0.1%) within 4 hours on datasets over 14 million samples. Table 3. Computational results on large-scale datasets (1, 000 4α, then we can conclude that each point belongs to a distinct cluster. We can arbitrarily assign x1 to cluster 1 and x5 to cluster 2. STEP 2.2: Sample based Assignment. Since ||x2 x5||2 2 = 16 > 4α, then x2 and x5 are not in the same cluster. Given x5 was assigned to cluster 2, we have x2 in cluster 1. Similarly, we have x3 in cluster 1, x4 and x6 in cluster 2. Note here the assignment of all samples are determined at the root node. STEP 3: Ball-based bounds tightening. Since x4 and x6 are in cluster 2, using ball-based bounds tightening as illustrated in Figure 6, M 2 = {(3, 0)}, i.e., the center of cluster 2 is (3, 0). For cluster 1, we can only attain M 1 = {µ1| 1 µ1 1 0, 0 µ1 2 1}. (3, 0) x6 (4, 0) Figure 6. Ball-based bounds tightening for cluster 2. STEP 4: Compute initial lower bound. Following Equation 7, we can get β = 1. STEP 5: Branch and bound. ITERATION 1: For cluster 1, find two subsets M 1 1 = {µ1| 1 µ1 1 0.5, 0 µ1 2 1} and M 1 2 = {µ1| 0.5 µ1 1 0, 0 µ1 2 1}.For the first child node, M 1 1 X = {x1, x2}. Randomly pick x2 as the center for cluster 1 and we have α = 1. With α = β = 1, we can conclude that the optimal value is 1 and the optimal centers are x2 and x5.