# scalable_constraintbased_virtual_data_center_allocation__78eaa759.pdf

Scalable Constraint-Based Virtual Data Center Allocation

Sam Bayless

Nodir Kodirov

Ivan Beschastnikh

Holger H. Hoos

University of British Columbia, Canada Universiteit Leiden, The Netherlands {sbayless, knodir, bestchai, hoos, ajh}@cs.ubc.ca

Constraint-based techniques can solve challenging problems arising from highly diverse applications. This paper considers the problem of virtual data center (VDC) allocation, an important, emerging challenge for modern data center operators. To solve this problem, we introduce NETSOLVER, which is based on the general-purpose constraint solver MONOSAT. NETSOLVER represents a major improvement over existing approaches: it is sound, complete, and scalable, providing support for end-to-end, multipath bandwidth guarantees across all the layers of hosting infrastructure, from servers to top-of-rack switches to aggregation switches to access routers. NETSOLVER scales to realistic data center sizes and VDC topologies, typically requiring just seconds to allocate VDCs of 5 15 virtual machines to physical data centers with 1000+ servers, maintaining this efﬁciency even when the data center is nearly saturated. In many cases, NETSOLVER can allocate 150% 300% as many total VDCs to the same physical data center as previous methods. Essential to our solution efﬁciency is our formulation of VDC allocation using monotonic theories, illustrating the practical value of the recently proposed SAT modulo monotonic theories approach.

1 Introduction Constraint-based techniques, such as SAT and SAT modulo theory (SMT) solvers, play a key role in state-of-theart approaches for solving challenging problems across a wide range of applications (see, e.g., [Prasad et al., 2005; Cadar et al., 2008; Rintanen, 2011]). In this work, we demonstrate how virtual data center (VDC) allocation, a prominent and increasingly important problem arising in the operation of modern data centers, can be tackled using a high-performance SMT solver, MONOSAT [Bayless et al., 2015]. Using a novel approach for handling multi-commodity ﬂow constraints, we obtain substantial improvements in performance and functionality over previous VDC allocation techniques. A VDC consists of multiple communicating virtual machines (VMs), each with individual server resource requirements (e.g., CPU or RAM), along with a virtual network of

Physical Data Center To R1 To R2

Agg Sw2 Agg Sw1

1 core 2 GB [ ] 1 core 2 GB [ ] 1 core 2 GB [ ]

1 core 2 GB [ ]

2 core 4 GB [ ] 2 core 4 GB [ ] 2 core 4 GB [ ]

Virtual Data Center

Figure 1: (Top) Example physical data center topology with three physical servers, two top-of-rack (To R) switches, and two aggregation switches (Agg Sw). Circled numbers on links denote available bandwidth in Gbps. (Bottom) Example Hadoop VDC with one master (m) and three slave VMs (s1-s3) with a required throughput of 2 Gbps between each slave and the master (shown in circles). Each VM also requires a certain number of CPU cores and RAM. The problem is to ﬁnd an allocation of the VDC to the physical data center, for example, as illustrated with the dashed lines. Note that s1 and m are mapped to the same physical server, while the virtual link m s2 is allocated a multi-path route.

pair-wise bandwidth requirements between the VMs. The VDC allocation problem is to ﬁnd a valid allocation of VMs to servers and links in the virtual network to links in the physical network. A valid allocation satisﬁes the compute, memory, and network bandwidth requirements of each VM across the entire data center infrastructure, including servers, top-of-rack (To R) switches, and aggregation switches [Popa et al., 2012; Lee et al., 2014]. Figure 1 shows a simple instance of the VDC allocation problem and one solution. The key insight to our approach is that VDC allocation can be formulated in terms of monotonic theories, a recently developed concept that lies at the heart of the SMT solver MONOSAT [Bayless et al., 2015]. Exploiting this insight, we constructed NETSOLVER, a virtual data center allocation procedure that is scalable, sound, and complete, with support

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Algorithm Complete Multi-path VDC Topology Data Center Topology

Second Net [Guo et al., 2010] All All Importance Sampling [Tantawi, 2012] All Tree Oktopus [Ballani et al., 2011] Star All VDCPlanner [Zhani et al., 2013] All All HVC-ACE [Rost et al., 2015] Hose All Virtual Rack [Huang et al., 2014] Hose All Z3-AR [Yuan et al., 2013] All Tree NETSOLVER (this work) All All

Table 1: Characteristics of sound VDC allocation algorithms from the literature and the NETSOLVER approach introduced in this work.

for end-to-end, multi-path bandwidth guarantees across all the layers of the networking infrastructure, from servers to top-of-racks to aggregation switches to access routers. NETSOLVER harnesses MONOSAT to solve our novel formulation of the VDC allocation problem and efﬁciently allocates VDCs with a dozen or more VMs to full-size physical data centers (with 1000+ servers), typically in seconds per allocation. In many cases, NETSOLVER can allocate 150% 300% as many total VDCs to the same physical data center as stateof-the-art heuristic methods, such as Second Net s VDCAlloc algorithm [Guo et al., 2010], while offering the ﬂexibility and extensibility characteristics of a constraint-based approach.

2 Related Work

We now survey prior work with respect to features of VDC allocation that are relevant to modern data centers. As can be seen from Table 1, all prior approaches have important limitations relative to the one we present here. 1. Soundness. Sound VDC allocation tools respect end-toend bandwidth guarantees, while unsound tools only attempt to minimize data center network trafﬁc without a guarantee that VMs will have sufﬁcient dedicated bandwidth. Examples of unsound approaches to VDC allocation include [Meng et al., 2010; Kakadia et al., 2013], which dynamically identify VM communication patterns through network trafﬁc analysis. This prior work is in contrast to the approaches discussed in this paper, all of which, including our contribution, NETSOLVER, are sound and assume that VDCs and their communication requirements are explicitly known to the allocator. 2. Completeness. Most VDC allocation tools that respect bandwidth guarantees are incomplete: they can fail to ﬁnd feasible VDC allocations in cases where such allocations exist (even when given unlimited runtime). Oktopus [Ballani et al., 2011], VDCPlanner [Zhani et al., 2013], HVC-ACE [Rost et al., 2015], and Second Net [Guo et al., 2010] are examples of incomplete allocation algorithms. For example, Second Net s algorithm is greedy in that it maps VMs to servers before checking for available paths, and allocates bandwidth one path at a time; if either of these steps fail, it will fail to allocate the VDC1. In contrast, the constraint-solver-based approaches described in [Yuan et al., 2013] and NETSOLVER are both complete: they are guaranteed to (eventually) ﬁnd a feasible allocation if one exists. We will show in our experiments that completeness does not merely represent a theoretical beneﬁt,

1In fact, Second Net will try this process several times on different sub-sets of the data center before giving up.

but can translate into substantial gains in practical allocation capability. NETSOLVER is the ﬁrst sound and complete VDC allocator that can be applied to any VDC and data center topology without simplifying abstractions. 3. Multi-path allocations. Many data centers use multipath allocations to maximize bandwidth and to provide faulttolerance and load-balancing [Raiciu et al., 2011; Alizadeh et al., 2014]. Lack of multi-path support in traditional L2/L3based networks was a primary motive for data center operators to develop networking stacks with multi-path support [Vahdat, 2015]. There are now multiple efforts underway to eliminate this restriction, which include using architectures speciﬁcally designed for multi-path routing, e.g., BCube [Guo et al., 2009], VL2 [Greenberg et al., 2009], and making the data center networking fabric itself multi-path [IETF, 2016]. Despite the increasing importance of multi-path routing, to the best of our knowledge, there is only one previous VDC allocator that supports multi-path communication between VMs: HVC-ACE [Rost et al., 2015], a sound but incomplete allocator that uses a hose-model for VDCs. There are also several incomplete algorithms for virtual network embedding that have support for multi-path allocation for smaller physical networks with 50-150 servers [Yu et al., 2008; Chowdhury et al., 2009; Cheng et al., 2011]. NETSOLVER is the ﬁrst sound and complete multi-path tool for VDC allocation. 4. Unrestricted topologies. Many VDC allocators simplify the problem, either by abstracting VDC topologies into simpler ones that are easier to allocate, or by restricting the physical data center to simpler topologies. For example, the abstractionreﬁnement encodings from [Yuan et al., 2013] only apply to tree-topology data centers. Oktopus [Ballani et al., 2011] abstracts VDCs into virtual clusters, which are VMs connected to central virtual switch in a star topology. Virtual Rack [Huang et al., 2014] and HVC-ACE [Rost et al., 2015] use a lessrestricted hose-model [Dufﬁeld et al., 1999] abstraction for VDCs. NETSOLVER is the ﬁrst sound and complete VDC allocation approach that supports arbitrary VDC and data center topologies.

3 The Multi-path VDC Allocation Problem

The problem we consider in this work is deﬁned as follows. We are given the description of a physical network (PN) and a virtual data center (VDC). The PN is speciﬁed through a set of servers S, switches N, and a directed (or undirected) graph (S N, L), with capacities c(u, v) for each link in L. The VDC consists of a set of virtual machines VM and a set R VM VM Z+ of directed (or undirected) bandwidth requirements between those machines. For each server s S, we have CPU core, RAM, and storage capacity speciﬁcations, cpu(s), ram(s), storage(s), and for each virtual machine v VM, we are given CPU core, RAM, and storage requirements cpu(v), ram(v), storage(v). The objective in the multi-path VDC allocation problem is to ﬁnd an assignment A : VM 7 S of virtual machines v VM to servers s S along with an assignment of nonnegative bandwidth Bu,v(l) to links l L for each bandwidth requirement (u, v, b) R, satisfying the following constraints:

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Local VM allocation constraints (L) ensure that each virtual machine is assigned to exactly one server, and that each server provides sufﬁcient CPU core, RAM, and storage resources to accommodate the requirements of all VMs allocated to it: s S : P V (s) cpu(v) cpu(s) P V (s) ram(v) ram(s) P V (s) storage(v) storage(s), where V (s) = {v VM | A(v) = s}. Resource requirements are modelled using integer values, and VMs do not share resources.

Global bandwidth allocation constraints (G) ensure that sufﬁcient bandwidth is available in the physical network to satisfy all bandwidth requirements between pairs of VMs. We formalise this by requiring that for all (u, v, b) R, the assignments Bu,v(l) form a valid A(u) A(v) network ﬂow no smaller than b, and that none of the link capacities l in the physical network is exceeded: l L : P (u,v,b) R Bu,v(l) c(l). Bandwidths are represented by integer values; bandwidth between VMs allocated on the same server is unlimited.

It has been previously observed [Gupta et al., 2001; Szeto et al., 2003; Yu et al., 2008; Chowdhury et al., 2009] that when allowing path-splitting, the global bandwidth allocation constraints give rise to a multi-commodity ﬂow problem, which is strongly NP-complete even for undirected integral ﬂows [Even et al., 1975]. Conversely, any multi-commodity ﬂow problem maps directly into bandwidth constraints, establishing the NP-hardness of the multi-path VDC allocation problem [Chowdhury et al., 2009].

4 NETSOLVER

We will now show how multi-commodity integral ﬂow problems can be encoded as a conjunction of maximum ﬂow constraints over graphs with symbolic edge weights. By combining this encoding for the global constraints G with a pseudo-Boolean encoding of the local constraints L, we are able to tackle the full multi-path VDC allocation problem using MONOSAT, a SAT modulo theory (SMT) solver that extends quantiﬁer-free ﬁrst-order Boolean logic with highly efﬁcient, built-in support for a wide set of ﬁnite monotonic predicates [Bayless et al., 2015]. Intuitively, a ﬁnite monotonic predicate is a predicate for which increasing the value of its arguments can never change the value of the predicate from true to false, e.g., adding links to a network can only increase the connectedness of the network. MONOSAT supports many common graph constraints, such as reachability, shortest paths, minimum spanning trees, and maximum ﬂows. MONOSAT also supports a subset of the theory of ﬁxed-width bitvectors. MONOSAT accepts formulas with one or more directed symbolic graphs, each of which comprises a ﬁxed set of nodes and symbolic edges (u, v). Each edge has an integer capacity, c(u, v), which may be either a constant or a variable (a ﬁxedwidth bitvector). Finally, MONOSAT supports a number of common graph predicates, of which only one is relevant here: max Flows,t,G f, where G is a directed graph, s and t are nodes in G, and f is a constant integer or a bitvector term. This predicate is TRUE iff the maximum s-t ﬂow in G, under

(c(1, 2) > c(2, 3)) (c(2, 3) + c(1, 3) = 2) (max Flow1,3 2)

(a) Directed graph G (b) Formula to satisfy (c) Satisfying ﬂow

c(1,2) c(2,3)

Figure 2: (a) Example symbolic graph, with variable capacities c(u, v) on each edge. includes the capacity assigned to each edge, as well as the ﬂow along that edge. (b) A formula constraining the graph. (c) A solution, assigning a ﬂow and a capacity (f/c) to each edge.

assignment to the edge capacities associated with G, is greater or equal to f. As an example, consider the directed graph G shown in Figure 2a, with variable integer capacities c(u, v), and the formula in Figure 2b. In this example, MONOSAT ﬁnds edge capacities that satisfy the constraints and also produces a ﬂow satisfying the maximum ﬂow predicate in Figure 2c. In the remainder of this section, we will ﬁrst describe how we model integer-value multi-commodity ﬂow in terms of the built-in maximum ﬂow predicates supported by MONOSAT; then we will show how to use these multi-commodity ﬂow constraints to express VDC allocation.

4.1 Multi-Commodity Flow in MONOSAT

SMT solvers have not traditionally been applied to large multicommodity ﬂow problems; instead, such problems are usually solved using integer-arithmetic solvers, or are approximated using linear programming. MONOSAT is different from other SMT solvers in that it has built-in predicates for (single-commodity) s-t maximum ﬂow. While this does not directly provide support for multi-commodity ﬂows, we will show that by expressing multi-commodity ﬂows as a combination of single-commodity maximum ﬂow predicates, we can use MONOSAT to solve large multi-commodity ﬂow problems a ﬁrst for SMT solvers. We consider this formulation of integer-value multicommodity ﬂows in terms of maximum ﬂow predicates to be a key contribution of our work. While there are many obvious ways to encode multi-commodity ﬂows in SMT solvers, the one we present here is, to the best of our knowledge, the only SMT encoding to scale to multi-commodity ﬂow problems with thousands of nodes. As there are many applications to which SMT solvers are better suited than ILP solvers (and vice-versa), this SMT formulation has many potential applications beyond VDC allocation. Given a directed graph G = (V, E), an integer capacity c(u, v) for each edge (u, v) E, and a set of commodity demands K, where a commodity demand i K is a tuple (si, ti, di), representing an integer ﬂow demand of di between source si V and target ti V . The integral multicommodity ﬂow problem is to ﬁnd a feasible ﬂow such that each demand di is satisﬁed, while for each edge (u, v) the

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

total ﬂow of all capacities (summed) is at most c(u, v):

fi(u, v) 0, (u, v) E, i K X

i K fi(u, v) c(u, v), (u, v) E

v V fi(u, v) X

v V fi(v, u) =

0, if u / {si, ti} d, if u = si d, if u = ti , i K

We instantiate symbolic graphs G1..|K| with the same topology as G. We set the capacities of each edge (u, v)i Gi to a new integer variable, c(u, v)i, with constraint 0 c(u, v)i c(u, v). Next, we assert that the capacities in each graph sum to no more than the original edge capacity: P i c(u, v)i c(u, v). Together, these constraints partition the original capacity graph into K separate graphs, one for each demand. To complete the encoding, for each commodity demand (si, ti, di), we use MONOSAT s built-in maximum ﬂow constraints to assert that the maximum si ti ﬂow in Gi is at least di.

4.2 Encoding Multi-path VDC Allocation

The global constraints G (Section 3) can be encoded as a multi-commodity ﬂow as described above, with up to |VM|2 commodity demands (one for each bandwidth tuple (u, v, bandwidth) R).2 However, we can greatly improve on this by merging bandwidth constraints that share a common source into a single commodity demand: Given a set of bandwidth constraints (u, vi, bandwidthi) R with the same source u, we can convert these into a single commodity demand, by adding an extra node w VM, along with edges (vi, w) with capacity bandwidthi. The commodity demands (u, vi, bandwidthi) can then be replaced by a single commodity demand (u, w, P i bandwidthi). As there are at most |VM| distinct sources in R, this reduces the number of demands from |VM|2 in the worst case to |VM| demands.3

In cases where the VDC is undirected, we improve on this further by swapping sources and sinks in communication requirements so as to maximize the number of requirements with common sources. This can be done efﬁciently even for large networks by ﬁnding an approximate minimum-cost vertex cover of R (e.g., using the 2-opt approximation from [Bar Yehuda and Even, 1985]. We ﬁrst construct an undirected graph of communication requirements, with an undirected edge of weight (u, v) = bandwidth for each bandwidth requirement and ﬁnd an approximate minimum-cost vertex cover. Necessarily, each edge, and hence each communication requirement, will have at least one covering vertex. For each requirement (u, v, bandwidth),

2Note that in our approach, bandwidth values are required to be integers, as we are restricted to ﬁnding integer maximum ﬂows. In practice, this is not a limitation, as data centers typically only offer a small number of (integer-valued) bandwidth/CPU choices to clients. 3Converting a single-source, multi-destination ﬂow problem into a single-source, single-destination maximum ﬂow problem is a wellknown transformation, and safely preserves the maximum possible ﬂow to each destination.

if v is a covering vertex and u is not, we replace the requirement with (v, u, bandwidth), swapping u and v. After swapping all uncovered source vertices in this way, we then proceed to merge requirements with common sources as above. For cases where the VDC is directed, we skip this vertex-cover optimization and only merge together connection requirements with the same (directed) source in the input description. Given this set of commodity demands, we construct an undirected (or directed) graph G consisting of the physical network (S N, L), and one node for each virtual machine in VM. If any VDC communication requirements (u, vi, bandwidthi) have been merged into combined requirements (u, w, P bandwidthi) as above, we add additional, directed edges (vi, w) with capacity bandwidthi to G. For each v VM and each server s S, we add a directed symbolic edge evs from v to s with unlimited capacity to G; this edge controls the server to which each VM is allocated. Next, we assert (using a cardinality constraint) that for each VM v, exactly one edge evs is enabled, so that the VM is allocated to exactly one server: v VM : P s evs = 1. Using the multi-commodity ﬂow encoding described above, we assert that the multi-commodity ﬂow in G satisﬁes (u, v, bandwidth) for each commodity requirement. The above constraints together enforce global constraints G; to enforce local constraints L, we use pseudo-Boolean constraints (as described in [E en and S orensson, 2006]) to assert: P v cpu(v) cpu(s) P v ram(v) ram(s) P v storage(v) storage(s). To allow multiple different VDC topologies to be allocated, the assumption mechanism [E en and S orensson, 2003] found in many SAT solvers, including MONOSAT, can be used to dynamically restrict a generic graph. Note that these encodings are novel contributions and critical to NETSOLVER s performance; however, they are empirically efﬁcient only because MONOSAT (unlike other SMT solvers) has built-in support for graph constraints. As we show next, carefully crafted encodings alone, such as the one developed in [Yuan et al., 2013], will not be competitive. Instead, fundamental improvements in the constraint solver, such as the ones we use in MONOSAT, are necessary.

5 Evaluation

We now present results from an extensive empirical evaluation demonstrating that our approach offers substantial advantages compared to state-of-the-art methods for VDC allocation. Speciﬁcally, we compare the performance of NETSOLVER to that of Second Net s VDCAlloc [Guo et al., 2010] a seminal, sound VDC allocation algorithm with end-to-end bandwidth guarantees and the Z3-based abstraction-reﬁnement procedure from [Yuan et al., 2013], which resembles our approach in that it makes use of an SMT solver. In each experiment, the algorithms repeatedly allocate VDCs to the data center until they are unable to make further allocations (or until a 1 CPU hour timeout is reached). This metric was also used in prior work [Guo et al., 2010; Yuan et al., 2013] and is important in practice, as it captures data center utilization.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

(a) 200 servers, 4 cores (b) 2000 servers, 16 cores

Median per-VDC alloc. time (s)

Number of VDC allocations

Left Scale, Solid Bars, VDC Allocations Right Log Scale, Symbols, Time (seconds) Per Allocation

Figure 3: Total number of consecutive VDCs allocated by different algorithms and time required per allocation on various tree topologies from [Yuan et al., 2013]. We report the median running time for allocating individual VDCs.

Second Net s VDCAlloc algorithm ( Second Net , except where ambiguous) is an incomplete, heuristic-driven algorithm that can ﬁnd VDC allocations for physical networks with hundreds of thousands of servers. As Second Net is based on bipartite matching, it fundamentally cannot allocate more than one VM in each VDC to any given server. Furthermore, because it performs allocation in an incomplete, greedy fashion, especially in heavily utilized networks, it can fail to ﬁnd an existing feasible allocation. As we will demonstrate, under many realistic circumstances, this happens quite frequently, leading to substantially lower data center utilization than can be achieved with a complete method, such as NETSOLVER. [Yuan et al., 2013] introduced two approaches for performing single-path VDC allocation with bandwidth guarantees that use the general-purpose SMT solver Z3 [De Moura and Bjørner, 2008]. Unlike the MONOSAT solver at the core of NETSOLVER, Z3 has no built-in support for graph predicates. Therefore, in order to use Z3 for VDC allocation, the global bandwidth and connectivity constraints have to be expressed using a lower-level logical formulation. The ﬁrst such encoding presented by [Yuan et al., 2013] (which we call Z3-generic) can handle any data center topology but scales extremely poorly [Yuan et al., 2013]. The second approach (which we call Z3-AR) makes use of an optimized abstractionreﬁnement technique; while substantially more scalable than the generic encoding, it is restricted to data centers with tree topologies. In preliminary experiments (not reported here), we conﬁrmed that Z3-generic performed poorly, often failing to ﬁnd any allocations within a 1-hour timeout on the benchmarks used in our experiments.

Comparison on Trees from [Yuan et al., 2013] Our ﬁrst experiment reproduces and extends an experiment from [Yuan et al., 2013], in which a series of identical VDCs is allocated one-by-one to tree-structured data centers, until the solver is unable to make further allocations (or a timeout of 1 CPU hour is reached). We obtained the original implementation of Z3AR from the authors for this experiment, along with a version of Second Net they implemented with support for the treestructured data centers considered here. In this experiment, the VDCs always have identical structure; this is a limitation introduced here for compatibility with the solvers from [Yuan et al., 2013]. In our subsequent experiments, below, we lift this constraint. Except where noted, all experiments were run

on a machine with a 2.66GHz (12MB L3 cache) Intel x5650 processor, running Ubuntu 12.04 and limited to 16GB RAM. Figure 3 summarizes our results, showing the total number of consecutive VDCs allocated within 1 CPU hour. On the left, we used the 200-server/4-cores-per-server physical data center from [Yuan et al., 2013]. On the right, we considered a larger data center with 2000 16-core servers. We note that, although all three solvers perform similarly on the small treestructured data centers (with Second Net, being heuristic and incomplete, faster than NETSOLVER, and NETSOLVER faster than Z3-AR), on the larger data center, NETSOLVER greatly outperforms Second Net and Z3-AR, often allocating two or even three times as many VDCs on the same infrastructure. Importantly, we see that NETSOLVER scales to thousands of servers, with median per-instance allocation times of a few seconds or less per VDC.

Comparison on BCube and Fat Tree from [Guo et al., 2010] The second experiment we conducted is a direct comparison against the original Second Net implementation (which we also used for all comparisons reported later). Note that the implementation of Z3-generic, and both the theory and implementation of Z3-AR, are restricted to tree topologies, so they could not be included in these experiments. The Second Net benchmark instances are extremely large in one case exceeding 100 000 servers but also extremely easy to allocate: the available bandwidth per link is typically 50 the requested communication bandwidths in the VDC, so with only 16 cores per server, the bandwidth constraints are mostly irrelevant. For such easy allocations, the fast, incomplete approach that Second Net uses is the better solution. Accordingly, we scaled the Second Net instances down to 432 1024 servers, a realistic size for many real-world data centers. For these experiments, we generated sets of 10 VDCs each of several sizes (6, 9, 12 and 15 VMs), following the methodology described in [Yuan et al., 2013]. These VDCs have proportionally greater bandwidth requirements than those originally considered by Second Net, requiring 5 10% of the smallest link-level capacities. The resulting VDC instances are large enough to be representative of many real-world use cases, while also exhibiting non-trivial bandwidth constraints. For each of these sets of VDCs, we then repeatedly allocated instances (in random order) until the data center is saturated. Figure 4a shows allocations made by Second Net and NET-

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

Fat Tree Physical Data Center, 432 16-core servers

BCube Physical Data Center, 512 16-core Servers

US-Middle, 800 servers

US-West, 1200 servers

(a) Allocations on sample real topologies (b) Allocations on real commercial topologies

Number of VDC allocations

Median per-VDC alloc. time (s)

Left Scale, Solid Bars, VDC Allocations Right Log Scale, Symbols, Time (seconds) Per Allocation

Figure 4: Total number of consecutive VDCs allocated by different algorithms and time required per allocation on two real sets of benchmarks: (a) Fat Tree and BCube topologies from [Guo et al., 2010], and (b) commercial Data Center topologies. We report median running times for allocating individual VDCs.

SOLVER on two data centers, one with a Fat Tree topology with 432 servers (top), and one with a BCube topology with 512 servers (bottom). As in our previous experiment, Second Net was much faster than NETSOLVER, but NETSOLVER ran fast enough to be practical for data centers with hundreds of servers, with typical allocation times of a few seconds per VDC (however, in a minority of cases, NETSOLVER did require tens or even hundreds of seconds for individual allocations). In many cases, NETSOLVER was able to allocate more than twice as many VDCs as Second Net on these data centers a substantial improvement in data center utilization.

Comparison on commercial networks The above comparisons consider how NETSOLVER compares to existing VDC allocation tools on several artiﬁcial (but representative) network topologies from the VDC literature. To address whether there are real-world VDC applications where NETSOLVER performs not only better than existing tools, but is also fast enough to be used in practice, we also considered a deployment of a standard Hadoop virtual cluster, on a set of actual data center topologies. We collaborated with the private cloud provider Zero Stack Inc. to devise an optimal virtual Hadoop cluster to run Terasort4. Each Hadoop virtual network consists of a single master VM connected to 3 11 slave VMs.5 We considered 5 different VM sizes, ranging from 1 CPU and 1GB RAM, to 8 CPUs and 16GB of RAM; for our experiments, the slave VMs were selected at random from this set, with the master VM also randomized but always at least as large as the largest slave VM. The Hadoop master has tree connectivity

4The Sort Benchmark Committee: http://sortbenchmark.org/ 5Many industrial VDCs have fewer than 15 VMs; e.g., [Bod ık et al., 2012] states that 80% of Bing services use fewer than 10 VMs. NETSOLVER performs well with up to 30 VMs.

with all slaves, with either 1 or 2 Gbps links connecting the master to each slave (like the VDC in Figure 1). The physical data center topology was provided by another company, which requested to remain anonymous. This company uses a private cloud deployed across four data centers in two geographic availability zones (AZs): us-west and usmiddle. Each data center contains between 280 and 1200 servers, spread across 1 to 4 clusters with 14 and 40 racks. Each server has 16 cores, 32 GB RAM, 20 Gbps network bandwidth (via two 10 Gbps links). The network in each data center has a leaf-spine topology, where all To R switches connect to two distinct aggregation switches over 40 Gbps links each (a total of 2 links with 80 Gbps; one on each aggregation switch) and aggregation switches are interconnected with four 40 Gbps links each. For each cluster, there is a gateway switch with a 240 Gbps link connected to each aggregation switch. All data centers use equal-cost multi-path (ECMP) to take advantage of multiple paths. A VDC is allocated inside one AZ: VMs in one VDC can be split across two clusters in an AZ, but not across two AZs. Figure 4b shows VDC allocation results per AZ.6 More generally, executing NETSOLVER on distinct physical network units, such as an AZ, improves its scalability. This also works well in practice, as modern data centers are modular by design. For example, one of the largest data center operators in the world, Facebook, designed its Altoona data center with over 100 000 servers using pods, with each pod containing fewer than 1000 servers [Andreyev, 2014]. We applied Second Net and NETSOLVER in this setting, consecutively allocating Hadoop master-slave VDCs of several sizes, ranging from 4 to 12 VMs, until no further allocations

6NETSOLVER is not limited to allocating to a single AZ and can support multi-AZ allocation, assuming it is aware of the capacity of each AZ, including inter-AZ network bandwidth.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

could be made. Note that, in addition to using a realistic physical topology, the CPU/memory, bandwidth values, and the VDCs being allocated are all real-world VDCs derived from real Hadoop jobs. By contrast, previous experiments used artiﬁcial VDCs from the Z3-AR paper [Yuan et al., 2013]. Again, we could not run Z3-AR in this setting, as it is restricted to tree-topology data centers. In Figure 4b, we show the results for the largest of these data centers (results for the smaller data centers were similar). As observed in our previous experiments, although Second Net was much faster than NETSOLVER, NETSOLVER s per-instance allocation time was typically just a few seconds, which is reasonable for long-running applications, such as the Hadoop jobs considered here. Again, NETSOLVER was able to allocate many more VDCs than Second Net (here, 1.5 2 times as many), across a range of data center and VDC sizes, including a commercial data center with more than 1000 servers. Moreover, with increasing virtual network size, NETSOLVER was able to allocate many more virtual machines, while respecting end-to-end bandwidth constraints. Often NETSOLVER allocated several times as many VDCs as Second Net, and in extreme cases, it found hundreds of allocations, while Second Net was unable to make any allocations (not shown for brevity). Similarly, keeping the virtual network the same size, but doubling the bandwidth requirements of each virtual machine greatly decreased the number of allocations made by Second Net, while NETSOLVER showed considerably more robust performance in these more congested settings. Overall, NETSOLVER was not only able to ﬁnd many more allocations than Second Net in this realistic setting, but NETSOLVER s median allocation time, 1 30 CPU seconds, shows that it can be practically useful in a real, commercial setting, for data centers and VDCs of this size. This provides strong evidence that NETSOLVER can ﬁnd practical use in realistic settings where large or bandwidth-hungry VDCs need to be allocated. It also demonstrates the practical advantage of a (fast) complete algorithm like NETSOLVER over a much faster but incomplete algorithm like Second Net: for bandwidth-heavy VDCs, even with arbitrary running time, Second Net s VDCAlloc was unable to ﬁnd the majority of the feasible allocations. This generally reinforces our observations from our earlier experiments with artiﬁcial topologies: NETSOLVER improves greatly on state-of-the-art VDC allocation, for bandwidthconstrained data centers with as many as 1000 servers.

Allocation Robustness In the above experiments, we showed that across many conditions, NETSOLVER was able to make many (often hundreds) more allocations than Second Net or Z3-AR. One may wonder whether these additional allocations are the result of NETSOLVER having a better ability to solve challenging allocations quickly, or if NETSOLVER is somehow making good allocation decisions early on that leave more space for later VDC instances. In the experiments where Z3-AR makes many fewer allocations (Figure 3b), Z3-AR s problem is excessively slow run times, allocating only a handful of VDCs in data centers with room for hundreds or thousands. In those cases, both NETSOLVER and Second Net can both make hundreds of further

Figure 5: Additional VDC allocations made by NETSOLVER (red), after Second Net (blue) has allocated its maximum number of VDCs. These experiments used the same VDCs and physical topologies as in Figure 4a. In many cases, NETSOLVER allocated hundreds of additional VDCs after Second Net could not make further allocations.

allocations starting from where Z3-AR was cut off. The robustness question is more apropos versus Second Net. We found conclusive evidence that good early allocations cannot be entirely responsible for NETSOLVER s performance, by observing that NETSOLVER can continue to allocate VDCs in cases where Second Net can no longer make any further allocations. We repeated the experiments from Figure 4a by ﬁrst using Second Net to allocate as many VDCs as it can into the data center. Then, starting from that already partially utilized data center, we used NETSOLVER to allocate further VDCs. The results of this experiment are shown in Figure 5. Similarly to the earlier experiment, NETSOLVER can still allocate hundreds of additional VDCs starting from Second Net s ﬁnal allocation.

6 Conclusions and Future Work

We introduced a new, SMT-based VDC allocation method, NETSOLVER, for multi-path VDC allocation with end-to-end bandwidth guarantees. NETSOLVER scales well to data centers with 1000 or more servers, while substantially improving data center utilization as compared to current methods. Notably, we have demonstrated that in several realistic settings, NETSOLVER allocates 3 times as many virtual data centers as previous approaches, with a runtime that is fast enough for practical use. NETSOLVER overcomes major limitations of current stateof-the-art approaches for VDC allocation with hard bandwidth guarantees: Unlike Second Net, our approach is complete and, as a result, is able to continue making allocations in bandwidthconstrained networks; unlike the abstraction-reﬁnement techniques from [Yuan et al., 2013], NETSOLVER supports arbitrary data center topologies (as well as being much faster). Our constraint-based approach represents the ﬁrst complete VDC allocation algorithm supporting multi-path bandwidth allocation for arbitrary network topologies an important capability in modern data centers. Because NETSOLVER is built on-top of MONOSAT a general purpose SMT-solver with high-performance support for graph constraints it is easily extensible. For example, it can handle additional VDC allocation constraints not supported by other approaches, yet relevant in real-world scenarios, such as

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

maximization of data locality (VM afﬁnity), minimization of the total number of utilized servers and load balancing (hotspot avoidance). While these constraints make VDC allocation substantially more challenging, preliminary results indicate that our approach can still efﬁciently allocate VDCs with up to 10 VMs on real-world data centers (typical running times per allocation are less than 1 CPU sec). In future work, we will explore extensions to NETSOLVER with these and other constraints. We will also apply NETSOLVER to the related problems of virtual network embedding (VNE) [Belbekkouche et al., 2012; Fischer et al., 2013] and network function virtualization (NFV) [Palkar et al., 2015; Heorhiadi et al., 2016]. More broadly, we believe that the combination of network ﬂow constraints with state-of-the-art SMT solvers underlying NETSOLVER is a promising approach for a variety of problem types. This combination may enable fast and ﬂexible algorithms that can beneﬁt a broad range of applications.

Acknowledgements We thank the authors of [Guo et al., 2010] and [Yuan et al., 2013] for making their implementations available to us, and especially Yifei Yuan for his extensive assistance. This research was funded by NSERC discovery grants to Ivan Beschastnikh, Holger H. Hoos and Alan J. Hu. Nodir Kodirov is supported by a UBC four-year doctoral fellowship (4YF).

References [Alizadeh et al., 2014] Mohammad Alizadeh, Tom Edsall, Sarang Dharmapurikar, Ramanan Vaidyanathan, Kevin Chu, Andy Fingerhut, Vinh The Lam, Francis Matus, Rong Pan, Navindra Yadav, and George Varghese. Conga: Distributed congestion-aware load balancing for datacenters. In Proceedings of the ACM Conference on SIGCOMM, 2014. [Andreyev, 2014] Alexey Andreyev. Introducing data center fabric, the next-generation facebook data center network, 2014. [Online at code.facebook.com/posts/ 360346274145943; accessed 05-23-2017]. [Ballani et al., 2011] Hitesh Ballani, Paolo Costa, Thomas Karagiannis, and Ant Rowstron. Towards predictable datacenter networks. ACM SIGCOMM Computer Communication Review, 2011. [Bar-Yehuda and Even, 1985] Reuven Bar-Yehuda and Shimon Even. A local-ratio theorem for approximating the weighted vertex cover problem. North-Holland Mathematics Studies, 1985. [Bayless et al., 2015] Sam Bayless, Noah Bayless, Holger Hoos, and Alan Hu. SAT Modulo Monotonic Theories. In Proceedings of the Twenty-Ninth AAAI Conference on Artiﬁcial Intelligence, 2015. [Belbekkouche et al., 2012] Abdeltouab Belbekkouche, Md. Mahmud Hasan, and Ahmed Karmouch. Resource discovery and allocation in network virtualization. IEEE Communications Surveys & Tutorials, 2012. [Bod ık et al., 2012] Peter Bod ık, Ishai Menache, Mosharaf Chowdhury, Pradeepkumar Mani, David A. Maltz, and

Ion Stoica. Surviving failures in bandwidth-constrained datacenters. In Proceedings of the ACM Conference on SIGCOMM, 2012. [Cadar et al., 2008] Cristian Cadar, Daniel Dunbar, and Dawson Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, 2008. [Cheng et al., 2011] Xiang Cheng, Sen Su, Zhongbao Zhang, Hanchi Wang, Fangchun Yang, Yan Luo, and Jie Wang. Virtual network embedding through topology-aware node ranking. ACM SIGCOMM Computer Communication Review, 2011. [Chowdhury et al., 2009] N. M. Mosharaf Kabir Chowdhury, Muntasir Raihan Rahman, and Raouf Boutaba. Virtual network embedding with coordinated node and link mapping. In IEEE International Conference on Computer Communications, 2009. [De Moura and Bjørner, 2008] Leonardo De Moura and Nikolaj Bjørner. Z3: An Efﬁcient SMT Solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, 2008. [Dufﬁeld et al., 1999] N. G. Dufﬁeld, Pawan Goyal, Albert Greenberg, Partho Mishra, K. K. Ramakrishnan, and Jacobus E. van der Merive. A ﬂexible model for resource management in virtual private networks. In Proceedings of the ACM Conference on SIGCOMM, 1999. [E en and S orensson, 2003] Niklas E en and Niklas S orensson. An extensible SAT-solver. In Theory and Applications of Satisﬁability Testing, 2003. [E en and S orensson, 2006] Niklas E en and Niklas S orensson. Translating pseudo-Boolean constraints into SAT. Journal on Satisﬁability, Boolean Modeling and Computation, 2006. [Even et al., 1975] Shimon Even, Alon Itai, and Adi Shamir. On the complexity of time table and multi-commodity ﬂow problems. In 16th Annual Symposium on Foundations of Computer Science, 1975. [Fischer et al., 2013] Anath Fischer, Juan Felipe Botero, Michael Till Beck, Hermann De Meer, and Xavier Hesselbach. Virtual network embedding: A survey. IEEE Communications Surveys & Tutorials, 2013. [Greenberg et al., 2009] Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta. VL2: A Scalable and Flexible Data Center Network. In Proceedings of the ACM Conference on SIGCOMM, 2009. [Guo et al., 2009] Chuanxiong Guo, Guohan Lu, Dan Li, Haitao Wu, Xuan Zhang, Yunfeng Shi, Chen Tian, Yongguang Zhang, and Songwu Lu. BCube: a high performance, server-centric network architecture for modular data centers. ACM SIGCOMM Computer Communication Review, 2009.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)

[Guo et al., 2010] Chuanxiong Guo, Guohan Lu, Helen J. Wang, Shuang Yang, Chao Kong, Peng Sun, Wenfei Wu, and Yongguang Zhang. Second Net: A Data Center Network Virtualization Architecture with Bandwidth Guarantees. In The International Conference on emerging Networking Experiments and Technologies, 2010. [Gupta et al., 2001] Anupam Gupta, Jon Kleinberg, Amit Kumar, Rajeev Rastogi, and Bulent Yener. Provisioning a virtual private network: a network design problem for multicommodity ﬂow. In Proceedings of the Thirty-third Annual ACM Symposium on Theory of Computing, 2001. [Heorhiadi et al., 2016] Victor Heorhiadi, Michael K. Reiter, and Vyas Sekar. Simplifying software-deﬁned network optimization using sol. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation, 2016. [Huang et al., 2014] Tianlin Huang, Chao Rong, Yazhe Tang, Chengchen Hu, Jinming Li, and Peng Zhang. Virtual Rack: Bandwidth-aware virtual network allocation for multi-tenant datacenters. In IEEE International Conference on Communications, 2014. [IETF, 2016] IETF. TRILL: Transparent interconnection of lots of links, 2016. [Online at en.wikipedia.org/wiki /TRILL; accessed 05-23-2017]. [Kakadia et al., 2013] Dharmesh Kakadia, Nandish Kopri, and Vasudeva Varma. Network-aware virtual machine consolidation for large data centers. In Proceedings of the Third International Workshop on Network-Aware Data Management, 2013. [Lee et al., 2014] Jeongkeun Lee, Yoshio Turner, Myungjin Lee, Lucian Popa, Sujata Banerjee, Joon-Myung Kang, and Puneet Sharma. Application-driven bandwidth guarantees in datacenters. In Proceedings of the ACM Conference on SIGCOMM, 2014. [Meng et al., 2010] Xiaoqiao Meng, Vasileios Pappas, and Li Zhang. Improving the scalability of data center networks with trafﬁc-aware virtual machine placement. In IEEE International Conference on Computer Communications, 2010. [Palkar et al., 2015] Shoumik Palkar, Chang Lan, Sangjin Han, Keon Jang, Aurojit Panda, Sylvia Ratnasamy, Luigi Rizzo, and Scott Shenker. E2: A framework for NFV applications. In Proceedings of the 25th Symposium on Operating Systems Principles, 2015.

[Popa et al., 2012] Lucian Popa, Gautam Kumar, Mosharaf Chowdhury, Arvind Krishnamurthy, Sylvia Ratnasamy, and Ion Stoica. Faircloud: Sharing the network in cloud computing. In Proceedings of the ACM Conference on SIGCOMM, 2012. [Prasad et al., 2005] Mukul R Prasad, Armin Biere, and Aarti Gupta. A survey of recent advances in SAT-based formal veriﬁcation. International Journal on Software Tools for Technology Transfer, 2005. [Raiciu et al., 2011] Costin Raiciu, Sebastien Barre, Christopher Pluntke, Adam Greenhalgh, Damon Wischik, and Mark Handley. Improving datacenter performance and robustness with multipath TCP. In Proceedings of the ACM Conference on SIGCOMM, 2011. [Rintanen, 2011] Jussi Rintanen. Madagascar: Efﬁcient planning with SAT. International Planning Competition, 2011. [Rost et al., 2015] Matthias Rost, Carlo Fuerst, and Stefan Schmid. Beyond the stars: Revisiting virtual cluster embeddings. ACM SIGCOMM Computer Communication Review, 2015. [Szeto et al., 2003] Wayne Szeto, Youssef Iraqi, and Raouf Boutaba. A multi-commodity ﬂow based approach to virtual network resource allocation. In IEEE Global Telecommunications Conference, 2003. [Tantawi, 2012] Asser N Tantawi. A scalable algorithm for placement of virtual clusters in large data centers. In 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 2012. [Vahdat, 2015] Amin Vahdat. A look inside Google s data center network, 2015. [Online at youtube.com/watch?v= Fa AZAII2x0w; accessed 05-23-2017]. [Yu et al., 2008] Minlan Yu, Yung Yi, Jennifer Rexford, and Mung Chiang. Rethinking virtual network embedding: Substrate support for path splitting and migration. ACM SIGCOMM Computer Communication Review, 2008. [Yuan et al., 2013] Yifei Yuan, Anduo Wang, Rajeev Alur, and Boon Thau Loo. On the feasibility of automation for bandwidth allocation problems in data centers. In Formal Methods in Computer-Aided Design, 2013. [Zhani et al., 2013] Mohamed Faten Zhani, Qi Zhang, Gael Simon, and Raouf Boutaba. VDC Planner: Dynamic migration-aware virtual data center embedding for clouds. In IFIP/IEEE International Symposium on Integrated Network Management, 2013.

Proceedings of the Twenty-Sixth International Joint Conference on Artiﬁcial Intelligence (IJCAI-17)