# congestion_graphs_for_automated_time_predictions__6be01f4d.pdf The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) Congestion Graphs for Automated Time Predictions Arik Senderovich, J. Christopher Beck Mechanical and Industrial Engineering University of Toronto Canada sariks@mie.utoronto.ca jcb@mie.utoronto.ca Avigdor Gal Industrial Engineering and Management Technion-Israel Institute of Technology Israel avigal@technion.ac.il Matthias Weidlich Dept. of Computer Science Humboldt University zu Berlin Germany matthias.weidlich@hu-berlin.de Time prediction is an essential component of decision making in various Artificial Intelligence application areas, including transportation systems, healthcare, and manufacturing. Predictions are required for efficient resource allocation and scheduling, optimized routing, and temporal action planning. In this work, we focus on time prediction in congested systems, where entities share scarce resources. To achieve accurate and explainable time prediction in this setting, features describing system congestion (e.g., workload and resource availability), must be considered. These features are typically gathered using process knowledge, (i.e., insights on the interplay of a system s entities). Such knowledge is expensive to gather and may be completely unavailable. In order to automatically extract such features from data without prior process knowledge, we propose the model of congestion graphs, which are grounded in queueing theory. We show how congestion graphs are mined from raw event data using queueing theory based assumptions on the information contained in these logs. We evaluate our approach on two real-world datasets from healthcare systems where scarce resources prevail: an emergency department and an outpatient cancer clinic. Our experimental results show that using automatic generation of congestion features, we get an up to 23% improvement in terms of relative error in time prediction, compared to common baseline methods. We also detail how congestion graphs can be used to explain delays in the system. Introduction Accurate time prediction is important in domains where having an accurate estimate of resource availability and the duration of tasks is critical for planning, scheduling, resource allocation, and coordination. In healthcare, the time until a patient sees a provider in an emergency department is crucial for ambulance routing and provider scheduling (Ang et al. 2015). Similarly, in smart cities, predicted travel and arrival times of public transportation feed directly into routing and dispatching (Botea, Nikolova, and Berlingerio 2013; Copyright c 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Wilkie et al. 2011). In manufacturing, in turn, predictions of cycle times for a product are used to set customer due dates and anticipate job completion times (Backus et al. 2006). An effective approach to solve a time prediction problem is to formulate it as a supervised learning task, where future time points are predicted based on raw event data (Senderovich et al. 2015). This data is commonly available in the form of event logs, recordings of the behavior of a system, which contain temporal information. For example, every visit to the emergency department is associated with a sequence of timestamped events that the patient experienced (e.g., start of triage and end of treatment). Previous work has shown that congestion has a substantial impact on the total time spent in a system (Gal et al. 2017) and hence on the quality of time prediction. However, event logs lack explicit information on the load imposed by arriving entities that are processed by shared (and scarce) resources. State-of-the-art methods, therefore, consider additional features that capture congestion of shared resources. These features are elicited by gathering extensive knowledge about the underlying process (e.g., by conducting interviews with stakeholders) and subsequently computed from the event logs (Ang et al. 2015). However, process knowledge is expensive to gather and not always easy to elicit as stakeholders often lack a global view of the process. It is wellknown that elicitation of process knowledge is hindered by its fragmentation across stakeholders, their focus on individual entities, and a general lack of conceptualization capabilities (Rosemann 2006; Frederiks and van der Weide 2006; Dumas et al. 2018). In addition, manual feature elicitation is often time consuming and prone to biases and errors. The process of feature generation is considered an art, making it difficult to automate (Khurana, Samulowitz, and Turaga 2018). In this work, we address the challenge of automatically generating congestion features based on the information available in event logs, thus removing the need for prior process knowledge. To this end, we propose a data-driven method rooted in queueing theory, a sub-field in Operations Research that analyzes the impact of congestion on a system s performance (Bolch et al. 2006). Our contribution is threefold. 1. We introduce congestion graphs, dynamic networks that capture queueing information. 2. We present a declarative mining procedure that automatically constructs congestion graphs from event data without the need for process knowledge. 3. We show how to extract congestion-related features from congestion graphs. We empirically test our approach using event logs from two real-world healthcare systems, predicting the time to meet the first physician in an emergency department and the total time spent in an outpatient cancer clinic. Incorporating our congestion features improves the relative error of prediction by up to 23% and 14%, respectively, compared to baseline prediction methods using the same process knowledge. Data-Driven Time Predictions In this section, we define our data model in the form of event logs and then pose the problem of automated time prediction via supervised learning. We conclude the section with an overview of our approach to generate congestionrelated features from an event log in order to solve the time prediction problem. Event Logs As our data model, we consider event data as collected by modern information systems (i.e., event logs) that trace the events that occur in the underlying system (van der Aalst 2016). For example, in a hospital setting, an event log will comprise patient pathways, represented by a sequence of timestamped services that denote treatment steps (e.g., XRAY ordering, start of physical examination, etc.). 1 is a sample from the event log of an emergency department. Here, the handling of a specific entity (i.e., a patient) is represented by the notion of a case that is encoded by a case identifier present in all log entries. Event logs represent raw data for individual cases, but do not contain explicit system-level information (e.g., the number of available resources and the number of cases waiting for a service). Table 1: An event log of an emergency department. Case Id Event Name Timestamp 11 Registration 7:30:04 11 Nurse Admission Start 7:35:52 13 Additional Vitals End 7:36:07 13 Lab Tests Results Start 7:40:32 11 Nurse Admission End 7:47:12 13 Lab Tests Results End 7:51:02 12 Additional Vitals Start 7:52:48 11 Order Blood Test 8:05:10 11 Additional Vitals Start 8:36:22 11 Additional Vitals End 8:48:37 12 Additional Vitals End 8:57:45 13 Doctor Admission Start 8:59:08 11 Doctor Admission Start 9:12:45 To formalize the notions of cases and their traces in event logs, let I, E, and T be the finite universes of case identifiers, event names, and timestamps, respectively. Then, an event log L (I E T ) is a set of log entries, triplets that combine a case, an event name, and a timestamp. We define some short-hand notation to refer to the log entries of a single case. Given a log L, the trace σi of case i I comprises all the related log entries in order: σi = (i, ei 1, ti 1), (i, ei 2, ti 2), . . . , (i, ei n, ti n) with σi(q) = (i, ei q, ti q) L, 1 q n, such that tq < tr for 1 q < r n (log entries are ordered by their timestamp) and S 1 q n{(i, ei q, ti q)} = {(ir, er, tr) L | ir = i} (the trace contains all log entries of case i). As such, ei q and ti q denote the event name and timestamp of the q-th log entry of the trace of case i, respectively. We assume that the first event ei 1 is an arrival event of case i into the system. In what follows, we shall omit suband superscripts whenever clear from the context. Moreover, we write |σi| = ni to denote the length of the i-th trace. Time Prediction with Supervised Learning We are interested in predicting the timestamp of an event related to a specific case. For example, in an emergency department, we are interested in the time that a patient sees a physician for the first time, as it is crucial information for online ambulance routing (for acute patients) and for a patient s choice of an emergency department (for low-acuity patients) (Ang et al. 2015). In other contexts, such as the treatment of cancer patients, the time until the end of the last treatment step is an important indicator for quality-of-service. The prediction target is therefore the time te, when a patient first reaches a specific event e E, conditioned on time t1 of arrival (e1). Using supervised learning, every log entry in the training set is given a label y = te t1 for the prediction target, denoting the universe of such labels by Y. Consequently, the input for the learning algorithm is a labeled event log denoted by Ly L Y. We aim to obtain a function h : L Y, which maps log entries to corresponding labels (Shalev-Shwartz and Ben-David 2014). A main challenge when applying supervised learning to solve prediction problems is to obtain features that explain the target variable. However, in systems with shared and scarce resources, raw event recordings do not contain congestion related information, such as system load and the number of available resources. In this work, we therefore propose a feature transformation function φ : L Y X Y that maps raw labeled event recordings into a set of features (X) with the following two capabilities: (i) The proposed method is automatically applicable without prior knowledge of the system under investigation or the specific semantics of events recorded in the log; (ii) The proposed method is grounded in well-established results from queueing theory, thereby guiding the feature generation procedure with insights on the impact of congestion on the system s temporal behavior. Approach Overview We use a model-driven approach to automatically generate congestion-related features, as illustrated in 1. Given an event Graph Mining Event Log Enriched Event Log Feature Extraction Congestion Figure 1: Our solution to generate congestion features. log, we first mine congestion graphs, graphical representations of the dynamics observed in the system. These dynamic graphs represent the flow of entities in terms of events and are labeled with performance information that is extracted from the event log. Extraction of such performance information is grounded in some general assumptions on the system dynamics: in this work, on a state representation of an underlying queueing system. Lastly, we create a transformation function φG that encodes the labels of a congestion graph into respective features. This feature creation yields an enriched event log, which can be used as input for a supervised learning method. Congestion Graphs We start the section with an overview of a general queueing model that serves as our theoretical basis, before introducing the model of congestion graphs. Then, we demonstrate mining of congestion graphs and show how these graphs are used for feature extraction. Generalized Jackson Networks For time prediction in queueing networks, we consider the model of a Generalized Jackson Network (GJN), the most general model in single-server queueing theory (Gamarnik and Zeevi 2006). A GJN describes a network of queueing stations, where entities wait for a particular service (e.g., a treatment step) that is conducted by or uses shared resources. The entities are assumed to be non-distinguishable and may arrive exogenously into any of the queueing stations according to a renewal process. Upon arrival, entities are served in a First-Come First-Served order by a single resource, with service times being independent and identically distributed. Hence, the length-of-stay (or sojourn time) at a queueing station is the sum of waiting time and service time. When entities complete service at a station, they are either routed to the next station or depart the system. Routing is assumed to be Bernoulli distributed: a coin is flipped at the end of service to decide on the next station (or departure). As a GJN model postulates that each station has a single resource, multiple resources are modeled by an increased processing rate of a station: the service rate is multiplied by the number of resources. The state of the GJN corresponds to a Markov process, known as the Markov state representation (MSR), that comprises three components: the queue length, the elapsed time since the most recent arrival, and the time since the start of the most recent service (Gamarnik and Zeevi 2006; Chen and Yao 2013). To capture the state at time t, the three components must be measured just prior to time t. The Model of Congestion Graphs A congestion graph is a fully-connected, vertex-labeled, directed graph, G = (V, F, ω) with V being the vertices and F = V V being the edges. The labeling is based on a universe Ωof vertex labels and is time-varying. With T as the universe of timestamps (as introduced above for event logs), function ω : V T Ωassigns a label to vertices at particular points in time. We denote ωt(v) the label of vertex v V at time t T . In our work, we define congestion graph labels using the MSR of a GJN. Specifically, a congestion graph can be thought of as a GJN where each edge represents a queueing station. The time that cases spend on edges of the congestion graph represent service times, while events (in the event log) correspond to congestion graph vertices. Hence, given a point in time t and an edge (v, v ) of the congestion graph, its MSR is given by a triplet that consists of: (1) the number of cases traveling on edge (v, v ); (2) the time elapsed since the most recent arrival of a case into edge (v, v ); and (3) the time elapsed since the start of the most recent service at (v, v ). However, we cannot determine the edge of an ongoing case at time t as this information is not directly accessible in event logs. At a time point t, we only know the last event observed for each case (v), without knowing the next event (v ). Thus, we label the vertices of the congestion graph rather than its edges. Following this idea, we construct the congestion graph G = (V, F, ω) by setting the vertices V to be the set of all events observed in the log and by assigning time-dependent vertex labels, as approximations of the MSR. Specifically, we set ωt(v) to be a tuple (n(v, t), ϵ(v, t), τ(v, t)), where n(v, t) is the number of cases for which v is the most recent event (i.e., the number of cases that are in transition to the service after v); ϵ(v, t) is the total time since these cases visited v (i.e., the accumulated partial transition delays); and τ(v, t) is the time between the two most recent occurrences of the respective event v. Feature Extraction from Mined Congestion Graphs We conclude this section by providing the declarative procedure to derive the approximated MSR from an event log L and demonstrating how to extract features from the mined congestion graph. Given an event log L, mining of a congestion graph involves the extraction of events that yield the vertices of the graph, V = {e E | (i, e, t) L}, the identification of dependencies between the events that yield the edges, F = {(ei q, ei q+1) (E E) | i I, 1 q < |σi|}, and the definition of the labeling function. As explained above, these labels are defined for particular points in time. However, in practice, the labeling function does not need to be defined for every timestamp in T , but may be limited to the timestamps that appear in the event log (T = {t T | (i, e, t) L}). We derive the labels in terms of the approximated MSR as follows. The number of cases in transition at time t, for Registration Lab tests results Doctor admission Nurse admission Order blood test Figure 2: A part of the congestion graph constructed using the event log of 1. which the last event was v is given by: n(v, t) = i I | 1 q |σi| : ei q = v ti q < t < ti q+1 . The total elapsed time ϵ(v, t) for cases, for which event v has just been observed, is calculated as: ϵ(v, t) = X i I t ti q | 1 q |σi| : ei q = v ti q < t < ti q+1. Finally, the time between the two most recent occurrences of events v prior to time t is defined as: τ(v, t) = t t , with t = max i I,1 q |σi| ei q=v ti q