# traffic_flow_prediction_with_vehicle_trajectories__14d15fb6.pdf

Trafﬁc Flow Prediction with Vehicle Trajectories

Mingqian Li,13* Panrong Tong,12 Mo Li,12 Zhongming Jin,3 Jianqiang Huang,13 Xian-Sheng Hua3

1 Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University, Singapore 2 School of Computer Science and Engineering, Nanyang Technological University, Singapore 3 Alibaba Group {mingqian001, tong0091, limo}@ntu.edu.sg, {zhongming.jinzm, jianqiang.hjq, xiansheng.hxs}@alibaba-inc.com

This paper proposes a spatiotemporal deep learning framework, Trajectory-based Graph Neural Network (Tr GNN), that mines the underlying causality of ﬂows from historical vehicle trajectories and incorporates that into road trafﬁc prediction. The vehicle trajectory transition patterns are studied to explicitly model the spatial trafﬁc demand via graph propagation along the road network; an attention mechanism is designed to learn the temporal dependencies based on neighborhood trafﬁc status; and ﬁnally, a fusion of multi-step prediction is integrated into the graph neural network design. The proposed approach is evaluated with a real-world trajectory dataset. Experiment results show that the proposed Tr GNN model achieves over 5% error reduction when compared with the state-of-the-art approaches across all metrics for normal trafﬁc, and up to 14% for atypical trafﬁc during peak hours or abnormal events. The advantage of trajectory transitions especially manifest itself in inferring high ﬂuctuation of ﬂows as well as non-recurrent ﬂow patterns.

1 Introduction

Robust and accurate predictions of vehicular trafﬁc conditions (e.g., ﬂow, speed, density), either short-term or longterm, is necessary for transportation services such as trafﬁc control and route planning. The challenge of trafﬁc prediction primarily stems from the complex nature of spatiotemporal interactions among vehicles and the road network. Data driven approaches have been extensively exploited in predicting vehicular trafﬁc on the road network. Early attempts leverage time series analysis (Williams and Hoel 2003), which primarily models the temporal correlations of trafﬁc. Conventional machine learning models (Sun, Zhang, and Yu 2006) are applied to learn the spatiotemporal correlations of trafﬁc from historical data. Latest works apply deep learning to trafﬁc prediction, and they typically follow a spatiotemporal framework, e.g., Graph Neural Network (GNN) (Li et al. 2018), which demonstrates superior capability in learning complicated spatiotemporal correlations.

*Mingqian Li is under the joint Ph D Program with Alibaba Group and Alibaba-NTU Singapore Joint Research Institute, Nanyang Technological University, Singapore. Copyright 2021, Association for the Advancement of Artiﬁcial Intelligence (www.aaai.org). All rights reserved.

Regular pattern

(b) When vehicle trajectories are available

(a) When only flow observations are available Non-recurrent pattern

Regular pattern Non-recurrent pattern

Figure 1: The challenge in predicting non-recurrent trafﬁc ﬂows and how vehicle trajectory information may help.

According to studies in transportation domain (Skabardonis, Varaiya, and Petty 2003), the road trafﬁc contains two parts: recurrent trafﬁc, which often arises from periodic trafﬁc demand such as daily commuters during morning and evening rush hours, and non-recurrent trafﬁc, which is triggered by unexpected causes such as public transit disruptions or accidents. Existing approaches learn spatiotemporal trafﬁc correlations from patterns that were seen in history, and thus are favourable in predicting recurrent trafﬁc. In predicting non-recurrent trafﬁc, however, existing approaches may fail to achieve the same level of accuracy, mainly due to insufﬁcient observations of similar ﬂow patterns in history. Figure 1(a) illustrates such an issue with an example - when only ﬂow observations (i.e., the number of vehicles passing each road segment) are available, existing approaches may learn spatiotemporal correlations of recurrent ﬂow patterns among different road segments across different time, which cannot effectively reason how a previously unseen part of the trafﬁc is credited to future road trafﬁc. This paper targets at addressing the current challenge with non-recurrent trafﬁc ﬂow prediction - the challenge that historical ﬂow data cannot provide insights on how nonrecurrent trafﬁc ﬂows correlate in time and space. Thus,

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

complementary to conventional spatiotemporal modeling using aggregated trafﬁc ﬂow observations, we also exploit intact vehicle trajectories to infer short-term trafﬁc dependency. As suggested in Figure 1(b), trajectory data provide information of how each portion of a trafﬁc ﬂow transits from one road segment to another, and thus implies the dependency of upstream and downstream ﬂows. The dependency embeds knowledge of how downstream trafﬁc are caused by upstream trafﬁc and thus may help infer trafﬁc ﬂow patterns that have not been seen before. Incorporating trajectories to trafﬁc ﬂow prediction entails the following challenges: 1) trajectories are only observed from historical data; 2) trafﬁc patterns at trajectory level need be aggregated properly to reﬂect the trafﬁc patterns at ﬂow level; and 3) future ﬂows may deviate from observed patterns due to inﬂuence from the environment. To address those challenges, this paper proposes a novel model, Trajectory-based Graph Neural Network (Tr GNN), which learns trajectory transition patterns from historical trajectories and incorporates that into a spatiotemporal graph-based deep learning framework. The main contributions of this paper are summarized as follows: We identify the challenge of predicting non-recurrent trafﬁc ﬂows, and we propose to incorporate vehicle trajectory data in trafﬁc ﬂow prediction. To the best of our knowledge, this is the ﬁrst study that attempts to leverage vehicle trajectory data to mine the underlying causality of ﬂows among roads. We design an end-to-end spatiotemporal graph-based deep learning model to predict trafﬁc ﬂows of the entire road network. Our model embeds trajectory transition into graph propagation along the road network to model the spatial trafﬁc demand; it learns the temporal dependencies with an attention mechanism based on neighborhood trafﬁc status; and ﬁnally it fuses multi-step predictions. We conduct extensive experiments 1 with real-world vehicle trajectory data and the results suggest that our model outperforms state-of-the-art approaches in terms of prediction errors across various scenarios (over 5% error reduction), and is especially superior in predicting nonrecurrent ﬂow patterns, e.g., during abnormal events (up to 14% error reduction). The rest of the paper is organized as follows. Section 2 discusses related work on vehicular trafﬁc prediction. Section 3 deﬁnes the problem and introduces some preliminary knowledge. The proposed model Tr GNN is introduced in Section 4, and experimentally evaluated in Section 5. Section 6 concludes the paper.

2 Related Work For decades, data driven approaches have been exploited to predict road trafﬁc conditions, such as ﬂow (Lv et al. 2014), speed (Li et al. 2018), density (Raj, Bahuleyan, and Vanajakshi 2016), accident rate (Sun and Sun 2015), and arrival time (Zhou, Zheng, and Li 2012).

1Code and dummy data are available at https://github.com/mingqian000/Tr GNN.

Early attempts with time series analysis model the temporal correlations of trafﬁc, such as SARIMA (Williams and Hoel 2003) and VAR (Chandra and Al-Deek 2009). Those approaches rely on strong assumptions of linearity and stationarity and often ignore the spatial impact from neighboring trafﬁc. Another line of research focuses on studies of conventional machine learning models, such as k-NN (Davis and Nihan 1991), Bayesian network (Sun, Zhang, and Yu 2006), and SVR (Vanajakshi and Rilett 2004). In those models, spatiotemporal features are manually designed and extracted, and the models are often shallow in structure with limited learning capability. Recent advances in deep learning have motivated its application in trafﬁc prediction (Liu et al. 2018). Earlier neural network architectures include SAEs (Lv et al. 2014) and DBN (Jia, Wu, and Du 2016). State-of-the-art approaches typically follow a spatiotemporal framework: it models the spatial correlations by CNNs (Yu, Yin, and Zhu 2018) viewing the map as an image, or by GCNs (Li et al. 2018) viewing the road network as a graph; and it models the temporal evolution of trafﬁc as a sequence of signals (Zhao et al. 2019). The spatiotemporal framework makes it ﬂexible to embed auxiliary information such as weather conditions (Yao et al. 2018), accident data (Yu et al. 2017), map query records (Liao et al. 2018), and POIs (Geng et al. 2019). Similar to these works, our model follows a graph-based spatiotemporal deep learning framework; in addition, we incorporate vehicle trajectory data into the design to address the challenge of predicting non-recurrent ﬂows. Among deep learning approaches, Graph Wavenet (Wu et al. 2019) and SLC (Zhang et al. 2020) mine latent graph structures to capture long-range dependencies, and MRes RGNN (Chen et al. 2019) designs a multiple hop scheme to capture long-term periodic patterns; the design goals of those methods deviate from our key objective of predicting non-recurrent trafﬁc which is often caused by sudden disruptions locally. Other methods explore temporal building blocks and combine them with graph convolution, e.g., attention in ASTGCN (Guo et al. 2019) and GMAN (Zheng et al. 2020), gated recurrent unit in T-GCN (Zhao et al. 2019), temporal convolution in STGCN (Yu, Yin, and Zhu 2018). Most of these methods aim at reducing overall prediction errors without speciﬁc focus on non-recurrent trafﬁc. Few existing works leverage trajectory data in trafﬁc ﬂow prediction. Zhang et al. leverages trajectories in trafﬁc state estimation, only to calibrate the embedding of road intersections (2019). In comparison, our work utilizes trajectories to mine the trafﬁc ﬂow transitions among road segments. Some work leverages trajectories for other purposes such as map generation (Ruan et al. 2020) which is less relevant to the topic of our study.

3 Preliminaries

We ﬁrst deﬁne the problem of trafﬁc ﬂow prediction, and then introduce some preliminary knowledge of Graph Convolutional Networks (GCNs) and attention mechanism.

3.1 Problem Deﬁnition In trafﬁc ﬂow prediction, the target is to predict future trafﬁc ﬂows given historical trafﬁc ﬂows on a static road network. Deﬁnition 1 (Road Graph). The road network can be formulated into a directed road graph G = (V, E, A). V = {vi}i=1,2,...,M is a ﬁnite set of nodes where each node vi represents a road segment i, and E = {eij} is a set of directed edges where each edge eij = (vi, vj) indicates that road segment i is an immediate upstream of road segment j, and A [0, 1]M M represents the weighted road adjacency matrix. Each node vi has a self-loop, i.e., eii E. Deﬁnition 2 (Trafﬁc Flows). Trafﬁc ﬂow is deﬁned as the number of vehicles passing by a road segment during a speciﬁc time interval. Given a road graph G = (V, E, A), we use X RT |V| to represent the time series of trafﬁc ﬂows, where for each interval t = 1, 2, ..., T, Xt R|V| represents the trafﬁc ﬂows of all road segments in the road network during time interval t. Deﬁnition 3 (Trajectory). Given a road graph G = (V, E, A), we use T = (v1, v2, ..., v I) to represent a trajectory of a vehicle, where each vi V represents a road segment in the trajectory, satisfying (vi, vi+1) E, vi = vi+1, i = 1, 2, ..., I 1. Problem 1 (Trafﬁc Flow Prediction). Given a road graph G = (V, E, A), ﬁnd a prediction function ˆf with parameter Θ such that given trafﬁc ﬂows X(t Tin+1):t within a historical window period Tin up to time interval t, ˆf estimates the most likely trafﬁc ﬂows ˆXt+1 for the next time interval t+1, i.e., ˆXt+1 := ˆfΘ(Xt Tin+1, ..., Xt) arg max Xt+1 log p(Xt+1|X(t Tin+1):t), (1)

3.2 Graph Convolutional Networks (GCNs) To model the spatial trafﬁc demand, we leverage the idea of graph propagation from GCNs. A GCN is deﬁned over a graph G = (V, E, A). It applies convolutional operations on graph signals in spectral domain (Kipf and Welling 2017; Defferrard, Bresson, and Vandergheynst 2016). Given a graph signal X R|V| N where N is the number of features, a typical formulation of a K-hop graph convolutional layer is

GCNG(X; W, θ) = σ(

k=0 θk Lk X)W (2)

where L [0, 1]|V| |V| is the graph Laplacian, a variant of A, to control the graph propagation across nodes; θ RK controls the fusion of different hops; W RN M (M is the output feature dimension) controls the fusion of different features; and σ is the activation function. In this paper, to model trafﬁc demand and trafﬁc status, we adopt graph propagation (Graph Prop), a simpliﬁed variant of GCN, with one single input feature (i.e., trafﬁc ﬂow) and thus ignoring the feature-wise parameters W:

Graph Prop(X, A; K) := [X AX A2X ... AKX]. (3)

Instead of directly learning θ to fuse different hops of trafﬁc demand, we deﬁne an attention mechanism to learn the temporal weights, as illustrated in Section 3.3.

3.3 Attention Mechanism

In learning a weighted sum of values, an attention mechanism (Vaswani et al. 2017; Veliˇckovi c et al. 2018) replaces the weight parameters with a learning module in which the same set of parameters (called keys) are shared across all values in calculating the weights. A typical formulation of an attention mechanism given queries Q RN dk, keys K RM dk and values V RM dv is

Attention(Q, K, V ) := softmax(QK )V. (4)

In this paper, we apply the attention mechanism for a more ﬂexible fusion of trafﬁc demand based on trafﬁc status.

4 Methodology

In this section, we introduce our proposed model, Tr GNN, to address the problem of trafﬁc ﬂow prediction (Problem 1). The model follows a spatiotemporal framework, leveraging trajectory transition patterns. The extraction of trajectory transition is illustrated in Figure 2. An overview of the model architecture is illustrated in Figure 3. We elaborate each part of the architecture in the subsections below.

4.1 Trajectory Transition

Figure 1 illustrates the extra information gain from historical trajectory data when inferring non-recurrent trafﬁc patterns. Compared to ﬂows, trajectories provide essential knowledge about drives origin and destination (O-D), and help infer their choices of routes at road intersections. Figure 1(b) visualizes a contrast between green and dark red trajectories, which indicates that vehicles coming from different upstream road segments (or origins) may differ in their distributions of downstream road segments (or destinations). Hence, for non-recurrent trafﬁc patterns, we may infer ﬂows on a trajectory basis: ﬁrst obtain the origins of the existing vehicles, and then based on their origins, infer the distribution of their destinations. In Figure 1(b), for example, we may infer that the extra spike of ﬂow is more likely caused by extra vehicle ﬂow of dark red trajectories instead of green trajectories. We utilize historical trajectories to explicitly model the transition of ﬂows between upstream and downstream road segments. Figure 2 illustrates the extraction of trajectory transition. We view historical trajectories as Markov processes, and by aggregating trajectories of all vehicles, we may infer the transition of ﬂows from one road segment to others. The rest of this subsection elaborates the extraction of trajectory transition. The trajectory generation of a vehicle can be modeled as a ﬁrst-order Markov process, assuming that the transition probability from each upstream road segment to each downstream road segment is stationary across days. We deﬁne a trajectory transition tensor P RTd |V| |V|, where each

Historical trajectories Trajectory transition ! Trajectory inference

trajectories vehicles time

!",$,& !",$,'

Figure 2: Learning trajectory transition from historical trajectories.

Pt R|V| |V| represents the trajectory transition probabilities for the tth time interval of the day. The trajectory generation process can be represented as

P(T |t) = P((v1, v2, ..., v I)|t)

i=1 P(vi+1|vi; t)

i=1 Pt,vi,vi+1.

Alternatively, P can be derived from a higher-order Markov process, which would require larger sample size and higher computational complexity. To estimate tensor P, we collect historical trajectories of all vehicles from the training set, and aggregate the cumulative transition probability with respect to time of day, upstream road segment, and downstream road segment:

ˆPt,vi,vj = #vehicles (vi vj|t) + 1[vj N(vi)]

#vehicles(vi|t) + N(vi) , (6)

where t stands for the tth time interval of day, and N(vi) denotes the set of downstream neighbors of road segment vi. To mitigate data sparsity, ˆP is smoothed out by adding a constant 1 for any pair of consecutive road segments. The trajectory transition tensor P summarizes the probability distribution of drivers choices of routes. In a macroscopic view, it approximates the transition of ﬂows from upstream to downstream road segments in near future, and serves as a lookup table in the proposed Tr GNN model.

4.2 Spatial Modeling of Trafﬁc Demand

Based on trajectory transition, we design a graph propagation mechanism to infer the trafﬁc demand in the spatial domain. Trafﬁc demand refers to the short-range and longrange destinations of existing vehicles on the road network. We leverage graph propagation from Graph Convolutional Networks (Section 3.2) to simulate the transition of vehicles along the road network. We perform graph propagation in d hops, resulting in a graph of trafﬁc demand for each hop. For each input time interval t, we can derive trafﬁc

demand Dt R|V| (d+1) via graph propagation:

Dt = Graph Prop(Xt, P t ; d)

= [Xt P t Xt (P t )2Xt ... (P t )d Xt], (7)

where denotes concatenation, denotes matrix transpose, and parameter d stands for demand hop, controlling the farthest possible destination. The graph propagation simulates the propagation of ﬂows along the road network, and as a result, the trafﬁc demand D is an aggregation of the short-range and long-range destinations (in different hops) of all vehicles in existing ﬂows.

4.3 Temporal Modeling of Trafﬁc Demand based on Trafﬁc Status

The modeling of trafﬁc demand in Section 4.2, however, does not consider the propagation speed of ﬂows, which should depend on trafﬁc status. Trafﬁc status refers to the overall trafﬁc volume in the neighborhood of each road segment. If the trafﬁc status is congested around a road segment (i.e., high volume of ﬂows in the neighboring road segments), the propagation of ﬂows along that road segment should be slow, and vice versa. A temporal module is thus designed to infer how each hop of trafﬁc demand, from short range to long range, corresponds to the future trafﬁc ﬂow in the targeted time interval. This is done by assigning a weight to each hop of trafﬁc demand via an attention mechanism (Section 3.3) based on trafﬁc status. Thus, we ﬁrst obtain trafﬁc status, and then build an attention mechanism based on trafﬁc status. For each input time interval t, we obtain trafﬁc status St R|V| (2s+1 1) via graph propagation in s hops from neighboring road segments. The graph propagation is done via dual random walk to incorporate both upstream and downstream trafﬁc:

St = Graph Propdual(Xt, A; s)

= [Xt A Xt AXt A A Xt ... As Xt], (8)

where A is a normalized variant of the weighted road adjacency matrix A, and parameter s stands for status hop, controlling the radius of the neighborhood.

Predicted flows !

Graph Propagation Traffic demand "#

Graph Propagation

Multi-step fusion

Predicted flows $

Traffic status %#

Trajectory transition &

Road-wise Temporal Attention Historical flows '

Road adjacency (

Figure 3: Trajectory-based Graph Neural Networks (Tr GNN). The framework models spatial trafﬁc demand via graph propagation based on trajectory transition, and models temporal dependencies via attention mechanism based on neighborhood trafﬁc status. The ﬁnal prediction is a fusion of multi-step prediction.

We apply a road-wise attention mechanism (referring to the dot-product attention in (Vaswani et al. 2017)) parameterized by keys K R|V| (2s+1 1) (d+1), taking trafﬁc status St as queries and trafﬁc demand Dt as values, to assign weights α [0, 1]|V| (d+1) to different hops in trafﬁc demand Dt and take the weighted sum as an initial prediction of ﬂows Ht R|V|: Ht = Attention(St, Dt; K)

i=0 α:,i Dt,:,i

i=0 [softmax(St K)]:,i Dt,:,i

where softmax( ) is applied over the dimension of demand hop, denotes road-wise matrix product, and denotes element-wise (or Hadamard) product.

4.4 Multi-Step Fusion From a sequence of input ﬂows {Xi}i=t Tin+1,...,t, we obtain a sequence of initial predictions H RTin |V|. The ﬁnal layer of the model is a temporal fusion of H. We adopt a road-wise fully connected layer. For each road segment v, yv := Xt+1,v = Fully Connected(H:,v; Θ) = Θ H:,v. (10)

Alternatively, this layer can be replaced by any RNN cell such as LSTM (Hochreiter and Schmidhuber 1997) or GRU (Chung et al. 2014)), or sequence modeling (Sutskever, Vinyals, and Le 2014), for a longer-term prediction. As a side note, the conservation of vehicles on the road network does not hold in practice. Future ﬂows not only depend on trajectory transition within the road network, but

also depend on new vehicles entering the road network (e.g., entering from the boundary of the region, or entering from a local road to an arterial road) and existing vehicles leaving the road network, which we call boundary ﬂows. Since the boundary ﬂows are strongly associated with drivers O-D demand which is periodic, we embed some periodic features (e.g., time of day, is working day) into the multi-step fusion module to model the boundary ﬂows.

5 Experimental Evaluation 5.1 Dataset Description We evaluate our model with SG-TAXI, a real-world dataset comprising GPS mobility traces from over 20,000 taxis in Singapore. The dataset is provided by Singapore Land Transport Authority. We collect the GPS readings of all active taxis for a period of 8 weeks (14th Mar-8th May 2016). Each GPS reading comprises vehicle id, longitude, latitude, and timestamp. The road network comprises 2,404 road segments, covering all expressways in Singapore.

5.2 Data Preprocessing We preprocess the SG-TAXI dataset in 4 steps: 1. Road graph formulation. For the road graph G = (V, E, A), we calculate the weighted road adjacency matrix A as the exponential decay of distance between roads. 2. Map matching. We apply the Hidden Markov map matching algorithm (Newson and Krumm 2009) to correct GPS readings to their corresponding road segments. 3. Trajectory cleansing. Given a sequence of mapped GPS points, we cleanse the vehicle s trajectory as follows: 1) eliminate duplicate records; 2) if GPS reading is off for over 10 minutes (e.g., the driver turns off the sensing device), split the trajectory; 3) if driver stays on the same road segment for over 2 minutes, split the trajectory; 4) if

no path exists between two consecutive GPS points (e.g., driver drives off the road network), split the trajectory; and 5) remove GPS points with extreme speed (i.e., speed derived from two consecutive GPS points exceeds 120 km/hr). Finally, we recover the full trajectory via Dijkstra s algorithm (Dijkstra et al. 1959).

4. Flow aggregation. We aggregate trajectories into ﬂows per road segment per 15-minute interval. We calibrate ﬂows to correct the daily ﬂuctuation in taxi arrangement and better represent the overall trafﬁc ﬂows in Singapore.

5.3 Experiment Settings

The model is trained on the preprocessed SG-TAXI dataset. The train-validate-test split is 5-1-2 week. Each data point consists of input ﬂows for 4 intervals (i.e., 1 hour) and output ﬂows for 1 interval (i.e., 15 minute). Flows are normalized before being input into the model. For hyperparameters, the demand hop d is set to 75, i.e., the maximum number of road segments that a vehicle with a normal speed could traverse within a 15-minute interval, and the status hop s is set to 3. The model is implemented in Py Torch (Paszke et al. 2019) on a single Tesla P100 GPU and is trained using Adam optimizer (Kingma and Ba 2014) to minimize MSE loss. The learning rate is initially set to 0.004 and is halved every 30 epochs. The maximum epochs to train is set to 100. Early stopping is applied on validation MAE. The training takes less than 4GB RAM and less than 1GB GPU memory.

5.4 Baseline Approaches

Our model Tr GNN is compared to representative baseline methods of each type, including naive methods (HA, MA), time series analysis (VAR), conventional machine learning (RF), and deep learning (T-GCN, STGCN, DCRNN). Speciﬁcally, (i) HA (Historical Average) is the average ﬂow of the same time on the same day in the past four weeks; (ii) MA (Moving Average) is the average ﬂow of the previous 1 hour; (iii) VAR (Vector Auto-Regression) (Hamilton 1994) models the future ﬂow as a linear combination of historical ﬂows in 5-hop neighborhood, implemented in Stats Models (Seabold and Perktold 2010); (iv) RF (Random Forest) is a decision-tree-based ensemble method that ﬁts a piece-wise function on historical ﬂows in 5-hop neighborhood, implemented in Scikit-learn (Pedregosa et al. 2011) with 100 trees; (vi) T-GCN (Temporal Graph Convolutional Network) (Zhao et al. 2019) is a graph-based neural network that integrates GCN with GRU, implemented in Tensorﬂow 2; (vii) STGCN (Spatio-Temporal Graph Convolutional Networks) (Yu, Yin, and Zhu 2018) is a graph-based neural network that models both spatial and temporal dependencies via convolution, implemented in Pytorch 3; and (viii) DCRNN (Diffusion Convolutional Recurrent Neural Network) (Li et al. 2018) is a graph-based neural network that integrates diffusion convolution on graph with sequence learning, implemented in Py Torch 4.

2https://github.com/lehaifeng/T-GCN 3https://github.com/Felix Opolka/STGCN-Py Torch 4https://github.com/chnsh/DCRNN Py Torch

5.5 Evaluation Metrics We evaluate prediction results by three error metrics: MAE (Mean Absolute Error), MAPE (Mean Absolute Percentage Error), and RMSE (Root Mean Squared Error), same as in (Li et al. 2018). Lower errors indicate better performance.

5.6 Results and Analysis Table 1 summarizes the evaluation of different approaches for trafﬁc ﬂow prediction on SG-TAXI dataset. The comparison covers overall testing as well as speciﬁc scenarios including peak hours, non-peak hours and MRT breakdown. Overall Performance. According to Table 1, the overall prediction errors of our model Tr GNN are 26.43/0.30/38.65 vph for MAE/MAPE/RMSE, and Tr GNN achieves over 5% error reduction from baselines across all metrics. The naive baselines generally give high errors, as they consider only temporal correlations of ﬂows; MA is more accurate than HA, indicating that near-past ﬂows play a stronger role than periodicity. VAR and RF perform better than the naive baselines, as they incorporate neighborhood ﬂows to model spatial correlations; in particular, RF performs better than VAR, implying that ﬂows are not linearly correlated. For deep learning, DCRNN achieves the best results out of all baselines, indicating the capability of graph-based deep learning in capturing the spatiotemporal correlations. Finally, Tr GNN outperforms all existing baselines in all metrics, which veriﬁes the effectiveness of learning spatiotemporal transition of ﬂows from trajectories. The line plot in Figure 4 visualizes predicted ﬂows of Tr GNN and a few representative baselines. HA ﬁts the worst to the ground truths, implying high variation of ﬂows from week to week; while Tr GNN and DCRNN are more sensitive to the real-time ﬂuctuations of ﬂows. If we look further into the peak hours indicated in the dashed box, Tr GNN captures the ﬂuctuations of ﬂows slightly earlier than DCRNN. Peak hours and non-peak hours. We select two typical periods for experiments: peak hours (8-10pm on working days, when public transport services become limited and the demand for taxis increases, thus with high ﬂuctuation of ﬂows); and non-peak hours (2-4pm on working days when people stay at ofﬁces and the demand for taxis stabilizes, thus with low ﬂuctuation of ﬂows). Results are summarized in Table 1 (under Peak hours and Non-peak hours column). In peak hours, absolute errors (MAE/RMSE) are consistently higher than in overall testing; while in non-peak hours, the results are the opposite. This meets our expectation, as predicting peak hour ﬂows is more challenging due to higher ﬂuctuation in trafﬁc demand. In both peak hours and non-peak hours, Tr GNN outperforms all baselines, and the error reduction of Tr GNN is more signiﬁcant during peak hours (6-13% reductions on the performance metrics). Figure 4 displays the heatmap snapshots of the prediction errors of HA, DCRNN and Tr GNN on the entire road network during selected peak hours. The color indicates increasing prediction error from green to red. A comparison of the heatmap snapshots suggests the robustness of Tr GNN in capturing the periodic ﬂuctuation of ﬂows in peak hours. Abnormal event: MRT breakdown. We analyze an abnormal event in Singapore, an MRT (Mass Rapid Transit)

Overall Peak hours Non-peak hours MRT breakdown Method MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE MAE MAPE RMSE HA 33.74 0.34 52.58 36.83 0.25 55.02 32.53 0.28 48.67 40.07 0.27 59.34 MA 31.55 0.35 47.69 36.14 0.26 53.18 28.18 0.27 39.41 44.85 0.30 71.43 VAR 29.27 0.33 43.22 34.23 0.24 49.71 28.10 0.26 39.28 40.68 0.27 64.41 RF 29.26 0.33 43.38 34.13 0.24 49.75 27.53 0.26 38.53 42.28 0.28 66.53 T-GCN 31.12 0.35 45.69 36.57 0.27 52.91 30.03 0.29 41.53 42.38 0.30 67.39 STGCN 29.88 0.33 44.51 34.86 0.24 50.86 27.94 0.27 39.05 42.19 0.28 66.40 DCRNN 29.01 0.31 43.12 33.74 0.25 48.88 27.75 0.27 38.74 40.39 0.28 64.28 Tr GNN27.34 0.31 40.05 31.35 0.23 45.11 26.61 0.26 37.20 38.57 0.27 59.53 Tr GNN 26.43 0.30 38.65 29.81 0.23 42.62 25.65 0.25 35.68 34.56 0.25 54.31 %diff -9% -5% -10% -12% -6% -13% -7% -4% -7% -14% -8% -8% Numbers in bold denote the best baseline performance and the best performance. %diff denotes the error reduction of Tr GNN from the best baseline performance.

Table 1: Performance of different approaches for trafﬁc ﬂow prediction on SG-TAXI dataset.

HA DCRNN Tr GNN

Non-peak hours Non-peak hours Peak hours Peak hours

Prediction error (vph)

Figure 4: Line plot of predicted ﬂows on the road network over a working day, and heatmap snapshots of prediction errors during peak hours.

breakdown, when train services were disrupted due to power fault (Chew 2016). The disruption falls on a Monday night lasting for more than one hour, and it affects 52 train stations on 4 train lines, covering the whole area of west Singapore. In Figure 5, the heatmap visualizes the abnormal spike of ﬂows of the affected region due to the increase in taxi demand during the MRT breakdown period, and the line plot visualizes the predicted ﬂows on a sample abnormal road segment - Tr GNN ﬁts the best to the ground truths. We select road segments within a 3km neighborhood of any affected train station, and summarize their prediction results during the breakdown period in Table 1 (under MRT breakdown column). Compared to Overall , we observe a signiﬁcant increase in MAE for all baselines, ranging from 19% to 44%, which demonstrates the performance drop in predicting abnormal ﬂows. Nevertheless, Tr GNN outperforms baselines by a signiﬁcant error reduction of 14%. The result suggests the capability of Tr GNN in capturing the spatiotemporal causality even for non-recurrent ﬂow patterns,

Abnormal road segments at MRT breakdown

Figure 5: Heatmap of abnormal ﬂows in west Singapore due to MRT breakdown, and line plots of abnormal ﬂows and predicted ﬂows on a road segment. In heatmap, the color scale indicates the amount of extra ﬂow compared to that of a normal day.

instead of simply memorizing the historical ﬂow patterns. Component analysis: trajectory transition. To analyze the utility of trajectories, we build a variant of Tr GNN, Tr GNN-, that replaces trajectory transition by the static road network. Results in Table 1 show that compared to Tr GNN- , Tr GNN reduces the prediction errors on all metrics in all scenarios, especially in MRT breakdown where MAE drops from 38.57 vph to 34.56 vph. This veriﬁes the effectiveness of trajectory transition in capturing ﬂow dependency.

6 Conclusion and Future Work This paper proposes a spatiotemporal deep learning model, Trajectory-based Graph Neural Network (Tr GNN), to solve the trafﬁc ﬂow prediction problem. The architecture leverages historical trajectory transition as an input into the graph-based deep learning framework. Tr GNN is evaluated on SG-TAXI dataset. Results show that Tr GNN outperforms state-of-the-art approaches, especially being superior in predicting non-recurrent trafﬁc ﬂows such as in MRT breakdown event. Potential future work includes expansion to a higher-order Markov model, longer-term prediction, and optimization of computational complexity in extracting trajectories. Moreover, our work points out a promising direction in incorporating trajectory data into trafﬁc prediction.

Acknowledgments

This research is supported, in part, by NRF Singapore under its grant SDSC-2019-001, Alibaba Group through Alibaba Innovative Research (AIR) Program and Alibaba-NTU Singapore Joint Research Institute (JRI), and Singapore MOE Tier 1 grant RG18/20. Any opinions, ﬁndings and conclusions or recommendations expressed in this material are those of the author(s) and do not reﬂect the views of funding agencies.

References Chandra, S. R.; and Al-Deek, H. 2009. Predictions of Freeway Trafﬁc Speeds and Volumes Using Vector Autoregressive Models. Journal of Intelligent Transportation Systems 13(2): 53 72.

Chen, C.; Li, K.; Teo, S. G.; Zou, X.; Wang, K.; Wang, J.; and Zeng, Z. 2019. Gated Residual Recurrent Graph Neural Networks for Trafﬁc Prediction. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 33, 485 492.

Chew, H. M. 2016. Train Service Disruptions on Three MRT Lines and Bukit Panjang LRT Due to Power Fault: SMRT. The Straits Times URL https://www.straitstimes.com/singapore/transport/traindisruption-on-east-west-line-and-north-south-line-due-totraction-power.

Chung, J.; Gulcehre, C.; Cho, K.; and Bengio, Y. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. ar Xiv preprint ar Xiv:1412.3555 .

Davis, G. A.; and Nihan, N. L. 1991. Nonparametric Regression and Short-term Freeway Trafﬁc Forecasting. Journal of Transportation Engineering 117(2): 178 188.

Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Lee, D.; Sugiyama, M.; Luxburg, U.; Guyon, I.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 29, 3844 3852. Curran Associates, Inc.

Dijkstra, E. W.; et al. 1959. A Note on Two Problems in Connexion with Graphs. Numerische Mathematik 1(1): 269 271.

Geng, X.; Li, Y.; Wang, L.; Zhang, L.; Yang, Q.; Ye, J.; and Liu, Y. 2019. Spatiotemporal Multi-Graph Convolution Network for Ride-Hailing Demand Forecasting. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 33, 3656 3663.

Guo, S.; Lin, Y.; Feng, N.; Song, C.; and Wan, H. 2019. Attention Based Spatial-Temporal Graph Convolutional Networks for Trafﬁc Flow Forecasting. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 33, 922 929.

Hamilton, J. D. 1994. Time Series Analysis, volume 2. Princeton University Press Princeton, NJ.

Hochreiter, S.; and Schmidhuber, J. 1997. Long Short-term Memory. Neural computation 9(8): 1735 1780.

Jia, Y.; Wu, J.; and Du, Y. 2016. Trafﬁc Speed Prediction Using Deep Learning Method. In 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), 1217 1222. IEEE.

Kingma, D. P.; and Ba, J. 2014. Adam: A Method for Stochastic Optimization. ar Xiv preprint ar Xiv:1412.6980 .

Kipf, T. N.; and Welling, M. 2017. Semi-Supervised Classiﬁcation with Graph Convolutional Networks. In International Conference on Learning Representations (ICLR).

Li, Y.; Yu, R.; Shahabi, C.; and Liu, Y. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Trafﬁc Forecasting. In International Conference on Learning Representations (ICLR).

Liao, B.; Zhang, J.; Wu, C.; Mc Ilwraith, D.; Chen, T.; Yang, S.; Guo, Y.; and Wu, F. 2018. Deep Sequence Learning with Auxiliary Information for Trafﬁc Prediction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 537 546.

Liu, Z.; Li, Z.; Wu, K.; and Li, M. 2018. Urban Trafﬁc Prediction from Mobility Data Using Deep Learning. IEEE Network 32(4): 40 46.

Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; and Wang, F.-Y. 2014. Trafﬁc Flow Prediction With Big Data: A Deep Learning Approach. IEEE Transactions on Intelligent Transportation Systems 16(2): 865 873.

Newson, P.; and Krumm, J. 2009. Hidden Markov Map Matching Through Noise and Sparseness. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 336 343.

Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. 2019. Py Torch: An Imperative Style, High Performance Deep Learning Library. In Advances in Neural Information Processing Systems, 8026 8037.

Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; and Duchesnay, E. 2011. Scikitlearn: Machine Learning in Python. Journal of Machine Learning Research 12: 2825 2830.

Raj, J.; Bahuleyan, H.; and Vanajakshi, L. D. 2016. Application of Data Mining Techniques for Trafﬁc Density Estimation and Prediction. Transportation Research Procedia 17: 321 330.

Ruan, S.; Long, C.; Bao, J.; Li, C.; Yu, Z.; Li, R.; Liang, Y.; He, T.; and Zheng, Y. 2020. Learning to Generate Maps from Trajectories. Proceedings of the AAAI Conference on Artiﬁcial Intelligence 34(01): 890 897.

Seabold, S.; and Perktold, J. 2010. Statsmodels: Econometric and Statistical Modeling with Python. In 9th Python in Science Conference.

Skabardonis, A.; Varaiya, P.; and Petty, K. F. 2003. Measuring Recurrent and Nonrecurrent Trafﬁc Congestion. Transportation Research Record 1856(1): 118 124.

Sun, J.; and Sun, J. 2015. A Dynamic Bayesian Network Model for Real-Time Crash Prediction Using Trafﬁc Speed Conditions Data. Transportation Research Part C: Emerging Technologies 54: 176 186.

Sun, S.; Zhang, C.; and Yu, G. 2006. A Bayesian Network Approach to Trafﬁc Flow Forecasting. IEEE Transactions on Intelligent Transportation Systems 7(1): 124 132.

Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems.

Vanajakshi, L.; and Rilett, L. R. 2004. A Comparison of the Performance of a Atiﬁcial Neural Networks and Support Vector Machines for the Prediction of Trafﬁc Speed. In IEEE Intelligent Vehicles Symposium, 194 199. IEEE.

Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention Is All You Need. Advances in Neural Information Processing Systems 30: 5998 6008.

Veliˇckovi c, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; and Bengio, Y. 2018. Graph Attention Networks. In International Conference on Learning Representations (ICLR).

Williams, B. M.; and Hoel, L. A. 2003. Modeling and Forecasting Vehicular Trafﬁc Flow as a Seasonal ARIMA Process: Theoretical Basis and Empirical Results. Journal of Transportation Engineering 129(6): 664 672.

Wu, Z.; Pan, S.; Long, G.; Jiang, J.; and Zhang, C. 2019. Graph Wave Net for Deep Spatial-Temporal Graph Modeling. In Proceedings of the 28th International Joint Conference on Artiﬁcial Intelligence (IJCAI), 1907 1913.

Yao, H.; Wu, F.; Ke, J.; Tang, X.; Jia, Y.; Lu, S.; Gong, P.; Ye, J.; and Zhenhui, L. 2018. Deep Multi-View Spatial Temporal Network for Taxi Demand Prediction. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence.

Yu, B.; Yin, H.; and Zhu, Z. 2018. Spatio-temporal Graph Convolutional Networks: A Deep Learning Framework for Trafﬁc Forecasting. In Proceedings of the 27th International Joint Conference on Artiﬁcial Intelligence (IJCAI).

Yu, R.; Li, Y.; Shahabi, C.; Demiryurek, U.; and Liu, Y. 2017. Deep learning: A Generic Approach for Extreme Condition Trafﬁc Forecasting. In Proceedings of the 2017 SIAM International Conference on Data Mining, 777 785. SIAM.

Zhang, Q.; Chang, J.; Meng, G.; Xiang, S.; and Pan, C. 2020. Spatio-Temporal Graph Structure Learning for Trafﬁc Forecasting. Proceedings of the AAAI Conference on Artiﬁcial Intelligence 34(01): 1177 1185.

Zhang, X.; Xie, L.; Wang, Z.; and Zhou, J. 2019. Boosted Trajectory Calibration for Trafﬁc State Estimation. In 2019 IEEE International Conference on Data Mining (ICDM), 866 875. IEEE.

Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; and Li, H. 2019. T-GCN: A Temporal Graph Convolutional Network for Trafﬁc Prediction. IEEE Transactions on Intelligent Transportation Systems .

Zheng, C.; Fan, X.; Wang, C.; and Qi, J. 2020. GMAN: A Graph Multi-Attention Network for Trafﬁc Prediction. Proceedings of the AAAI Conference on Artiﬁcial Intelligence 34(01): 1234 1241. Zhou, P.; Zheng, Y.; and Li, M. 2012. How Long to Wait? Predicting Bus Arrival Time With Mobile Phone Based Participatory Sensing. In Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services, 379 392.