# selecting_compliant_agents_for_optin_microtolling__4384d60f.pdf

The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)

Selecting Compliant Agents for Opt-in Micro-Tolling

Josiah P. Hanna,1 Guni Sharon,2 Stephen D. Boyles,1 Peter Stone1

1 The University of Texas at Austin 2 Texas A & M University jphanna@cs.utexas.edu, guni@tamu.edu, sboyles@mail.utexas.edu, pstone@cs.utexas.edu

This paper examines the impact of tolls on social welfare in the context of a transportation network in which only a portion of the agents are subject to tolls. More specifically, this paper addresses the question: which subset of agents provides the most system benefit if they are compliant with an approximate marginal cost tolling scheme? Since previous work suggests this problem is NP-hard, we examine a heuristic approach. Our experimental results on three real-world traffic scenarios suggest that evaluating the marginal impact of a given agent serves as a particularly strong heuristic for selecting an agent to be compliant. Results from using this heuristic for selecting 7.6% of the agents to be compliant achieved an increase of up to 10.9% in social welfare over not tolling at all. The presented heuristic approach and conclusions can help practitioners target specific agents to participate in an opt-in tolling scheme.

1 Introduction

Within road networks, the latest advances in GPS-based tolling technology and electronic road pricing systems are making it possible to charge specific tolls on each link of the network (Numrich, Ruja, and Voß 2012; Chen et al. 2018). Such micro-tolling systems dynamically update the tolls in response to real-time observations of traffic conditions. From the other end, vehicles (either autonomous or human drivers with computer-based routing) may re-route according to how they value paying tolls compared to taking an alternative path. A large body of research has shown that appropriately setting tolls can lead to improved and even optimal social welfare in transportation networks (Pigou 1920; Roughgarden and Tardos 2002; Braess 1969). Despite the promise of such tolling systems, political factors might deter public officials from allowing such a microtolling scheme to be realized. Road-pricing is known to cause a great deal of public unrest and is thus opposed by governmental institutions (Schaller 2010). Motivated by the unpopularity of road-pricing, we propose an opt-in microtolling system. An opt-in micro-tolling system is a system where agents may voluntarily join the system and pay tolls. Naturally, such a system would require an initial incentive

Copyright c 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

(e.g., a lump-sum payment) for rational agents to gain utility by opting-in. The system benefits in turn by being able to influence the routes selected by these agents through setting tolls. We refer to agents that join the system as compliant agents; otherwise they are non-compliant agents. Noncompliant agents plan their route to minimize travel time and are not subject to tolls. Both sets of agents may choose to take any link in the road network.

A key problem towards implementing such an opt-in micro-tolling system is the problem of identifying the set of agents who provide the most benefit to the system if they opt-in. Consider a single compliant agent that must take a congested road to reach its destination. Being able to influence this user has little value for reducing congestion as the agent has no alternative but to take the congested road. While previous work has shown that it is possible to achieve system optimal performance with partial compliance (Sharon et al. 2018), it is an open question as to how the set of compliant agents should be selected. Solving this problem would allow practitioners to identify the set of agents that maximizes system benefit and target them with specific incentives to become compliant. This paper addresses this problem by answering the question, Given that we can select n compliant agents, how should we select the n agents such that the system social welfare is maximized?

Prior work on a related problem suggests that computing the optimal set of compliant agents is NP-hard (Sharon et al. 2018). Thus, we develop a heuristic method that selects compliant agents based on a heuristic estimate of the agent s marginal impact on the system. While this exact quantity cannot be computed in practice, we show that it can be approximated. We present experimental results obtained from a dynamic traffic assignment simulation of three real-world traffic scenarios. The results show that assigning the compliant set according to our presented heuristic results in better overall performance compared to baseline approaches for selecting the compliant set. Moreover, the results suggest that a significant improvement in traffic flow can be achieved when as little as 7.6% of the agents are compliant. Additionally, our experimental results suggest, counter-intuitively, that having a moderate number of compliant agents can yield higher social welfare compared to a larger number of compliant agents in an adaptive tolling scheme.

2 The Traffic Model

We consider a scenario where a set of agents, A, must be routed across a traffic network given as a directed graph, G(V, E). Each link e E has a latency, le, defined to be the amount of time needed to traverse e. A path, p, is an ordered set of adjacent links. The latency of p is defined to be lp =

e p le. Each link e is also assigned a toll value τe that may change at every discrete time-step t. For any path p we define the total tolls along p as τp =

e p τe. Each agent a A begins from a source node, sa V at time ta and travels towards destination node, da V . A path, p, is valid for a given agent, a, if it leads from sa to da. We denote the value-of-time for agent a as va, i.e., the agent s valuation of a delay of 1 time unit. Agents are assumed to be self-interested and, hence, follow the least cost path leading from sa to da. In this work, we define two types of agents:

Compliant compliant agents are subject to tolls. As a result a compliant agent, a, seeks to minimize the generalized cost of its route: Cg(a, p) = lp va + τp.

Non-compliant non-compliant agents are not subject to tolls. As a result a non-compliant agent, a, seeks to minimize only the latency component of its route: Cl(a, p) = lp va.

Since latency and toll values change, we assume agents continually re-optimize their chosen route according to current conditions.1 As a result, an agent might change its planned route at every node along its path. This work considers that we can select a set of compliant agents. We assume selected agents always opt-in to the system. The compliant set must be chosen in a way that maximizes social welfare, defined to be

where la is the actual travel time experienced by agent a. Toll costs are considered as transfer payments and thus excluded from total social welfare. A flow assignment specifies a path for each agent in A. A flow is valid if all agents are assigned valid paths. Let X be the set of valid flow assignments. A valid flow is defined as strongly optimal if it yields the maximal social welfare among all flows in X. Define Xl X as the set of valid flow assignments where each non-compliant agent (a) travels on minimal latency paths (arg minp[Cl(a, p)]). A valid flow that is composed from compliant and non-compliant agents is defined as weakly optimal if it yields the maximal social welfare among all flows in Xl. Intuitively, weakly optimal flows are optimal for a given number of compliant agents but could potentially be improved if more agents would become compliant. Strongly optimal flows cannot be improved.

1In principle, agents may predict changing latencies and toll values, however including prediction in our work requires assuming a model for how agents would predict latency and tolls.

3 Background In this section, we first define the marginal cost of links and paths in the network. We then review the conclusions drawn from previous work on computing the exact set of compliant agents required to achieve strong optimality. These conclusions serve as a starting point for our work.

Marginal Cost Paths and Tolls Define the marginal cost of link e, me, to be the marginal impact on the system (difference in social welfare) from adding one more agent to link e and define the marginal cost of a path to be mp =

e p me. A path p leading from s to d is said to be of minimal marginal cost if there is no other path p leading from s to d such that mp > mp . Note that computing the marginal cost requires counterfactual knowledge of both the latency of the link and the latency of the link with an additional agent. If all agents choose minimal marginal cost paths then the resulting flow is strongly optimal (Roughgarden and Tardos 2002). One way to encourage self-interested agents to follow minimal marginal cost paths is by imposing marginal cost tolls (MCT) (Pigou 1920; Beckmann, Mc Guire, and Winston 1956; Braess 1969). When applying MCT each agent is charged a toll that is equivalent to the damage it inflicts on the system (i.e., the decrease it causes to social welfare). Applying a marginal-cost tolling scheme requires knowing in advance the marginal delay that each agent will impose on all others. This, in turn, requires exact knowledge of future demand and roadway capacity conditions, as well as counter-factual knowledge of the network state without each driver. Assuming such knowledge is unreasonable in many traffic models as well as in real-life traffic.

Computing the Optimal Set of Compliant Agents Sharon et al. (2018) studied the problem of partial compliance with a micro-tolling system within an idealized macroscopic model of a traffic network. This work proposed an algorithm for computing the minimal set of compliant agents that is required for achieving a strongly optimal flow. That work drew three conclusions that are of particular relevance to this paper:

1. Achieving a strongly optimal flow might require 100% of the agents to be compliant.

2. If the set of compliant agents is sufficient for achieving a strongly optimal flow then diverting those agents towards minimal marginal cost paths would lead to a flow that is both weakly and strongly optimal.

3. If the set of compliant agents is insufficient for achieving a strongly optimal flow then computing the weakly optimal flow is NP-hard.

4 Selecting Compliant Agents This paper focuses on traffic scenarios where a subset of the agents are compliant (i.e., pay tolls) and are thus traveling on a path, p, that minimizes Cg(a, p) over all valid paths. The rest of the agents are considered as non-compliant and are thus traveling on a path, p, that minimizes Cl(a, p) over

all valid paths. Both sets of agents may include any network link in their path; non-compliant agents do not pay tolls even when traveling on a link with a non-zero toll. Compliant agents that are taking the same link at the same time pay the same toll. In contrast to Sharon et al. (2018), this paper also considers scenarios where the set of compliant agents is insufficient to achieve a strongly optimal flow. Specifically, we address the question, Given limited resources that allow recruiting n agents to be compliant, which set of n compliant agents will lead to the best system performance? In addressing this question, we assume that tolls are set in a way that approximates the marginal cost toll (MCT). We motivate this assumption from Conclusion 2 of section 3.2: diverting agents towards minimal marginal cost paths leads to a flow that is both weakly and strongly optimal. Though this conclusion applies for the specific case of sufficient compliant agents, we assume it for the general case due to the fact that computing the exact path assignment is NP-Hard. We study the setting where tolls are set to approximate the marginal cost toll since (if done exactly) marginal cost tolling would lead to the optimal solution if all agents were compliant. In principle, an opt-in micro-tolling scheme could be applied alongside any underlying tolling mechanism. In order to answer our main question, we aim to develop a function that approximates the benefit of having a given agent as compliant. Specifically, we aim to develop heuristic measures of the benefit of selecting an agent as compliant versus non-compliant. More formally, this function should assign each agent, a A, a value h(a) R, such that h(a1) < h(a2) implies that the system will benefit more from agent a2 being compliant than a1 being compliant. After defining such a function, we require a mechanism that uses this function to select n compliant agents. If we have knowledge of h(a) for all a A then we could select the n agents with the highest h(a) values. However, knowing h(a) for all agents requires knowing in advance what agents will enter the system. A more reasonable assumption is that we can estimate the distribution of the h(a) values. Our proposed mechanism can then approximately select n compliant agents using the inverse of the cumulative distribution function (CDF) of the estimated heuristic values distribution. Formally, let a A be an agent sampled from A with uniform probability on all agents. Define F to be the CDF of h(a), i.e., F(x) is the probability that h(a) < x. The proposed mechanism selects n compliant agents by selecting all agents with h(a) > F 1( |A| n

|A| ) to be compliant where F 1 is the inverse of F. In practice, F 1 is likely unknown but can be estimated empirically.

5 Heuristic functions

The previous section discussed a general mechanism for selecting n compliant agents but it made the assumption that we have a function, h, that assesses the impact of assigning an agent as compliant. This section suggests such a function. Development of a good heuristic estimate of the value of assigning an agent to be compliant requires us to identify agents attributes that correlate with the impact of as-

signing the given agent as compliant. More formally, let pl(a) = arg minp Cl(a, p) be the shortest non-compliant path (minimal latency path) for a given agent, a. Let pg(a) = arg minp Cg(a, p) be the optimal path with regard to the generalized cost for agent a. The desired heuristic function is one that correlates to: h (a) = mpl(a) mpg(a) + (lpl(a) lpg(a)) va (1) Intuitively, the desired heuristic is the decrease in the marginal cost imposed by an agent on the system plus the utility loss suffered by that agent by changing from being non-compliant to being compliant. Assuming knowledge of the agents value of time as well as the source, destination, and departure time for each agent, we can compute pl(a) and pg(a) for any agent, a. Unfortunately, computing the marginal cost of either paths is infeasible in real time (Sharon et al. 2017b). As a result we must turn to approximation methods.

Difference between Marginal Cost Paths We cannot compute the marginal cost and thus we cannot use our desired heuristic h . Instead we suggest approximating h using approximate marginal cost tolling. Recall from Section 3 that the sum of marginal cost tolls on links in a path are equal to the marginal cost of that path. Thus, if we can approximate the marginal cost toll on each link then we can use the sum of the approximate tolls along a path, p, to approximate mp. Recall that we have assumed that tolls on links are set in a way that approximates the MCT. Thus the tolls along a path, p, approximate mp. We define the difference in marginal cost path heuristic function as: h DMCP(a) = τpl τpg + (lpl(a) lpg(a)) va Computing this heuristic requires computing the compliant and non-compliant shortest paths (time complexity of O(|E| + |V | log |V |)). One limitation of the DMCP heuristic is that it sometimes fails to differentiate between two agents. When a given agent, a, has the same choices of pg and pl (i.e., pg(a) = pl(a) then h DMCP(a) = 0. As a result, h DMCP will fail to differentiate between all agents with pg = pl and it will be impossible to determine which agent is a better choice for influencing.2 We next consider an alternative that has a weaker relationship with h but will better differentiate between two agents with similar h DMCP values. We will then show how this alternative can be combined with DMCP to better differentiate between two agents with identical DMCP values.

Time Evaluation Given this limitation, we now turn to examine agents attributes that better distinguish between agents while still correlating with mpl(a) mpg(a). The agents value of time

2Differentiating between agents with pg = pl might seem unnecessary. However, as apparent in the experimental results, such a differentiation is meaningful due to the fact that agents constantly reoptimize their route. An agent that seems not to be affected by being compliant at the beginning of its journey might later be affected due to the dynamic nature of traffic.

(VOT) is one such attribute. First, since agents represent heterogeneous drivers each agent s VOT is drawn from some random distribution making this heuristic less likely to group compliant agents. Second, there is a direct correlation between the VOT for a given agent a (va) and the value of mpg(a). This correlation is because moving an agent with a lower va to a path with higher latency will impact social welfare less than moving an agent with a higher va. This correlation between va and mpg(a) implies a correlation between va and mpl(a) mpg(a). Building on this understanding, we define the Time Evaluation heuristic function as:

h TE(a) = va.

This heuristic prefers to select agents with lower valueof-time because they are easier to influence with tolls.

Combining DMCP and TE The Time Evaluation heuristic allows us to differentiate between agents at a finer-grained level than DMCP. However, in contrast to DMCP, it ignores information about the path of the agent. DMCP also implicitly factors in latency on links since tolls will be higher on more congested links. Our final proposed heuristic uses a weighted combination of the two heuristics to obtain the strengths of both. This final heuristic, denoted DMCP+TE, makes primary use of the DMCP heuristic but also use the TE heuristic to better differentiate agents with similar hdmcp values. The DMCP+TE heuristic function is defined as:

h DMCP+TE = (1 α)h DMCP + αh TE(a)

where α is a small, positive constant (we use 0.01). The purpose of α is so that if h DMCP is equal for two agents, h DMCP+TE will still be different provided the agents have different value of time values. Making α small means that the term only breaks ties when h DMCP is close to equal for two agents. Computing this heuristic has the same time-complexity of DMCP (O(|E| + |V | log |V |)).

6 Approximating MCT with the -Tolling Algorithm We have made the assumption that tolls are set using an approximate marginal cost tolling scheme and that these tolls are in turn used to compute the DMCP+TE and DMCP heuristic. In our experiments, we will use the -tolling algorithm and thus we briefly introduce -tolling. We choose -tolling since we are unaware of any other approximation scheme for MCT that is model-free. -tolling (Sharon et al. 2017b; Sharon et al. 2017a) was introduced as a model-free scheme for approximating marginal cost tolling. As opposed to true marginal-cost tolling, -tolling only requires observing the average travel time on each link and makes no assumptions on the underlying traffic model. -tolling changes the toll on each link proportional to the the difference between the current observed travel time and the free-flow travel time. At time step t, -tolling updates each link e E by first computing the difference ( ) between the current latency (lt e) and its free flow travel time (denoted by Te).

Next, -tolling updates the toll for e at the next time step (τ t+1 e ) to be a weighted average of times a parameter, β, and the current toll value. A second parameter, R (0 < R 1), gives the weight assigned to the β term. Deploying -tolling requires tuning these parameters. The R parameter determines the rate in which toll values react to observed traffic conditions. When R = 1 the network s tolls respond immediately to changes in traffic but leave the system susceptible to spikes in toll values that cause traffic flow to oscillate between paths. By contrast, as R 0 the tolls are stable, but are also unresponsive to changes in traffic conditions. Sharon et al. (2017b) showed that the performance of -tolling is sensitive to the values of both the R and β parameters.

7 Empirical Study We compare the relative performance of the proposed heuristics in several simulated traffic scenarios. In contrast to prior work on partial compliance (Sharon et al. 2018), we use a more realistic cell-transmission model simulator and use -tolling as a real-time approximation method to MCT. We design our empirical study to address the following questions: 1. Do DMCP, TE, and DMCP+TE improve over a random assignment of compliant and non-compliant agents? 2. Which of the proposed heuristic methods performs best and under what compliance levels? Analyzing the results of the initial experiments led us to suspect that the optimal number of compliant agents changes as a function of how fast the tolling scheme reacts to the current latencies on links in the network. This understanding, in turn, led us to a second set of experiments, aiming to address the question: 3. How does the compliance level relate to the optimal R parameter in -tolling? In all experiments, our metric of interest is total system welfare as defined in Section 2.

Model Specification We compare the relative performance of the proposed heuristics within a Dynamic Traffic Assignment (DTA) simulator (Levin and Boyles 2015b) which models traffic through the cell transmission model (CTM)(Daganzo 1994; Daganzo 1995). A CTM simulator is a discrete, explicit solution method for the hydrodynamic theory of traffic flow proposed by Lighthill et al. (1955) and Richards (1956). CTM divides each link in a given traffic network into a set of cells, each of which has a length equal to the distance a vehicle would travel in one time step at free-flow conditions. This choice of cell length ensures stability of the cell transmission model (it satisfies the Cournout-Friedrich Lewy conditions (Courant, Friedrichs, and Lewy 1928) for the underlying system of partial differential equations). While CTM was originally formulated for highway networks, it is now used for urban traffic simulation and can be used with a variety of intersection models (Tamp ere et al. 2011). Modeling such intersections allows CTM to simulate

inter-link effects such as queue spillbacks. The simulations reported here use intersection models reflecting a mixture of traffic signals and stop signs (based on real world data).

Traffic Scenario Specification

We evaluated the performance of the different heuristics using three traffic scenarios: Sioux Falls, Austin, and San Antonio. Each scenario is specified by a network and a demand table that provides the source node (sa), start time (ta), and destination node (da) for each agent. The network and demand table sizes for each scenario are:

Sioux Falls - (Le Blanc, Morlok, and Pierskalla 1975) this scenario is widely used in the transportation research literature (Bar-Gera, Hellman, and Patriksson 2013; Levin and Boyles 2015a), and consists of 76 directed links, 24 nodes (intersections) and 28,835 agents spanning 3 hours.

Austin - (Levin et al. 2015) this network consists of 1,247 directed links, 546 nodes and 62,836 agents spanning 2 hours during the morning peak.

San Antonio - this network consists of 1,662 directed links, 864 nodes, and 10,858 agents.

The networks for all scenarios are depicted in Figure 1. The traffic scenarios are available online at: https://goo.gl/ Syv V5m. During simulation, agents respond to changing link travel times and toll values by adapting their routes at each node. In particular, agents compute the minimum cost path from their current node n to their destination da according to their cost function (Cg if compliant; Cl if non-compliant). Similar to Sharon et al. (2017b) an additional rule was added to prevent gridlocks which can arise in dynamic traffic models when a cycle of links are at jam density: if a vehicle is unable to enter a link because the link s receiving flow is zero for more than 96 seconds (16 time steps), the vehicle attempts to reroute itself through the least cost path leading to its destination that avoids the jammed link. The same waiting time (96 seconds) was used by Sharon et al. (2017b) and was justified as the best performing value. Following the experimental setting in Sharon et al. (2017b) the distribution of agent value-of-time follows a Dagum distribution with parameters ˆa = 22020.6, ˆb = 2.7926, and ˆc = 0.2977, reflecting the distribution of personal income in the United States (Łukasiewicza, Karpioa, and Orłowskia 2012). At the beginning of each simulation, the value va is sampled from this distribution for each a A. For each compliance level and heuristic method we run 20 trials and average the resulting total social welfare values.

Determining Heuristic Thresholds

Our mechanism for selecting compliant agents requires the empirical cumulative distribution function (CDF) of heuristic values over agents. When using the Time Evaluation (TE) heuristic, we simply use the inverse CDF of the Dagum distribution. For the Difference between Marginal Cost Paths (DMCP) heuristics we estimate the inverse CDF by running

(a) Sioux Falls

(c) San Antonio

Figure 1: Traffic scenarios used in the experiments.

the simulation with all vehicles as non-compliant.3 When an agent, a, enters the system we compute h(a) and store this value. We then sort the stored heuristic values of all agents once the simulation is complete. If the sorted h values are indexed as h0...hi...h|A| then the empirical CDF is defined as F 1(x) = h|A| x for x [0, 1]. Due to stochasticity in the value-of-time of agents, using the empirical inverse CDF may result in greater or fewer than n compliant agents. When plotting results we use the true compliance level but then aggregate results to the nearest 5% of compliance level when averaging performance for each compliance level. For example, if a threshold results in 16% agents being compliant then we record and present the compliance level as 15% when averaging results.

Heuristics comparison

Our main empirical analysis compares the DMCP, TE, and DMCP+TE heuristic methods for various levels of compliance. We also include a baseline (denoted RANDOM) that selects compliant agents randomly. The top row of Figure 2 shows results for each heuristic as we vary the compliance level with R = 1 10 4, β = 4 (the parameter settings used by Sharon et al. (2017b)). We first note that in all scenarios and for all heuristics (Figure 2, top row), the system s performance increases to an optimum and then remains constant or decreases. We hypothesize that the decrease in performance is most likely related to an R value that is too high causing performance to deteriorate as more agents become susceptible to spiking toll values and oscillation. We test this hypothesis by repeating the same set of experiments with R = 1 10 5. We display results for these experiments in the bottom row of Figure 2. These results suggest that the system can benefit from a higher R value when fewer agents are compliant. In the following subsection we will revisit this observation. We observe the DMCP+TE heuristic to perform best in Sioux Falls and San Antonio it reaches the maximal or near maximal observed performance with approximately 20% of agents compliant when R = 1 10 4. In Austin (R = 1 10 4), DMCP+TE requires 40% of agents to be compliant to reach optimal social welfare half as many as TE or the baseline. With R = 1 10 5, DMCP+TE also

3In real-life scenarios, a CDF function can be approximated for the DMCP heuristics through sampling of real-life observations.

(a) Sioux Falls (1 10 4)

(b) Austin (1 10 4)

(c) San Antonio (1 10 4)

(d) Sioux Falls (1 10 5)

(e) Austin (1 10 5)

(f) San Antonio (1 10 5)

Figure 2: Each figure shows the average social welfare for each heuristic method. The x-axis gives the fraction of agents who are compliant and the y-axis gives the total social welfare:

a A va la. The No Tolls baseline corresponds to zero compliant agents. The ideal result is to have as high a social welfare value as possible with a small number of compliant agents. Each heuristic is averaged over 20 random seeds. Error bars show a 95% confidence interval.

leads to greater social welfare with less agents compliant. Using agent s value-of-time (TE) is a small improvement over randomly selecting compliant agents (RANDOM). We also note that RANDOM perform slightly better than the proposed heuristics in the San Antonio (R = 1 10 4) experiment for high compliance levels. This result may indicate that it is possible for our heuristics to find local optima since they are selecting compliant agents greedily.

Compliance and Tolling Scheme Reactivity

As noted in the previous subsection, we observe lower compliance levels giving better system performance than full compliance when the tolling scheme is more reactive to the current travel time on links. This observation suggests that more reactive adaptive tolling systems may in fact benefit from having fewer agents pay tolls. First, we note that a reactive scheme is desirable for quickly reducing the flow of agents on links that have high travel time. But if the tolling scheme overestimates the toll needed to achieve the desired flow then too many agents may decide to choose a different path. The change in many agents route choices will in turn cause the travel time (and tolls) to spike on the new paths and then more agents may switch back to the original path (whose toll may have fallen too low by this time). While in principle, the tolling scheme distributes agents correctly, in practice, excessive switching of paths may introduce inefficiency into the system. One solution to the problem of oscillation of agents between paths is to use a less reactive tolling scheme. The

problem with a less reactive scheme is that tolls should reflect current travel conditions and a less reactive scheme fails to update tolls fast enough. In our final experiment, we study if partial compliance can mitigate the oscillation problem.

While the oscillation problem may exist for any reactive tolling scheme, we investigate it with the specific choice of -tolling. Specifically, we hypothesize that a higher R value (> 10 4) with fewer compliant agents will improve system performance relative to full compliance. The reasoning behind this hypothesis is that only agents that are affected by the tolls are susceptible to oscillation, and so fewer compliant agents would result in less oscillation of traffic. Moreover, a higher R value contributes to a toll value that is more reactive to observed traffic.

To test this hypothesis, we evaluate different values of R for each of our heuristics at different compliance levels. We also compare different R values for our RANDOM baseline. We set β = 4 in all experiments. Figure 3 contains the results for each method.

Across heuristics we see that the higher R values lead to worse social welfare as the number of compliant agents increases. In Figure 3a, we see that the maximal performance obtained by the DMCP+TE heuristic is sensitive to the R parameter. For R 1 10 4, social welfare peaks at approximately the 20% compliance level and then remains constant or decreases. The height of the peak is greatest for R = 1 10 3.

Figure 3: Social welfare (y-axis) as a function of compliance level (x-axis) for different R values and different heuristics. Each figure shows 6 different R parameter values for -tolling. Results are averaged over 20 random seeds. Error bars show a 95% confidence interval.

8 Discussion and Future Work

In this section we discuss the results presented in the previous section and suggest directions for future research. We first observe in our results that across all traffic scenarios and all heuristics, any number of compliant agents is better than none. This result indicates that if even a small number of agents can be incentivized to participate in an approximate marginal cost tolling system (such as -tolling) we may see an improvement in the system s performance. Furthermore, this result demonstrates feasibility of opt-in micro-tolling system when only a subset of agents opt-in. While any number of compliant agents is better than none, we show that our proposed DMCP+TE heuristic lead to further improvements in system performance compared to assigning a random subset of agents to be compliant. In particular, across all traffic scenarios we see that the DMCP+TE heuristic can obtain close to the performance of 100% compliance. In fact, in the San Antonio scenario with 7.6% compliant agents we see an improvement of 10.9% and in Sioux Falls with 18.7% compliant agents we see an improvement of 21.1%. In our empirical analysis we make two assumptions that could be relaxed in future work. First, we assume that agents selected by one of our heuristic methods become compliant with probability one. In the real world it is unlikely that all selected agents will decide to opt-in. Future work should consider the robustness of the presented heuristic methods when selected agents may remain non-compliant with some probability. Second, we considered traffic scenarios where each agent makes a single trip through the network while in the real world, agents may make multiple trips every day. In such a setting, an effective heuristic may need to consider the frequency of trips that an agent makes. Finally, it is also important to consider how to incentivize agents to participate in a micro-tolling system. A first step towards addressing this problem could be to investigate the

loss in utility to an agent who switches from non-compliant to compliant.

9 Conclusion This paper has considered an opt-in micro-tolling system as an alternative to a full micro-tolling system. We propose the problem of how to select compliant agents in such a system since the benefits from tolling may be sensitive to which users comply with tolls. Since selecting the optimal set of compliant agents has been suggested to be NP-hard, we introduce a heuristic method for doing so. In experiments with a dynamic traffic assignment simulator we demonstrate that 1) even randomly selecting agents to be compliant can lead to improvement over having no compliant agents and 2) our proposed heuristic method leads to more improvement with fewer compliant agents than the baseline. These contributions demonstrate that, with our heuristic method, the benefits of an approximate marginal cost tolling system can be realized with only a subset of agents responding to tolls.

Acknowledgments We would like to thank the anonymous reviewers for suggestions that improved the final presentation of the work. A portion of the work has taken place in the Learning Agents Research Group (LARG) at the Artificial Intelligence Laboratory, The University of Texas at Austin. LARG research is supported in part by grants from the National Science Foundation (IIS-1637736, IIS-1651089, IIS-1724157), Intel, Raytheon, and Lockheed Martin. Josiah Hanna is supported by an IBM Ph D Fellowship. Peter Stone serves on the Board of Directors of Cogitai, Inc. The terms of this arrangement have been reviewed and approved by the University of Texas at Austin in accordance with its policy on objectivity in research. The authors also acknowledge the support of the Data-Supported Transportation Operations & Planning Center and the National Science Foundation under Grant Nos. CMMI-1254921 and CNS-1739964.

References Bar-Gera, H.; Hellman, F.; and Patriksson, M. 2013. Computational precision of traffic equilibria sensitivities in automatic network design and road pricing. Transportation Research Part B: Methodological 57:485 500. Beckmann, M. J.; Mc Guire, C. B.; and Winston, C. B. 1956. Studies in the Economics of Transportation. New Haven, CT: Yale University Press. Braess, D. 1969. Uber ein Paradoxon aus der Verkehrsplanung. Unternehmensforschung 12:258 268. Chen, H.; An, B.; Sharon, G.; Hanna, J. P.; Stone, P.; Miao, C.; and Soh, Y. C. 2018. Dyetc: Dynamic electronic toll collection for traffic congestion alleviation. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI18). Courant, R.; Friedrichs, K.; and Lewy, H. 1928. Uber die partiellen Differenzengleichungen der mathematischen Physik. Mathematische Annalen 100:32 74. Daganzo, C. F. 1994. The cell transmission model: a dynamic representation of highway traffic consistent with the hydrodynamic theory. Transportation Research Part B 28(4):269 287. Daganzo, C. F. 1995. The cell transmission model, part II: network traffic. Transportation Research Part B 29(2):79 93. Le Blanc, L. J.; Morlok, E. K.; and Pierskalla, W. P. 1975. An efficient approach to solving the road network equilibrium traffic assignment problem. Transportation Research 9(5):309 318. Levin, M. W., and Boyles, S. D. 2015a. Intersection auctions and reservation-based control in dynamic traffic assignment. In Transportation Research Board 94th Annual Meeting, number 15-2149. Levin, M. W., and Boyles, S. D. 2015b. A multiclass cell transmission model for shared human and autonomous vehicle roads. Transportation Research Part C: Emerging Technologies. Levin, M. W.; Pool, M.; Owens, T.; Juri, N. R.; and Waller, S. T. 2015. Improving the convergence of simulation-based dynamic traffic assignment methodologies. Networks and Spatial Economics 15(3):655 676. Lighthill, M., and Whitham, G. 1955. On kinematic waves. II. A theory of traffic flow on long crowded roads. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences 317 345. Łukasiewicza, P.; Karpioa, K.; and Orłowskia, A. 2012. The models of personal incomes in USA. In Proceedings of the 5th Symposium on Physics in Economics and Social Sciences. Numrich, J.; Ruja, S.; and Voß, S. 2012. Global navigation satellite system based tolling: state-of-the-art. NETNOMICS: Economic Research and Electronic Networking 13(2):93. Pigou, A. C. 1920. The Economics of Welfare. Palgrave Macmillan.

Richards, P. 1956. Shock waves on the highway. Operations Research 4(1):42 51. Roughgarden, T., and Tardos, E. 2002. How bad is selfish routing? Journal of the ACM (JACM) 49(2):236 259. Schaller, B. 2010. New York City s congestion pricing experience and implications for road pricing acceptance in the United States. Transport Policy 17(4):266 273. Sharon, G.; Hanna, J. P.; Rambha, T.; Levin, M. W.; Albert, M.; Boyles, S. D.; and Stone, P. 2017a. Real-time adaptive tolling scheme for optimized social welfare in traffic networks. In Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS2017). Sharon, G.; Levin, M. W.; Hanna, J. P.; Rambha, T.; Boyles, S. D.; and Stone, P. 2017b. Network-wide adaptive tolling for connected and automated vehicles. Transportation Research Part C 84:142 157. Sharon, G.; Albert, M.; Rambha, T.; Boyles, S.; and Stone, P. 2018. Traffic optimization for a mixture of self-interested and compliant agents. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18). Tamp ere, C. M.; Corthout, R.; Cattrysse, D.; and Immers, L. H. 2011. A generic class of first order node models for dynamic macroscopic simulation of traffic flows. Transportation Research Part B: Methodological 45(1):289 309.