# managing_overstaying_electric_vehicles_in_parkandcharge_facilities__b970b837.pdf

Managing Overstaying Electric Vehicles in Park-and-Charge Facilities

Arpita Biswas, Ragavendran Gopalakrishnan, Partha Dutta

Xerox Research Centre India {Arpita.Biswas, Ragavendran.Gopalakrishnan,Partha.Dutta}@xerox.com

With the increase in adoption of Electric Vehicles (EVs), proper utilization of the charging infrastructure is an emerging challenge for service providers. Overstaying of an EV after a charging event is a key contributor to low utilization. Since overstaying is easily detectable by monitoring the power drawn from the charger, managing this problem primarily involves designing an appropriate penalty during the overstaying period. Higher penalties do discourage overstaying; however, due to uncertainty in parking duration, less people would ﬁnd such penalties acceptable, leading to decreased utilization (and revenue). To analyze this central tradeoff, we develop a novel framework that integrates models for realistic user behavior into queueing dynamics to locate the optimal penalty from the points of view of utilization and revenue, for different values of the external charging demand. Next, when the model parameters are unknown, we show how an online learning algorithm, such as UCB, can be adapted to learn the optimal penalty. Our experimental validation, based on charging data from London, shows that an appropriate penalty can increase both utilization and revenue while signiﬁcantly reducing overstaying.

1 Introduction As the number of on-road Electric Vehicles (EVs) increases rapidly, lack of adequate charging infrastructure is an area of growing concern. For example, a recent New York Times article reports that in California, scarcity of park-and-charge spots is a chronic problem: Electric-vehicle owners are unplugging one another s cars, trading insults, and creating black markets and side deals to trade spots in corporate parking lots [Richtel, 2015]. While investing in a wide-spread deployment of charging stations by both public and private service providers is needed to address the infrastructure problem [Gopalakrishnan et al., 2016] in the long-term, effective use of the existing charging infrastructure can help in signiﬁcantly reducing this problem. In particular, curbing improper utilization of park-and-charge spots can improve their availability. Two prominent causes of utilization degradation are

(i) the overstaying problem, where an EV continues to occupy a park-and-charge spot even after it is fully charged, and (ii) the icing problem, where a gas-powered car (Internal Combustion Engine or ICE vehicle) occupies a parkand-charge spot.

While icing and overstaying in park-and-charge spots are illegal in increasingly many jurisdictions, there is little or no enforcement [Loveday, 2013], as a result of which the frustrated users resort to ad-hoc measures ranging from mild (e.g., leaving courtesy notices) [Blink, 2012; 2015] to drastic (e.g., publicly shaming the violator by posting a picture of the violation showing the violator s license plate on blogs and social media) [Tumblr, 2015; Zapatag, 2015]. Extension cables that can help charge a vehicle parked a few spots away from an occupied park-and-charge spot are available in the market, but are very expensive [Moloughney, 2015]. A longer term solution to these problems, however, will require both enforcement and an appropriate penalty for these events.

While enforcement with heavy penalty may help curb icing, a gentler approach is prudent to manage overstaying EVs. Depending on the demand for charging, overstaying EVs potentially block access to other EVs that might need charging, and so, it is important to discourage such behavior by imposing penalties. But, imposing too high a penalty might turn away EVs from using the park-and-charge facility altogether due to increased risk of a steep ﬁne, since EV users may not exactly know their parking duration beforehand.

Our central contribution in this paper is a novel framework that combines a realistic user behavior model with traditional queueing dynamics to capture this trade-off and study the optimal penalty from the points of view of both utilization and revenue. Next, when the model parameters are unknown, we show how the well known UCB[Auer et al., 2002] learning algorithm can be adapted to learn the optimal penalty over a period of time. Our experiments, based on charging data from London, show that an appropriate penalty results in increased utilization and signiﬁcantly increased revenue. Also, perhaps surprisingly, we observe that the utilization achieved by imposing the revenue-maximizing penalty is very close to the maximum utilization.

1.1 Related Work

Dynamic pricing of parking has been an area of active study in the transportation literature. In [Zoeter et al., 2014;

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

Rowe and Fiorucci, 2011], the authors present dynamic pricing schemes for regular parking based on estimated demand to reduce both congestion and underuse. On the other hand, signiﬁcant work has also been done on dynamic pricing of electric vehicle charging. In [Tushar et al., 2012], the authors model the problem as a Stackelberg game between the smart grid as the leader setting the price, and the vehicle owner as the follower deciding their charging strategies. In [Anglin et al., 2013; Gadh et al., 2012; Hafner et al., 2009], dynamic pricing of electricity for charging EVs is proposed based on the usage data in a given location and time. None of the above work, however, consider the problem of overstaying EVs. To the best of our knowledge, this is the ﬁrst paper to investigate designing penalty schemes for park-and-charge facilities to combat overstaying EVs.

To address the situation where the probability distributions required for modelling user behaviour remain unavailable to the system until the users are actually observed, and the user behaviour needs to be learned over time, we model our problem as a Multi-Armed Bandit (MAB) problem, which has been well studied [Bubeck and Cesa-Bianchi, 2012; Auer et al., 2002; Cesa-Bianchi, 2006; Agrawal, 1995] and applied to a wide variety of domains such as crowdsourcing, online advertising, dynamic pricing, and smart grids. However, we are not aware of any application of MABs to the park-and-charge scenario.

2 Park-and-Charge System Model We adopt a simple three-part model for a parking area: (a) Section 2.1 introduces the pricing function (during charging) and penalty function (during overstaying) for an EV, (b) Section 2.2 models how EV users respond to the posted penalty scheme, i.e., whether they agree to its terms and enter the parking area, and if so, how long they stay, and (c) Section 2.3 models the ﬂow of EV users in and out of the parking area using queueing dynamics. We end the section by deﬁning some performance measures of interest in Section 2.4.

2.1 Pricing and Penalty Functions Let the price function while actively charging be pc(t) and the penalty function for overstaying at a park-and-charge spot after charging is complete be po(t), where pc(0) = po(0) = 0, and pc(t) and po(t) are continuous, nondecreasing functions. In addition, we assume that po(t) is a strictly increasing function, since we will need its inverse function, p 1

o , to be well deﬁned.1 To be realistically implementable (easily understandable by an average user), these functions should not be more complicated than simple piecewise linear functions; however, our framework is general and can accommodate arbitrary functions. These functions are illustrated in Figure 1.

2.2 EV User Behavior We deﬁne the following parameters, for each EV user:

Tc, the duration of time it would take to fully charge their

electric vehicle from the initial state,

1The framework can be extended to the case where po(t) is not strictly increasing, by deﬁning p 1

o (x) = sup{t : po(t) = x}. This could accommodate, for example, an initial grace period (no penalty) after charging is complete.

Figure 1: Pricing and penalty functions for an example scenario.

Ta, the length of their appointment, i.e., the EV user s

preferred parking duration, and

Cmax, the penalty threshold, i.e., the maximum penalty

due to overstaying that they would bear for the convenience of an uninterrupted appointment.

Let Tc, Ta, Cmax be independent, nonnegative random variables with cumulative distribution functions Fc, Fa, Fmax.

Penalty Acceptance Probability: When a user arrives at the parking area and inspects the posted penalty scheme, they know their Tc and Cmax, but they may not know exactly how long their appointment will last; hence Ta is not yet realized. However, the user can still estimate the likelihood that their penalty would not exceed Cmax, as follows:

q = P r (po (Ta Tc) Cmax) = Fa(Tc + p 1

o (Cmax)). (1)

Thus, we assume that an arriving user will enter the parking lot with probability q. Let q denote its mean, given by:

d Fcd Fmax. (2)

Actual Parking Duration: If a user decides to enter the parking area, they park their EV at a park-and-charge spot and leave for their appointment. Ta is then realized, and if it exceeds Tc + p 1

o (Cmax), we assume that the user reluctantly interrupts their appointment to avoid paying more than Cmax penalty. Thus, the actual parking duration is given by:2

o (Cmax), Ta

and hence, the duration of time the EV overstays is given by:

To = (Tpc Tc)+ = max{Tpc Tc, 0}. (4)

Revenue Collected: The revenue from an EV s time at the park-and-charge spot (the cost to the EV user) is given by:

R = pc(Tpc To) + po(To). (5)

2.3 Queueing Model Since there is a ﬁnite number of park-and-charge spots in the parking area, the impact of overstaying depends on the arrival process to the parking lot, e.g., if the inter-arrival times are

2If the realized value of Ta is less than Tc, we assume that the user leaves immediately after Ta time units, without waiting for their EV to ﬁnish charging.

uniformly distributed with a low arrival rate, a limited amount of overstaying may be harmless, as opposed to a high arrival rate during weekends at a mall. We capture such phenomena using an M/G/N/N queueing model [Kleinrock, 1975], where the EVs that accept the posted penalty scheme and enter the parking lot are the jobs , and the N park-and-charge spots are the servers . The parameters and assumptions are:

EVs arrive at the parking area according to a Poisson

process with rate λ. However, only those EVs whose users accept the posted penalty scheme (with probability q), and enter the parking area are counted for further analysis. Using the total probability theorem and the well known Poisson splitting property, it can be shown [Biswas et al., 2016] that this ﬁltered arrival process is still Poisson, with rate λ q. An arriving EV that ﬁnds all N park-and-charge spots

occupied leaves the system (there is no waiting). Otherwise, the EV stays at a park-and-charge spot for a duration of Tpc; thus, the service times are i.i.d. according to the cumulative distribution function of Tpc.3

If Npc is the random variable denoting the number of EVs in the parking area, then its steady state distribution is:

P rob (Npc = i) = i/i! PN

, 0 i N, (6)

where = λ q E[Tpc] is the offered load . Thus, in steady state, the mean number of EVs in the parking area is:

2.4 Performance Measures We deﬁne the following performance measures of interest. Throughout, T > 0 denotes an arbitrary duration of time.

(a) Throughput: The throughput of the park-and-charge

system, denoted by , is the average rate at which EV users leave the park-and-charge area, given by:

E[Tpc] . (8)

(b) Overstay: The fraction of time spent overstaying at park-

and-charge spots, denoted by γo, is given by:

γo = Overstaying Time

Total Time = T E[To]

N T = E[Npc]

E[To] E[Tpc] . (9)

(c) Utilization: The utilization, denoted by γu, is deﬁned as

the fraction of time the park-and-charge spots were used for charging, and is given by:

γu = Charging Time

Total Time = T E[Tpc To]

N T = E[Npc]

(10) (d) Revenue Rate: The revenue rate R is deﬁned as the av-

erage rate at which revenue is accrued from the park-andcharge spots, and is given by:

R = Total Revenue

Total Time = T E[R]

T = E[Npc] E[R]

E[Tpc] . (11)

3This could be a general distribution; however, in an M/G/N/N system, the stationary distribution of its Markov chain (that tracks the number of EVs) depends on the service time distribution only through its mean.

When Tc, Ta, Cmax are generally distributed, and the price/penalty functions pc(t), po(t) are general, the above quantities do not admit closed form expressions, and we defer their mathematical derivations to the full version [Biswas et al., 2016]. However, in Section 3, we investigate a simple special case where the price/penalty functions are linear, Tc and Ta are Exponentially distributed, and Cmax is a constant, and derive closed form expressions for these performance measures. Benchmarking Performance: In order to measure the effectiveness of our penalty scheme, we benchmark it against a system with ideal human behavior, one in which the EV users, on their own, do not overstay, i.e., Tpc = min{Tc, Ta}. Here, q = 1, and the revenue from (5) is simply pc(Tpc).

3 Example Scenario: Linear Functions, Exponential Distributions In this section, we consider a simpliﬁed scenario for which we derive closed form expressions for the performance measures, which allows better analysis of the effectiveness of imposing simple, linear penalties for overstaying. In particular, we assume:

Linearity: Let pc(t) = ct and po(t) = ot, where

c > 0 and o 0 are the parameters. Tc and Ta are Exponentially distributed with parameters

µc and µa respectively. Cmax is not a random variable, but a constant. Due to limited space, we defer the often nontrivial intermediate steps in evaluating the expressions to the full version [Biswas et al., 2016].

Under the above assumptions, the acceptance probability q from (1) and its mean q from (2) can be evaluated to obtain q = 1 βe µa Tc and q = 1 βµc µa+µc respectively, where

Conditional Distribution of Tc: While Tc is Exponentially distributed by assumption, given that (with probability q,) an EV accepts the posted penalty and enters the parking lot (call this event E), the conditional random variable Tc|E need not be Exponential. (Ta remains unchanged, since E does not depend on it.) Thus, we ﬁrst compute the conditional probability density function, fc|E, using Bayes s rule, to obtain:

fc|E(Tc) = q fc(Tc)

q µc e µc Tc

3.1 Distribution and Mean of Tpc The complementary cumulative distribution function of Tpc, deﬁned as F pc(t) = Prob(Tpc > t), can be evaluated as:

e µat, t Cmax

1 µc µa+µc e µat

Thus, the mean, given by

0 F pc(t)dt, can be evaluated as:

E[Tpc] = 1 µa

µa µa + (1 β)µc

3.2 Distribution and Mean of To The complementary cumulative distribution function of To, deﬁned as F o(t) = Prob(To > t), can be evaluated as:

µc µa+µc βµc 2µa+µc

o 0, t Cmax

Thus, the mean, given by

0 F o(t)dt, can be evaluated as:

E[To] = 1 β 2µa + µc

µa µa + (1 β)µc

3.3 Mean Revenue When the pricing/penalty functions are linear, the revenue deﬁned in (5) is given by R = c(Tpc To) + o To, and by linearity of expectations, its mean can be evaluated to obtain:

E[R] = c 2µa + µc

1 + µa µa + (1 β)µc

+ o E[To]. (15)

3.4 Performance Measures By inspecting the expressions, it can be seen that β is increasing in o. Hence, with increasing penalty, E[Tpc] increases and E[To] decreases. The other quantities, including performance measures (8)-(11), are more complicated and their behavior is not obvious from inspection. We omit a detailed analysis with plots due to space constraints. As an example, when we set N = 10 slots, λ = 8 EVs per hour, µc = 60

45 per hour, µa = 60 105 per hour, Cmax = $4, c = $2 per hour, we ﬁnd that setting the penalty rate o = $2.37 per hour provides the maximum value of utilization (30%), whereas the utilization with no penalty is about 26%. (The ideal utilization when there is no overstaying is 42%.) From the point of view of revenue, setting the penalty rate o = $3.07 per hour provides the maximum revenue rate of $15.36 per hour, but the decreased utilization at this penalty rate is only slightly less, at 29.5%. (The ideal revenue when there is no overstaying is merely $8.34 per hour.) Thus, the revenue maximizing penalty rate provides increased utilization (by 4%), as well as signiﬁcantly increased revenue (by 85%).

4 Dynamic Environment

In this section, we consider a real world setting where no prior information about the distributions Fc, Fa, Fmax is known, and thus, the expressions for the performance measures are also unknown to the framework. We assume that these distributions are unknown, but ﬁxed for the day; hence, the utilization and revenue for the day depend only on o.

In our model, on each day, a penalty rate o is declared at the entrance to the parking area. Arriving customers behave according to their Tc, Ta, Cmax values (which are unknown to the system, but known to the customers), and the posted penalty rate o; their behavior is as described in Section 2.2. The system observes the customer behavior only after they enter the parking area. Thus, the performance measures (such as revenue rate, utilization, etc.) corresponding to penalty rate o are only observed at the end of the day. The task is to ﬁnd

an optimal penalty rate

o for each day, such that the total revenue over all days is maximized.4

This problem is an example of sequential decision making in an unknown and dynamic environment, where the system seeks to optimize the average revenue over all days, while continuously gathering more information about the revenue obtained by using different penalty rates o. This leads to a trade-off between exploration (imposing each penalty rate o sufﬁciently often to obtain better estimates of the corresponding revenue accumulated) and exploitation (frequently imposing the optimal penalty for which the observed revenue is maximized). Such problems naturally fall into the category of stochastic multi-armed bandit (MAB) problems [Bubeck and Cesa-Bianchi, 2012]. Each penalty rate o is considered as an arm, and the daily revenue obtained by imposing that penalty rate is analogous to the reward obtained by pulling the corresponding arm. The goal in the MAB setting is to determine the arm to be pulled each time in order to maximize the total reward obtained.

4.1 Learning Algorithm UCB-PC Let A denote a ﬁnite, ordered set of penalty rates, and let Ri denote the expected daily revenue corresponding to penalty rate A[i], which is unknown. The (unknown) optimal penalty rate is thus given by A[i ], where i = arg maxi Ri. In order to estimate {Ri}, on each day, we impose a penalty rate and observe the daily revenue, which is the sum of the payments made by the customers who choose to enter the parking area that day.5 In doing so, we keep track of the observed average daily revenues { ˆRi}, which are then used to gradually learn the optimal penalty rate A[i ] for the parking area. Algorithm 1 is based on the techniques used in the UCB1 algorithm [Auer et al., 2002], which is a well known tool for solving stochastic multi-armed bandit problems. However, UCB1 assumes that the system is allowed to run only for a ﬁnite number of trials T ( ﬁnite horizon multi-armed bandit ), whereas we operate under the assumption that the system runs for inﬁnite time, and must therefore tackle the challenges involved in adapting UCB1 to the inﬁnite horizon setting.6

Algorithm 1 UCB-PC

1: Input: A ﬁnite set A of penalty rates 2: TRi stores the total observed revenue for A[i] 3: Ki stores the number of days A[i] is imposed 4: for t 1 to |A| do 5: Choose penalty rate A[t] on tth day; Observe revenue earned r; 6: Set TRt r; Set Kt 1; 7: Find i = arg maxi

8: for t |A| + 1, |A| + 2, . . . do 9: Choose penalty rate A[i ] on tth day; Observe revenue earned r; 10: Update TRi TRi + r; Update Ki Ki + 1; 11: Update ˆ Ri

12: Find i = arg maxi

for (t + 1)th day;

4Variants of the technique presented here can be used for other performance measures such as utilization or overstay.

5We observe, for each customer, whether they choose to enter the parking area or not, and if they do, their charging and overstaying times, as well as their ﬁnal payment.

6While UCB1 was our choice to illustrate the technique, we believe that our framework can also accommodate modiﬁed versions of other stochastic multi-armed bandit algorithms (such as EXP3).

4.2 Performance Analysis In this section, we compare the performance of our learning algorithm with an optimal algorithm that knows the expected revenues {Ri} beforehand, and can therefore choose the optimal penalty A[i ] every day. The loss incurred by the learning algorithm over K days is termed as the regret LK, and can be written as

E[Ki(K)](Ri Ri) (16)

where E[Ki(K)] denotes the expected number of days (out of K) that penalty rate A[i] is chosen by our learning algorithm. In the literature, the performance of multi-armed bandit algorithms is measured by obtaining an upper bound on this regret. In [Auer et al., 2002], the authors show that the upper bound on the regret using UCB1 is sublinear within a ﬁnite, ﬁxed horizon T; in particular, they show that the regret is bounded above by a term that is O(log T). Next, in Theorem 1, we show that the regret for UCB-PC is upper bounded by O(log K) after the algorithm runs for K days, for all K.

Theorem 1. The expected regret of UCBPC, after the algorithm runs for K days, is |A| X

8 ln K (Ri Ri)2

Proof Sketch: The proof follows from [Auer et al., 2002], where the upper bound for regret is obtained by bounding the expected number of times an arm is pulled. In our case, the upper bound on E[Ki(K)] for each i 6= i is given by:

P (A[i] is imposed on day (t + 1) Ki(t) ) (17)

for any ﬁnite positive integer . Next, in Lemma 1, we show that, for any suboptimal penalty index i 6= i , the term P (A[i] is imposed on day (t + 1) Ki(t) ) is decreasing in t after sufﬁcient exploration.

Lemma 1. For any i 6= i , if penalty rate A[i] is chosen for at least

8 ln t (Ri Ri)2

days among the ﬁrst t days, then,

P (A[i] is imposed on day (t + 1) Ki(t) ) 2

Proof. We begin by writing,

P (A[i] is imposed on day (t + 1) Ki(t) )

P (A[i] is imposed on day (t + 1)

Ki(t) = s Ki (t) = s0) .

After running UCB-PC for t days, a suboptimal penalty index i 6= i is chosen only when ˆ Ri +

s0 ˆ Ri + q

s . Let this event be denoted by W. Also, let U denote the event

, and V denote the event

. Now, the probability of choosing penalty

rate A[i] on the (t + 1)th day can be written as:

P (W ) P(W |U)P(U) + P(W |V )P(V ) + P(W | U V )P( U V ). (19)

Using the Chernoff-Hoeffding s inequality, it can be shown

that P(U) = P

and P(V ) = P

1 t4 . Next, we show that

P(W | U V ) = 0 if s 8 ln t (Ri Ri)2 . First, we observe that the events U and V can be equivalently written as:

U , ˆ Ri > Ri

s0 , Ri < ˆ Ri +

V , ˆ Ri < Ri +

Then, we show that if (20) holds and s 8 ln t (Ri Ri)2 , then the event W is a contradiction:

s ) Ri < Ri + 2

(Ri Ri)2 ) Ri < Ri ,

which is a contradiction. Finally, substituting the results of the above evaluations in (19), we get P(W) 2 t4 , which in turn, can be substituted in (18) to obtain

P(A[i] imposed on (t + 1)thday)

This completes the proof.

Using Lemma 1 and substituting =

8 ln K (Ri Ri)2

in (17), we obtain the required upper bound on regret that is speciﬁed by Theorem 1. It can be shown that LK = O(ln K) for a ﬁxed set of penalty rates. Thus, the average daily regret, given by LK

K , vanishes as K ! 1.

5 Experimental Results For simulating the behavior of electric vehicles we study real-world data for the city of London, obtained from [www.gov.uk, 2013], which consists of 9961 charging events, including usage data of charge points and the duration of stay for EVs. We ﬁnd it difﬁcult to ﬁt a single parametric model to these data without any restrictions.7 Therefore, we only look at data entries that correspond to the standard charging point type and a speciﬁc duration of parking time (between 30 minutes and 180 minutes), in order to ﬁt them in a single parametric model. We assume that the charging duration data corresponds to Tc and the parking duration data corresponds to Ta, the duration for which customers would park their car in the absence of any penalties. The histogram for the restricted parking duration data and charging duration data are shown in Figures 2 and 3.

7Mixture models can be effective, since they reveal the hidden heterogeneity that arises from a latent categorical variable such as charging point type, parking duration (very short, standard, long), time of arrival (peak hours and non-peak hours), etc.

Figure 2: Histogram of parking duration data

Figure 3: Histogram of charging duration data

The distributions that best ﬁt these histograms (using Mathematica) correspond to a uniform distribution between 30 minutes and 180 minutes for Ta, and a generalized Gamma distribution with parameters µ = 1.35188, scale parameter 33.7831, and shape parameters 1.44212 and 1.19403, for Tc.

We assume that the penalty threshold Cmax is a discrete random variable that takes values $4, $8, $10, $20 with probabilities 0.4, 0.3, 0.2, 0.1 respectively. We also assume a linear pricing function during charging, with rate c = $2 per hour. We simulate vehicles arriving to a parking area with 10 charging slots for a 6 hour time period, assuming that the arrivals follow a Poisson distribution with rate 10 per hour. We also assume that when all the slots of a parking lot are occupied, arriving users do not wait for a slot to be free and immediately leave the parking area. We perform two sets of experiments, discussed next.

Experiment 1: We simulate the parking area for a 6 hour time period per day for 100 days. Then, we observe the utilization and revenue obtained for each day by employing a linear penalty function po(t) = o t with o 2 {0, 1, 2, 3, 4, 5, 6}. Note that o = 0 corresponds to the no-penalty scenario. We compare these results against an ideal benchmark where EV users do not overstay at all. Figure 4 shows the results averaged over the 100 days.

Figure 4: Comparing the utilization and total revenue obtained by imposing penalty, no penalty and when no customer overstays

Utilization: The utilization of the parking area increases (dark blue line) after imposing a small penalty as compared to when no penalty (red dotted line) is charged. However, when a high penalty is imposed, most of the EV users decide not to enter the parking area, resulting in a steep drop in utilization. With a penalty rate $4 per hour, the utilization is maximum, and also very close to the ideal utilization when there is no overstaying by the EV users (green line). Thus, imposing just a small penalty helps improve the utilization of charging spots in the parking area signiﬁcantly.

Revenue: The daily revenue from the parking area quickly increases even after imposing a small penalty for overstaying. The daily revenue obtained in the ideal situation of no overstaying is more than when no penalty is imposed (albeit only slightly so), since more customers are served in the former situation than the latter. As the penalty rate increases, the daily revenue follows the same pattern as the utilization, and for the same reasons. It should be noted that the penalty rate $4 per hour also maximizes the daily revenue.

Experiment 2: We simulate the parking area for two different settings: (a) the optimal penalty rate

o is chosen on all days, where

o is calculated by approximately) maximizing the total expected revenue with complete knowledge of all the underlying distributions that are used, and (b) our learning algorithm UCB-PC determines the best o for each day, based on the observed parameters for all previous days. We compare the convergence of the average revenue obtained by our learning method to the average revenue obtained by the (approximately) optimal method. Figure 5 shows the results.

Figure 5: Comparing the performance of UCB-PC with respect to the optimal algorithm

We observe that the penalty rates chosen by our learning algorithm UCB-PC are almost equivalent to that of the optimal algorithm after 15 days in terms of the daily revenue obtained. We also show the difference between the total revenue by the optimal algorithm and UCB-PC averaged over the number of days, along with the theoretical upper bound given in Theorem 1. As expected, it can be seen that the average difference, that is, Lk

k , decreases with increase in the number of days after sufﬁcient exploration (15 days).

6 Concluding Remarks

In this paper, we undertake a formal study, for the ﬁrst time, of the problem of overstaying EVs in park-and-charge spots. We establish a novel framework in which we bring together an interdisciplinary mix of models and techniques: probabilistic user behaviour, queueing dynamics, online learning. This framework can be extended to accommodate different user behaviour models, queueing dynamics, and other learning techniques. One can imagine instantiating it for multiple parking areas in a city with a model for the population of EV users and study the interaction at a higher level, e.g., competition between parking lots. When viewed as a comprehensive model for park-and-charge, our framework could serve as a useful tool for future research.

References [Agrawal, 1995] Rajeev Agrawal. Sample mean based index

policies with O(logn) regret for the multi-armed bandit problem. Advances in Applied Probability, pages 1054 1078, 1995. [Anglin et al., 2013] Howard N. Anglin, Irgelkha Mejia,

Nicholas J. Ruegger, Young, and Yvonne M. Recharging of battery electric vehicles on a smart electrical grid system. United States Patent Application US 20130173326 A1, 2013. [Auer et al., 2002] Peter Auer, Nicolo Cesa-Bianchi, and

Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235 256, 2002. [Biswas et al., 2016] Arpita Biswas, Ragavendran Gopalakr-

ishnan, and Partha Dutta. Managing overstaying electric vehicles in park-and-charge facilities, 2016. ar Xiv preprint ar Xiv:1604.05471. [Blink, 2012] Blink. The Blink Courtesy Notice, 2012. http://www.blinknetwork.com/ﬁle/7741/Blink+Courtesy+ Notices.pdf. [Blink, 2015] Blink. The Blink Courtesy Notice, 2015. http://www.blinknetwork.com/ﬁle/18760/ Blink Courtesy Notices.pdf. [Bubeck and Cesa-Bianchi, 2012] S ebastien Bubeck and Ni-

colo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems, 2012. ar Xiv preprint ar Xiv:1204.5721. [Cesa-Bianchi, 2006] Nicolo Cesa-Bianchi. Prediction, learning, and games. Cambridge University Press, 2006. [Gadh et al., 2012] Rajit Gadh et al. Smart electric vehicle

(ev) charging and grid integration apparatus and methods. United States Patent US 9026347 B2, 2012. [Gopalakrishnan et al., 2016] Ragavendran Gopalakrishnan,

Arpita Biswas, Aleﬁya Lightwala, Skanda Vasudevan, Partha Dutta, and Abhishek Tripathi. Demand prediction and placement optimization for electric vehicle charging stations, 2016. ar Xiv preprint ar Xiv:1604.05472. [Hafner et al., 2009] James Lee Hafner, Melissa Wiltsey

O mara, and Paul Stuart Williamson. Managing incentives for electric vehicle charging transactions. United States Patent Application 20090313104, 2009. [Kleinrock, 1975] Leonard Kleinrock, editor. Queueing Sys-

tems Volume 1: Theory. Wiley Interscience, 1975. [Loveday, 2013] Eric Loveday. Electric Vehicle Parking Law

Coming Under Fire for Not Being Enforced in Hawaii, 2013. http://insideevs.com/electric-vehicle-parking-lawcoming-under-ﬁre-for-notbeing-enforced-in-hawaii/. [Moloughney, 2015] Tom Moloughney. Featured EV Prod-

uct: The JLong, 2015. http://bmwi3.blogspot.co.uk/2015/ 03/featured-ev-product-jlong.html. [Richtel, 2015] Matt Richtel. In California, Electric Cars Outpace Plugs, and Sparks Fly, 2015. http://www.nytimes.com/2015/10/11/science/

in-california-electric-cars-outpace-plugs-and-sparks-ﬂy. html. [Rowe and Fiorucci, 2011] Rick Rowe and Jean-louis Fiorucci. Parking locator system providing variably priced parking fees. United States Patent 8688509, 2011. [Tumblr, 2015] Tumblr. ICE Shaming, 2015. http:// iceshaming.tumblr.com/. [Tushar et al., 2012] W. Tushar, W. Saad, H.V. Poor, and

D.B. Smith. Economics of electric vehicle charging: A game theoretic approach. Smart Grid, IEEE Transactions on, 3(4):1767 1778, 2012. [www.gov.uk, 2013] www.gov.uk. Plugged in places data,

2013. https://www.gov.uk/government/uploads/system/ uploads/attachment data/ﬁle/236776/plugged-in-places. csv/previewl. [Zapatag, 2015] Zapatag. Zapatag, 2015. http://www. zapatag.com/. [Zoeter et al., 2014] Onno Zoeter, Christopher Dance, St ephane Clinchant, and Jean-Marc Andreoli. New algorithms for parking demand management and a city-scale deployment. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, 2014.