# additional_multitouch_attribution_for_online_advertising__d8a7fef0.pdf Additional Multi-Touch Attribution for Online Advertising Wendi Ji, Xiaoling Wang Shanghai Key Laboratory of Trustworthy Computing, East China Normal University 3663 North Zhongshan Road, Shanghai, China wendyg8886@gmail.com, xlwang@sei.ecnu.edu.cn Multi-Touch Attribution studies the effects of various types of online advertisements on purchase conversions. It is a very important problem in computational advertising, as it allows marketers to assign credits for conversions to different advertising channels and optimize advertising campaigns. In this paper, we propose an additional multi-touch attribution model (AMTA) based on two obvious assumptions: (1) the effect of an ad exposure is fading with time and (2) the effects of ad exposures on the browsing path of a user are additive. AMTA borrows the techniques from survival analysis and uses the hazard rate to measure the influence of an ad exposure. In addition, we both take the conversion time and the intrinsic conversion rate of users into consideration to generate the probability of a conversion. Experimental results on a large real-world advertising dataset illustrate that the our proposed method is superior to state-of-the-art techniques in conversion rate prediction and the credit allocation based on AMTA is reasonable. Introduction As the growth of computational advertising, targeting techniques make personalized advertising possible. Based on the contextual information and the user feedback data, online advertising systems deliver ads to the users who are most likely to respond. Nowadays companies launch an advertisement campaign through various channels, such as display ad, video ad, social ad, paid search ad and etc. Attribution technology is designed to help marketers understand how particular channels contribute to user conversions, which is now being seen as integral to the future of digital advertising. A promising attribution model is of great help for marketing managers to interpret the influence of channels and optimize their advertising strategies. In an online advertising campaign, users are exposed to ads with various channels, as illustrated in Figure 1. Suppose that a company X launches an advertising champaign through three channel: display ad, social ad and paid search ad. User 1 saw X s display ad at t1 1 when browsing a webpage and then saw X s social ad at t1 2. Later, she/he searched for products and clicked X s paid ad link at t1 3. Finally, The corresponding author is Xiaoling Wang. Copyright c 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. t1 1t1 1 t2 1t2 1 t3 2t3 2 t1 2t1 2 t3 1t3 1 t3 3t3 3 Figure 1: Customer journeys on an advertising champaign. Each journey is composed by a chronological sequence of actions by a user on three advertising channels, including display ad, social ad and paid search ad. she/he made a purchase on X s website at time T 1. How shall we evaluate the contribution of the three ads to the conversion? Post-click attribution is one of the earliest and simplest attribution models, which assigns all credit to the last ad clicked before a conversion. It has been considered as the standard attribution model in digital advertising industry. For user 1, if the last click wins, the overall contribution is assigned to the paid search ad and the effects of former viewed ads are totally ignored. Despite its simplicity, this attribution mechanism overestimates the contribution of search ads and neglects the influence of the ads before the last click. In fact, the some queries triggering paid search ads are special conversions due to previously viewed ads. Furthermore, in many cases, user never clicks before conversion. A reliable attribution mechanism should consider the contributions of all relative ads in the consumer journey. Users behaviors are caused by the combined effect of the exposed ads within the journey. Multi-touch attribution (MTA) allows marketers to capture real Return of Investment (ROI) for multiple advertising touch points. It has become a significant research topic and has been explored by several online marketing analytics companies (e.g. Google Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Analytics1, Nielsen2). Comparing the ROI based on different MTA models, advertisers evaluate the contribution of different channels and, furthermore, decide how to allocate their budget to various channels in the next stage. Some rulebased MTA models have been proposed in practice, e.g. linear attribution model, time decay attribution model and position based attribution model. However, the main drawbacks of these rule-based models are the subjectivity of hypotheses. In recent year, several data-driven attribution models have been proposed in computational advertising (Shao and Li 2011; Dalessandro et al. 2012; Zhang, Wei, and Ren 2014) and marketing analytics (Xu, Duan, and Whinston 2014; Wooff and Anderson 2015). However, these existing models only consider either the time-independent conversion rate of a user or the conversion time. First, the influence of an ad is highly related to time, because a user is more likely to be affected by more recent ads. Secondly, the actual conversion also depends on the intrinsic conversion rate of user, because the conversion delay does not exist if the user has no interest in the ads. In fact, the conversions are extremely rare event (the actual conversion rate is as low as about 0.01%), so the conversion prediction solely based on conversion time is biased. In this paper, we propose a data-driven model for multitouch attribution and conversion prediction, which is denoted as additional multi-touch attribution model (AMTA). First, we assume that the effect of an ad exposure is fading with time and the effects of multiple ad exposures on the browsing path of a user are additive. Inspired by survival analysis, we use hazard rate to model the effect of an ad exposure upon the conversion, which reflects the effect of an ad exposure to trigger a conversion. The hazard rate of an ad exposure is determined by the influence strength and the decaying speed. It is built for individual ad channel to avoid the bias introduced by different advertising forms and layouts. The distribution of conversion time can be calculated by the additive hazard of all relative ads. Then, we focus on how to predict conversion rate with the proposed AMTA. The conversion prediction based on a MTA model provides great guidance for advertisers to allocate the budget among various channels when starting an advertising champaign. When generating the probability of a conversion, we take both whether the user will convert and when she/he will convert into account. Finally, we evaluated AMTA model using a real-world dataset obtained from Miaozhen3, a leading marketing technology company in China. The experiments demonstrate the effectiveness of the proposed model in both conversion rate prediction and attribution analysis. Related Works In the domain of computational advertising, some recent researches have been devoted to the study of MTA for ad conversions through data-driven approaches. A bagged logistic regression method was proposed to predict the conversion 1http://analytics.google.com 2http://www.nielsen.com 3http://www.miaozhen.com/en/index.html rate based on the viewed ads of a user (Shao and Li 2011), which is the first study in this field. This approach characterize the user journey with the counts of ad exposures and uses the weights to measure the credits of different channels. The drawbacks of this work include: (1) the temporal factor is ignored; (2) the attribution based on logistic regression is difficult to interpret. Dalessandro et al. formulate MTA as a causal estimation problem to achieve interpretable attribution and use the additive marginal lift of each ad to present its credit to conversion (Dalessandro et al. 2012). However, the unbiased estimation of the causal parameters is too complicated to implement and authors therefore developed much simpler approximating methods in practices with subjective assumptions. Zhang et al. proposed an Additivehazard model based on survival theory (Zhang, Wei, and Ren 2014). They modeled the temporal influence of an advertising channel by defining a decay function without the consideration of the intrinsic conversion rate of user and contextual features. However, since it is unknown whether users are interested in the advertising champaign, it is arbitrary to model the impact of an ad exposure. In addition to the extremely sparsity of user conversions, it is even more necessary to consider the intrinsic conversion rate of users. Ji et al. models conversion delay with Weibull distribution and uses the corresponding hazard rate to reflect the influence of an ad exposure (Ji, Wang, and Zhang 2016). This method does not directly measure the combined effect of ad exposure and use one minus the zero effect of all relative ads to generate the multi-touch conversion rate. There are some researches focusing on MTA in marketing analytics (Gupta and Zeithaml 2006; Li and Kannan 2014). A proportional hazard model was used to predict the conversion time based on the viewed ads of users (Manchanda et al. 2006). It is similar to the logistic regression method (Shao and Li 2011) and the difference is: this one aims at the conversion time, but Shao s model aims at the conversion rate. Inspired by first-touch attribution and last-touch attribution, Wooff et al. used beta distribution to model the influence of an ad exposure, which attributes most credit to the first ad and the last ad (Wooff and Anderson 2015). The common drawback of these models is the ignorance of the intrinsic conversion rate of users. Therefore, these methods fail to provide solely conversion rate prediction. This work is also related to studies focusing on the timeaware dynamics of ad exposures and conversions based on survival analysis. In marketing analytics, Bolton et al. and Gonul et al. predicted the probability of a customer switching to competitor with proportional hazard models (Bolton 1998; G on ul, Kim, and Shi 2000), where different specifications for the baseline hazard rate are determined by different duration models such as exponential and Weibull. In recommendation system, the same method was used to predict the right time to recommend a product (Wang and Zhang 2013) . Chapelle used exponential distribution to model the delayed feedback of clicked ads (Chapelle 2014). However, these models are all based on last-touch attribution. The idea of modeling the combined effect of ads by additive hazard is inspired by exciting point process. Yan et al. formulated pipe failure events into a self-exciting stochas- tic process model, which has already deployed as a industrial computational system for pipe failure prediction (Yan et al. 2013). Li and Zha proposed a probabilistic model based on mixtures of Hawkes processes that simultaneously tackles event attribution and network parameter inference to solve the problem of dyadic event attribution (Li and Zha 2013). Yan et al. developed a profile-specific twodimensional Hawkes processes model to capture the influence from sellers activities on their leads to the win outcome in sales pipeline analytics (Yan et al. 2015). Xu et al. proposed a MTA model based on mutually exciting point process, which considers ad clicks and purchases as independent random events in continuous time (Xu, Duan, and Whinston 2014). Censored data (the event has no occurred) makes survival analysis special and exciting point process only considers the occurrence of event, which is the main difference between them. However, the conversion rate is extremely low for online advertising and it is necessary to take the users who have not converted yet into consideration. The drawback of modeling customer journeys in an advertising champaign with exciting point process is the failure of utilizing unconverted ads. The advantage of our proposed AMTA model is the combination of the survival analysis and exciting point process, which considers both censored data and the additive effects of ads. Survival analysis Survival analysis is a widely used approach to have a finegrained modeling of the observed survival time of products in various fields, including biology, technical reliability, econometrics, sociology, etc (Nelson 2005). As a generic term, the survival time is denoted as the time from the initiating event to the event of interest. We assume that the conversion delay T between an ad exposure and the eventual conversion is the survival time in this work. There are two basic concepts that pervade the whole theory of survival analysis: hazard rate and survival function. The hazard rate h(t) presents the occurrence rate of the conversion at timestamp t on the condition that the user does not convert before t, which defined (Lawless 2011): h(t) = lim Δt 0 Pr(t T t + Δt|T > t) The survival function S(t) is defined as the expected proportion of users for which the conversion has not yet occurred by a specified timestamp t. The mathematical connection among the survival function S(T), the hazard rate h(t) and the probability density function ϕ(t) of the survival time t is: h(t) = lim Δt 0 1 Δt S(t) S(t + Δt) S(t) = ϕ(t) By integration, using that S(0) = 1, we get log{S(t)} = t 0 h(ν)dν, (3) and it follows that S(t) = exp t 0 h(ν)dν . (4) And the probability density function of a conversion occurring at time t is ϕ(t) = h(t)S(t). (5) Therefore, we can define the survival function S(t) and the probability density function ϕ(t) given the hazard rate h(t). The relationship between Survival function S(t) and the probability density function ϕ(t) is S (t) = 1 t 0 ϕ (ν) dν = 1 F (t) , (6) where F (t) is the cumulative distribution function of ϕ (t). Additional Multi-touch Attribution In this paper, we aim to build a probability MTA model to analyze the contribution of each ad exposure to the conversion based on the historical behaviors of users. We assume that the effects of ad exposures on the further conversion are additional and the influence is fading with time. The proposed model is named Additional Multi-touch Attribution Model (AMTA for short). Additional Effects of Ad Exposures Before going to the detail of the proposed model, we introduce the notations used in this paper. We denote users as {1, ..., U}, and the advertising channels as {1, ..., K}. As shown in Figure 1, we define a behavior {au i , tu i } as a user u viewing or clicking an ad on an advertising channel au i at some timestamp tu i . An ad browsing path bu of user u is {{au i , tu i , xu i }lu i=1, Y u, T u c }, where lu is the length of the ad browsing path bu, xu i is a set of features, Yu {0, 1} indicates whether a conversion has already occurred. If Yu = 1, T u c is the conversion time. If Yu = 0, T u c is the last timestamp of the observation window. If a user does not convert in an observation window, it is either because the user will never convert or because he/she will convert later. Therefore, an extra variable, Cu {0, 1}, should be considered, which indicates whether a user will eventually convert. Besides, xu c,i, xu a,iand xu d,i are three subsets of xu i , which include contextual information such as user preferences, recent impressions and clicks, etc. xu c,i determines whether the conversion will be performed when tu i < t < tu i+1. xu e,i determines the effect of the ad exposure {au i , tu i } and xu d,i determines its decay speed. We use the hazard rate to model the additional influence of the ads in the browsing path on the final conversion, which is inspired by the construction of conditional intensity in exciting point process (Aalen, Borgan, and Gjessing 2008). If the user will convert (C = 1), the hazard rate of the conversion at time t for user u is: tu i