This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A State Transition Model for Mobile Notifications
via Survival Analysis

Yiping Yuan LinkedIn Corporation ypyuan@linkedin.com Jing Zhang University of Maryland jzhang86@umd.edu Shaunak Chatterjee LinkedIn Corporation shchatte@linkedin.com Shipeng Yu LinkedIn Corporation siyu@linkedin.com  and  Romer Rosales LinkedIn Corporation rrosales@linkedin.com
Abstract.

Mobile notifications have become a major communication channel for social networking services to keep users informed and engaged. As more mobile applications push notifications to users, they constantly face decisions on what to send, when and how. A lack of research and methodology commonly leads to heuristic decision making. Many notifications arrive at an inappropriate moment or introduce too many interruptions, failing to provide value to users and spurring users’ complaints. In this paper we explore unique features of interactions between mobile notifications and user engagement. We propose a state transition framework to quantitatively evaluate the effectiveness of notifications. Within this framework, we develop a survival model for badging notifications assuming a log-linear structure and a Weibull distribution. Our results show that this model achieves more flexibility for applications and superior prediction accuracy than a logistic regression model. In particular, we provide an online use case on notification delivery time optimization to show how we make better decisions, drive more user engagement, and provide more value to users.

Mobile notifications; survival analysis; Weibull distribution; accelerated failure-time model
copyright: noneccs: Information systems Data miningccs: Mathematics of computing Survival analysis

1. Introduction

Social networking services (e.g., Facebook, LinkedIn, Instagram, Twitter, WeChat) actively push information to their users through mobile notifications. As the content ecosystem and users’ connection networks grow, more and more information is generated on the social networking site that is worth informing the users. On the other hand, users have limited attention span, regardless of how much value notifications could inform them of. The discrepancy between increasing content and limited user attention is the challenge many mobile applications are facing, especially those social networking applications.

A mobile notification is a message displayed to the user either through the mobile application user interface (UI) itself, or through the operating system’s push notification services, such as Apple Push Notification Service (APNs). Instances of such messages include a user-to-user communication, a friend or connection request, an update from a friend or connection (e.g., birthday or job change), an article posted by a connection, etc. These notifications help keep the users informed of what is happening in their network. In addition, notifications also serve the purpose of promotions and product marketing for many mobile applications.

Compared with email communication, mobile notifications are more time sensitive and more promptly responded to (Fischer et al., 2011; Pielot et al., 2014a). Without an established way to determine delivery time, mobile notifications often arrive at inconvenient moments, failing to provide value to a user. Moreover, due to the pervasive nature of smartphones, such inconvenience may lead to complaints or even disablement on future notification deliveries, causing a permanent loss to both service providers and users. In short, sending notifications at the right time with the right content in many cases is critical.

In this paper, we focus on a quantitative way to measure the effectiveness of a mobile notification and to learn the pattern of how the effectiveness differs from user to user and from time to time. The overall objective is to improve personalization and ensure better delivery time and volume optimization.

The interaction of a user with mobile notifications can be very complex and depends on numerous aspects (Mehrotra et al., 2015; Mehrotra et al., 2014; Xu et al., 2013). It is common to link a notification event to one or more rewards to evaluate the effectiveness of a notification. For engagement, a typical reward is a visit from the user to the app. One challenge for such a study is how to attribute a reward, because users may receive multiple notifications before they open and visit the app. Simple strategies could be to attribute the reward to the most recent one or to several notifications delivered within a look-back time period. Such strategies are hard to justify theoretically and could introduce significant bias in learning. Our strategy is to leverage the survival analysis to attribute a reward without ambiguity and bias (Buckley and James, 1979; James and Smith, 1984).

Survival analysis is commonly used within medical and epidemiological research to analyze data where the outcome variable is the time until the occurrence of an event of interest. For example, if the event of interest is heart attack, then the time to event or survival time can be the time in years until a person develops a heart attack. In survival analysis, subjects are usually followed over a specified time period and the focus is on the time at which the event of interest occurs. Survival time has two components that must be clearly defined: a beginning point and an endpoint that is reached either when the event occurs or when the follow-up time has ended. If the event does not occur by the follow-up time, the observation is called censored. The censored observations are known to have a certain amount of time where the event of interest did not occur and it is not clear whether the event would have occurred if the follow-up time were longer. Such censoring is very common in observational notification data.

We introduce survival analysis to notification modeling as a new domain. The beginning point in this case is the delivery time of a notification and the endpoint is the reward time (e.g., the time of a visit) or a next notification delivery time, whichever happens first. When the next notification occurs first, the observation is censored. In this paper, we apply an accelerated failure-time model (Wei, 1992; Keiding et al., 1997) with a Weibull distribution to our large-scale user data for the reward prediction. This turns out to be not just novel, but also superior in prediction performance compared to baseline models in our offline analysis.

We provide two example formulations for notification volume optimization (VO) and delivery time optimization (DTO) separately. We then present an online use case on notification DTO, where our model is used to make send decisions. The A/B test results show significant improvement on user engagement and content consumption over a non-DTO control and a baseline DTO model.

The major contributions of this paper can be summarized as follows.

  • We develop a state transition model to measure the effectiveness of a notification through a delta effect ΔF(W0,T)\Delta F(W_{0},T) in Section 3.

  • We propose to estimate the delta effect in the presence of censored data, using a log linear survival structure and a Weibull distribution.

  • We conduct offline evaluations with real-world notification data to demonstrate the accuracy and flexibility of our engagement prediction.

  • We carry out an online use case of determining the delivery time and show superiority of the proposed method with A/B tests in Section 7.

2. Related work

Email communication as a channel has a long history for social networking services. A volume optimization framework (Gupta et al., 2016; Gupta et al., 2017) can simultaneously minimize the number of emails sent, control the negative complaints, and maximize user engagement. While we share similar goals for mobile notifications, there are unique mobile aspects to be considered. Moreover, the volume optimization framework focuses on solving a Multi-Objective Optimization (MOO) problem (Agarwal et al., 2011, 2012), in which multiple objectives are optimized under given constraints. We focus on probabilistic nature of interactions between a user and a notification. Our work can be leveraged as a utility prediction model, which would be one of the utilities of interest in a MOO formulation for mobile notifications.

As more mobile applications push information to users, several studies have been carried out to understand how to make effective use of notifications. Sahami et al. (Sahami Shirazi et al., 2014) collect close to 200 million notifications from more than 40,000 users including users’ subjective perceptions and present the first large-scale analysis of mobile notifications. A number of findings about the nature of notifications, such as shorter responding time, have shed light on how to effectively use them. Pielot et al. (Pielot et al., 2014a) carry out an one-week, in-situ study involving 15 mobile phones users and suggested that an increase in the number of notifications is associated with an increase in negative emotions. Both works do not attempt to model the interactions probabilistically.

Xu et al. (Xu et al., 2013) developed an app usage prediction model that leverages the user’s day-to-day activities, app preferences and the surrounding environment. Mehrotra et al. (Mehrotra et al., 2015) developed a classification model to predict notification acceptance by considering both content and context information. Pielot et al. (Pielot et al., 2014b) proposed a machine learning model to predict whether the user will view a message within the next few minutes or not after a notification is delivered. Their study also suggests that indicators of availability, such as the last time the user has been online, not only create social pressure, but are also weak predictors of attentiveness of the message. Pielot et al. (Pielot et al., 2017) carried out a field study with hundreds of mobile phone users and built a machine-learning model to predict whether a user will click on the notification and subsequently engage with the content. The model can be used to determine the opportune moments to send notifications. These studies focus on cross-application study with complete device information, yet the scale of notifications and users are not comparable to our case.

On general user engagement, extensive studies (Attenberg et al., 2009; Dave et al., 2014; Wang et al., 2013; Ashley and Tuten, 2015; Khan, 2017) have been promoting relevant and high quality content to users to maximize long-term user engagement with the platform. Other works (Goldstein et al., 2013; Yoo and Kim, 2005) show that low-quality advertising has detrimental effect on long-term user engagement. Zhou et al. (Zhou et al., 2016) developed an ad quality model based on logistic regression to identify offensive ads that affect user engagement. The focus has been on the quality instead of the timing.

Most applications of the survival analysis in the literature have been in medicine, biology or public health, but there is an increasing interest in its applications to social behavior. Survival techniques based on Weibull distributions have been applied to understanding and predicting dwell time on web services (Liu et al., 2010; Vasiloudis et al., 2017). Yu et al. (Yu et al., 2017) proposed a temporally heterogeneous survival framework to model social behavior dynamics, whose model parameters can be solved by maximum likelihood estimation. The model can be applied to user-to-user communication. Gomez-Rodriguez et al. (Gomez-Rodriguez et al., 2013) studied the formation of an information cascade in a network based on survival theory. Last but not least, Li et al. (Li et al., 2017) applied survival analysis in modeling the career paths. They formulated the problem as a multi-task learning and achieved favorable performance against several other state-of-the-art machine learning methods.

3. State transition model via survival analysis

A mobile notification may be delivered through one or many channels such as a sound, a badge count update on the app icon, and an alert shown on the lock screen or as banners. A UI push notification shown in Figure 1 refers to one with an alert showing the content of the message. Such notifications are more effective at drawing a user’s attention but they can also be intrusive or even annoying. As suggested in studies (Sahami Shirazi et al., 2014; Pielot et al., 2014a), the UI push channel is better for time-sensitive and potentially important notifications, e.g., a connection invitation or a user-to-user message. Other less time-sensitive ones, e.g., a connection’s birthday or work anniversary, can be served as badging notifications, meaning we only push a badge count update as shown in Figure 2 and a user has to open the app to see the content as an in-app notification within the mobile application’s UI in Figure 3. Unlike UI push notifications, such badging notifications are much less intrusive. On the other hand, the effect of them are more subtle. Users are not able to view and interact with the notification content without opening the app. It usually takes longer time for a user to respond to the badging than the UI push and it is harder to separate the effect of notifications from the organic visits. Attribution challenges also arise when multiple badging notifications have been delivered with more than one badge count. This challenge is further elaborated in Section 3.2 as data censoring. The content of badging notifications are usually less time-sensitive and hence we have more flexibility in their delivery time.

Refer to caption
Figure 1. An example of UI push notifications

Refer to caption Refer to caption Refer to caption

Figure 2. Visual appearance of badge counts
Refer to caption
Figure 3. An example of in-app notification

In this section, we develop a state transition model to describe and predict the interactions between users and notifications. We focus on badging notifications, which has a more subtle effect to model and is less studied in the literature. The methodology can be extended to UI push notifications with possibly different distribution assumptions as they are responded to more quickly.

3.1. State Transition Model

We aim to learn how notifications as interventions promote user engagement and bring more value to users. Notifications may change users’ mobile context state in various ways. For badging notifications, the most obvious one would be the change of the outer badge count. They may also change the notification inventory within the app.

Let MM be a notification event, ss be a mobile context state, and tst_{s} be the time to the next visit since the start time of the state ss. Figure 4 shows how a state transition model works. After a notification M0M_{0}, a mobile context state stays at s0s_{0}. Then at any evaluation time, we consider whether or not to send a notification M1M_{1} to a user, who has stayed in state s0s_{0} for W0W_{0} time. The mobile context state will change to s1s_{1} if M1M_{1} is received. Note that a user’s visit can also change the state, so s0s_{0} may start from the most recent visit event or the most recent notification event, whichever comes later.

Refer to caption
Figure 4. Illustration of state transition

In our state transition model, we assume that users’ engagement behaviors depend on both their mobile context states and users’ characteristics. If M1M_{1} is sent, then state s1s_{1} kicks in and the probability of a user visiting our app within the next TT time would be

(1) P(visit—send)=Pr(ts1<T𝒛,s1)=Fts1𝒛,s1(T),P(\text{visit|send})=\Pr(t_{s_{1}}<T\mid\boldsymbol{z},s_{1})=F_{t_{s_{1}}\mid\boldsymbol{z},s_{1}}(T),

where s1s_{1} represents this user’s new mobile context state after the notification is sent, 𝒛\boldsymbol{z} represent this user’s features and Fts1𝒛,s1F_{t_{s_{1}}\mid\boldsymbol{z},s_{1}} is the cumulative distribution function of time-to-visit ts1t_{s_{1}} given (𝒛,s1)(\boldsymbol{z},s_{1}). TT is the prediction window whose value is usually chosen based on the specific problem instance. For example, we can set it to be 24 hours if we want to focus on daily active users.

If we decide not to send a notification M1M_{1}, the user will stay in the current state s0s_{0}. Then the probability of the next visit within the next TT time is

P(visit—not send)=\displaystyle P(\text{visit|not send})= Pr(ts0T+W0|𝒛,s0,ts0>W0)\displaystyle\Pr(t_{s_{0}}\leq T+W_{0}|\boldsymbol{z},s_{0},t_{s_{0}}>W_{0})
=\displaystyle= Pr(W0<ts0T+W0|𝒛,s0)Pr(ts0>W0|𝒛,s0)\displaystyle\frac{\Pr(W_{0}<t_{s_{0}}\leq T+W_{0}|\boldsymbol{z},s_{0})}{\Pr(t_{s_{0}}>W_{0}|\boldsymbol{z},s_{0})}
(2) =\displaystyle= Fts0𝒛,s0(T+W0)Fts0𝒛,s0(W0)1Fts0𝒛,s0(W0),\displaystyle\frac{F_{t_{s_{0}}\mid\boldsymbol{z},s_{0}}(T+W_{0})-F_{t_{s_{0}}\mid\boldsymbol{z},s_{0}}(W_{0})}{1-F_{t_{s_{0}}\mid\boldsymbol{z},s_{0}}(W_{0})},

which is the probability of time-to-visit from the last state ts0t_{s_{0}} being less than or equal to T+W0T+W_{0} given that ts0t_{s_{0}} is already greater than W0W_{0}.

We name the difference between (1) and (2) the delta effect, which is a function of TT and W0W_{0} given 𝒛\boldsymbol{z}, s0s_{0} and s1s_{1},

ΔF(W0,T)\displaystyle\Delta F(W_{0},T) =\displaystyle= P(visit—send)P(visit—not send)\displaystyle P(\text{visit|send})-P(\text{visit|not send})
(3) =\displaystyle= Fts1𝒛,s1(T)Fts0𝒛,s0(T+W0)Fts0𝒛,s0(W0)1Fts0𝒛,s0(W0).\displaystyle F_{t_{s_{1}}\mid\boldsymbol{z},s_{1}}(T)-\frac{F_{t_{s_{0}}\mid\boldsymbol{z},s_{0}}(T+W_{0})-F_{t_{s_{0}}\mid\boldsymbol{z},s_{0}}(W_{0})}{1-F_{t_{s_{0}}\mid\boldsymbol{z},s_{0}}(W_{0})}.

The delta effect predicts the additional probability of visit in the next TT time by sending a notification at the moment. The larger the delta effect is, the more motivation we have to deliver a notification.

In (3), we need to learn the distribution of users’ time to visit in each state Ft𝒛,s(T)F_{t\mid\boldsymbol{z},s}(T) to predict the delta effect. We explain how we estimate this distribution in Section 3.2.

3.2. Time-to-visit Forecasting

One of the challenges for learning Ft𝒛,s(T)F_{t\mid\boldsymbol{z},s}(T) is that we can not always observe the time to visit after a notification send event, because we may send out another notification before the user’s next visit. Figure 5 illustrates the mobile activities of a user. After T1T_{1} with notification event M1M_{1}, we observe a visit V1V_{1}. And T4T_{4} after notification event M4M_{4} we observe a visit V2V_{2}. We do not observe a visit after M2M_{2} and M3M_{3} before their next notification events M3M_{3} and M4M_{4}, respectively. In the latter cases, the two observations are censored. A censored observation only tells you that the visit event has not happened before the next notification arrives. Such censored observations are very common in notification training data, especially for less active users since the average time-to-visit after a notification delivery is longer.

Refer to caption
Figure 5. Right-censoring

Therefore, we observe either a visit time TiT_{i}, or a censored time UiU_{i}. An observation in survival analysis can be conveniently represented by a triplet (Xi;Ti;δi)(X_{i};T_{i};\delta_{i}). Here XiX_{i} is a feature vector containing both users’ features 𝒛\boldsymbol{z} and state features ss; δi\delta_{i} is the censoring indicator, specifically, δi=1\delta_{i}=1 for an uncensored instance and δi=0\delta_{i}=0 for a censored instance; and TiT_{i} denotes the observed visit time if δi=1\delta_{i}=1 and a censoring time if δi=0\delta_{i}=0.

While it is possible to avoid or alleviate such censoring by collecting data from a controlled experiment, we argue that it is impractical in many cases. For example, in a controlled study we send a mobile notification to every user in the treatment group at the beginning of the experiment, and then monitor the next visit event without sending more notifications in-between. First, the treated users may get very negative user experience without being promptly notified. Secondly, the experiment may take a long time to observe a visit event for less active users. Lastly, it becomes too costly to repeat the experiment frequently when the model has to be re-trained over time with updated user bases and features.

Survival methods correctly incorporate information from both censored and uncensored observations for estimation, through maximizing the following likelihood function

L=i:δi=1f(t=TiXi)i:δi=0Pr(t>TiXi)\displaystyle L=\prod_{i:\delta_{i}=1}f(t=T_{i}\mid X_{i})\prod_{i:\delta_{i}=0}\Pr(t>T_{i}\mid X_{i})
(4) =i=1n(ftXi(Ti))δi((1FtXi(Ti)))1δi,\displaystyle=\prod_{i=1}^{n}\left(f_{t\mid X_{i}}(T_{i})\right)^{\delta_{i}}\left((1-F_{t\mid X_{i}}(T_{i}))\right)^{1-\delta_{i}},

where ftXif_{t\mid X_{i}} is the probability density function.

A well-known survival model is the Cox proportional hazards model (Cox, 1992; Kapoor et al., 2014). It is a semi-parametric model built upon the assumption of proportional hazards. In other words, it assumes that the effects of the predictor variables on survival are constant over time and are additive in one scale. This assumption may not be realistic for our application. In addition, a nonparametric baseline hazard function from the Cox model is difficult to interpret and to conduct statistical inference with. A popular alternative survival model is the parametric log-linear model, which is also known as accelerated failure-time (AFT) model (Wei, 1992; Keiding et al., 1997). In this model, the effect of changing a covariate is to accelerate or decelerate the time-to-event by some factor. The parametric form makes it much easier to evaluate F(W0,T)F(W_{0},T) in (3). In addition, a property in Lemma 3.1 works well in practice to space notifications. Therefore, we use the AFT model for our time-to-event forecasting.

The AFT model proposes the following relationship between a random time-to-visit TiT_{i} and covariates 𝑿i\boldsymbol{X}_{i},

(5) logTi=𝒃𝑿i+σϵi,\log T_{i}=\boldsymbol{b}\boldsymbol{X}_{i}+\sigma\epsilon_{i},

where ϵi\epsilon_{i} are independent and identically distributed (i.i.d.) random errors.

Popular distributions for ϵi\epsilon_{i} are logistic, Gaussian and extreme value distributions, leading to log-logistic, log-Gaussian, and Weibull distributions for TiT_{i}, respectively. Based on our data analysis and prior knowledge, the time-to-visit for badging notifications given the users’ features and state features does not quite depend on how much time has elapsed already, which is the memoryless property. The distribution may be close to an exponential distribution, a special case of the Weibull distribution. Therefore, we assume a Weibull distribution for t𝒛,st\mid\boldsymbol{z},s. The Weibull distribution is a flexible model for time-to-event data (Klein and Moeschberger, 2005). The probability density function and the cumulative distribution function of Weibull distribution are

(6) f(T;λ,α)={αλTα1eλTαT0,0T<0,f(T;\lambda,\alpha)=\begin{cases}\alpha\lambda T^{\alpha-1}e^{-\lambda T^{\alpha}}&T\geq 0,\\ 0&T<0,\end{cases}

and

(7) F(T;λ,α)=Pr(tT)=1eλTαT0.F(T;\lambda,\alpha)=\Pr(t\leq T)=1-e^{-\lambda T^{\alpha}}\quad T\geq 0.

The exponential distribution is a special case when α=1\alpha=1.

Assume ϵi\epsilon_{i} in (5) follows an extreme value distribution with

(8) fϵ(t)=e(tet),Fϵ(t)=1eet,f_{\epsilon}(t)=e^{(t-e^{t})},F_{\epsilon}(t)=1-e^{-e^{t}},

then TiT_{i} follows Weibull distribution (Klein and Moeschberger, 2005) with

(9) λi=eμi/σ;α=1/σ,\lambda_{i}=e^{-\mu_{i}/\sigma};\quad\alpha=1/\sigma,

where μi=𝒃𝑿i\mu_{i}=\boldsymbol{b}\boldsymbol{X}_{i}. Note that the model assumes no heteroscedasticity for simplicity, which implies that σ\sigma as well as α\alpha are constants. It is possible to assume heteroscedasticity and model σ\sigma as a function of features 𝑿i\boldsymbol{X}_{i}, adding more personalization in estimating the distribution of time-to-visit TiT_{i} for different users at different states. On the other hand, the maximum likelihood estimation is going to be more computationally challenging.

Lemma 3.1.

If ts0𝐳,s0t_{s_{0}}\mid\boldsymbol{z},s_{0} follows a Weibull distribution f(T;λ0,α0)f(T;\lambda_{0},\alpha_{0}) with α0(0,1)\alpha_{0}\in(0,1), then P(visit—not send)P(\text{visit|not send}) in (2) is decreasing and thus ΔF(W0,T)\Delta F(W_{0},T) in (3) is increasing in W0W_{0}, the time that has elapsed already in state s0s_{0}, for any given T>0T>0.

Proof.

With the Weibull distribution in (7), ΔF(W0,T)\Delta F(W_{0},T) in (3) becomes

ΔF(W0,T)\displaystyle\Delta F(W_{0},T)
=\displaystyle= Fts1𝒛,s1(T)Fts0𝒛,s0(T+W0)Fts0𝒛,s0(W0)1Fts0𝒛,s0(W0)\displaystyle F_{t_{s_{1}}\mid\boldsymbol{z},s_{1}}(T)-\frac{F_{t_{s_{0}}\mid\boldsymbol{z},s_{0}}(T+W_{0})-F_{t_{s_{0}}\mid\boldsymbol{z},s_{0}}(W_{0})}{1-F_{t_{s_{0}}\mid\boldsymbol{z},s_{0}}(W_{0})}
=\displaystyle= 1eλ1(T)α1{1eλ0(T+W0)α0}{1eλ0(W0)α0}eλ0(W0)α0\displaystyle 1-e^{-\lambda_{1}(T)^{\alpha_{1}}}-\frac{\{1-e^{-\lambda_{0}(T+W_{0})^{\alpha_{0}}}\}-\{1-e^{-\lambda_{0}(W_{0})^{\alpha_{0}}}\}}{e^{-\lambda_{0}(W_{0})^{\alpha_{0}}}}
=\displaystyle= eλ0(T+W0)α0+λ0(W0)α0eλ1(T)α1.\displaystyle e^{-\lambda_{0}(T+W_{0})^{\alpha_{0}}+\lambda_{0}(W_{0})^{\alpha_{0}}}-e^{-\lambda_{1}(T)^{\alpha_{1}}}.

Taking derivative with respect to W0W_{0},

ΔF(W0,T)W0\displaystyle\frac{\partial\Delta F(W_{0},T)}{\partial W_{0}}
=\displaystyle= eλ0(T+W0)α0+λ0(W0)α0{λ0α0(T+W0)α01+λ0α0(W0)α01}.\displaystyle e^{-\lambda_{0}(T+W_{0})^{\alpha_{0}}+\lambda_{0}(W_{0})^{\alpha_{0}}}\{-\lambda_{0}\alpha_{0}(T+W_{0})^{\alpha_{0}-1}+\lambda_{0}\alpha_{0}(W_{0})^{\alpha_{0}-1}\}.

Since (T+W0)α01<(W0)α01(T+W_{0})^{\alpha_{0}-1}<(W_{0})^{\alpha_{0}-1} for α0(0,1)\alpha_{0}\in(0,1) and T>0T>0. Then we have ΔF(W0,T)W0>0\frac{\partial\Delta F(W_{0},T)}{\partial W_{0}}>0 for α0(0,1)\alpha_{0}\in(0,1) and T>0T>0. ∎

Lemma 3.1 shows that if ts0𝒛,s0t_{s_{0}}\mid\boldsymbol{z},s_{0} follows a Weibull distribution with α0(0,1)\alpha_{0}\in(0,1), the delta effect in (3) can be calculated as

(10) ΔF(W0,T)=eλ0(T+W0)α0+λ0(W0)α0eλ1Tα1,\Delta F(W_{0},T)=e^{-\lambda_{0}(T+W_{0})^{\alpha_{0}}+\lambda_{0}(W_{0})^{\alpha_{0}}}-e^{-\lambda_{1}T^{\alpha_{1}}},

and is increasing in W0W_{0}. This suggests we can bring more value to a user by sending a notification when the user has stayed in a state for a longer time. In other words, incorporating the delta effect into decision making reduces the frequency of sending a notification, because short intervals between notifications do not engage the user’s attention effectively. The model we learned from data in Section 4 gives α(0,1)\alpha\in(0,1), which is in line with our conjecture.

Following (4), the likelihood function becomes

(11) L=i=1n(fϵ(logTi𝒃𝑿iσ))δi((1Fϵ(logTi𝒃𝑿iσ)))1δi,\displaystyle L=\prod_{i=1}^{n}\left(f_{\epsilon}\left(\frac{\log T_{i}-\boldsymbol{b}\boldsymbol{X}_{i}}{\sigma}\right)\right)^{\delta_{i}}\left((1-F_{\epsilon}\left(\frac{\log T_{i}-\boldsymbol{b}\boldsymbol{X}_{i}}{\sigma}\right))\right)^{1-\delta_{i}},

where fϵf_{\epsilon} and FϵF_{\epsilon} are from the extreme value distribution as in (8). Finally the parameters in AFT models (𝒃,σ)(\boldsymbol{b},\sigma) can be estimated by maximizing the above likelihood function.

3.3. Calculation of the Delta Effect

Once we learn the parameter estimation (𝒃^,σ^)(\hat{\boldsymbol{b}},\hat{\sigma}) from model training, we can calculate the delta effect for user ii at a given time as follows,

  • Get all the features 𝑿0,i\boldsymbol{X}_{0,i} including the state at the moment and the time since last state (i.e., badge count update) W0,iW_{0,i} for member ii.

  • Derive the new features 𝑿1,i\boldsymbol{X}_{1,i} given that a notification is sent at the moment, which updates the badge count as a state feature and state interaction features.

  • According to (9), calculate

    λ^0,i=e𝒃^𝑿0,i/σ^,α^0=1/σ^;\displaystyle\hat{\lambda}_{0,i}=e^{-\hat{\boldsymbol{b}}\boldsymbol{X}_{0,i}/\hat{\sigma}},\quad\hat{\alpha}_{0}=1/\hat{\sigma};
    λ^1,i=e𝒃^𝑿1,i/σ^,α^1=1/σ^.\displaystyle\hat{\lambda}_{1,i}=e^{-\hat{\boldsymbol{b}}\boldsymbol{X}_{1,i}/\hat{\sigma}},\quad\hat{\alpha}_{1}=1/\hat{\sigma}.
  • Apply the above values to (10), and calculate the delta effect

    (12) ΔFi(W0,i,T)=eλ^0,i(T+W0,i)α^0+λ^0,i(W0,i)α^0eλ^1,iTα^1.\Delta F_{i}(W_{0,i},T)=e^{-\hat{\lambda}_{0,i}(T+W_{0,i})^{\hat{\alpha}_{0}}+\hat{\lambda}_{0,i}(W_{0,i})^{\hat{\alpha}_{0}}}-e^{-\hat{\lambda}_{1,i}T^{\hat{\alpha}_{1}}}.

4. Data collection and model training

Collecting large-scale unbiased training data is challenging, especially in the case of observational data. We collect data at LinkedIn from hundreds of millions of users for a given week including all badging notification events delivered to users and all user visit events. For each notification event, we include 3 broad categories of features in 𝑿𝒊\boldsymbol{X_{i}}.

  • user’s profile features such as locale and network size.

  • State features such as badge count.

  • user’s activity features such as user’s last visit time, the number of site visits over the past week and the number of notifications received over the past week.

In addition, we also include interactions between the above features, such as interaction terms between the badge count and the profile features so that we can learn different sensitivity to the badge count from different users.

To get the response TiT_{i} and censoring indicator δi\delta_{i}, we sort notification events and visit events in the temporal order for each user so that we can get the next event type and next event time TiT_{i}. If the next event is a visit, then δi=1\delta_{i}=1; otherwise δi=0\delta_{i}=0. Note that the next event and next visit may extend beyond the given week, and thus the following week’s data may be needed and joined accordingly. We then remove potential outliers by discarding records from users who receive too many notifications or visit too many times. Such records may come from erroneous tracking or abnormal accounts. Next, we split a week’s notification data into training and test data with a ratio of 4:1. The test data are held out for evaluation in Section 6.

We train the AFT model with the training data on using Spark MLlib (Meng et al., 2016) and obtain 𝒃^\hat{\boldsymbol{b}} and σ^\hat{\sigma}. Parameters in the conditional Weibull distribution can be calculated as α^=1/σ^\hat{\alpha}=1/\hat{\sigma}, λ^i=e𝒃^𝑿i/σ^\hat{\lambda}_{i}=e^{-\hat{\boldsymbol{b}}\boldsymbol{X}_{i}/\hat{\sigma}}.

The model we learned from training data suggests very different feature importance from that of a notification CTR model. For example, the badge count is a strong predictor and most people are more sensitive to one badge count increase when the badge count is low and become indifferent when the badge count is high. On the other hand, the badge count, the time after the last notification sent are usually not strong signals for a notification CTR model based on our previous experience. The two models can be complementary to each other in a MOO setup described in Section 5.1, since they seem to capture different aspects of notifications.

The σ^\hat{\sigma} we learned is greater than 11, so we have α^(0,1)\hat{\alpha}\in(0,1), suggesting that the ΔFi(W0,i,T)\Delta F_{i}(W_{0,i},T) in (3) is increasing in W0W_{0} according to Lemma 3.1. This is aligned with our intuition that the longer time spacing we have from the previous notification send time, the more incentive we have to send another notification.

The model we train also suggests that the marginal effect on user engagement diminishes as the badge count increases. The interaction between badge count and user features are significant, meaning different users have different sensitivity to badging.

5. Applications and Thresholds

In this section, we show how our model can be leveraged by different notification decision systems.

5.1. Notification MOO Problems

The model works well with notification MOO problems as a utility function. Consider a typical example where we have notifications available to send to NN users and we would like to maximize the total engagement gains while increasing the total notification clicks and controlling the send volume. Let yiy_{i} be the decision variable for notification MiM_{i} with 1 indicating send and 0 not send. ΔFi(W0,i,T)\Delta F_{i}(W_{0,i},T) is the predicted session gain, where W0,iW_{0,i} is the time since last badge update and TT is the prediction time window we are interested in, e.g., the next 24 hours. Assuming we have another model that predicts the probability of a click Pi(click)P_{i}(click) for a notification available for user ii given it is sent, we can formulate a MOO problem,

(13) Maximize i=1NΔFi(W0,i,T)yi\displaystyle\sum_{i=1}^{N}\Delta F_{i}(W_{0,i},T)y_{i}
subject to i=1NPi(click)yiCclick,\displaystyle\sum_{i=1}^{N}P_{i}(click)y_{i}\geq C_{click},
i=1NyiCsend,\displaystyle\sum_{i=1}^{N}y_{i}\leq C_{send},
0yi1.\displaystyle 0\leq y_{i}\leq 1.

The objective above is to maximize user visits due to notifications, which is quantified by ΔFi(W0,i,T)\Delta F_{i}(W_{0,i},T) if notification ii is sent. The first constraint requires the total number of clicks on notifications to be greater than or equal to CclickC_{click}, thus ensuring that the notifications sent are relevant to users. The second requires the total number of notifications sent to be less than or equal to CsendC_{send}, thus controlling the send volume to avoid notification overload.

By considering the duality of the linear programming problem, the resulting decision rule would be

(14) yi=1ΔFi(W0,i,T)+κ1Pi(click)>κ2,y_{i}=1\iff\Delta F_{i}(W_{0,i},T)+\kappa_{1}P_{i}(click)>\kappa_{2},

where κ1\kappa_{1} and κ2\kappa_{2} correspond to dual variables for the first two constraints. The decision rule is a global threshold of κ2\kappa_{2} across all users on a linear combination of engagement effect ΔFi(W0,i,T)\Delta F_{i}(W_{0,i},T) and notification quality Pi(click)P_{i}(click). Similar volume optimization problems can be found in (Gupta et al., 2016; Gupta et al., 2017) for emails.

5.2. Delivery Time Optimization (DTO)

Mobile notifications are time-sensitive. Sending notifications at a better timing may increase user engagement and improve user experience. The major advantage of our model is to add a utility to evaluate along the time dimension through two channels. The first one is real-time features in the model itself, such as current badge count. The other is the time since last badge update WW, which would affect the calculation of ΔFi(W0,i,T)\Delta F_{i}(W_{0,i},T). Under the Weibull distribution assumption, ΔFi(W0,i,T)\Delta F_{i}(W_{0,i},T) is increasing in WW according to Lemma 3.1, which means we have less incentive to send a notification if we already sent one shortly before and more if we have not sent one in a long time. This makes the model effective in DTO and notification spacing. A straightforward strategy to achieve this is to send a notification to a user ii only when ΔFi(W0,i,T)\Delta F_{i}(W_{0,i},T) is above a certain threshold,

(15) ΔFi(W0,i,T)>κ.\Delta F_{i}(W_{0,i},T)>\kappa.

In practice, we find that a modification below can improve the performance in some cases when we optimize user engagement,

(16) ΔFi(W0,i,T)Pi(visit—not send)>κ.\frac{\Delta F_{i}(W_{0,i},T)}{P_{i}(\text{visit|not send})}>\kappa.

where Pi(visit—not send)P_{i}(\text{visit|not send}) is defined in (2) for user ii. The latter decision rule (16) can be viewed as a personalized version of (15), where the personalization is based on a per-member constraint on the number of notification sends.

6. Offline evaluation

We compare our proposed survival-based approach with the conventional baseline logistic regression model. While there are potentially more accurate baseline models such as tree models and deep models, survival models can also be extended beyond a linear structure (Ishwaran et al., 2008; Ranganath et al., 2016). Such a comparison isolates the impact of data censoring and the survival approach from that of feature engineering. For any given time frame TT, we train a logistic regression with the same set of features including their interactions 𝑿1,i\boldsymbol{X}_{1,i} and a response of whether a user’s visit occurs within TT after the notification is delivered. One advantage of our formulation over a classification task is that the same model can be used to predict a user’s probability of visiting given any time frame TT through a Weibull distribution F(T;λ,α)F(T;\lambda,\alpha) in (7). Therefore, the same model can be deployed in different applications, where the prediction windows are chosen differently. On the other hand, we need to train an individual logistic regression model for each different TT since the response variables are different.

To evaluate the prediction performance, we calculate the area under the receiver operating characteristic curve (AUC) for selected TT values as binary classification problems. For the AFT model, we calculate F(T;λ^,α^)F(T;\hat{\lambda},\hat{\alpha}) in (7) to be used in the same way as the logistic prediction for the AUC. Figure 6 shows how our model compares with the baseline model in terms of the AUCs as a function of the prediction time window TT. The shorter the time window is, the harder the prediction as a binary classification is, since the randomness of the users’ engagement behavior tends to be dominating in the short term. At 44 hours prediction window, the model already gives a reasonable AUC of about 0.740.74 while the logistic regression model only gives 0.580.58. At 2424 hours for daily engagement prediction, the model gives an AUC as high as 0.850.85 while the baseline model reaches 0.730.73. Interestingly, as we further increase the prediction window, the AUC of the logistic regression model starts to fall while our model reaches 0.890.89 at 4848 hours. The decline of the logistic regression could come from bias introduced by attributing a visit event to multiple notification events within the time window TT. This bias becomes more severe as the time window TT increases and likely covers more notifications. The AFT model, on the other hand, avoids such bias by correctly incorporating information from both censored and uncensored observations.

The results show that handling censoring properly is very crucial to mobile notification data. In addition, our model achieved great flexibility in TT and superior prediction power compared with the logistic classification models at every given TT.

Refer to caption
Figure 6. AUC as a function of T
Refer to caption
Figure 7. System architecture

7. Online use case and experiment

In this section, we present a case study deployed at LinkedIn to show how we improve our decision making for mobile notifications with our model.

7.1. Delivery Time Optimization for Less Time-sensitive Notifications

Social network services send both time-sensitive and less-time-sensitive notifications to users. Time-sensitive ones are usually triggered by user-to-user messages or connections’ activities such as sharing an article. These notifications need to be sent immediately when triggered to keep users well-informed. There are also types of notifications which are less time-sensitive. For example, birthday notifications reminding a connection’s upcoming birthday can be sent the day of the birthday or several days ahead. In this use case, the less-time-sensitive notifications are first filtered based on a click-through-rate (CTR) prediction model thus dropping notifications with low predicted CTR to ensure high notification quality. The filtered are queued to be then sent within a valid send time window for each individual notification. The send time window ranges from a few hours to a few days depending on the nature of the notification. In this application, “when to send” is decoupled from “whether to send” since the latter decision is already made at the filtering stage. This makes a good use case of delivery time optimization described in Section 5.2.

In this application, we apply the decision rule in (16), where TT is set to be 4 hours and κ\kappa is chosen from offline analysis and online tuning to optimize the performance. For comparison, we set up a control treatment, in which notifications are sent immediately when available, and a baseline treatment, in which we send a notification to a user only if their badge counts are less than or equal to 1. For users who have notifications in the queue, we evaluate send decisions every 44 hours.

7.2. Online Experiment and Results

Table 1 shows the A/B test experiment results comparing the DTO based on our model with the control and baseline models described above. We are mostly interested in user engagement and notification interactions, which can be characterized by the following metrics.

  • Sessions: A session is a collection of full page views made by a single user on the same device type. Two sessions are separated by 1515 minutes of zero activity.

  • Engaged Feed Sessions: This metric counts the number of sessions where the user engaged with the newsfeed (either by interacting with feed updates, or by viewing at least 1010 feed updates).

  • Notification Sessions: This metric counts the number of sessions in which the user viewed or interacted with the notification page.

  • Notification Daily Unique Send CTR: This metric measures the average click-through-rate of notifications sent to a user in a day.

The experiment was tested over a full week and the numbers in the table are all statistically significant. Compared with the control, which is basically no DTO, our model increased the total sessions by 1.86%1.86\%, notification sessions by 6.19%6.19\% and engaged feed sessions by 1.78%1.78\%. The higher boost in notification sessions was expected since we are optimizing notification send time. The roughly proportional gain in engaged feed sessions suggests that the additional sessions are of similar quality to existing ones. In addition, the notification daily unique send CTR was increased by 2.51% against control, suggesting notifications were delivered at better timing resulting in increased user engagement. Compared with the baseline model, the proposed model showed healthy gains in all four metrics. One interesting observation is that the increase in notification daily unique send CTR (+4.48%) is higher than the comparison with the control (+2.51%). This suggests that although the badge count baseline model increases user engagement, it reduces the CTR compared with the control, implying that it may not be a desirable user experience.

Table 1. Online A/B results for delivery time optimization
Metric vs. Control vs. Baseline
Sessions + 1.86% +0.67%
Engaged Feed Sessions + 1.78% +0.69%
Notification Sessions +6.19% +1.51%
Notification Daily Unique Send CTR +2.51% +4.48%

7.3. System Architecture

We outline a design of a notification decision system using the state transition model in Figure 7. Since the model takes a few real-time features (e.g., current badge count, time since last badge count update) as important signals, having an online scoring system is ideal for model performance. To avoid maintaining all features in an online database, we include an offline component for more static features, such as user profile features. In this offline component, offline features are retrieved from tracking data on our HDFS system and a partial score is calculated based on the trained model coefficients. We push the partial scores to an online feature store daily through Apache Kafka (Kreps et al., 2011). The online component maintains realtime features and make realtime decisions based on the real-time ΔFi(W0,i,T)\Delta F_{i}(W_{0,i},T) score.

8. Discussion

To our best knowledge, this is the first work on probabilistic modeling of interactions between mobile notifications and user engagement at scale. We develop a state transition model and derive a delta effect to measure the effectiveness of a notification. With a common existence of censoring in observational mobile notification data, we estimate the delta effect through an AFT regression with a Weibull distribution. The prediction from this survival regression is both flexible to apply and superior in prediction accuracy compared to baseline models with the same feature set.

Our state transition model is generalizable and can have broader applications. While we focus on modeling the badging notifications, our model is applicable for all types of mobile notifications. For example, UI push notifications can be modeled with a distribution possibly different from a Weibull distribution.

We consider a user’s visit as a reward to a mobile notification. However, the reward can be generalized to a user’s purchase event for on-line shopping apps such as Amazon or a user’s content creation event for question-and-answer apps such as Quora. In the cases where data censoring is a major concern for modeling mobile notifications, we provide a general framework to evaluate the effectiveness of a notification towards driving a reward.

Acknowledgement

We would sincerely like to thank Rupesh Gupta, Matthew Walker, Kinjal Basu, Yan Gao, Haoyu Wang, Myunghwan Kim, Guangde Chen, Ajith Muralidharan for their detailed and insightful feedback during the development of this model.

References

  • (1)
  • Agarwal et al. (2011) Deepak Agarwal, Bee-Chung Chen, Pradheep Elango, and Xuanhui Wang. 2011. Click shaping to optimize multiple objectives. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 132–140.
  • Agarwal et al. (2012) Deepak Agarwal, Bee-Chung Chen, Pradheep Elango, and Xuanhui Wang. 2012. Personalized click shaping through lagrangian duality for online recommendation. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval. ACM, 485–494.
  • Ashley and Tuten (2015) Christy Ashley and Tracy Tuten. 2015. Creative strategies in social media marketing: An exploratory study of branded social content and consumer engagement. Psychology & Marketing 32, 1 (2015), 15–27.
  • Attenberg et al. (2009) Josh Attenberg, Sandeep Pandey, and Torsten Suel. 2009. Modeling and predicting user behavior in sponsored search. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1067–1076.
  • Buckley and James (1979) Jonathan Buckley and Ian James. 1979. Linear regression with censored data. Biometrika 66, 3 (1979), 429–436.
  • Cox (1992) David R Cox. 1992. Regression models and life-tables. In Breakthroughs in statistics. Springer, 527–541.
  • Dave et al. (2014) Kushal Dave, Vasudeva Varma, et al. 2014. Computational advertising: Techniques for targeting relevant ads. Foundations and Trends® in Information Retrieval 8, 4–5 (2014), 263–418.
  • Fischer et al. (2011) Joel E Fischer, Chris Greenhalgh, and Steve Benford. 2011. Investigating episodes of mobile phone activity as indicators of opportune moments to deliver notifications. In Proceedings of the 13th international conference on human computer interaction with mobile devices and services. ACM, 181–190.
  • Goldstein et al. (2013) Daniel G Goldstein, R Preston McAfee, and Siddharth Suri. 2013. The cost of annoying ads. In Proceedings of the 22nd international conference on World Wide Web. ACM, 459–470.
  • Gomez-Rodriguez et al. (2013) Manuel Gomez-Rodriguez, Jure Leskovec, and Bernhard Schölkopf. 2013. Modeling information propagation with survival theory. In International Conference on Machine Learning. 666–674.
  • Gupta et al. (2017) Rupesh Gupta, Guanfeng Liang, and Romer Rosales. 2017. Optimizing Email Volume For Sitewide Engagement. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. ACM, 1947–1955.
  • Gupta et al. (2016) Rupesh Gupta, Guanfeng Liang, Hsiao-Ping Tseng, Ravi Kiran Holur Vijay, Xiaoyu Chen, and Romer Rosales. 2016. Email volume optimization at LinkedIn. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 97–106.
  • Ishwaran et al. (2008) Hemant Ishwaran, Udaya B Kogalur, Eugene H Blackstone, Michael S Lauer, et al. 2008. Random survival forests. The annals of applied statistics 2, 3 (2008), 841–860.
  • James and Smith (1984) Ian R James and PJ Smith. 1984. Consistency results for linear regression with censored data. The Annals of Statistics (1984), 590–600.
  • Kapoor et al. (2014) Komal Kapoor, Mingxuan Sun, Jaideep Srivastava, and Tao Ye. 2014. A hazard based approach to user return time prediction. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 1719–1728.
  • Keiding et al. (1997) Niels Keiding, Per Kragh Andersen, and John P Klein. 1997. The role of frailty models and accelerated failure time models in describing heterogeneity due to omitted covariates. Statistics in medicine 16, 2 (1997), 215–224.
  • Khan (2017) M Laeeq Khan. 2017. Social media engagement: What motivates user participation and consumption on YouTube? Computers in Human Behavior 66 (2017), 236–247.
  • Klein and Moeschberger (2005) John P Klein and Melvin L Moeschberger. 2005. Survival analysis: techniques for censored and truncated data. Springer Science & Business Media.
  • Kreps et al. (2011) Jay Kreps, Neha Narkhede, Jun Rao, et al. 2011. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB. 1–7.
  • Li et al. (2017) Huayu Li, Yong Ge, Hengshu Zhu, Hui Xiong, and Hongke Zhao. 2017. Prospecting the career development of talents: A survival analysis perspective. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 917–925.
  • Liu et al. (2010) Chao Liu, Ryen W White, and Susan Dumais. 2010. Understanding web browsing behaviors through Weibull analysis of dwell time. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval. ACM, 379–386.
  • Mehrotra et al. (2015) Abhinav Mehrotra, Mirco Musolesi, Robert Hendley, and Veljko Pejovic. 2015. Designing content-driven intelligent notification mechanisms for mobile applications. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 813–824.
  • Mehrotra et al. (2014) Abhinav Mehrotra, Veljko Pejovic, and Mirco Musolesi. 2014. SenSocial: a middleware for integrating online social networks and mobile sensing data streams. In Proceedings of the 15th International Middleware Conference. ACM, 205–216.
  • Meng et al. (2016) Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael J. Franklin, Reza Zadeh, Matei Zaharia, and Ameet Talwalkar. 2016. MLlib: Machine Learning in Apache Spark. J. Mach. Learn. Res. 17, 1 (Jan. 2016), 1235–1241. http://dl.acm.org/citation.cfm?id=2946645.2946679
  • Pielot et al. (2017) Martin Pielot, Bruno Cardoso, Kleomenis Katevas, Joan Serrà, Aleksandar Matic, and Nuria Oliver. 2017. Beyond interruptibility: Predicting opportune moments to engage mobile phone users. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017), 91.
  • Pielot et al. (2014a) Martin Pielot, Karen Church, and Rodrigo De Oliveira. 2014a. An in-situ study of mobile phone notifications. In Proceedings of the 16th international conference on Human-computer interaction with mobile devices & services. ACM, 233–242.
  • Pielot et al. (2014b) Martin Pielot, Rodrigo de Oliveira, Haewoon Kwak, and Nuria Oliver. 2014b. Didn’t you see my message?: predicting attentiveness to mobile instant messages. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3319–3328.
  • Ranganath et al. (2016) Rajesh Ranganath, Adler Perotte, Noémie Elhadad, and David Blei. 2016. Deep Survival Analysis. In Proceedings of the 1st Machine Learning for Healthcare Conference (Proceedings of Machine Learning Research), Finale Doshi-Velez, Jim Fackler, David Kale, Byron Wallace, and Jenna Wiens (Eds.), Vol. 56. PMLR, Children’s Hospital LA, Los Angeles, CA, USA, 101–114.
  • Sahami Shirazi et al. (2014) Alireza Sahami Shirazi, Niels Henze, Tilman Dingler, Martin Pielot, Dominik Weber, and Albrecht Schmidt. 2014. Large-scale assessment of mobile notifications. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 3055–3064.
  • Vasiloudis et al. (2017) Theodore Vasiloudis, Hossein Vahabi, Ross Kravitz, and Valery Rashkov. 2017. Predicting Session Length in Media Streaming. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’17). ACM, New York, NY, USA, 977–980. https://doi.org/10.1145/3077136.3080695
  • Wang et al. (2013) Taifeng Wang, Jiang Bian, Shusen Liu, Yuyu Zhang, and Tie-Yan Liu. 2013. Psychological advertising: exploring user psychology for click prediction in sponsored search. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 563–571.
  • Wei (1992) Lee-Jen Wei. 1992. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Statistics in medicine 11, 14-15 (1992), 1871–1879.
  • Xu et al. (2013) Ye Xu, Mu Lin, Hong Lu, Giuseppe Cardone, Nicholas Lane, Zhenyu Chen, Andrew Campbell, and Tanzeem Choudhury. 2013. Preference, context and communities: a multi-faceted approach to predicting smartphone app usage patterns. In Proceedings of the 2013 International Symposium on Wearable Computers. ACM, 69–76.
  • Yoo and Kim (2005) Chan Yun Yoo and Kihan Kim. 2005. Processing of animation in online banner advertising: The roles of cognitive and emotional responses. Journal of Interactive Marketing 19, 4 (2005), 18–34.
  • Yu et al. (2017) Linyun Yu, Peng Cui, Chaoming Song, Tianyang Zhang, and Shiqiang Yang. 2017. A Temporally Heterogeneous Survival Framework with Application to Social Behavior Dynamics. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1295–1304.
  • Zhou et al. (2016) Ke Zhou, Miriam Redi, Andrew Haines, and Mounia Lalmas. 2016. Predicting pre-click quality for native advertisements. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 299–310.