Foreword
Customer churn belongs to subscription services. Churn is the event of a customer opting out of their subscription. They may do so for different reasons:
The drivers behind this decision are multiple. Businesses want to understand why their customers churn. Why?
Churn is related to duration. Subscription services bring in recurring revenues (through periodic billing for example). Long durations mean long income streams. As long as the customer subscribes to the service, the company can rely on a steady income. Keeping customers is important. We can think of telecom services. Nonetheless, we can find the subscription business model in many sectors:
A lower churn generates more Customer Lifetime Value (CLV).
The lower the average churn, the longer the average subscriber remain, and the more cumulative revenues a company can collect. Here is the trickle-down effect of churn on CLV or Lifetime Value (LTV):
\[Churn \rightarrow Duration \rightarrow Recurring~revenues \rightarrow\] \[Recurring~revenues - Recurring~costs = Gross~Contribution~Margin\] \[Gross~Contribution~Margin - Marketing~costs = Net~margin~for~single~event\] \[Net~margin~for~single~event \times Expected~number~of~purchase~in~lifetime = \] \[Accumulated~margin\]
\[Present~value(Accumulated~margin - Acquisition~cost) =\] \[LTV\]
Survival analysis is one way to model churn. Survival analysis has been traditionally used in medicine and in life sciences to analyze how long it takes before a person dies – hence ‘survived’. What are the kinds of insights we can get from survival analysis? By charting the results, we can visualize the changes over time and likelihood of churn.
Source: Barry Analytics
For one thing, we can estimate duration. We can also show how the likelihood of customer churn changes over time and we can determine the optimal intervention point. We are interested in knowing:
Survival regressions allow us to model the relationship between customer churn, time, and customer characteristics. We can find and measure factors driving churn.
Using the covariates such as gender and age, we can measure the probability that ‘female non-senior subscribers with dependents’ will stay for 24 months.
Categorical variables: gender, SeniorCitizen, Partner, Dependents, PhoneService, MultipleLines, InternetService, OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, StreamingMovies, Contract, PaperlessBilling, PaymentMethod, and Churn.
Numeric variables: tenure, MonthlyCharges, and TotalCharges.
We will use a dataset from the IBM Watson Analytics that describes a fictitious telco’s subscriber base. The dataset counts 7043 customers and 20 variables (covariates). Most of the variables are categorical. We will perform survival analysis using subscription duration (tenure). During the time window, some of the subscribers opted out (Churn=1), other remained (Churn=0).
In the figure below, we can see how monthly charges are distributed across customers. A large proportion of customers is paying around $20 per month.
Histogram of monthly charges.
The survival curve is fundamental in survival analysis. It tells us the probability that a customer will still subscribe over a period of time.
However, the longer the period, the smaller the probability of surviving (the hazard of churn increase over time).
One way of estimating the survival curve is with the Weibull distribution. The Weibull distribution is a natural choice for modeling time-to-death data or machine failure.
We start with a simple PH model (weibull_ph_model) with two covariates: the subscriber’s Contract and if the subscriber has MultipleLines.
We read each result in relation to the rest of the results (ceteri paribus). The lower the coefficient (low positive values, high negative values), the lower the risk of event – the event is churn – and the longer the duration.
Keep in mind the models are tentative, the dataset is fictitious… so are the results.
| Coef. | |
|---|---|
| ContractOne year | -2.1077429 |
| ContractTwo year | -3.7239890 |
| MultipleLinesNo phone service | -0.2891042 |
| MultipleLinesYes | -0.3604447 |
| log(scale) | 3.5809741 |
| log(shape) | -0.1757588 |
From the coefficients, we can see that two-year contracts, -3.72, have the lowest impact on churn.
The log coefficients belong to the shape of the graphics we can plot out of the results. We will come back to these notions.
The exponential coefficients are the hazard ratios; a relative measure.
| exp(Coef.) | |
|---|---|
| ContractOne year | 0.1215119 |
| ContractTwo year | 0.0241375 |
| MultipleLinesNo phone service | 0.7489342 |
| MultipleLinesYes | 0.6973662 |
The hazard of event is not the probability of event. A hazard of 10% means the odds of the event happening are 10% higher, controlling for other variables (ceteris paribus), but the probability of occurence is not 10%.
Keeping everytime else constant, two-year contracts only increase the hazard of event – the hazard of churn – by 2.4 % while one-year contracts increase the hazard by 12.2 %, controlling for the other variables.
It is also important to check if the survival curve is consistent with the business understanding of their customer lifecycle.
weibull_ph_model
As time passes, the less ‘survivers’ or subscribers. After 24 months (2 years), 50 % of the subscriber base is gone. The estimation begins losing significance beyond that point. 2 years is a lot. It becomes difficult to precisely predict if less than 20 % of subscribers will remain after 70 months (more than 5 years).
weibull_ph_model
One thing is sure, the hazard is not constant over time. According to the shape coefficient, \(\rho\), -0.176, the hazard curve is decreasing, but the decrease slows down over time. Churn is high during the first 7-10 months. Then, following an ‘inflection’ point around the 8th month, the hazard decreases and flatten out. It explain the survival curve: a sharp decreasing curve at the beginning that decelerates as time passes. We can rely on another model to confirm the shape coefficient or \(\rho\): the accelerated failure time model.
With the Weibull model, we can have decreasing (ρ < 0) or increasing (ρ > 0) hazard curve. The shape can be decelerating (ρ < 1 or ρ > -1) or accelerating (ρ > 1 or ρ < -1). There is a special case of the Weibull model called the Exponential model where the hazard is constant over time (ρ = 1).
We countervalidate the results with an AFT model (weibull_aft_model) with two covariates: the subscriber’s Contract and if the subscriber has MultipleLines.
| Coef. | |
|---|---|
| (Intercept) | 3.5809741 |
| ContractOne year | 2.5127468 |
| ContractTwo year | 4.4395554 |
| MultipleLinesNo phone service | 0.3446557 |
| MultipleLinesYes | 0.4297043 |
We compute the percentage change: \(\tfrac{1}{coefficient} - 1\).
Two-year contracts still have the highest impact at 4.44.
| %Change | |
|---|---|
| (Intercept) | -0.7207464 |
| ContractOne year | -0.6020291 |
| ContractTwo year | -0.7747522 |
| MultipleLinesNo phone service | 1.9014462 |
| MultipleLinesYes | 1.3271817 |
A 1 % increase in time results in a percentage change in the hazard of event: churn. Therefore, the hazard of churn declines over time; more with two-year contracts than with one-year contracts. This is consistent with the weibull_ph_model hazard and survival curves.
The interpretations of the parameter estimates for PH and AFT models differ. The PH assumption is useful for a comparison of hazards. The AFT assumption is useful for a comparison of survival times.
The PH model (from the eha package) can generate graphics. The AFT model (from the survival package) does not, but computes one important coefficient, \(\rho\), that goes hand to hand with the PH graphics. The hazard curve is negative and decelerating.
ρ determines that the shape of the hazard function, which has an impact on the survival curve.
The weibull_aft_model acceleration factor or scale = 1.19 can be transformed into a percentage change: \(\rho\) = -0.161. The weibull_aft_model estimate is not too far from the weibull_ph_model estimate: \(\rho\) = -0.176.
In business terms, if their customers are churning much earlier or later than the business perceives them to be, then the firm may have to tweak its customer lifecycle management.
It also may be a good idea to intervene and incentivise customers who have already stayed longer. Since their probability of staying is dipping over time, without intervention, they are more likely to churn.
We can also compare survival curves by customer segments.
For example, do females and males churn differently?
We can first look at total counts. How many females have churned versus males?
Churn=Yes, absolute and percentage values, by gender.
| Female | Male | Female | Male | |
|---|---|---|---|---|
| No | 2549 | 2625 | 36.19 | 37.27 |
| Yes | 939 | 930 | 13.33 | 13.20 |
Churn=Yes, by gender
It looks like males and females churn is not much different.
We run another model (weibull_ph_model2). This time, we stratify the results by gender.
We are interested in log coefficients (log(scale):1 vs. log(scale):2 or male vs. female). They should be significantly different if gender had any impact on churn. This is not the case.
weibull_ph_model2
| Coef. | |
|---|---|
| ContractOne year | -2.1075900 |
| ContractTwo year | -3.7242056 |
| MultipleLinesNo phone service | -0.2893374 |
| MultipleLinesYes | -0.3604831 |
| log(scale):1 | 3.5725884 |
| log(shape):1 | -0.1930597 |
| log(scale):2 | 3.5891540 |
| log(shape):2 | -0.1581253 |
The survival curves are almost identical as well. Furthermore, they mimic the survival group of the whole population.
Now, what would be the difference between customers who have Dependents or not, i.e. children (weibull_ph_model3)?
weibull_ph_model3
Those with dependents are more loyal. For any duration, the survival rate is higher; the overall curve is higher. Is this statistically significant?
The log-rank test will confirm that these survival curves are statistically different from each other. The result of the test should be a rejection of the null hypothesis of ‘similarity’.
log-rank test statistic = \(-\tfrac{loglik}{df}\) where loglik is the maximum log(likelihood).
From the results, the model has a maximun log likelihood of \(-9289.5417673\) with 4 covariates (\(4\) degrees of freedom).
The log-rank test statistic is \(2322.3854418\), giving a p-value = \(0\), way below 5 %. The test is significant and we can reject the null hypothesis of ‘similar curves’.
For any duration, the survival rate of those with dependents is statistically higher.
The marketing department would be well-advised to target those customers with children and waste no energy on gender composition of their market segmentation.
Knowing how long customers stay is all well and good, but what if we wanted to find out about the factors that influence churn?
We will focus on Cox Proportional Hazard regressions.
The hazard of event is not the probability of event. A hazard of 10% means the odds of the event happening are 10% higher, controlling for other variables (ceteris paribus).
The main insight the Cox model provides is the coefficients. The exponential of these coefficients correspond to the hazard of event or hazard ratios.
For example, let’s say we’ve fitted a Cox regression model to our example telco dataset, and one of the variables is gender. This variable takes on two values: 1 for male, and 0 for female.
What does it mean if hazard ratio of gender is 1.10? It means that at any time, whether it has been 6 months since signing up or 12 months, males are 10 % more likely to churn versus females. Covariates over 1 increase hazard, below 1 decrease hazard. 1 is the baseline.
Cox models make a very important assumption: proportional hazard. It means that the hazard ratio of all variables should be constant over time. We saw that notion with the Weibull models and the \(\rho\). There can only be one \(\rho\) coefficient. \(\rho\) does not change over time.
For example, we have the variable Dependents that is 1 if the customer has children and 0 if not. If the proportional hazards assumption holds, then, at any moment in time, those without children are more likely to churn. This hazard ratio should be more or less the same across time.
Hazard Functions. Customers with Dependents (red) display a lower hazard curve over time. On the left, the hazard is constant over time. On the right, the hazard shifts over time. It would mean that customers with Dependents (red) would display an increasing hazard over time. With these types of increasing hazard curves, we can expect a decreasing and accelerating survival curve (a curve that plunges over time, not flatten out). These curves are observed in health science: as time passes, the survival rate of cancer patients plummets down.
It is possible to test this assumption using a statistical test.
The test says that only the following variables satisfy the proportional hazards assumption: PaperlessBilling, SeniorCitizen, Dependents, and Gender. Let’s now use these binary variables to fit a model.
An independent variable is a covariate; they are the same. A variable can be binary (like gender) or continuous (like age). Age is a fixed-continuous covariate. People age due to the passing of time. Disposible income is also a continuous variable, but a time-varying covariate. It can change, but not because of time.
We leave out the other variables, but there are other methods that can incorporate variables that don’t follow the proportional hazards assumption:
We cannot interpret Cox PH models as we interpret Weibull PH models or Weibull AFT models. Cox models are semi-parametric methods while Weibull models are parametric methods that rely on the Weibull distribution.
After filtering the variables, we can fit a time-constant model (cox_model).
The model (partial) results.
| Coef. | p | |
|---|---|---|
| genderMale | -0.0207941 | 6.5e-01 |
| SeniorCitizen | 0.2649922 | 1.2e-06 |
| DependentsYes | -0.7820715 | 0.0e+00 |
| PaperlessBillingYes | 0.6120632 | 0.0e+00 |
gender is not a good indicator of churn since the p-value is over 5 %, confirming what we saw above in section 2.SeniorCitizen, with Dependents, and PaperlessBilling are good indicators of churn (p-values < 5 %).We can also quantify their effects on the hazard ratio (churn) with the exponential coefficients (the hazard of churning).
| exp(Coef.) | |
|---|---|
| genderMale | 0.9794206 |
| SeniorCitizen | 1.3034208 |
| DependentsYes | 0.4574574 |
| PaperlessBillingYes | 1.8442325 |
The more the hazard ratio strays way from 1, the higher the impact on the survival rate, controlling for the other covariates. The ratio can have a positive or a negative impact on the risk of event.
We do not interpret continuous variable as we would interpret binary variables. If the continuous exp(Coef.) was 1.30, first, we can say the covariate increases the risk of churn. Second, we can transform the hazard ratio: (1.30 - 1)*100 = 30 %. For every year passing, the hazard of churn increases by 30%. However, we have 3 binary covariates.
There is no estimate of the intercept in Cox models; it is already incorporated in the baseline hazard function.
cox_model
Plotting the estimated survival function.
cox_model
We saw three significant factors of churn. Let’s display them in absolute terms (no negative values to set the results on an equal footing).
Factors affecting churn
We can leave out the Senior citizen factor. The hazard ratio of the SeniorCitizen impact is the lowest compared to the other two covariates. We should look after more important factors to curb churn.
Paperless billing.
The PaperlessBilling factor is not surprising since ‘techno-savvy’ customers may prefer billing by e-mail, but these are the kinds of people that are more mercurial. As soon as new devices come to the market, they have a propensity to break their contract to opt for a new contract and… a new phone with the latest gadgets. Nonetheless, the business should investigate the cause furthermore. Also, as the majority of the customers are under paperless billing this is an important issue.
Dependents.
We see from the graph there are more than twice as many customers without Dependents than with Dependents. At the same time, those with children are twice as less likely to churn. The odds of event is \(-54.3\) %, controlling for other variables. This means that there should be more efforts to retain this segment of customers. Should the company treat them as ‘member of the family’ as they take care of their own family?
These are just some of the ways that survival analysis can be used to address business situations:
Ultimately, these issues all have an impact of LTV. In addition, we want to consider ‘win-backs’ (known as 2LTV).
survival package documentation.eha package documentation about event history analysis.lattice package is useful for grid layouts; the ggplots does facet layouts.reshapes2 or tidyr packages are useful for reshaping wide data frames into long data frames.mstate package, we can model multistate and competing risks. 2LTV or win-backs imply multistates.lifelines package documentation along with the SciPy and scikit-learn stacks are the alternative to R.