Combating attrition with sequence analysis and survival analysis | Eleven

Combating attrition with sequence analysis and survival analysis01 August 2021

Data science

Customer attrition – or churn – is a major issue of analysis for many companies: as the cost of acquiring customers is high in most industries, retaining them is necessary to ensure good profitability. Indeed, an already engaged customer continues to generate business with much less effort on the part of the company. This is not only true for businesses based on a recurring contract model, but also for one-off sales models, as the probability of reselling to a satisfied customer is higher than that of selling to a new prospect (some additional statistics).

Attrition is sometimes natural – the customer no longer needs the product or service – but more often than not it is the result of a departure to a competing offer. This phenomenon is now amplified by the disruption brought about by the arrival of new digital players, sometimes more agile, that can meet customers’ aspirations in terms of digitalisation (for example in banking and insurance).

Why study customer attrition?

Predicting and understanding attrition is necessary for several reasons. On the one hand, anticipating attrition provides usefull business insights, that can be used to plan for possible volume reductions, to adapt operations, or to focus on obtaining new customers. On the other hand, identifying the factors responsible for attrition enables preventive action to be taken to avoid customers leaving. Indeed, while some causes of attrition are external to the company (crises such as Covid in 2020), others are factors can be easily anticipated (seasonality, demographic factors: ageing, changing needs) or can be acted upon (one-off bad experience, less engaging offer), making it possible to better anticipate and reduce customer departures, with direct results on operational costs and benefits.

Practical issues of attrition analysis

Predicting attrition at the level of the individual customer is, however, a major challenge: if it is the result of dissatisfaction or departure, the customer’s decision is made before he or she actually terminates the contract, and this delay between decision and realisation precludes any relevant reaction. To add to this difficulty, the strong signals available are often markers of the realisation and not of the decision. They thus become available too late, leaving no opportunity for the company to put in place preventive actions. Generally speaking, the further in the future the prediction is made, the lower the accuracy of the predictions: the most revealing signals are those that occur closest to actual attrition.

It is therefore particularly important to develop an approach that makes it possible to anticipate the departure of customers well in advance, or even better, to directly identify the sources and dynamics of attrition in order to take corrective action upstream.

There are several possible approaches to conducting such a study:

Using a set of classic Machine Learning predictive models, which will aim to predict attrition at different time horizons, and whose analysis by interpretability methods will allow certain factors to be highlighted.
Survival analysis, which allows a single model to predict the evolution of the volume of attrition (percentage of customers entering attrition) over time, and to compare on different segments of the customer base (by offer, year of arrival, or type of customer experience) how the probability of attrition evolves over time.
By looking at the sequence of events experienced by churning customers, in order to identify which series of events increase or decrease the probability and speed of attrition.

Survival analysis and sequence analysis are two very complementary approaches, which rely on a different representation of customers – the first based on endogenous customer characteristics, the second on the customer journey.

Survival analysis approach

Survival analysis (introductory course and Python example) was originally developed to estimate the life expectancy of individuals: for a certain population, of which a sample is known (it is known for different individuals whether the “death” event has been observed, and if so, after how long), it makes it possible to model the “survival function”, i.e. the probability that a given individual is still alive at each moment.

This framework looks very different from attrition at first glance, but in fact it is perfectly adequate: we just have to consider the attrition of a customer as the “death” event. Survival analysis then models how the probability of attrition changes over the time horizon.

The result of a survival analysis is richer than that of a linear regression which would aim at predicting the life span of the individual. It does not only model the life expectancy, but the whole probability distribution of surviving each time horizon. The following example illustrates this nuance: if a customer has a 30% chance of “surviving” 1 month and a 70% chance of “surviving” 5 years, this information about the distribution is much more valuable than simply knowing that the customer has a “life expectancy” of 42 months. This is particularly important for attrition management: it is necessary to be able to simulate the actual times of departure, not just their expectation.

The survival analysis approach is not only richer, it also avoids an important bias: the censorship phenomenon.

Censorship and survival analysis

When modelling a “survival” phenomenon such as attrition from customer data, an important characteristic of the data must be taken into account: the event under study has probably not been observed for all customers. For example, not all current customers have experienced attrition. A regression approach would require transforming this data to avoid anomalies in the model (infinite average life expectancy), for example by considering only customers who have already left. This transformation uncontrollably biases the model and its results: all individuals used to train the model have experienced the “death” event, which is far from the desired behaviour, especially if the normal life span of an individual is greater than the period since which the data is acquired! To see how this is problematic, consider a doctor studying the life expectancy of patients after 10 years; if he neglects all patients still alive, he will never reach the right conclusions.

Survival analysis does not encounter censoring, as it does not explicitly model life expectancy, and can therefore be trained on samples for which the death event has not occurred – in our example, it allows the doctor to take into account surviving patients.

Practical use of survival analysis

Survival analysis makes it possible to estimate for each customer, according to his or her characteristics, the probability that he or she will still be present for any time horizon. This information can then be transformed to predict attrition at a fixed time horizon (for example, by triggering an alert if the probability of the customer’s departure at six months exceeds a certain threshold). But it can also be used to visualise and estimate the differences in attrition between several groups of individuals – depending on the acquisition channel, the period of arrival, or the presence or absence of an event in their customer journey – and thus identify the similarities and differences in the attrition dynamics of the groups. Finally, some models, such as the Cox model, are directly interpretable: once trained, their parameters reflect the influence of the individuals’ descriptive variables on the survival function.

Using sequence analysis to refine the study

In practice, attrition analysis sometimes requires additional factors to be taken into account:

On the one hand, customer attrition may not show up in the company’s data until very late in the process, independent of the actual end of the customer experience. For example, in the case of the formal closure of an account, it may occur long after the customer has ceased trading. It is therefore sometimes necessary to broaden the notion of an attrition event, or to look at several phases of attrition.
On the other hand, attrition may be caused by a succession of factors – which will be difficult to detect by conventional approaches, as these take little or no account of temporality in the description of a customer.

This is why it may be relevant to complete this approach by defining a set of key events in the customer journey (requests, new orders, disputes, etc.) and by looking at the succession of these events and the way in which this succession influences attrition: this is sequence analysis (of which the following is a more detailed presentation).

Sequence analysis allows us to visualise and study the way in which individuals evolve over time, and in particular to understand the conditioning between events (for example: the majority of individuals who experienced event C first experienced events A and then B). By representing the probabilities of successive passages between events in the form of a tree, it is possible to identify which event successions have a major influence on the risk of attrition. In particular, it is possible to detect events that are synonymous with actual attrition (those that invariably lead to detected attrition), or pivotal events (those following which the possible paths are radically different in terms of attrition risk) – on which action is most likely to reduce attrition. Working with a large French bank, eleven has found that detecting these pivotal events through sequence analysis has a particular value: they can occur long before attrition and are not easily detected by a classical detection model, which will focus on the events most correlated with attrition (i.e. the later ones), whereas the tree will allow us to study the source events first. This approach also has a direct predictive component: for an individual customer, this modelling makes it possible to assess the empirical probability of each of the paths based on his or her current situation.

These two tools make it possible to carry out a detailed study and to put in place preventive measures

With the help of these tools, the attrition of the company’s customers is no longer inevitable, and its impact can be greatly reduced:

On the one hand, survival analysis enables better anticipation, at a high level of temporal and sectoral granularity.
On the other hand, sequence analysis enables the prediction of a customer’s journey to be refined by enriching it with several key events in order to take targeted action. But its main interest remains to quickly identify the events and sequences of events that have the greatest impact on the risk of attrition, in order to guide the actions to be taken to prevent it across the entire customer base.

The interest of sequence analysis in the company does not stop there: it can also be used to study the acquisition and sales paths or, in general, the entire customer journey in the company.

Louis Dumont, Charafeddine Mouzouni

Generative AI: commodity or core business?

04 March 2024

The choice between buying or developing generative AI in-house depends on the use, the data and the competitive advantage sought. Eleven proposes a methodology for…

Read the article

How LLMs can transform your business: impacts, challenges and use cases

29 November 2023

Unlock business transformation with large language models (LLMs). Explore real-world impacts, challenges and use cases. Improve your strategy with Eleven's expertise in AI and innovation.

Read the article