IsoEx – An AI-based approach to cyber investigation
23 May 2023
39 seconds. That’s the timelapse between two consecutive cyber-attacks as of 2023. To relieve the cybersecurity teams in their investigative effort and help them focus…
Customer attrition – or churn – is a major issue of analysis for many companies: as the cost of acquiring customers is high in most industries, retaining them is necessary to ensure good profitability. Indeed, an already engaged customer continues to generate business with much less effort on the part of the company. This is not only true for businesses based on a recurring contract model, but also for one-off sales models, as the probability of reselling to a satisfied customer is higher than that of selling to a new prospect (some additional statistics).
Attrition is sometimes natural – the customer no longer needs the product or service – but more often than not it is the result of a departure to a competing offer. This phenomenon is now amplified by the disruption brought about by the arrival of new digital players, sometimes more agile, that can meet customers’ aspirations in terms of digitalisation (for example in banking and insurance).
Predicting and understanding attrition is necessary for several reasons. On the one hand, anticipating attrition provides usefull business insights, that can be used to plan for possible volume reductions, to adapt operations, or to focus on obtaining new customers. On the other hand, identifying the factors responsible for attrition enables preventive action to be taken to avoid customers leaving. Indeed, while some causes of attrition are external to the company (crises such as Covid in 2020), others are factors can be easily anticipated (seasonality, demographic factors: ageing, changing needs) or can be acted upon (one-off bad experience, less engaging offer), making it possible to better anticipate and reduce customer departures, with direct results on operational costs and benefits.
Predicting attrition at the level of the individual customer is, however, a major challenge: if it is the result of dissatisfaction or departure, the customer’s decision is made before he or she actually terminates the contract, and this delay between decision and realisation precludes any relevant reaction. To add to this difficulty, the strong signals available are often markers of the realisation and not of the decision. They thus become available too late, leaving no opportunity for the company to put in place preventive actions. Generally speaking, the further in the future the prediction is made, the lower the accuracy of the predictions: the most revealing signals are those that occur closest to actual attrition.
It is therefore particularly important to develop an approach that makes it possible to anticipate the departure of customers well in advance, or even better, to directly identify the sources and dynamics of attrition in order to take corrective action upstream.
There are several possible approaches to conducting such a study:
Survival analysis and sequence analysis are two very complementary approaches, which rely on a different representation of customers – the first based on endogenous customer characteristics, the second on the customer journey.
Survival analysis (introductory course and Python example) was originally developed to estimate the life expectancy of individuals: for a certain population, of which a sample is known (it is known for different individuals whether the “death” event has been observed, and if so, after how long), it makes it possible to model the “survival function”, i.e. the probability that a given individual is still alive at each moment.
This framework looks very different from attrition at first glance, but in fact it is perfectly adequate: we just have to consider the attrition of a customer as the “death” event. Survival analysis then models how the probability of attrition changes over the time horizon.
The result of a survival analysis is richer than that of a linear regression which would aim at predicting the life span of the individual. It does not only model the life expectancy, but the whole probability distribution of surviving each time horizon. The following example illustrates this nuance: if a customer has a 30% chance of “surviving” 1 month and a 70% chance of “surviving” 5 years, this information about the distribution is much more valuable than simply knowing that the customer has a “life expectancy” of 42 months. This is particularly important for attrition management: it is necessary to be able to simulate the actual times of departure, not just their expectation.
The survival analysis approach is not only richer, it also avoids an important bias: the censorship phenomenon.
When modelling a “survival” phenomenon such as attrition from customer data, an important characteristic of the data must be taken into account: the event under study has probably not been observed for all customers. For example, not all current customers have experienced attrition. A regression approach would require transforming this data to avoid anomalies in the model (infinite average life expectancy), for example by considering only customers who have already left. This transformation uncontrollably biases the model and its results: all individuals used to train the model have experienced the “death” event, which is far from the desired behaviour, especially if the normal life span of an individual is greater than the period since which the data is acquired! To see how this is problematic, consider a doctor studying the life expectancy of patients after 10 years; if he neglects all patients still alive, he will never reach the right conclusions.
Survival analysis does not encounter censoring, as it does not explicitly model life expectancy, and can therefore be trained on samples for which the death event has not occurred – in our example, it allows the doctor to take into account surviving patients.
Survival analysis makes it possible to estimate for each customer, according to his or her characteristics, the probability that he or she will still be present for any time horizon. This information can then be transformed to predict attrition at a fixed time horizon (for example, by triggering an alert if the probability of the customer’s departure at six months exceeds a certain threshold). But it can also be used to visualise and estimate the differences in attrition between several groups of individuals – depending on the acquisition channel, the period of arrival, or the presence or absence of an event in their customer journey – and thus identify the similarities and differences in the attrition dynamics of the groups. Finally, some models, such as the Cox model, are directly interpretable: once trained, their parameters reflect the influence of the individuals’ descriptive variables on the survival function.
In practice, attrition analysis sometimes requires additional factors to be taken into account:
This is why it may be relevant to complete this approach by defining a set of key events in the customer journey (requests, new orders, disputes, etc.) and by looking at the succession of these events and the way in which this succession influences attrition: this is sequence analysis (of which the following is a more detailed presentation).
Sequence analysis allows us to visualise and study the way in which individuals evolve over time, and in particular to understand the conditioning between events (for example: the majority of individuals who experienced event C first experienced events A and then B). By representing the probabilities of successive passages between events in the form of a tree, it is possible to identify which event successions have a major influence on the risk of attrition. In particular, it is possible to detect events that are synonymous with actual attrition (those that invariably lead to detected attrition), or pivotal events (those following which the possible paths are radically different in terms of attrition risk) – on which action is most likely to reduce attrition. Working with a large French bank, eleven has found that detecting these pivotal events through sequence analysis has a particular value: they can occur long before attrition and are not easily detected by a classical detection model, which will focus on the events most correlated with attrition (i.e. the later ones), whereas the tree will allow us to study the source events first. This approach also has a direct predictive component: for an individual customer, this modelling makes it possible to assess the empirical probability of each of the paths based on his or her current situation.
With the help of these tools, the attrition of the company’s customers is no longer inevitable, and its impact can be greatly reduced:
The interest of sequence analysis in the company does not stop there: it can also be used to study the acquisition and sales paths or, in general, the entire customer journey in the company.
Louis Dumont, Charafeddine Mouzouni
Sur le même sujet