Thursday, November 15, 2012

Uplift (Netlift) Modeling


This blog describes basic concepts, benefits and challenges of implementation of Net Lift Models in direct marketing campaigns.  Net lift models predict which customer segments are likely to make a purchase ONLY if prompted by a marketing undertaking.  The modeling work was conducted using stepwise logistic regression in SAS Enterprise Miner ®.

The paper provides examples how net lift probability decomposition models leveraged differences between purchasers in test group and control group to predict which customer segments need a marketing contact and which customers segments are likely to make purchasing decision without a nudge.

TRADITIONAL APPROACH TO DIRECT MARKETING LIST MODELING

Majority of direct marketing campaigns are based on purchase propensity models, selecting customer email, paper mail or other marketing contact lists based on customers’ probability to make a purchase.

 

 
Scoring
Rank
 
Response
Rate
 
Lift
1
28.1%
3.41
2
17.3%
2.10
3
9.6%
1.17
4
8.4%
1.02
5
4.8%
0.58
6
3.9%
0.47
7
3.3%
0.40
8
3.4%
0.41
9
3.5%
0.42
10
0.1%
0.01
Total
8.2%
 

 

Table 1. Example of standard purchase propensity model output used to generate direct campaign mailing list at 1800Flowers.com

This purchase propensity model had a ‘nice’ lift (rank’s response rate over total response rate) for the top 4 ranks on the validation data set. Consequently, we would contact customers included in top 4 ranks. After the catalog campaign had been completed, we conducted post analysis of mailing list performance vs. control group. The control group consisted of customers who were not contacted, grouped by the same purchase probability scoring ranks.

Sample campaign post analysis results:



 
 
Mailing Group
 
Scoring
Rank
 
Response
Rate
1
27.0%
2
20.3%
3
10.7%
4
8.9%
Total
16.7%

 
Control Group
 
Response
Rate
27.9%
20.9%
10.0%
7.5%
16.5%

 
Incremental Response Rate
-0.91%
-0.56%
0.66%
1.38%
0.15%


 
 








Table 2. Campaign Post analysis

As shown the table 2, the top four customer ranks selected by propensity model perform we and control group. However, even though mailing/test group response rate was at decent le incremental response rate (mailing group net of control group) for combined top 4 ranks was low incremental response rate, our undertaking would be likely generating a negative ROI.

What was the reason that our campaign shown such poor incremental results? The purchase propensity model did its job well and we did send an offer to people who were likely to make a purchase. Apparently, modeling based on expected purchase propensity is not always the right solution for a successful direct marking campaign. Since there was no increase in response rate over control group, we could have been contacting customers who would have bought our product without promotional direct mail. Customers in top ranks of purchase propensity model may not need a nudge or they are buying in response to a contact via other channels. If that is the case, the customers in the lower purchase propensity ranks would be more ‘responsive’ to a marketing contact.

We should be predicting incremental impact – additional purchases generated by a campaign, not purchases that would be made without the contact. Our marketing mailing can be substantially more cost efficient if we don’t mail customers who are going to buy anyway.

Since customers very rarely use promo codes from catalogs or click on web display ads, it is difficult to identify undecided, swing customer based on the promotion codes or web display clickthroughs.

Net lift models predict which customer segments are likely to make a purchase ONLY if prompted by a marketing undertaking.

Purchasers from mailing group include customers that needed a nudge, however, all purchasers in the holdout/control group did not need our catalog to made their purchasing decision. All purchasers in the control group can be classified as ‘need no contact’. Since we need a model that would separate ‘need contact’ purchasers from ‘no contact’ purchasers, the net lift models look at differences in purchasers in mailing (contact) group versus purchasers from control group.

In order to classify our customers into these groups we need mailing group and control group purchases results from similar prior campaigns. If there are no comparable historic undertakings, we have to create a small scale trial before the main rollout.

All models described in this project used stepwise logistic regression on data partitioned into test and validation sets. All data prep work was done in base SAS ® and all modeling was done in SAS Enterprise Miner ®.

NET LIFT MODELS

There has been recent mentions of a target selection (i.e., case selection) technique referred to as net lift, uplift, incremental response, differential response, and possible other names.  When posed as a return maximization problem, net lift and the usual target selection practice coincide.  Net lift applies to target selection in situations with a binary treatment; return maximization provides direction on how to handle problems in situations with more than one treatment.

Definition of Uplift modeling: Analytically modeling to predict the influence on a customer's buying behavior that results from choosing one marketing treatment (customer-facing action) over another. The secondary treatment is often passive – make no contact – as evaluated over a control group. The uplift model answers the question, “How much more likely is this treatment to generate the desired outcome than the alternative treatment?” For each customer, the model's prediction drives the decision of which treatment to apply [3].

Problem statement
Given the following data [2]:
·         Cases P = {1,…,n},
·         Treatments J = {1,…,U},
·         expected return R(i,t) for each case  and treatment ,
·         non-negative integers n1,…,nU  such that
n1 + … + nU = n
find a treatment assignment
f: P→J
so that the total return
[i=1 to n] Rif(i)
is maximized, subject to the constraints that the number of cases assigned to treatment j is not to exceed nj (j=1,…,U) [2].
Example 1: Mailing campaign
·         P: a group of customers,
·         two treatments:
1.       treatment 1: send a promotional coupon; Ri1  is the expected return if a coupon is sent to customer i,
2.       treatment 2: no coupon is sent; the expected return is zero: Ri2 = 0
Solution to the maximization problem:
         assign treatment 1 to the customers with the n1  largest values of Ri1
         assign treatment 2 to the remaining customers
This solution can also be derived from the Neyman-Pearson lemma.
Example 2: Marketing action case
         P: a group of customers,
         two treatments:
         treatment 1: exercise some marketing action; Ri1 is the expected return if treatment 1 is given to customer i,
         treatment 2: exercise no the marketing action; let Ri2 be the expected return if treatment 2 is given to customer
Solution to the maximization problem:


The second sum does not involve f, so maximizing total return is equivalent to maximizing the first term


As for to the solution to Example 1, to attain the maximum return:
         assign treatment 1 to the customers with the n1 largest values of Ri1Ri2  
         assign treatment 2 to the remaining customers
The difference Ri1Ri2   is called net lift, uplift, incremental response, differential response, etc.
If one considers only the response to treatment 1, bases targeting on a model built out of responses to previous marketing actions, one is proceeding as if the situation were as in Example1. One would mistakenly maximize
Such maximization would not yield the maximum return. One needs to consider the return from cases subjected to no marketing action.


Example 3: A toy example
Consider the following toy example with a population of n = 3 cases, and U = 3 treatments, n1 = n2 = n3 = 1  and returns:
This  assignment is one that maximizes total return under the given constraints:

 
 
Note that neither case 2 nor case 3 were assigned the treatment that maximize their return.


Although the possibility of a return of 18 exists, this possibility is not realized, since case 2 is not assigned treatment 2.


 (In a case like this, one would probably advice that more resources be allocated to treatment 2, so that n2 > 1.)
 
Example 4: General case

The problem can be cast as a standard integer linear programming problem. If we let

 
then the problem can be written as:

 
subject to the constraints:

 

Note:

In general, the best assignment that solves the linear programming problem does not vary continuously with the coefficients:

         small changes in the returns Rij  result in only small changes in the best total return,

         but, the assignment that yields the best return may vary considerably.

Example 5: A (n almost real) example and variation

Each week, a call center is responsible for contacting a group of customers. The length n of the list is not fixed, but it does not vary much from week to week.

Based on what is known of the customers, and on historical observations, it is possible to estimate the expected probability of successfully contacting each customer at different combinations of time of the day and call type (“home” or “other”).

Un-adjusted probabilities of successful contact are not constant in time…

Problem: make a (calling time, weekday) assignment so that expected total number of contacts is maximized, subject to the constraint that the call centre capacity is limited.

Remarks:

         in general, we will only know an estimate of Rij:


which suggests that insisting on solving the full maximization problem is an over-kill

         in practice, proper call optimization is carried dynamically

A solution sketch:

         segment customers, including the probabilities of successful contact at different times as segmentation variables, so that the probability of contact is approximately constant for the segment

         solve the optimization problem for the fraction of each segment that has to be contacted at each time

NET LIFT MODELING APPROACH – PROBABILITY DECOMPOSITION MODELS

Segments used in probability decomposition models:

 
Contacted Group
Control Group
Purchasers prompted by contact
A
D
Purchasers not needing contact
B
E
NonPurchasers
C
F

 
Figure 2. Segments in probability decomposition models

Standard purchase propensity models are only capable of predicting all purchasers (combined segments A and B). The probability decomposition model predicts purchasers segments that need to be contacted (segment A) by leveraging two logistic regression models, as shown in the formula below [1].


P(A I AUBUC) =
P(AUB I AUBUC) x
(2 - 1/P(AUB I AUBUE))
Probability of purchase prompted by contact
Probability of purchase out of contact group
Probability of purchaser being in contact group out of all purchasers

 

Summary of probability decomposition modeling process:

1.       Build stepwise logistic regression purchase propensity model (M1) and record model score for every customer in a modeled population.

2.       Use past campaign results or small scale trial campaign results to create a dataset with two equal size sections of purchasers from contact group and control group. Build a stepwise regression logistic model predicting which purchasers are from the contact group. The main task of this model will be to penalize the score of model built in the step 1 when purchaser is not likely to need contact.

3.       Calculate net purchasers score based on probability decomposition formula

Results of the probability decomposition modeling process for marketing offer mailing.

 

 
S co ring R a nk
 
Co nta ct
Gro up
R e sp o nse %
 
Co ntro l
Gro up
R e sp o nse %
 
Incre me nta l
R e sp o nse
R a te
1
18.8%
12.9%
5.9%
2
7.8%
5.4%
2.4%
3
6.9%
4.5%
2.5%
4
4.3%
3.6%
0.7%
5
3.9%
3.5%
0.4%
6
4.1%
4.1%
0.0%
7
3.7%
4.0%
-0.2%
8
4.7%
4.1%
0.6%
9
5.0%
6.7%
-1.7%
10
11.0%
15.7%
-4.7%

 

Table 3. Post analysis of campaign leveraging probability decomposition model

Scoring Ranks 1 thru 6 show positive incremental response rates. The scoring ranks are ordered based on the incremental response rates.



CONCLUSION

The probability decomposition model is just one in a group of methods known as net lift models. The net lift models help maximize ROI of marketing campaigns as they let us avoid contacting customers or prospects who are highly likely to buy a product or service anyway. The traditional purchase propensity model may do a good job ranking customers based on their probability to make a purchase but it does not have the ability to select the true responders, the customers who will only make a purchase if contacted. The probability decomposition model has its challenges; it is relatively difficult to interpret as it combines scores of two separate model scores. Following is a list of conditions required for net lift model:

         presence of randomized control group

         analyzed marketing contact is not the only communication leading to purchase

         purchase rate is not correlated to lift, purchase propensity model is not sufficient

         presence of similar/repetitive marketing campaigns or small scale tests

         variation in average lift across scoring ranks

References

1.       Jun Zhong, VP Targeting and Analytics, Card Services Customer Marketing, Wells Fargo in the presentation: “Predictive Modeling & Today’s Growing Data Challnges” at Predictive Analytics World in San Francisco, CA in 2009.

2.       Lo, Victor S.Y. “The True Lift Model - A Novel Data Mining Approach to Response Modeling” in Database Marketing, SIGKDD Explorations. Volume 4 (2002), Issue 2, pg 78-86

3.       Siegel, Eric, “Uplift Modeling: Predictive Analytics Can’t Optimize Marketing Decisions Without It”, Predictive Impact, Inc., 2011.

No comments:

Post a Comment