Simulation Educators: Uplift (Netlift) Modeling

This blog describes basic concepts, benefits and challenges of implementation of Net Lift Models in direct marketing campaigns. Net lift models predict which customer segments are likely to make a purchase ONLY if prompted by a marketing undertaking. The modeling work was conducted using stepwise logistic regression in SAS Enterprise Miner ®.

The paper provides examples how net lift probability decomposition models leveraged differences between purchasers in test group and control group to predict which customer segments need a marketing contact and which customers segments are likely to make purchasing decision without a nudge.

TRADITIONAL APPROACH TO DIRECT MARKETING LIST MODELING

Majority of direct marketing campaigns are based on purchase propensity models, selecting customer email, paper mail or other marketing contact lists based on customers’ probability to make a purchase.

Scoring Rank	Response Rate	Lift
1	28.1%	3.41
2	17.3%	2.10
3	9.6%	1.17
4	8.4%	1.02
5	4.8%	0.58
6	3.9%	0.47
7	3.3%	0.40
8	3.4%	0.41
9	3.5%	0.42
10	0.1%	0.01
Total	8.2%

Table 1. Example of standard purchase propensity model output used to generate direct campaign mailing list at 1800Flowers.com

This purchase propensity model had a ‘nice’ lift (rank’s response rate over total response rate) for the top 4 ranks on the validation data set. Consequently, we would contact customers included in top 4 ranks. After the catalog campaign had been completed, we conducted post analysis of mailing list performance vs. control group. The control group consisted of customers who were not contacted, grouped by the same purchase probability scoring ranks.

Sample campaign post analysis results:

	Mailing Group
Scoring Rank	Response Rate
1	27.0%
2	20.3%
3	10.7%
4	8.9%
Total	16.7%

Control Group

Response

Rate

27.9%

20.9%

10.0%

7.5%

16.5%

Incremental Response Rate

-0.91%

-0.56%

0.66%

1.38%

0.15%

Table 2. Campaign Post analysis

As shown the table 2, the top four customer ranks selected by propensity model perform we and control group. However, even though mailing/test group response rate was at decent le incremental response rate (mailing group net of control group) for combined top 4 ranks was low incremental response rate, our undertaking would be likely generating a negative ROI.

What was the reason that our campaign shown such poor incremental results? The purchase propensity model did its job well and we did send an offer to people who were likely to make a purchase. Apparently, modeling based on expected purchase propensity is not always the right solution for a successful direct marking campaign. Since there was no increase in response rate over control group, we could have been contacting customers who would have bought our product without promotional direct mail. Customers in top ranks of purchase propensity model may not need a nudge or they are buying in response to a contact via other channels. If that is the case, the customers in the lower purchase propensity ranks would be more ‘responsive’ to a marketing contact.

We should be predicting incremental impact – additional purchases generated by a campaign, not purchases that would be made without the contact. Our marketing mailing can be substantially more cost efficient if we don’t mail customers who are going to buy anyway.

Since customers very rarely use promo codes from catalogs or click on web display ads, it is difficult to identify undecided, swing customer based on the promotion codes or web display clickthroughs.

Net lift models predict which customer segments are likely to make a purchase ONLY if prompted by a marketing undertaking.

Purchasers from mailing group include customers that needed a nudge, however, all purchasers in the holdout/control group did not need our catalog to made their purchasing decision. All purchasers in the control group can be classified as ‘need no contact’. Since we need a model that would separate ‘need contact’ purchasers from ‘no contact’ purchasers, the net lift models look at differences in purchasers in mailing (contact) group versus purchasers from control group.

In order to classify our customers into these groups we need mailing group and control group purchases results from similar prior campaigns. If there are no comparable historic undertakings, we have to create a small scale trial before the main rollout.

All models described in this project used stepwise logistic regression on data partitioned into test and validation sets. All data prep work was done in base SAS ® and all modeling was done in SAS Enterprise Miner ®.

NET LIFT MODELS

There has been recent mentions of a target selection (i.e., case selection) technique referred to as net lift, uplift, incremental response, differential response, and possible other names. When posed as a return maximization problem, net lift and the usual target selection practice coincide. Net lift applies to target selection in situations with a binary treatment; return maximization provides direction on how to handle problems in situations with more than one treatment.

Definition of Uplift modeling: Analytically modeling to predict the influence on a customer's buying behavior that results from choosing one marketing treatment (customer-facing action) over another. The secondary treatment is often passive – make no contact – as evaluated over a control group. The uplift model answers the question, “How much more likely is this treatment to generate the desired outcome than the alternative treatment?” For each customer, the model's prediction drives the decision of which treatment to apply [3].

Problem statement

Given the following data [2]:

· Cases P = {1,…,n},

· Treatments J = {1,…,U},

· expected return R(i,t) for each case and treatment ,

· non-negative integers n₁,…,n_U such that

n₁ + … + n_U = n

find a treatment assignment

f: P→J

so that the total return

∑_{[i=1 to n]}R_if(i)

is maximized, subject to the constraints that the number of cases assigned to treatment j is not to exceed n_j (j=1,…,U) [2].

Example 1: Mailing campaign

· P: a group of customers,

· two treatments:

1. treatment 1: send a promotional coupon; R_i₁ is the expected return if a coupon is sent to customer i,

2. treatment 2: no coupon is sent; the expected return is zero: R_i₂= 0

Solution to the maximization problem:

• assign treatment 1 to the customers with the n₁ largest values of R_i₁

• assign treatment 2 to the remaining customers

This solution can also be derived from the Neyman-Pearson lemma.

Example 2: Marketing action case

• P: a group of customers,

• two treatments:

• treatment 1: exercise some marketing action; R_i₁ is the expected return if treatment 1 is given to customer i,

• treatment 2: exercise no the marketing action; let R_i₂ be the expected return if treatment 2 is given to customer

Solution to the maximization problem:

The second sum does not involve f, so maximizing total return is equivalent to maximizing the first term

As for to the solution to Example 1, to attain the maximum return:

• assign treatment 1 to the customers with the n₁ largest values of R_i₁ – R_i₂

• assign treatment 2 to the remaining customers

The difference R_i₁ – R_i₂ is called net lift, uplift, incremental response, differential response, etc.

If one considers only the response to treatment 1, bases targeting on a model built out of responses to previous marketing actions, one is proceeding as if the situation were as in Example1. One would mistakenly maximize

Such maximization would not yield the maximum return. One needs to consider the return from cases subjected to no marketing action.

Example 3: A toy example

Consider the following toy example with a population of n = 3 cases, and U = 3 treatments, n₁ = n₂ = n₃ = 1 and returns:

This assignment is one that maximizes total return under the given constraints:

Note that neither case 2 nor case 3 were assigned the treatment that maximize their return.

Although the possibility of a return of 18 exists, this possibility is not realized, since case 2 is not assigned treatment 2.

(In a case like this, one would probably advice that more resources be allocated to treatment 2, so that n₂ > 1.)

Example 4: General case

The problem can be cast as a standard integer linear programming problem. If we let

then the problem can be written as:

subject to the constraints:

Note:

In general, the best assignment that solves the linear programming problem does not vary continuously with the coefficients:

• small changes in the returns R_ij result in only small changes in the best total return,

• but, the assignment that yields the best return may vary considerably.

Example 5: A (n almost real) example and variation

Each week, a call center is responsible for contacting a group of customers. The length n of the list is not fixed, but it does not vary much from week to week.

Based on what is known of the customers, and on historical observations, it is possible to estimate the expected probability of successfully contacting each customer at different combinations of time of the day and call type (“home” or “other”).

Un-adjusted probabilities of successful contact are not constant in time…

Problem: make a (calling time, weekday) assignment so that expected total number of contacts is maximized, subject to the constraint that the call centre capacity is limited.

Remarks:

• in general, we will only know an estimate of R_ij:

which suggests that insisting on solving the full maximization problem is an over-kill

• in practice, proper call optimization is carried dynamically

A solution sketch:

• segment customers, including the probabilities of successful contact at different times as segmentation variables, so that the probability of contact is approximately constant for the segment

• solve the optimization problem for the fraction of each segment that has to be contacted at each time

NET LIFT MODELING APPROACH – PROBABILITY DECOMPOSITION MODELS

Segments used in probability decomposition models:

	Contacted Group	Control Group
Purchasers prompted by contact	A	D
Purchasers not needing contact	B	E
NonPurchasers	C	F

Figure 2. Segments in probability decomposition models

Standard purchase propensity models are only capable of predicting all purchasers (combined segments A and B). The probability decomposition model predicts purchasers segments that need to be contacted (segment A) by leveraging two logistic regression models, as shown in the formula below [1].

P(A I AUBUC) =	P(AUB I AUBUC) x	(2 - 1/P(AUB I AUBUE))
Probability of purchase prompted by contact	Probability of purchase out of contact group	Probability of purchaser being in contact group out of all purchasers

Summary of probability decomposition modeling process:

1. Build stepwise logistic regression purchase propensity model (M1) and record model score for every customer in a modeled population.

2. Use past campaign results or small scale trial campaign results to create a dataset with two equal size sections of purchasers from contact group and control group. Build a stepwise regression logistic model predicting which purchasers are from the contact group. The main task of this model will be to penalize the score of model built in the step 1 when purchaser is not likely to need contact.

3. Calculate net purchasers score based on probability decomposition formula

Results of the probability decomposition modeling process for marketing offer mailing.

S co ring R a nk	Co nta ct Gro up R e sp o nse %	Co ntro l Gro up R e sp o nse %	Incre me nta l R e sp o nse R a te
1	18.8%	12.9%	5.9%
2	7.8%	5.4%	2.4%
3	6.9%	4.5%	2.5%
4	4.3%	3.6%	0.7%
5	3.9%	3.5%	0.4%
6	4.1%	4.1%	0.0%
7	3.7%	4.0%	-0.2%
8	4.7%	4.1%	0.6%
9	5.0%	6.7%	-1.7%
10	11.0%	15.7%	-4.7%

Table 3. Post analysis of campaign leveraging probability decomposition model

Scoring Ranks 1 thru 6 show positive incremental response rates. The scoring ranks are ordered based on the incremental response rates.

CONCLUSION

The probability decomposition model is just one in a group of methods known as net lift models. The net lift models help maximize ROI of marketing campaigns as they let us avoid contacting customers or prospects who are highly likely to buy a product or service anyway. The traditional purchase propensity model may do a good job ranking customers based on their probability to make a purchase but it does not have the ability to select the true responders, the customers who will only make a purchase if contacted. The probability decomposition model has its challenges; it is relatively difficult to interpret as it combines scores of two separate model scores. Following is a list of conditions required for net lift model:

• presence of randomized control group

• analyzed marketing contact is not the only communication leading to purchase

• purchase rate is not correlated to lift, purchase propensity model is not sufficient

• presence of similar/repetitive marketing campaigns or small scale tests

• variation in average lift across scoring ranks

References

1. Jun Zhong, VP Targeting and Analytics, Card Services Customer Marketing, Wells Fargo in the presentation: “Predictive Modeling & Today’s Growing Data Challnges” at Predictive Analytics World in San Francisco, CA in 2009.

2. Lo, Victor S.Y. “The True Lift Model - A Novel Data Mining Approach to Response Modeling” in Database Marketing, SIGKDD Explorations. Volume 4 (2002), Issue 2, pg 78-86

3. Siegel, Eric, “Uplift Modeling: Predictive Analytics Can’t Optimize Marketing Decisions Without It”, Predictive Impact, Inc., 2011.

Simulation Educators

Thursday, November 15, 2012

Uplift (Netlift) Modeling

No comments:

Post a Comment