For marketing models, a few measures are important, and they are related to each other:
1. The r-squared in logistic or regular regression - sometimes presented as the 'measure of fit' in an Inductive Decision Tree tool - is a key measure. An r-squared of '0' means that your model does not capture any of the change in your dependent variable, while a '1' reflects that your model perfectly captures (predicts or classifies) the variability (whether categorical or numerical) in your dependent variable.
2. Another important measure is 'Lift' - this statistic measures, for example, how much your model improves your predicted or classified response rate versus how many fewer people must be targeted.
3. You can also calculate the $ marketing savings that would accrue from applying your marketing model.
I have more than 4 years of experience building risk as well as marketing model using logistic regression. From my experience I think actually KS is used for the marketing models because here based on the Supreme difference of the old model and new model we can stop at the decile ,suppose KS is highest at the 5th decile then we can cut down on the mailing volume by 50%. As a matter of fact Gini and Rank ordering are checked simultaneously to finalize the model.
Well, so, KS is actually like a 'Lift in Response' statistic that can be realized by applying a model. For example, would the KS statistic be 7 if 5% of a random sample of our customer file were 'responders' but our logistic model identified a segment with a 35% response rate?
Or, does the KS statistic express a comparision between what percentage (less) of a population must be targeted to achieve a certain response rate (higher one) based on the logistic model results? I assume that, if only certain deciles are being used, this is the case?
Thanks Tom and Subhadip!
Your replies are really helpful.
So if I have got it correct, KS is the measure used in marketing models to target a smaller base of customers to get a higher response from what we have got in any random case.
Basically an optimum solution where we are reducing cost and maximizing the profit at the same time. (Please correct me if I am wrong).
Yes you right. Now for Risk modeling also we use KS but here we use the KS stat to derive the risk score where the difference between %good and %bad is maximum (here you basically plot %good and %bad on the y axis and Risk score derived from the model on the x axis and the check at what risk score the two lines are wide apart). Other than that Gini is the most crucial stat for Risk models.
KS is a test statistic among nonparametric tests. I do not know if this could be used in this case, as the problem is not completely specified. If validation of the model is the goal then KS is one of the methods.
What's the objective? How well does the model meet that objective?
Let me give some examples:
1) Model bank deposits based on demographics to get an understanding of what the expected deposits for each customer are, and their differences from the expected. Here, R^2 is appropriate.
2) We want to make prospecting mailings, but we can only mail to 5% of our potential list. Here the lift on the top 5% is the right criteria.
3) We want to understand customer attrition, and be able to make strategies accross the range of attrition risks. Here, an overall metric like KS is appropriate, although I use ROC.
Three different purposes, three different metrics.