# AnalyticBridge

Subscribe to Vincent Granville's Weekly Digest:

# Data Mining Cookbook

This is a good book but lack details. Here is a question.

In Chapter 10 for identifying high risk customers, the author selects a sample of customers and brings in 300+ variables from the credit bureau. After applying logistic regression, 62 variables are retained. Then the author doesn't say how to apply this model to the whole customer base. Should the company buys 62 variables for all customers from the credit bureau in order to use the model to produce risk scores for them.

Views: 464

### Replies to This Discussion

You do not have to buy any additional database. If the logistic regression model comes out to  be reliable and valid, with the help of the coefficients, an equation can be formulated which gives P(Y). By default probability more than or equal to  .5 would mean category 1, else, category 0. Hope it helps....Ravi Sangal

Jason.  The author is implying that it would be too costly and computationally inefficient to purchase and run a logistic regression on the whole population in order to determine the significant variables.  So she chose a smaller sample.  Once these variables are identified, they can be purchased for the entire population and then scored.

=Ralph Winters

Can anyone tell me how normally a credit bureau sell their data? Do they charges by # of records, by # of variables, or both?

Jason-

Typically by # of Records.  Sometimes it can be custom pricing based on the project.

1

2

3

4

5

6

7

8

9

10