Subscribe to Vincent Granville's Weekly Digest:

Hi All,

 

Iam currently involved in building a response model for MPE client. sample data set has 3800 variables and 50000 cases. In logistic regression how I can reduce the variables (approx. 2500 continous variabbles and 1300 categorical variable) how to take it forward?

 

Regards,

Ravi

Views: 228

Reply to This

Replies to This Discussion

you could use principal components analysis

 

or, 

 

try variable selection techniques such as stepwise or genetic ones

Thanks Neil. I will tyr.

I'd also recommend trying principal component analysis.

 

You could also use a decision tree algorithm (like C4.5) to generate a tree with limited depth, to figure out which variables are giving you the most information.  Then you can throw out the rest.

 

A third option would be to see if you have any variables that are highly correlated, and keep only one out of each set of correlated variables.

Thanks Mike. I will try these methodologies from my end and keep you posted.

I would say Principal component analysis is a good way of reducing the number of variables but the PCAs would not make sense when you try to implement a logistic regression model or to decide the strategies. In my personal experience, It is very hard to explain PCA variables to the business partners and get them to digest the fact that each PCA is a combination of all the variables.

 

You can try stepwise logistic regression method, use VIF and reduce multicollinearity, check correlations between variables, check fill rates for variables and remove variables with less than 60% fill rate as you would need to impute the remaining observations. You could also try variable clustering methodology.

in R something like that:

 

install.packages('randomForest')

library(randomForest)  

 

rf = tuneRF(Xvars, Yvar, stepFactor=1.2, doBest=T)

rfi = rf$importance   

barplot(rfi[order(- rfi)])

 

RSS

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service