Subscribe to Vincent Granville's Weekly Digest:
can we find out which variables are important for carrying out logisitc regression before carrying out logistic regression?

Views: 729

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by RockyRambo on March 8, 2012 at 11:26am

Variable transformations are usually applied if the relationship between the independent and dependent variables is not linear..

Comment by Tom Wolfer on November 1, 2010 at 7:52am
Minethedata. A quick response for now, whether your dependent variable is Dichotomous or not, it is always a good idea to try running your data through multiple techniques: logistic regression, neural net, decision tree. Then, compare your results and choose the techniques that gives you the best accuracy and reliability as far as a classification or prediction.
Comment by Minethedata on November 1, 2010 at 7:20am
Tom, thanks for the input. Yes, I was about to ask the question on when to use decision trees, logisitic regression, clustering or neural nets when the dependent variable is dichotomous. Can you please elaborate on this as well as why to use other data mining techniques when the relationship is non-linear? Is this rule applicable even when there is exponential relationship between variables and logisitc regression seems to be apt for modeling?Is this non-linear relationship between the dependent variable and any one independent var even if other indepdendent variables are not non-linearly related.Also please let me know if you have reading material on this
Comment by Minethedata on October 29, 2010 at 6:00am
Yes, I was about to ask the question on when to use decision trees, logisitic regression, clustering or neural nets when the dependent variable is dichotomous. Can you please elaborate on this as well as why to use other data mining techniques when the relationship is non-linear? Is this rule applicable even when there is exponential relationship between variables and logisitc regression seems to be apt for modeling?Is this non-linear relationship between the dependent variable and any one independent var even if other indepdendent variables are not non-linearly related.Also please let me know if you have reading material on this.
Comment by Tom Wolfer on October 28, 2010 at 7:46am
Minethedata. If you believe that there is a predictive relationship that is non-linear in nature, then that is where using other datamining techniques becomes important: Inductive Decision Tree, Clustering. Thes techniques analyze data for relationships that are not linear. After you run your logistic regression model, try inputting your same variables (even ones not included in the regression) into a decison tree.
Comment by Minethedata on October 28, 2010 at 4:41am
Thanks Tom for th answer on concatenating 2 variables. A question which comes to my mind is that we consider a variable to be fit for using it for Logistic regression if it has a high correlation with the dependant variable. A high correlation coefficient also tells us that there is a linear relationship between the variables. Suppose 2 variables have a non-linear relationship then the correlation coefficient may not capture this and we may land up neglecting the variable for logisitic regression. Or do we have to check for other correlation coefficients, and if yes which one should be used in case of numeric variables.
Comment by Tom Wolfer on October 26, 2010 at 7:51am
Minethedata. If, after doing your exploratory analysis, you find that both gender and education have an impact on, say, response to a campaign, then you can concatenate and create variables to feed into the logistic regression. Let's say males are most likely to respond and those with a university degree are also most likely to respond. You can create a set of dummy variables from these two: the variables would represent combinations of the categories across the gender and education variables. For example, the most important dummy variable would be named 'Male_University' (a dichotmous variable that would have a value of '1' if true and '0' if false). This variable would be an input into your logistic regression. The other dummy variables would be 'Male_No_University', 'Female_University', 'Female_NoUniversity'...assuming that the education variable had just two categories. Does this make sense?
Comment by Ralph Winters on October 25, 2010 at 2:33pm
To properly do correlation (and not association) you need the number of rows to be equal to the number of columns. Then assuming the data is at least ordinal I would perform a Spearman rank correlation test. If you are talking about simple association then you could perform a test like Cramers V.

-Ralph Winters
Comment by Minethedata on October 25, 2010 at 2:45am
Tom, using crosstabs i get tthe frequency of the data. Suppose there are 3 columns and 2 rows ( this is all multinomial data) what is the formula used for getting the phi coefficient( I asume you will get the corelation coefficient using phi coefficient only) value. Also in general if there is not a 2*2 contigency table ,, what is the formula used?

Also please elaborate how to concatenate 2 variables?
Comment by Tom Wolfer on October 22, 2010 at 1:18pm
Minethedata. Well, you could use simple crosstabs and a chi-square to determine a correlation between categorical variables - binomial or multinomial. Picking up on Idielle's point: if you find two categorical variables that are correlated highly, you could concatenate the two variables into one.

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service