Subscribe to Vincent Granville's Weekly Digest:
Mark
  • Male
  • Johannesburg
  • South Africa
Share Twitter

Mark's Friends

  • Ruan Metembo
  • Cristina Priego Crespo
  • Joscelyne Gray
  • Dana Mullarkey
  • Karen Alexandre, Ph.D.
  • Manish Mohan
  • Nathan Och
  • George Lee
  • Pooneh Varasteh
  • Nicolene Benecke
  • Paul Swanepoel
  • Payel Bhattacharya
  • Semkiwa G. L
  • Abhinav Jain
  • dabsy

Mark's Groups

Mark's Discussions

Handling Imbalanced data when building regression models
13 Replies

Dear colleaques and friends,i would like to know how you go about handling a dataset with imbalanced groups being modelled using a classification model eg logistics regression. As an example, fitting…Continue

Started this discussion. Last reply by Mark Apr 20.

Fitting a Pareto Distribution on SAS

Hi Everyone,I would like some assistance on fitting a pareto distribution on SAS. I have functions for other continous distributions eg lognormal, exponential, weibull but i havent been able to find…Continue

Started Nov 27, 2012

Excluding variables from a logistic regression model based on correlation
2 Replies

Hi,is it neccessary to exclude independent variables from a regression model based on the fact that they are correlated. i am working on a logistic regression model built from a very large dateset…Continue

Started this discussion. Last reply by Ralph Winters Feb 28, 2012.

Are financial institutions investing sufficiently on Statistical/Quantitative methods for fraud detection and prevention?

Dear All,   i am doing research in South Africa on Fraud detection and prevetion using quantitative approaches. I would love a global view on whether financial institutions e.g. bank are investing…Continue

Started May 20, 2011

 

Mark's Page

Latest Activity

Mark replied to Mark's discussion Handling Imbalanced data when building regression models
"Many thanks Abhijit, i agree with Steven Finlay that this paper gives a comprehensive review of how to deal with imbalances in datasets when modelling.   Mark"
Apr 20
Steven Finlay replied to Mark's discussion Handling Imbalanced data when building regression models
"Abhijit   Thanks, I've not seen this paper before - a very comprehensive literature review of the methods to date.   Steve    "
Apr 19
Abhijit Kulkarni replied to Mark's discussion Handling Imbalanced data when building regression models
"Hi, Please find below the link for a very good review paper which addresses this problem. I hope you will find it interesting: http://www.ele.uri.edu/faculty/he/PDFfiles/ImbalancedLearning.pdf Best, abhijit"
Apr 19
Steven Finlay replied to Mark's discussion Handling Imbalanced data when building regression models
"Hi Mark In my experience only 230 bads is quite difficult to work with for a credit scoring type problem. One problem with bootstrapping is that you only use about 2/3 of the data for each model which in your case may be a problem. To put it…"
Apr 19
Mark replied to Mark's discussion Handling Imbalanced data when building regression models
"Hi Abhijit, thank you very much for yourresponse. i think alot of us developing predictive models have always used the undersampling technique when dealing with unbalanced datasets, and you are right there is likelihood of loss of information. I…"
Apr 19
Abhijit Kulkarni replied to Mark's discussion Handling Imbalanced data when building regression models
"Hello Mark, Handling imbalanced data sets in classification is a tricky job. As suggested in other replies, you can handle it with few sampling tricks. Under-sampling the majority class in my view is not advisable as it is normally…"
Apr 19
Mark replied to Mark's discussion Handling Imbalanced data when building regression models
"Hi Bala, i worked on a fraud model with a dataset which had 99.95% of non frauds and 0.05% of frauds. i used an undersampling technique to adjust the dataset so that the ratio of frauds to non-frauds in the model development dataset was 1:10. this…"
Apr 19
Steven Finlay replied to Mark's discussion Handling Imbalanced data when building regression models
"Hi DataMiner   That's interesting to know.   When assessing the performance of a model you should always use a data set that matches your real world application to decide how good it is. So in the above example, if you…"
Apr 19
Mark replied to Mark's discussion Handling Imbalanced data when building regression models
"Hi Steven, thank you very much for the detailed response. i have been developing scoring models and have always undersampled the larger group of the responses. Is there a minimum number of observations you need to have in a group? As an example i…"
Apr 19
Steven Finlay replied to Mark's discussion Handling Imbalanced data when building regression models
"Hi Bala Most modelling software (e.g. SAS) allows you to create a weight variable. In the above example you would set the weight variable to 1 for the non responders (Majority class) and 50 for the minority class; i.e. the responders (My…"
Apr 19
BR Deshpande replied to Mark's discussion Handling Imbalanced data when building regression models
"Hi Steven Thanks for your explanation. We have also struggled with this issue and when we tried support vector machines, they were quite sensitive to imbalanced data. We balanced the data by undersampling but the results were sub-par. I am not sure…"
Apr 18
Steven Finlay replied to Mark's discussion Handling Imbalanced data when building regression models
"Hi Mark This is a good question, and one that seems to get raised time and time again. Myself and a colleague (Sven Crone from Lancaster University in the UK) published a paper on this issue last year in the International Journal of Forecasting.…"
Apr 18
Mark's discussion was featured

Handling Imbalanced data when building regression models

Dear colleaques and friends,i would like to know how you go about handling a dataset with imbalanced groups being modelled using a classification model eg logistics regression. As an example, fitting a logistic regression model to a dataset whose dependent variable is made up of 5% of bads and 95% of goods.See More
Apr 16
Mark posted a discussion

Fitting a Pareto Distribution on SAS

Hi Everyone,I would like some assistance on fitting a pareto distribution on SAS. I have functions for other continous distributions eg lognormal, exponential, weibull but i havent been able to find one for the pareto which finds the estimates and adds the curve to a histogram on SAS.See More
Nov 27, 2012
Mark replied to Mark's discussion Error on proc import in the group SAS Network
"Hey Timothy, used Ajay's suggestion and followed your advice on formatting the variables and it worked perfectly. many thanks. Mark"
Jun 21, 2012
Mark replied to Mark's discussion Error on proc import in the group SAS Network
"Hey Ajay, thank you very much. it is working now. regards"
Jun 21, 2012

Profile Information

Short Bio:
Quantitative Analyst at First National bank in South Africa.
Field of Expertise:
Predictive Modeling, Data Mining, Statistical Programming
Years of Experience in Analytical Role:
3
Professional Status:
Technical
Interests:
Networking
Your Company:
First National Bank
Industry:
banking
How did you find out about AnalyticBridge?
Online

Comment Wall

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

  • No comments yet!
 
 
 

Follow us

© 2013   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC

Badges  |  Report an Issue  |  Terms of Service