Subscribe to DSC Newsletter

What is a Prediction Problem?

A business problem which involves predicting future events by extracting patterns in the historical data. Prediction problems are solved using Statistical techniques, mathematical models or machine learning techniques.

For example: Forecasting stock price for the next week, predicting which football team wins the world cup, etc.

What is Regression analysis, where is it applicable?

While dealing with any prediction problem, the easiest, most widely used yet powerful technique is the Linear Regression. Regression analysis is used for modeling the relationship between a response variable and one or more input variables.

In simpler terms,Regression Analysis helps us to find answers to:

  • Prediction of Future observations
  • find association, relationship between variables.
  • Identify which variables contribute more towards predicting the future outcomes.

Types of regression problems:

Simple Linear Regression:

 If model deals with one input, called as independent or predictor variable and one output variable, called as dependent or response variable then it is called Simple Linear Regression. In this type of Linear regression, it assumes that there exists a linear relation between predictor and response variable of the form.

Y ≈ β0 + β1X + e.


In the above equation, β0,β1 are the unknown constants that represent intercept and slop of a straight line which we learned in our high schools. These known constants are known as the model coefficients or parameters. From the above equation, X is the known input variable and if we can estimate β0,β1 by some method then Y can be predicted. In order to predict future outcomes, by using the training data we need to estimate the unknown model parameters (ˆ β0,ˆ β1) using the equation.

ˆy = ˆ β0 + ˆ β1x + ˆe, where ˆ y,ˆ β0,ˆ β1 are the estimates.


Multiple Linear Regression:

If the problem contains more than one input variables and one response variable, then it is called Multiple Linear regression.

How do we apply Regression analysis using R?

Let us apply regression analysis on power plant dataset available from here. The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant.

please visit here for full blog post with R code.

Views: 13662


You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Khurram on January 1, 2015 at 7:54pm

As you mentioned regression helps answers to find association, relationship between variables.As per my understanding association  relates to correlation where variables completly not dependent to another variable. This is true with regression which let you know the relation between variable.

Comment by Prof. Dr. Diego Kuonen on December 30, 2014 at 4:53am

Note that the assumptions on the errors of the multiple linear regression model are not satisfied!

For example, huge residuals clearly demand a robust fit (e.g. using MM-estimation as in "lmRob" of R package "robust", or "lmrob" of package "robustbase"), and ACF of resid. may show autocorrelations with such dependent (time-ordered) data (e.g. using "acf(resid(model2))" in R.

It makes sense to base inferences or conclusions only on valid models. In other words, any conclusion is only as sound as the model on which it is based.

Comment by suresh kumar Gorakala on December 29, 2014 at 3:07am

Thanks Justice Moses for the explanation. Will takecare of such things in future

Comment by JUSTICE MOSES K. AHETO on December 27, 2014 at 1:39pm

Hi Suresh,

Many thanks for throwing more light on some basics of regression models/analysis, well done.

The general regression model formula you presented at the top of the graph is correct.

However, the estimated regression model below the graph is quiet not right. Once you introduced the cap at the top of y (y with cap on top), you now have a fitted regression model whose expected error must be zero, ie E(e)=0. This means that you should have ˆy = ˆ β0 + ˆ β1x instead of ˆy = ˆ β0 + ˆ β1x + ˆe unless in multilevel model in which apart from allowing the observations at higher level to be correlated within the group, you are also allowing for complex level 1 residuals.  

Hope this helps.

Once again, well done. 

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017 is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service