Data Intelligence, Business Analytics
Permalink Reply by Balázs Bárány on July 24, 2012 at 2:47am Hi,
you need to separate the subsetting from the plotting. There many ways to do that.
# Setting up random data
RateA <- (rnorm(50) + 2) * 100
RateB <- (rnorm(50) + 2) * 100
data.table <- data.frame(RateA=RateA, RateB=RateB)
# Using the formula interface of plot()
# Plot everything
plot(RateA ~ RateB, data=data.table)
# Plot the subset
plot(RateA ~ RateB, data=subset(data.table, RateA > 150 | RateB > 150)
# Or with directly filtering the variables - but then you need to make sure that there as many RateA-s as RateB-s
ra <- RateA[RateA > 150]
rb <- RateB[RateB > 150]
length(ra); length(rb)
plot(ra, rb) # gives an error
Permalink Reply by jadelim on July 24, 2012 at 3:13am Hi. May I know which method is this?
I want to identify RateA and Rate B relationship. like what is their link. so which method would be more suitable in this case.
Anyway, thanks for the help.
Permalink Reply by jadelim on July 24, 2012 at 3:55am Hi again. I'm sorry because I am still very new to R and is unfamiliar with the scripting.
May I know what does the below command means? i have thousands of data in my dataset but as i follow the command you gave me. the graph came out with only less than a hundred i guess?
RateA <- (rnorm(50) + 2) * 100
RateB <- (rnorm(50) + 2) * 100
for the last command
plot (ra, rb). it does give an error. any idea how to solve the error in order to plot?
thanks in advance!
Permalink Reply by Balázs Bárány on July 24, 2012 at 4:35am rnorm(50) gives you 50 random numbers.
It was an attempt to create a dataset that has the same structure as your dataset.
You can't plot vectors with differing lengths. This is why the plot(ra, rb) command fails. Your original attempt, if it worked, would fail in the same way.
Use the subset method with your original data.table for the plot.
Permalink Reply by Balázs Bárány on July 24, 2012 at 4:33am Hi, you can use the cor() function for determining if a linear correlation exists. Or you try to model the relationship with a linear model (lm), a generalized linear model (glm) etc.
Permalink Reply by jadelim on July 25, 2012 at 12:30am i have one command here doing with lm.
>attach(data.table)
>plot(RateA, RateB)
>ahmetrics = lm(RateA~RateB)
>summary(ahmetrics)
>abline(ahmetrics)
the above command can find the relationship between RateA and RateB? this is using the linear regression method.
Permalink Reply by Balázs Bárány on July 25, 2012 at 6:25am Hi!
Yes, this is exactly how you use linear regression to analyse the (linear) relationship between two variables.
The most interesting output from summary() is the (Intercept) line and the one below that, starting with RateB. The Estimate column shows the slope of the regression line and the Pr(>|t|) column the confidence of the metric.
Please read one of the multiple R introductions online or in a book about what the summary output from lm models means and about regression diagnostics.
It is very hard to google for R specific information. There is a specific search engine on www.r-project.org, and also FAQs and manuals.
I think that the following command that will do the plot,
plot(RateA[RateA<150 & RateB<150], RateB[RateA<150 & RateB<150], data=data.table) .
I am not sure if you can use the data=data.table in plot the same way the association is used in regression functions. If not, you will have to use the full variable names.
Hope this is helpful.
Margot
Permalink Reply by jadelim on July 26, 2012 at 9:16pm Hi. What if for now i would like to plot the Rate between 40 to 50. How do I type the command.
I'm a little confused by the operators to use for between.
Also, other than linear regression. I can also use association or classification method to find the relationship between Rate A and Rate B right?
Permalink Reply by Balázs Bárány on July 27, 2012 at 12:15am You can use more conditions in the subsetting.
e.g. RateA[RateA >40 & RateA < 50 & RateB > 40 & RateB<50]
You didn't tell us anything about your problem. What is the target variable?
Regression methods are for numeric prediction: Which numeric value will the target variable get, given the input variables?
Classification is for separating test cases into two or more groups. Will it rain tomorrow? Which species is an iris flower?
Association methods predict the value of some variables in a data set using other variables. The classical example is the market basket: Given "sausage" and "cheese", will the customer also buy "bread" and "milk"?
Permalink Reply by jadelim on July 30, 2012 at 9:03pm Hi again!
I wanted to find the correlation between RateA and RateB. I've tried the method using regression. Is there any other method that I can use to find the correlation?
Another scenario here. The above one was RateA and RateB, both are number.
If i would like to plot a graph with RateA, another is with it's name. How can it be done. because i tried but with error.
Error: unexpected ')' in "plot
© 2013 AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC