User Tools

Site Tools


public:relevant_r_help

Relevant R Help

This wiki is meant for people who are comfortable in Matlab and need some of the functionality of R. This guide is made with only the relevant knowledge of R and attempts to minimize the use of R. This means that all data pre-processing must occur outside of R.

This wiki will cover the following relevant topics in R.

  1. Parsing data files in Matlab for R
  2. Loading data files into R
  3. Linear Regression
  4. Non-Linear Regression
  5. Logistic Regression
  6. Multiple Regression
  7. Multiple Non-Linear (Logistic) Regression
  8. Regression Summaries

Parsing data files in Matlab for R

R can read most text files quite easily. The text files can come with any combination of headers and/or indexes:

Header1Header2Headern
Data1Data2Datan

or

Header1Header2Headern
IndexData1Data2Datan

or

Data1Data2Datan

So, in order to read data into R, you need to write out a file in that format.

If you are going to do any statistics, make sure that you remove outliers and omitted points BEFORE loading into R, because I'm not going to tell you how to omit points once you are in R.

Loading data files into R

In order to enter data into R, it's quite easy. Type:

DataArray ← read.table(“filename”,header=TRUE)
or
DataArray ← read.table(“filename”,header=FALSE)

Once the data is in R, you can refer to parts of the data like this:

DataArray$Header_1

There is a way to not have to type DataArray$ every time, but I don't reccommend it for our purposes.

Linear Regression

This is exceedingly easy in R, once you know what all the variables.

Let y and x be column vectors of data values that are all the same length. The linear regression of these vectors is:

fitted.model ← lm(y ~ x,data=data.frame)
This model allows for intercept estimation as well.

fitted.model ← lm(y~x+a,data=data.frame)
This model fixes the intercept to be a.

Note that x and y must be the name of the column in the data matrix. Aka: data$x and data$y must exist.
If this is not the case, change the name of the variables in the function.

Multiple Regression

Similar to linear regression, all we now have to do is change our data and our formula.

Example:

fitted.model ← lm(y ~ x1 + x2, data=data.frame)
In this case, the data.frame has columns titled y, x1, and x2. It can have more columns than that as well.

Non-Linear Regression

Similarly to linear regression, all we have to do now is change our formula, which was the first argument. We can change the formula on BOTH sides of the equation.

Examples:

fitted.model ← nls(y ~ x1*log(x1),data=data.frame)
or
fitted.model ← nls(log(y) ~ x1*exp(x1),data=data.frame)

For parameter estimation, use the following code:
fitted.model ← nls(y~ a*x1*log(b*x2),data=data.frame,start=list(a=0,b=0));

Multiple Non-Linear Regression

To do both a multiple regression and a non-linear regression, the generalization is the same from linear regression to multiple linear regression.

Example:

fitted.model ← nlm(y^p[2] ~ (p[1]*x1)^p[2]+(p[3]*x2)^p[2],p=c(start values for p), data=data.frame)

Regression Summaries

In order to output the results of your regression, there are MANY possibilities. I reccomend:

summary(fitted.model)
This is a comprehensive summary of the entire fit.

Other possibilities are:

formula(fitted.model)
This extracts the model formula.

deviance(fitted.model)
Residual sum of squares of the model.

coef(fitted.model)
Extracts the regression coefficient (matrix)

fitted.model
This prints a concise version of the model.

resid(fitted.model)
Displays the residuals

U Penn R Study Group Notes

Shiny

Shiny provides for interactive R widgets on websites:

http://www.rstudio.com/shiny/

public/relevant_r_help.txt · Last modified: 2012/11/21 15:43 by aguirreg