Skip to main content

Online Learning Resources: Linear Regression in R

This post is part of a series of posts on online learning resources for data science and programming.

By Andre Archer, Data Science Research Consultant

Linear regression is a popular technique in quantitative fields. Linear regression, also called OLS or linear fitting, predicts response variables from explanatory variables and provides a means of interpreting the effects of the explanatory variables. Linear regression falls under a special class of statistical techniques called generalized linear models (GLMs). In this blog, we will be discussing free online resources that can assist the Northwestern community on getting started and comfortable with linear regression in R.

Given a response variable Y, linear regression predicts Y using the formula

Y = b1*(explanatory variable 1)+ b2*(explanatory variable 2)+ ….  +  bm*(explanatory variable m)

where b1, b2, .. bm are coefficients of the linear model. The coefficients are obtained from fitting the model from data.

As with other guides in this series, we’re focusing on resources that can be accessed for free by members of the Northwestern community, and we’re focusing on resources other than full-length online courses.

Getting Started

R-Linear Regression
This tutorial provides a reasonable introduction to linear regression, especially for those without a significant mathematical background. It briefly describes what is a linear model in the case of a single explanatory variable — the theory of a straight line. It then discusses how to fit the straight line to data (i.e. how to do linear regression) using R and to use this straight line to predict unseen data.

Linear Regression
UC Business Analytics R Programming Guide
Sections 1 (Replication requirements) to 3 (Simple linear regression) provide a great first step to doing linear modeling in R. They discuss the foundational idea of training and test set splitting and doing linear regression with a single predictor. The link also picks up where the previous tutorial left off. It walks the user through interpreting model coefficients and how to use and interpret diagnostics of goodness of fit.

Getting Better

Linear Regression
UC Business Analytics R Programming Guide
Sections 4 (Multiple Regression) to 6 (Additional consideration) provide next steps for those already familiar with basic line fitting and willing to take on the challenge of doing linear regression with multiple predictors.  These sections extend many of the ideas discussed in the single predictor case, such as coefficient interpretation and goodness of fit, to the multiple predictor case. Additionally, in sections 5 and 6, the guide spends a significant amount of time discussing regression with interaction terms and categorical variables! These can be very tricky and hard to interpret.

An Introduction to Statistical Learning with Applications in R
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
This textbook is pretty much the standard for teaching statistical learning, which includes linear regression. Chapter 3 gives a very good mix of the theory behind linear regression and how to use R to do linear regression. In its discussion of linear regression, it goes beyond including multiple predictors. It discusses how to include interaction terms, non-linear predictors and categorical variables. Some mathematical background will be required to use this textbook.

Stuck?

If you have a question about using R with Linear Regressions, don’t know which resource to start with, or need to learn something not covered above, remember you can always request a free consultation with our data science consultants. We’re more than happy to answer questions and point you in the right direction.