Skip to main content

Resource Guide: Linear Regression

Last Updated: January 2024

Linear regression is a popular, simple, and flexible technique to model phenomena in a variety of fields. Linear regression predicts response variables from explanatory variables and provides a means of interpreting their association. Linear regression falls under a special class of statistical techniques called generalized linear models (GLMs). Ordinary least squares (OLS) regression is one type of linear regression model, but there are also others. 

This guide focuses on resources for learning how to implement linear regression in R and Python, but it is also a part of every data analysis program (even Excel!).  Linear regression is used in both statistical analysis, where you are likely interested in estimating the relationships between variables and testing hypotheses about coefficients in the model, and machine learning, where the primary focus is on building a model that will correctly predict an outcome given new input data.  The model is the same in both cases, but the approaches differ in terms of how the models are used, evaluated, and optimized.

Getting Started 

Data Analysis Examples
UCLA Statistical Consulting Center
This site provides examples for implementing and interpreting multiple types of regression analysis in Stata, SPSS, MPlus, SAS, and R. Always a good place to start when you have questions on implementing a statistical model in one of these programs.

R 

Linear Regression and ANOVA
Princeton Library Guides
Simple and brief overview of linear regression using base R (just standard libraries).  

Linear Regression
UC Business Analytics R Programming Guide
Somewhat more detailed and comprehensive introduction to linear regression in R. This tutorial adopts a supervised machine learning approach, for example, discussing how to split the data into training and test sets. The code uses the tidyverse packages, which some people may appreciate. 

Linear Regression and ANOVA
James D. Long and Paul Teetor, R Cookbook
Comprehensive introduction to linear modeling in R. This tutorial discusses linear regression alongside ANOVA, which can be helpful for people with a background using the later. The tutorial also points to many other helpful resources. 

Python 

Linear Regression in Python using Statsmodels
GeeksforGeeks
Brief and simple introduction to linear regression using the Statsmodels package, including how to install the package, load and visualize the data, and fit and understand models. More generally, GeekforGeeks tutorials tend to be good. 

Linear Regression in Python
Real Python
Comprehensive, yet approachable introduction to linear regression and how to implement it using Python. This tutorial shows the implementation with two popular packages: Scikit-Learn and Statsmodels. RealPython tutorials are typically very good resources (and more comprehensive than GeeksforGeeks). 

Linear Regression
Ott Toomet, Machine Learning in Python
Explanation of linear regression from a machine learning perspective. Section 10.2 of the book discusses how to implement linear regression using both the statsmodels and scikit-learn packages. 

Getting Better

An Introduction to Statistical Learning
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
This textbook is a well-known and approachable introduction to statistical/machine learning, which includes linear regression. Traditionally, the book provided code on how to implement linear regression in R, but now it includes both R and Python. 

Data Analysis Using Regression and Multilevel/Hierarchical Models
Andrew Gelman and Jennifer Hill
Although focused on hierarchical models, chapters 3 and 4 discuss linear regression, providing a great overview from a statistical perspective. 

Applied Linear Statistical Models
Michael H. Kutner, Christopher J. Nachsheim, John Neter, and William Li
Very comprehensive and detailed book about linear models, including linear regression (parts one and two). In contrast to the previous one, this book adopts a more traditional statistical perspective. This book is a great resource for people who want to learn regression in more depth.