A quick and easy function to plot lm() results with ggplot2 in R

Posted on August 9, 2012 by susanejohnston

Updated Apr 2015:

Sometimes it's nice to quickly visualise the data that went into a simple linear regression, especially when you are performing lots of tests at once. Here is a quick and dirty solution with ggplot2 to create the following plot:

Let's try it out using the iris dataset in R:

data(iris)
head(iris)

##   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species
## 1     5.1           3.5          1.4           0.2         setosa
## 2     4.9           3.0          1.4           0.2         setosa
## 3     4.7           3.2          1.3           0.2         setosa
## 4     4.6           3.1          1.5           0.2         setosa
## 5     5.0           3.6          1.4           0.2         setosa
## 6     5.4           3.9          1.7           0.4         setosa

Create fit1, a linear regression of Sepal.Length and Petal.Width. Normally we would quickly plot the data in R base graphics:

fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
summary(fit1)

## 
## Call:
## lm(formula = Sepal.Length ~ Petal.Width, data = iris)
## 
## Residuals:
##    Min       1Q      Median     3Q       Max 
## -1.3882   -0.2936   -0.0439   0.2643   1.3452 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|) 
## (Intercept)    4.7776    0.0729    65.5   <2e-16 ***
## Petal.Width    0.8886    0.0514    17.3   <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.478 on 148 degrees of freedom
## Multiple R-squared: 0.669, Adjusted R-squared: 0.667 
## F-statistic: 299 on 1 and 148 DF, p-value: <2e-16 
##

plot(Sepal.Length ~ Petal.Width, data = iris)
abline(fit1)

This can be plotted in ggplot2 using stat_smooth(method = "lm"):


library(ggplot2)

ggplot(iris, aes(x = Petal.Width, y = Sepal.Length)) + 
  geom_point() +
  stat_smooth(method = "lm", col = "red")

However, we can create a quick function that will pull the data out of a linear regression, and return important values (R-squares, slope, intercept and P value) at the top of a nice ggplot graph with the regression line.

ggplotRegression <- function (fit) {

require(ggplot2)

ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) + 
  geom_point() +
  stat_smooth(method = "lm", col = "red") +
  labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
                     "Intercept =",signif(fit$coef[[1]],5 ),
                     " Slope =",signif(fit$coef[[2]], 5),
                     " P =",signif(summary(fit)$coef[2,4], 5)))
}

After specifying this function, all you would have to run is:

fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
ggplotRegression(fit1)

or even just

ggplotRegression(lm(Sepal.Length ~ Petal.Width, data = iris))

Disclaimer: This is not suitable for non-normal distributions or categorical variables!

39 thoughts on “A quick and easy function to plot lm() results with ggplot2 in R”

John

May 22, 2013 at 1:40 pm Reply

This is very useful. Thank you. Just one question, is there a way of putting the model descriptive inside the graph?
Safi Ullah

June 13, 2013 at 11:56 pm Reply

Extremely thankful to you Susane Johnston for this post… you made my work very easy..
May

August 14, 2014 at 9:43 pm Reply

Hi, This function has worked earlier but it seems some of the changes in the theming system in ggplot2 necessitate updating it. Can you please post the updated version? I am an R beginner and my trials in doing this have been unsuccessful.
Reference: https://github.com/wch/ggplot2/wiki/New-theme-system

Thanks a lot!
M
- Nick
  
  August 17, 2014 at 3:50 pm Reply
  
  You can fix this by simply replacing:
  opts(title =
  with:
  ggtitle(
  
  So that the last element of the function reads:
  ggtitle(paste(“Adj R2 = “,signif(summary(fit)$adj.r.squared, 5),
  “; Intercept =”,signif(fit$coef[[1]],5 ),
  “; Slope =”,signif(fit$coef[[2]], 5),
  “; P =”,signif(summary(fit)$coef[2,4], 5)))
biogeographia

March 26, 2015 at 9:12 pm Reply

Thanks for posting this Susan. Very helpful!
Hamish Innes-Brown

January 28, 2016 at 9:13 am Reply

Thanks for posting this, it’s so useful! ANy ideas on what to do if you want to have a facet on your plot, with seperate lm’s for each facet?
Phil Kalina (@pkalina)

February 4, 2016 at 9:28 pm Reply

What a help this is! I immediately adapted ggPlotRegression() for my particular needs and it worked perfectly. Until I had your code to use as an example I didn’t know how to write a function; didn’t even know I *could* write a function. Thank you very much, Susan!
sam

April 12, 2016 at 9:07 am Reply

I was looking for method to obtain residuals and do other kind of regression using ggplot, which brought me here, I learned few things about regression. But my goal is still unfulfilled, you have not mentioned anywhere, how to find residual and plot residuals using ggplot without taking using ‘lm’ command. Would you throw some light on it. While going through your post meticulously, I found that there is no need to define a function and this line // ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) // is redudant, as you have used the fit$model in place of iris, (fit$model)[2] in place of iris$Sepal.Length ……In spite of going through a cumbersome function method, we can simply write // ggplot(iris,aes( Petal.Width,Sepal.Length))+geom_point()+geom_smooth(method=lm, col=’red’)// . Please correct me, if have missed something. Thank you
- susanejohnston
  
  April 12, 2016 at 2:01 pm Reply
  
  Sam, the function is plotting based on the model object, not the data itself, that is why aes_string and the model parameters are in there. I have outlined in the post already the code to plot with the data alone. This post is not for the residuals, merely visualisation of the regression itself. A quick Google of plotting residuals in ggplot2 gives plenty of hits, for example: https://rpubs.com/therimalaya/43190
  - Sawaid Abbas
    
    July 1, 2016 at 11:56 am
    
    Dear Susan,
    thanks for illusteration, its very useful.
    Can we plot a Multiple regression plot using the above code ? if not – can you please guide me how to do that.
    Thanks.
Sachesh (@sachesh)

April 13, 2016 at 8:56 am Reply

how does it work in facets? How to work on facets?
Pingback: Validity by Linear Regression in R | Sport Statistics R Sweet!
michacampbell

July 22, 2016 at 6:37 am Reply

Very nice! Never thought about writing functions for plots – inspiring 🙂
Hakim Abdi (@HakimAbdi)

August 26, 2016 at 10:29 am Reply

Great work Susan! Any advice on how to adapt it to a multiple regression model (e.g. y~x+z+c)?
Danilo

November 2, 2016 at 7:40 am Reply

Hi, nice post and thanks for the info.
I have problem to plot dose-response curves from a create model of class “drm” from a “drc” package using “ggplot”.
do you have any tips?

Thanks,

Danilo
Benjamin Cowley

November 17, 2016 at 11:31 am Reply

This is great, thanks! It may be inappropriate statistically to use lm, but just for plotting categorical variables, I modified it thus:

ggplotRegression <- function (fit, jit=FALSE) {

require(ggplot2)

if(jit){
fit$model[,1] <- jitter(fit$model[,1])
fit$model[,2] <- jitter(fit$model[,2])
}

ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}
Emma

November 18, 2016 at 9:08 pm Reply

Hi,

What if you want to use your own defined fitted model which includes more than 1 input variable instead of the geom_smooth?
(i.e. myfit = lm(y ~ x + z + d + x:d, mydata)

Is there a patch or equivalent function that allows using “myfit” in geom_smooth instead of formula = …. ?
- susanejohnston
  
  November 18, 2016 at 9:23 pm Reply
  
  Hey, you can specify formulas in geom_smooth(), see http://docs.ggplot2.org/0.9.3.1/stat_smooth.html
HishamAlhawal

December 6, 2016 at 9:26 pm Reply

This is easy to understand and apply. Great illustration. Thanks.

What about if you have two different datasets that have differnt variables and you need all the variables to be in your regression, how would you do it?

I tired cbind function but when I run my regression, I had same names for all of the varialbles that are in my regrssion.

Thanks again
Hisham Alhawal
jcanistrum

April 10, 2017 at 8:04 pm Reply

Reblogged this on it 4 developers.
DaveP

April 30, 2017 at 5:38 pm Reply

Thanks for a useful script. One suggestion: unless you’re comparing models with different numbers of predictors, you should display R^2 rather than adjusted R^2. R^2 has a nice interpretation which adjusted R^2 does not have.
- Paz Montenegro
  
  April 24, 2019 at 3:51 pm Reply
  
  Thanks Susane! I’m a beginner at R and I was wondering Dave’s same question.
  How can the code be modified to show R2 and not adj R2?
Rosario

December 8, 2017 at 11:25 am Reply

You made my day!!!! Thanks a lot!
Miguel Rayón

April 2, 2018 at 2:52 pm Reply

Hi Susane,
Very nice example. I particularly like the way to automatically label the plot with the coefficient of the regression. I have used it in an study I am doing on F1 races. Thanks a lot!
Andrew

June 5, 2018 at 5:47 pm Reply

Nice very helpful!
serraogx

September 25, 2018 at 6:52 pm Reply

Hi. This changes values from R2 when i try to use cubic regressions.
fzslq

October 9, 2018 at 6:54 am Reply

Extremely awesome fzslq http://fzslq.ga/ for the winter! They continue to keep me vogue and toasty!
iuey

October 26, 2018 at 11:36 am Reply

Is there any way to print the results instead of putting them in the label?
Erin

December 20, 2018 at 9:24 pm Reply

Hello, is there a way to change the size of the label title? I have tried to adjust by changing the number “5”, or by putting in a “size =” but have been unsuccessful. I am also having problems with including the “slope =”, where my console warning is that the dimensions “[2,4]” are incorrect. Here is my code:
KCCPIexportcorn %>% ggplot(aes(y = SIYieldNonirr, x = OM)) +
geom_point(size = 2, color = “purple4″) + geom_smooth(method=”lm”, se = FALSE, color = “black”, size = 1) +
labs(x = ‘OM Subrule’) + labs(y = expression(bold(paste(‘Dryland Corn Yield (kg ha’^-1,’)’))) ) +
labs(title = paste(“Adj R2 =”,signif(summary(OMsubrule)$adj.r.squared,5))) +
theme_classic() +
theme(
axis.title.x = element_text(color=”black”, size=8, face=”bold”),
axis.title.y = element_text(color=”black”, size=8, face=”bold”),
axis.text = element_text(color=”black”, size = 7, face=”bold”),
axis.line = element_line(color = “black”, size = 1))
- Erin
  
  December 20, 2018 at 10:10 pm Reply
  
  Nevermind, solved! Thank you for the great post!
sarah

January 17, 2019 at 8:30 pm Reply

How could you remove the grey area around the lm regression line?

Thank you
Sarah
Em

March 7, 2019 at 12:37 pm Reply

What does the grey area represent? Does it represent anything like standard error, or…? What does it visualise?
Dong-Mei Wu

June 3, 2019 at 9:37 am Reply

Thanks. It’s very helpful.
Dong-Mei Wu

June 3, 2019 at 9:40 am Reply

Hi, nice post and thanks for the info.
I have problem to plot dose-response curves from a create model of class “drm” from a “drc” package using “ggplot”.
do you have any tips?

Thanks,
Dong-Mei
casparwynne

July 1, 2019 at 9:52 am Reply

Great tutorial! Does what I need, but can anyone tell me how to swap out the legend from geom_smooth back to geom_point?
yf569 (@yf569)

September 20, 2019 at 4:38 pm Reply

What if I need a polynomial function instead of just linear? And I have a group by variables, which means I will show several functions, each for one group?
AnonGuru

July 6, 2020 at 3:04 am Reply

Hello, and thank you for this helpful code. Why is the p-value: <2e-16 in the raw output above, but P=2.3255e-37 at the top of the graph in the final output? Thank you.
barragan1234

February 5, 2021 at 5:51 am Reply

Gracias amigo, It is really helpful
Pingback: ggplot에 회귀선 추가 - 스택플로우