A quick and easy function to plot lm() results with ggplot2 in R

Updated Apr 2015:

Sometimes it's nice to quickly visualise the data that went into a simple linear regression, especially when you are performing lots of tests at once. Here is a quick and dirty solution with ggplot2 to create the following plot:

Let's try it out using the iris dataset in R:

data(iris)
head(iris)
##   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species
## 1     5.1           3.5          1.4           0.2         setosa
## 2     4.9           3.0          1.4           0.2         setosa
## 3     4.7           3.2          1.3           0.2         setosa
## 4     4.6           3.1          1.5           0.2         setosa
## 5     5.0           3.6          1.4           0.2         setosa
## 6     5.4           3.9          1.7           0.4         setosa

Create fit1, a linear regression of Sepal.Length and Petal.Width. Normally we would quickly plot the data in R base graphics:

fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
summary(fit1)
## 
## Call:
## lm(formula = Sepal.Length ~ Petal.Width, data = iris)
## 
## Residuals:
##    Min       1Q      Median     3Q       Max 
## -1.3882   -0.2936   -0.0439   0.2643   1.3452 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|) 
## (Intercept)    4.7776    0.0729    65.5   <2e-16 ***
## Petal.Width    0.8886    0.0514    17.3   <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.478 on 148 degrees of freedom
## Multiple R-squared: 0.669, Adjusted R-squared: 0.667 
## F-statistic: 299 on 1 and 148 DF, p-value: <2e-16 
## 
plot(Sepal.Length ~ Petal.Width, data = iris)
abline(fit1)

This can be plotted in ggplot2 using stat_smooth(method = "lm"):


library(ggplot2)

ggplot(iris, aes(x = Petal.Width, y = Sepal.Length)) + 
  geom_point() +
  stat_smooth(method = "lm", col = "red")

However, we can create a quick function that will pull the data out of a linear regression, and return important values (R-squares, slope, intercept and P value) at the top of a nice ggplot graph with the regression line.

ggplotRegression <- function (fit) {

require(ggplot2)

ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) + 
  geom_point() +
  stat_smooth(method = "lm", col = "red") +
  labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
                     "Intercept =",signif(fit$coef[[1]],5 ),
                     " Slope =",signif(fit$coef[[2]], 5),
                     " P =",signif(summary(fit)$coef[2,4], 5)))
}

After specifying this function, all you would have to run is:

fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
ggplotRegression(fit1)

or even just

ggplotRegression(lm(Sepal.Length ~ Petal.Width, data = iris))

Disclaimer: This is not suitable for non-normal distributions or categorical variables!


Comments

39 responses to “A quick and easy function to plot lm() results with ggplot2 in R”

  1. This is very useful. Thank you. Just one question, is there a way of putting the model descriptive inside the graph?

  2. Extremely thankful to you Susane Johnston for this post… you made my work very easy..

  3. Hi, This function has worked earlier but it seems some of the changes in the theming system in ggplot2 necessitate updating it. Can you please post the updated version? I am an R beginner and my trials in doing this have been unsuccessful.
    Reference: https://github.com/wch/ggplot2/wiki/New-theme-system

    Thanks a lot!
    M

    1. You can fix this by simply replacing:
      opts(title =
      with:
      ggtitle(

      So that the last element of the function reads:
      ggtitle(paste(“Adj R2 = “,signif(summary(fit)$adj.r.squared, 5),
      “; Intercept =”,signif(fit$coef[[1]],5 ),
      “; Slope =”,signif(fit$coef[[2]], 5),
      “; P =”,signif(summary(fit)$coef[2,4], 5)))

  4. Thanks for posting this Susan. Very helpful!

  5. Thanks for posting this, it’s so useful! ANy ideas on what to do if you want to have a facet on your plot, with seperate lm’s for each facet?

  6. What a help this is! I immediately adapted ggPlotRegression() for my particular needs and it worked perfectly. Until I had your code to use as an example I didn’t know how to write a function; didn’t even know I *could* write a function. Thank you very much, Susan!

  7. I was looking for method to obtain residuals and do other kind of regression using ggplot, which brought me here, I learned few things about regression. But my goal is still unfulfilled, you have not mentioned anywhere, how to find residual and plot residuals using ggplot without taking using ‘lm’ command. Would you throw some light on it. While going through your post meticulously, I found that there is no need to define a function and this line // ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) // is redudant, as you have used the fit$model in place of iris, (fit$model)[2] in place of iris$Sepal.Length ……In spite of going through a cumbersome function method, we can simply write // ggplot(iris,aes( Petal.Width,Sepal.Length))+geom_point()+geom_smooth(method=lm, col=’red’)// . Please correct me, if have missed something. Thank you

    1. Sam, the function is plotting based on the model object, not the data itself, that is why aes_string and the model parameters are in there. I have outlined in the post already the code to plot with the data alone. This post is not for the residuals, merely visualisation of the regression itself. A quick Google of plotting residuals in ggplot2 gives plenty of hits, for example: https://rpubs.com/therimalaya/43190

      1. Sawaid Abbas Avatar
        Sawaid Abbas

        Dear Susan,
        thanks for illusteration, its very useful.
        Can we plot a Multiple regression plot using the above code ? if not – can you please guide me how to do that.
        Thanks.

  8. how does it work in facets? How to work on facets?

  9. […] found a cool piece of code from another R blog that plotted the above figure with the model summary data as a title. I made a few tweaks, to […]

  10. Very nice! Never thought about writing functions for plots – inspiring 🙂

  11. Great work Susan! Any advice on how to adapt it to a multiple regression model (e.g. y~x+z+c)?

  12. Hi, nice post and thanks for the info.
    I have problem to plot dose-response curves from a create model of class “drm” from a “drc” package using “ggplot”.
    do you have any tips?

    Thanks,

    Danilo

  13. This is great, thanks! It may be inappropriate statistically to use lm, but just for plotting categorical variables, I modified it thus:

    ggplotRegression <- function (fit, jit=FALSE) {

    require(ggplot2)

    if(jit){
    fit$model[,1] <- jitter(fit$model[,1])
    fit$model[,2] <- jitter(fit$model[,2])
    }

    ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
    geom_point() +
    stat_smooth(method = "lm", col = "red") +
    labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
    "Intercept =",signif(fit$coef[[1]],5 ),
    " Slope =",signif(fit$coef[[2]], 5),
    " P =",signif(summary(fit)$coef[2,4], 5)))
    }

  14. Hi,

    What if you want to use your own defined fitted model which includes more than 1 input variable instead of the geom_smooth?
    (i.e. myfit = lm(y ~ x + z + d + x:d, mydata)

    Is there a patch or equivalent function that allows using “myfit” in geom_smooth instead of formula = …. ?

    1. Hey, you can specify formulas in geom_smooth(), see http://docs.ggplot2.org/0.9.3.1/stat_smooth.html

  15. HishamAlhawal Avatar
    HishamAlhawal

    This is easy to understand and apply. Great illustration. Thanks.

    What about if you have two different datasets that have differnt variables and you need all the variables to be in your regression, how would you do it?

    I tired cbind function but when I run my regression, I had same names for all of the varialbles that are in my regrssion.

    Thanks again
    Hisham Alhawal

  16. Thanks for a useful script. One suggestion: unless you’re comparing models with different numbers of predictors, you should display R^2 rather than adjusted R^2. R^2 has a nice interpretation which adjusted R^2 does not have.

    1. Paz Montenegro Avatar
      Paz Montenegro

      Thanks Susane! I’m a beginner at R and I was wondering Dave’s same question.
      How can the code be modified to show R2 and not adj R2?

  17. You made my day!!!! Thanks a lot!

  18. Hi Susane,
    Very nice example. I particularly like the way to automatically label the plot with the coefficient of the regression. I have used it in an study I am doing on F1 races. Thanks a lot!

  19. Andrew Avatar
    Andrew

    Nice very helpful!

  20. Hi. This changes values from R2 when i try to use cubic regressions.

  21. Extremely awesome fzslq http://fzslq.ga/ for the winter! They continue to keep me vogue and toasty!

  22. Is there any way to print the results instead of putting them in the label?

  23. Hello, is there a way to change the size of the label title? I have tried to adjust by changing the number “5”, or by putting in a “size =” but have been unsuccessful. I am also having problems with including the “slope =”, where my console warning is that the dimensions “[2,4]” are incorrect. Here is my code:
    KCCPIexportcorn %>% ggplot(aes(y = SIYieldNonirr, x = OM)) +
    geom_point(size = 2, color = “purple4″) + geom_smooth(method=”lm”, se = FALSE, color = “black”, size = 1) +
    labs(x = ‘OM Subrule’) + labs(y = expression(bold(paste(‘Dryland Corn Yield (kg ha’^-1,’)’))) ) +
    labs(title = paste(“Adj R2 =”,signif(summary(OMsubrule)$adj.r.squared,5))) +
    theme_classic() +
    theme(
    axis.title.x = element_text(color=”black”, size=8, face=”bold”),
    axis.title.y = element_text(color=”black”, size=8, face=”bold”),
    axis.text = element_text(color=”black”, size = 7, face=”bold”),
    axis.line = element_line(color = “black”, size = 1))

    1. Nevermind, solved! Thank you for the great post!

  24. How could you remove the grey area around the lm regression line?

    Thank you
    Sarah

  25. What does the grey area represent? Does it represent anything like standard error, or…? What does it visualise?

  26. Dong-Mei Wu Avatar
    Dong-Mei Wu

    Thanks. It’s very helpful.

  27. Dong-Mei Wu Avatar
    Dong-Mei Wu

    Hi, nice post and thanks for the info.
    I have problem to plot dose-response curves from a create model of class “drm” from a “drc” package using “ggplot”.
    do you have any tips?

    Thanks,
    Dong-Mei

  28. Great tutorial! Does what I need, but can anyone tell me how to swap out the legend from geom_smooth back to geom_point?

  29. What if I need a polynomial function instead of just linear? And I have a group by variables, which means I will show several functions, each for one group?

  30. Hello, and thank you for this helpful code. Why is the p-value: <2e-16 in the raw output above, but P=2.3255e-37 at the top of the graph in the final output? Thank you.

  31. barragan1234 Avatar
    barragan1234

    Gracias amigo, It is really helpful

  32. […] 블로그 에서이 기능을 찾았습니다 […]

Leave a reply to Benjamin Cowley Cancel reply