A quick and easy function to plot lm() results with ggplot2 in R

Updated Apr 2015:

Sometimes it's nice to quickly visualise the data that went into a simple linear regression, especially when you are performing lots of tests at once. Here is a quick and dirty solution with ggplot2 to create the following plot:

Let's try it out using the iris dataset in R:

data(iris)
head(iris)
##   Sepal.Length  Sepal.Width  Petal.Length  Petal.Width    Species
## 1     5.1           3.5          1.4           0.2         setosa
## 2     4.9           3.0          1.4           0.2         setosa
## 3     4.7           3.2          1.3           0.2         setosa
## 4     4.6           3.1          1.5           0.2         setosa
## 5     5.0           3.6          1.4           0.2         setosa
## 6     5.4           3.9          1.7           0.4         setosa

Create fit1, a linear regression of Sepal.Length and Petal.Width. Normally we would quickly plot the data in R base graphics:

fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
summary(fit1)
## 
## Call:
## lm(formula = Sepal.Length ~ Petal.Width, data = iris)
## 
## Residuals:
##    Min       1Q      Median     3Q       Max 
## -1.3882   -0.2936   -0.0439   0.2643   1.3452 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|) 
## (Intercept)    4.7776    0.0729    65.5   <2e-16 ***
## Petal.Width    0.8886    0.0514    17.3   <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 0.478 on 148 degrees of freedom
## Multiple R-squared: 0.669, Adjusted R-squared: 0.667 
## F-statistic: 299 on 1 and 148 DF, p-value: <2e-16 
## 
plot(Sepal.Length ~ Petal.Width, data = iris)
abline(fit1)

This can be plotted in ggplot2 using stat_smooth(method = "lm"):


library(ggplot2)

ggplot(iris, aes(x = Petal.Width, y = Sepal.Length)) + 
  geom_point() +
  stat_smooth(method = "lm", col = "red")

However, we can create a quick function that will pull the data out of a linear regression, and return important values (R-squares, slope, intercept and P value) at the top of a nice ggplot graph with the regression line.

ggplotRegression <- function (fit) {

require(ggplot2)

ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) + 
  geom_point() +
  stat_smooth(method = "lm", col = "red") +
  labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
                     "Intercept =",signif(fit$coef[[1]],5 ),
                     " Slope =",signif(fit$coef[[2]], 5),
                     " P =",signif(summary(fit)$coef[2,4], 5)))
}

After specifying this function, all you would have to run is:

fit1 <- lm(Sepal.Length ~ Petal.Width, data = iris)
ggplotRegression(fit1)

or even just

ggplotRegression(lm(Sepal.Length ~ Petal.Width, data = iris))

Disclaimer: This is not suitable for non-normal distributions or categorical variables!

21 thoughts on “A quick and easy function to plot lm() results with ggplot2 in R

  1. This is very useful. Thank you. Just one question, is there a way of putting the model descriptive inside the graph?

    • You can fix this by simply replacing:
      opts(title =
      with:
      ggtitle(

      So that the last element of the function reads:
      ggtitle(paste(“Adj R2 = “,signif(summary(fit)$adj.r.squared, 5),
      “; Intercept =”,signif(fit$coef[[1]],5 ),
      “; Slope =”,signif(fit$coef[[2]], 5),
      “; P =”,signif(summary(fit)$coef[2,4], 5)))

  2. What a help this is! I immediately adapted ggPlotRegression() for my particular needs and it worked perfectly. Until I had your code to use as an example I didn’t know how to write a function; didn’t even know I *could* write a function. Thank you very much, Susan!

  3. I was looking for method to obtain residuals and do other kind of regression using ggplot, which brought me here, I learned few things about regression. But my goal is still unfulfilled, you have not mentioned anywhere, how to find residual and plot residuals using ggplot without taking using ‘lm’ command. Would you throw some light on it. While going through your post meticulously, I found that there is no need to define a function and this line // ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) // is redudant, as you have used the fit$model in place of iris, (fit$model)[2] in place of iris$Sepal.Length ……In spite of going through a cumbersome function method, we can simply write // ggplot(iris,aes( Petal.Width,Sepal.Length))+geom_point()+geom_smooth(method=lm, col=’red’)// . Please correct me, if have missed something. Thank you

    • Sam, the function is plotting based on the model object, not the data itself, that is why aes_string and the model parameters are in there. I have outlined in the post already the code to plot with the data alone. This post is not for the residuals, merely visualisation of the regression itself. A quick Google of plotting residuals in ggplot2 gives plenty of hits, for example: https://rpubs.com/therimalaya/43190

      • Dear Susan,
        thanks for illusteration, its very useful.
        Can we plot a Multiple regression plot using the above code ? if not – can you please guide me how to do that.
        Thanks.

  4. Pingback: Validity by Linear Regression in R | Sport Statistics R Sweet!

  5. Hi, nice post and thanks for the info.
    I have problem to plot dose-response curves from a create model of class “drm” from a “drc” package using “ggplot”.
    do you have any tips?

    Thanks,

    Danilo

  6. This is great, thanks! It may be inappropriate statistically to use lm, but just for plotting categorical variables, I modified it thus:

    ggplotRegression <- function (fit, jit=FALSE) {

    require(ggplot2)

    if(jit){
    fit$model[,1] <- jitter(fit$model[,1])
    fit$model[,2] <- jitter(fit$model[,2])
    }

    ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
    geom_point() +
    stat_smooth(method = "lm", col = "red") +
    labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
    "Intercept =",signif(fit$coef[[1]],5 ),
    " Slope =",signif(fit$coef[[2]], 5),
    " P =",signif(summary(fit)$coef[2,4], 5)))
    }

  7. Hi,

    What if you want to use your own defined fitted model which includes more than 1 input variable instead of the geom_smooth?
    (i.e. myfit = lm(y ~ x + z + d + x:d, mydata)

    Is there a patch or equivalent function that allows using “myfit” in geom_smooth instead of formula = …. ?

  8. This is easy to understand and apply. Great illustration. Thanks.

    What about if you have two different datasets that have differnt variables and you need all the variables to be in your regression, how would you do it?

    I tired cbind function but when I run my regression, I had same names for all of the varialbles that are in my regrssion.

    Thanks again
    Hisham Alhawal

  9. Thanks for a useful script. One suggestion: unless you’re comparing models with different numbers of predictors, you should display R^2 rather than adjusted R^2. R^2 has a nice interpretation which adjusted R^2 does not have.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s