Discover how to conduct linear regression analysis in Google Sheets, interpret your findings, and explore advanced techniques for examining variable relationships.
Linear regression allows you to analyze the relationship between a predictor (i.e., independent variable) and an outcome (i.e., dependent variable). By analyzing how changes in your independent variables affect the dependent variable, you can identify trends, forecast values, and make data-informed decisions. For example, you could explore how increasing your advertising budget impacts predicted revenue or how the duration of a marketing campaign influences customer acquisition rates.
Learn how to perform Google Sheets linear regression, including advanced techniques and related concepts to explore.
Linear regression is a form of regression analysis that models the relationship between a dependent variable and one or more independent variables. This analysis helps you estimate the association between variables and reveals patterns suggesting how adjusting controllable factors could influence an outcome.
In its simplest form, linear regression fits a straight line (known as the regression line) through your data points to help you predict the value of a dependent variable based on your independent variables.
In general, linear regression is an analytical technique used to determine the relationship between two types of variables: independent and dependent variables. It is also suitable for assessing the predicted value of a dependent variable based on a set value of the independent variable, such as estimating the expected sales count at a certain product price.
For linear regression to be an appropriate option, your data must meet several assumptions. In addition to proper data collection and sample selection, linear regression requires that:
The relationship between your variables is linear.
Your outcome variable (dependent variable) is continuous.
The observations are independent.
The distribution of errors is equal around the fitted line (homoscedasticity of errors).
Google Sheets offers a convenient and collaborative platform for you and your team to store and analyze data. When getting started, performing a simple linear regression (with one dependent and one independent variable) can help you gain confidence in building your functions.
To perform a simple linear regression, follow these steps:
To fit a regression line, create separate columns for your independent and dependent variables. Ensure your columns contain the same number of data points and have no missing values.
While not essential for creating your regression line, a scatter plot helps you visualize the relationship between your independent and dependent variables, allowing you to validate the data distribution suggests a linear trend.
To create one in Google Sheets, highlight your data range, select the Insert menu option, and choose Chart. Once the Chart Editor appears, you can select the Scatter chart type.
Once you’re confident your data suits this type of analysis, you can use the LINEST function to create your regression line. Enter the LINEST equation in the blank cell where you want the regression statistics to appear. The raw syntax for the LINEST function is
LINEST(known_data_y, [known_data_x], [calculate_b], [verbose])
In this equation:
known_data_y is a one or two-dimensional array that includes your dependent variable measures.
known_data_x includes the values of independent variable(s) corresponding to the known_data_y measures.
calculate_b determines whether “b” is your intercept, assuming a linear equation of y = mx + b, and should be set to TRUE in most cases.
verbose specifies whether to return additional regression statistics (set to TRUE or FALSE).
For example, if you record your independent variable in cells A1:A10 and your dependent variable is in cells B1:B10, your equation would be:
=LINEST(B1:B10, A1:A10, TRUE, TRUE)
Using LINEST, you will receive key regression statistics, including coefficients (slope) for each variable and the y-intercept. Additionally, with verbose set to TRUE, you will obtain the standard errors of each variable, the coefficient of determination, the F-statistic, the degrees of freedom, the sum of squares attributed to the regression, and the residual sum of squares, among other options.
Correctly interpreting your results allows you to make informed decisions within your organization. The two main measures to interpret are the slope and intercept. The slope represents how much you can expect your dependent variable to change for each one-unit change in your independent variable. The intercept indicates the value of the dependent variable when the independent variable is set to zero.
For example, if you are examining projected sales based on the duration of your advertising campaign (in weeks), and you have a slope coefficient of 1,000 and an intercept of 15,000, this means:
If your advertising campaign lasts zero weeks, you would expect 15,000 sales.
For each additional week of the advertising campaign, you can anticipate an increase of 1,000 sales.
Once you’re confident in building a simple linear regression model, you can expand to other regression analyses, such as multiple linear regression, and interpret more advanced statistics, including model fit.
In multiple linear regression, you use two or more independent variables to predict your dependent variable. This is useful when you have outcomes influenced by several factors. For example, you might analyze how both your advertising budget and campaign duration affect total sales.
You can use the LINEST function to perform multiple regression by specifying more than one range for your independent variables. Make sure you have a separate column for each variable to avoid confusion.
If your independent variables were in cells A1:A10 and B1:B10, and your dependent variable was in cells C1:C10, your equation would be:
=LINEST(C1:C10, A1:B10, TRUE, TRUE)
In terms of interpretation, assume you were looking at projected sales based on your advertising campaign duration (in weeks) and marketing budget (in $1,000 increments). If you returned an advertising campaign slope coefficient of 2,000, a marketing budget coefficient of 3,000, and an intercept of 15,000, this would tell you:
When both campaign duration (zero weeks) and marketing budget are zero (an unlikely scenario), you can expect 15,000 sales.
For each additional $1,000 you spend on marketing, sales can increase by $3,000, assuming your campaign duration stays the same.
Sales can increase by $2,000 for each additional week of your campaign, assuming your marketing budget stays the same.
After building your regression model, you can evaluate how well it explains the relationship between your variables. An effective way to assess model fit is to look at the R-squared (coefficient of determination) value returned by LINEST, which represents the proportion of variance in the dependent variable explained by the independent variables.
This value ranges from zero to one. For example, an R-squared of 0.85 means that your independent variables (e.g., campaign duration and marketing budget) account for 85 percent of the variation of your outcome (e.g., sales).
In addition to linear regression, you can explore other variable relationships and predictions. Some to explore include:
If one variable doesn’t necessarily predict the other, but you still want to know how they relate, you can instead find the correlation. For example, liking fries might not predict whether someone likes burgers, but you can see how the two food preferences correlate.
In Google Sheets, the function for this is:
=CORREL(data_y, data_x)
You’ll return a value between negative one and one, with zero representing no correlation, and negative one or one representing a perfect negative or positive relationship, respectively.
If you have partial data and still want to predict future values of your dependent variable, you can use the TREND function. For example, if you tracked monthly sales data for half the year, you can use TREND to forecast sales for the remaining months. You could do this with:
=TREND(known_ys, known_xs, new_xs)
known_ys: your dependent variable values (e.g., sales)
known_xs: your independent variable values (e.g., months one to six)
new_xs: the values for which you want predictions (e.g., months seven to 12)
When using Google Sheets for linear regression, it’s important to keep several common issues and limitations in mind to avoid common pitfalls. Keep the following tips in mind.
If your R-squared value is low: This may indicate a weak relationship between your variables or that a linear model isn’t the best fit. However, the R-squared value alone doesn’t determine the quality of your model; it's essential to carefully evaluate how to proceed. You can often improve your R-squared by selecting features thoughtfully and checking for multicollinearity, which occurs when at least two of your independent variables are highly correlated.
If your data points do not align with your trendline: This may indicate that the relationship isn’t truly linear. In such cases, explore other types of analysis that may more appropriate
If your LINEST function is returning errors: You may encounter a LINEST function error due to mismatched data ranges, missing cells, or non-numeric values. Carefully check that your data is clear, correctly formatted, and free of missing values.
To perform a linear regression in Google Sheets, follow these steps:
Prepare your data set and ensure correct formatting.
Check assumptions and linearity with diagnostic plots.
Write your LINEST function.
Interpret your results.
Linear regression in Google Sheets allows you to assess how your independent variables predict the movement of your dependent variable, helping you make informed predictions and business decisions. You can explore more analytic techniques to maximize your data with the IBM Data Analytics with Excel and R Professional Certificate. In this Professional Certificate program, you’ll have the opportunity to perform data analysis, including data preparation, statistical analysis, and predictive modeling using R, R Studio, and Jupyter.
Editorial Team
Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.