Plot coordinate scatter points and output the line of best fit, slope, y-intercept, Pearson r, and R-squared instantly. 100% local, nothing ever sent anywhere.
Scatter Plot with Line of Best Fit
Add at least 2 data points to see the regression line.
Regression Equation
y = mx + b
Slope (m)
--
Y-Intercept (b)
--
Pearson r
--
R-Squared (R2)
--
Enter Your Data
#
X
Y
Supports comma, tab, or space delimiters. Extra whitespace and blank lines are ignored. Parsing instantly populates the manual grid.
Correlation Strength Guide
Use this table to interpret your R-squared value in real-world terms.
R-Squared Range
Pearson r Range
Strength
What It Means in Practice
0.90 - 1.00
r: 0.95 - 1.00 (or -0.95 to -1.00)
Very Strong
The line fits the data almost perfectly. Rare in real-world behavioral or social data. Common in controlled physical experiments.
0.70 - 0.89
r: 0.84 - 0.94 (or -0.84 to -0.94)
Strong
Most of the variance in Y is explained by X. A reliable model for forecasting within your data range.
0.50 - 0.69
r: 0.71 - 0.83 (or -0.71 to -0.83)
Moderate
A meaningful relationship exists, but other variables also influence Y. Useful for trend direction, not precise prediction.
0.30 - 0.49
r: 0.55 - 0.70 (or -0.55 to -0.70)
Weak
Some linear signal, but the model explains less than half the variance. Other factors likely dominate.
0.00 - 0.29
r: 0.00 - 0.54 (or -0.00 to -0.54)
Very Weak / None
Little to no linear relationship between X and Y. A straight-line model is not appropriate for this data.
Advertisement
Key Terms Explained
Linear Regression
A statistical method that models the relationship between two variables using a straight line. Given a set of data points, it finds the line that best predicts Y from X.
Scatter Plot
A graph that displays individual data points on a two-dimensional plane, with one variable on each axis. Reveals the shape, direction, and spread of a relationship.
Line of Best Fit
The single straight line that minimizes the total squared distance between all data points and the line itself. Also called the regression line or trend line.
Slope (m)
How steeply the line rises or falls. A slope of 3 means Y increases by 3 units for every 1-unit increase in X. A negative slope means Y decreases as X increases.
Y-Intercept (b)
The value of Y when X equals zero. It is the point where the regression line crosses the Y-axis. In many real-world datasets, the y-intercept is a baseline or starting value.
Pearson Correlation Coefficient (r)
A number from -1 to 1 measuring the strength and direction of the linear relationship. Values near 1 or -1 are strong; values near 0 indicate little linear association.
R-Squared (R2)
The proportion of variance in Y explained by the regression model. R2 = 0.80 means 80% of the variation in Y is accounted for by its linear relationship with X.
Least Squares Method
The algorithm behind Ordinary Least Squares (OLS) regression. It finds the slope and intercept that minimize the sum of the squares of the vertical residuals (errors).
Outlier
A data point that lies far from the general pattern of the other points. Outliers can have an outsized influence on the slope and intercept of the regression line.
Extrapolation
Using the regression equation to predict Y for X values outside the range of the observed data. Extrapolation is risky because the linear relationship may not hold beyond the data range.
The Complete Guide to Linear Regression and Scatter Plot Analysis
Whether you are a student running a science experiment, an analyst building a business forecast, or a researcher looking for patterns in survey data, linear regression is the most widely used tool in quantitative analysis. This guide explains the math, the interpretation, and the pitfalls, so you can go beyond plugging numbers into a formula and actually understand what the output means.
How to Use This Calculator
There are two ways to get your data in. Use the Manual Grid tab to type X and Y pairs row by row - click "Add Row" to expand the table as needed. Use the Bulk Import tab to paste a column of data directly from Excel, Google Sheets, or any plain text source. Comma, tab, and space delimiters all work. Pasting into the bulk textarea instantly populates the manual grid, so you can fine-tune individual points after importing. The scatter plot, regression line, and all statistical outputs update in real time as you type - no calculate button needed.
The Math Behind the Line
This tool uses Ordinary Least Squares (OLS), the standard method for simple linear regression. Given n data points (x1, y1), (x2, y2), ..., (xn, yn), the formulas are:
m = (n * sum(xy) - sum(x) * sum(y)) / (n * sum(x2) - sum(x)^2)
b = (sum(y) - m * sum(x)) / n
The Pearson Correlation Coefficient (r) is computed from the same building blocks, and R-squared is simply r multiplied by itself. When R-squared is 1.0, all points fall exactly on the line. When it is 0, the line explains none of the variation in Y.
Reading the Scatter Plot
Look at the plot before you trust the numbers. The scatter of the points around the regression line (in red) tells you a lot: if all the points cluster tightly around the line, R-squared will be high. If they form a curve rather than a straight line, linear regression is the wrong model for your data - no matter how high R-squared appears, a curved pattern means the linear assumption is violated. Also look for outliers: a single point far from the cluster often has more influence on the slope than a dozen well-behaved points.
Frequently Asked Questions
r is the Pearson Correlation Coefficient: a value between -1 and 1 that measures the strength and direction of the linear relationship. R-squared is r multiplied by itself, always between 0 and 1. While r tells you direction (positive or negative), R-squared tells you the proportion of variance in Y that the model explains. For example, r = 0.9 means a strong positive correlation; R-squared = 0.81 means 81% of the variation in Y is accounted for by the linear model. R-squared loses the sign, so you need r to know whether the relationship is positive or negative.
A single outlier can dramatically shift the regression line, especially in small datasets. The Ordinary Least Squares method minimizes the sum of squared residuals. Because errors are squared, a point far from the cluster contributes a disproportionately large penalty and pulls the line toward itself. In a dataset of 5 points, one extreme outlier can swing the slope by 50% or more and artificially inflate or deflate R-squared. Always inspect your scatter plot for outliers before trusting a regression equation - the visual check catches problems the numbers alone miss.
Extrapolation means predicting Y for X values outside the range of your observed data. The regression line is calibrated only to the data you have. Beyond that range, the relationship may curve, plateau, reverse direction, or behave in ways your model cannot predict. A model trained on population growth from 2000 to 2010 will not reliably forecast 2050. Extrapolated predictions are often wildly wrong, and the further you go beyond the observed X range, the wider the uncertainty becomes. Stick to interpolation (predicting within the observed range) whenever possible.
No. Correlation measures how closely two variables move together, but it cannot tell you why. A high R-squared only means the regression model explains a large share of the variance in Y using X. Both variables might be driven by a third confounding variable, or the relationship might be coincidental. For example, ice cream sales and drowning rates are both high in summer - they correlate strongly, but ice cream does not cause drowning. Establishing causation requires controlled experiments or rigorous study design, not regression statistics alone.
A negative slope means that as X increases, Y tends to decrease - the two variables move in opposite directions. For example, as outdoor temperature rises, heating costs tend to fall: a negative slope. The magnitude of the slope tells you the rate of change: a slope of -2.5 means Y drops by 2.5 units for every 1-unit increase in X. A negative slope does not imply weakness. Strength is measured by r and R-squared independently of direction. A slope of -50 with R-squared near 1.0 is an extremely strong negative relationship.