Historical background.
Francis Galton coined "regression" in the 1880s studying how children's heights regress toward the population mean.
Karl Pearson formalised it and defined r as the correlation coefficient.
R² fell out naturally: once you have r, squaring it tells you how much of the variation
in one variable is accounted for by the other. It became the standard goodness-of-fit
measure for linear regression — so standard that most people encounter R² before they
ever think carefully about r.
What R² literally is.
R² is exactly r squared — r².
More precisely: it is the square of the Pearson correlation between the
predicted values (ŷ, from your regression line) and the observed values (y).
In simple linear regression those two definitions are identical.
Why squaring gives a clean interpretation.
r can be negative (−1 to +1). Squaring it does two things:
it removes the sign (direction no longer matters — only strength does),
and it maps everything to [0, 1].
That [0, 1] range has a precise meaning: it is the
fraction of the total variance in Y that your model explains.
The rest — (1 − R²) — is variance left unexplained, driven by noise or variables you didn't measure.