Sum of Squares: Calculation, Types, and Examples

The sum of squares (SS) is a basic statistical measure of variability: it quantifies how far data points are spread around their mean or around fitted values in a regression. It’s central to variance, standard deviation, and least-squares regression.

Key takeaways

  • Sum of squares equals the sum of squared deviations from a reference value (usually the mean or fitted value).
  • Larger SS indicates greater variability; smaller SS indicates data points cluster more tightly.
  • Variance = SS divided by the number of observations (or by n−1 for sample variance). Standard deviation is the square root of the variance.
  • In regression, SS is decomposed into explained and unexplained parts (SSR and SSE).

How it works

For a dataset, the difference between each observation and a reference value (commonly the mean) is squared so negative and positive deviations do not cancel out. Summing those squared deviations yields a positive measure of total variation. Regression methods choose parameters that minimize the sum of squared residuals (least squares), producing the best-fit line or curve under that criterion.

Formula

For a dataset X = {X1, X2, …, Xn} with mean X̄:
SS (total) = Σi=1..n (Xi − X̄)²

Relationship to variance and standard deviation:
Population variance σ² = SS / n
Sample variance s² = SS / (n − 1) (when estimating population variance)
* Standard deviation = √variance

In regression the total sum of squares (SST) is decomposed:
SST = SSR + SSE
where:
SSR (regression sum of squares) = Σ (ŷi − ȳ)² — variation explained by the model
SSE (residual or error sum of squares) = Σ (yi − ŷi)² — unexplained variation (residuals)

How to calculate (step-by-step)

  1. Gather the data points.
  2. Compute the reference value (usually the mean).
  3. For each observation, subtract the reference value to get the deviation.
  4. Square each deviation.
  5. Sum the squared deviations — this is the sum of squares.

Types of sum of squares

  • Total Sum of Squares (SST): total variation of observed values around their mean.
  • Regression Sum of Squares (SSR): portion of SST explained by the regression model (variation of fitted values around the mean).
  • Residual Sum of Squares (SSE): portion of SST not explained by the model (variation of actual values around fitted values).

Interpretation:
Small SSE → model fits the data well.
Large SSR relative to SST → model explains a large share of total variation.
* R² = SSR / SST measures the proportion of variance explained by the model.

Example

Data (closing prices): 374.01, 374.77, 373.94, 373.61, 373.40
Sum = 1,869.73 → mean = 1,869.73 / 5 = 373.946

Compute deviations and squares:
* (374.01 − 373.946)² ≈ 0.0041
(374.77 − 373.946)² ≈ 0.6790
(373.94 − 373.946)² ≈ 0.0000
(373.61 − 373.946)² ≈ 0.1129
(373.40 − 373.946)² ≈ 0.2981

Sum of squares ≈ 1.094 (low), indicating low variability in these five observed prices.

Practical uses

  • Measuring variability (input to variance and standard deviation).
  • Assessing volatility in finance (e.g., comparing stability of asset prices).
  • Quantifying model fit in regression and computing R².
  • Underpinning least-squares estimation for linear and nonlinear models.

Limitations and cautions

  • SS grows with the number of observations and with scale (not directly comparable across datasets with different units or sizes unless normalized).
  • Squaring amplifies the influence of outliers.
  • SS and derived measures are based on historical data and do not guarantee future performance.
  • Interpretation in regression relies on appropriate model specification and assumptions (e.g., independent residuals, correct functional form).

Bottom line

The sum of squares is a foundational measure of variation used across statistics and regression analysis. It quantifies how much data deviate from a reference value or model, supports calculation of variance and standard deviation, and is central to least-squares model fitting. Use SS alongside robust diagnostics and appropriate context when making inferences or investment decisions.