Sum of Squares: Formula - Definition - Calculation

What Is the Sum of Squares?

The Sum of squares is a statistical technique to measure how dispersed the numbers in a dataset are and their deviation from a mean data point

The Sum of squares is used in statistics for regression analysis to determine the dispersion of data points. In a regression analysis, the objective is to determine how well a data series can fit a particular function. In turn, this provides clues to help explain how the data series was generated. Ultimately, the sum of squares is a mathematical way to find the function that best fits the data. In regression analysis, it is a way to measure variance. That means how dispersed and spread out the numbers in a dataset are. The sum of squares gets its name from the way you calculate it. By summing up the squared difference between an observation and the target value.

The total sum of squares measures deviation from a mean data point. Those deviations are a combination of numbers above and below the target value. As a result, some numbers are positive, and some are negative. By definition, the simple sum of those deviations is always zero. To describe how spread out those numbers are from each other, they are first squared. Squaring the numbers makes them all positive values. Then, they are added together resulting in the sum of squares. Regression analysis estimates one variable using another. In that application, the sum of squares measures how well an estimate fits the available data.

Sum of Squares Formula

Sum of Squares = Σ(x_i + x̄)²

∑ = sum
x_i = each value in the set
x̄ = mean
x_i – x̄ = deviation

What Does the Sum of Squares Tell You?

The sum of squares measures deviation from the mean. In statistics, the mean is the average of a set of numbers. It measures the central tendency of the group. The arithmetic mean is simply calculated by summing up the values in the data set and dividing by the number of values.

For example, assume the closing prices of FaceBook Inc (FB) in the last five days were 274.01, 274.77, 273.94, 273.61, and 273.40 in US dollars. The sum of the total prices is $1369.73. As a result, the mean or average price would be $1369.73 / 5 = $273.95.

Knowing the mean of a data set is a good starting point. But, sometimes it is helpful to know how much variation there is in a set of measurements as well. How far apart the individual values are from the mean may give some insight into how accurate the values are to a particular regression model. The smaller the residual sum of squares, the better your model fits your data. The greater the residual sum of squares, the poorer the model fits the data. A value of zero means the model is a perfect fit.

Comparing Share Prices

For example, assume an analyst wants to know whether the share price of Microsoft (MSFT) moves in tandem with the price of Apple (AAPL). In this case, he can list out the set of observations for the process of both stocks for a certain period. Maybe 1, 2, or even 5 years. He can create a linear model with each of the observations or measurements recorded. If the relationship between both variables (i.e., the price of AAPL and the price of MSFT) is not a straight line, then variations in the data set need to be more closely examined.

If the line in the linear model created does not pass through all the measurements of value, then some of the variability that has been observed in the share prices is unexplained. The sum of squares is used to calculate whether a linear relationship exists between two variables. Also, if any unexplained variability is referred to as the residual sum of squares. The sum of squares is the sum of the square of variation. And, variation is defined as the spread between each value and the mean. To determine the sum of squares, the distance between each data point and the line of best fit is squared and then summed up. The line of best fit will minimize this value. (Source: investopedia.com)

How to Calculate the Sum of Squares

The measurement is called the sum of squared deviations, or the sum of squares for short. Using Facebook (FB) as an example, the sum of squares can be calculated as:

(274.01 – 273.50)² + (274.77 – 273.95)² + (273.94 – 273.95)² + (273.61 – 273.95)² + (273.40 – 273.95)²
(0.56)² + (0.82)² + (-0.01)² + (-0.34)² + (-0.55)²
Sum of Squares = 1.4042

Adding the sum of the deviations alone without squaring will result in a number equal to or close to zero. This is because the negative deviations will very closely offset the positive deviations. To get a meaningful value, the sum of deviations must be squared. Squaring any number will always result in a positive number. This is because any number multiplied by itself, whether positive or negative, is always positive.

Example of How to Use the Sum of Squares

A high sum of squares indicates that most of the values are farther away from the mean. Therefore, there is large variability in the data. A low sum of squares indicates low variability in the set of observations. In the FaceBook example above, 1.4042 shows that the variability in the stock price in the last five days is very low. Investors looking to invest in stocks with stable prices and low volatility may find FaceBook appealing. Of course, a week is not enough data to draw a meaningful conclusion. A year or 52 weeks of data would give a much more accurate picture.

NVIDIA vs APPLE

Assume you wanted to compare the stock prices of two companies — NVIDIA and Apple. They trade at nearly equal values, but they are not the same. Consider the trading week of May 4th through 8th, 2020.

	NVIDIA	Apple
5/8/2020	$312.50	$310.13
5/7/2020	$304.87	$302.92
5/6/2020	$297.79	$299.82
5/5/2020	$293.74	$296.76
5/4/2020	$291.29	$292.37
Average	$300.04	$300.40

The average closing price for the week for both stocks is nearly identical. However, NVIDIA’s stock price moved around more than the Apple stock price. This information doesn’t show up when investors only look at average values. The sum of squares gives additional information about how much movement there is in the data. In this example, NVIDIA has a sum of squares value of 300, while Apple’s is 179. The lower number for Apple indicates less dispersion and less volatility. (Source: learn.robinhood.com)

Limitations

Making an investment decision on what stock to purchase requires many more observations than the ones listed above. An analyst may have to work with years of data to gain a higher certainty of how high or low the variability is for an asset. As more data points are added to the set, the sum of squares becomes larger. Also, the values will be more spread out. The most widely used measurements of variation are the standard deviation and variance. The variance is the average of the sum of squares. This can be found by taking the sum of squares divided by the number of observations. The standard deviation is the square root of the variance.

There are two methods of regression analysis that use the sum of squares. The linear least-squares method and the non-linear least-squares method. The least-squares method refers to the fact that the regression function minimizes the sum of the variance from the actual data points. In this way, it is possible to draw a function that statistically provides the best fit for the data. A regression function can either be linear resulting in a straight line or non-linear resulting in a curving line.

Up Next: What Is Delivered-at-Place (DAP)?

DAP DAP is an international trade term where the seller is responsible for all costs of shipment from the origin to a specified destination.

Delivered-at-place is an international logistics term. It describes a deal where a seller agrees to bear all shipping costs and potential losses when moving goods sold to a specific location. In delivered-at-place agreements, the buyer is responsible for paying import duties and any applicable taxes. This includes clearance and local taxes once the shipment has arrived at the specified destination. The term delivered-at-place (DAP) was adopted in the International Chamber of Commerce’s (ICC) eighth publication of its international commercial terms in 2010.

Under DAP Incoterms, the seller is responsible for all risks and costs from the origin to the agreed-upon delivery location. This includes proper packing and all necessary documents. When the shipment reaches the destination specified in the contract of sale, the buyer becomes responsible and assumes all further risks. The buyer is responsible for unloading costs and risks at the named destination as well as any transportation after. Additionally, the buyer is responsible for customs clearance in the importing country. This includes providing import permits, proof of payment, customs taxes, and duties.

Sum of Squares: Formula – Definition – Calculation – Examples

What Is the Sum of Squares?