What Is Stepwise Regression?
Stepwise regression is a statistical technique for regressing several variables while concurrently deleting those that aren’t significant. The process uses a step-by-step iterative construction of a regression model. Independent variables are selected for use in a final model through a series of automated steps. At every step, the candidate variables are evaluated, adding or removing potential explanatory variables in succession. The stepwise process adds the most significant variable or subtracts the least significant variable. It does not consider all alternative models, and after the procedure is finished, it returns a single regression model.
The selection process typically uses the t statistics for the coefficients of the variables being considered at each iteration. The availability of statistical software packages makes stepwise regression possible, even in models with hundreds of variables. The t statistic is a computation used during a t-test to evaluate whether your null hypothesis should be rejected. A t-test can be used to evaluate whether your population differs from some value of interest or whether two samples come from separate populations.
Types of Stepwise Regression
Stepwise regression aims to identify a group of independent variables that significantly influence the dependent variable. This is done through a series of statistical tests, for example, F-tests and t-tests. Repetitive iteration is used with powerful computers to do this. Iteration is the technique of arriving at conclusions or judgments by going through multiple rounds or cycles of analysis, Testing automatically with the use of statistical software programs has the advantage of reducing time and limiting errors.
Stepwise regression can be accomplished by testing one independent variable at a time. Then, include it in the regression model if it is statistically significant. Or, by testing all potential independent variables and discarding those that are not statistically significant. Some combine both methods, therefore there are three potential approaches to stepwise regression.
Forward selection
A forward-selection rule begins with no explanatory factors. It then adds variables one at a time, based on whatever variable is the most statistically significant. This repeats until no statistically significant variables remain. In other words, the process starts with no variables in the model. Each variable is tested as it is added to the model. The most statistically significant variables are kept while the least significant ones are discarded. This process is repeated until the results are optimal.
Backward elimination
A backward-elimination rule begins with all possible explanatory factors. Then, one by one, the least statistically significant variables are discarded. When each variable left in the equation is statistically significant, the discarding process comes to an end. Backward elimination is difficult when there are a high number of candidate variables. Moreover, it is impossible when the number of candidate variables exceeds the number of observations. In other words, it begins with a set of independent variables. Then, it removes one at a time, testing to see if the deleted variable is statistically significant.
Bidirectional elimination
Forward selection and backward elimination are combined in a bi-directional stepwise approach. The technique, like forward selection, starts with no variables and adds variables based on a pre-specified criterion. The catch is that the technique evaluates the statistical repercussions of deleting previously included variables at each step. As an example, a variable could be inserted in Step 3, dropped in Step 6, and then added again in Step 8. It is a hybrid of the first two strategies for determining which variables should be included or eliminated.
Example
A stepwise regression procedure was used to build a regression model for understanding and identifying customer behavior. Specifically, the factors that influence the propensity to leave the service provided by a cable television company. An attempt was made to examine and understand watch time, overall usage, monthly bill, household income, years of service, family size, and alternative programming options. These variables were assembled as a stepwise regression using the backward elimination method. The model included all of the variables, which were subsequently removed one by one to determine which were statistically insignificant.
A significant relationship was found between the propensity to leave the cable service and with the following variables. The model demonstrates that the monthly bill size, household income, and family size are the most important predictive variables. Years using the service, watch time, and overall usage were less significant factors. The most significant interaction was between the household income level and the monthly bill. An increased bill combined with lower income increased the likelihood to discontinue service.
Stepwise Regression – Advantages
Advantages include:
- Large data pools – The ability to manage a vast number of potential predictor variables while fine-tuning the model to select the best predictor variables from among the available alternatives.
- Speed – It’s faster than other automatic model-selection methods.
- Ranking – Observing the order in which variables are removed or introduced might provide useful information about the predictor variables’ quality.
Stepwise Regression – Limitations
Today, regression analysis, both linear and multivariate, is frequently employed in economics and investing. The goal is frequently to identify patterns that existed in the past and may reoccur in the future. For example, a basic linear regression may examine price-to-earnings ratios and stock returns over time. The objective is to see if stocks with low P/E ratios (independent variable) give higher returns (dependent variable). The problem with this strategy is that market conditions frequently change. Therefore, relationships that were valid in the past may no longer be valid in the present or future.
Meanwhile, the stepwise regression approach has many detractors. According to statisticians, the approach has various problems including inaccurate results and an inherent bias in the process. Moreover, the need for a large processing capacity to create sophisticated regression models through iteration.
Up Next: What Is Quality of Earnings?
Quality of earnings refers to the portion of income realized from a company’s core operations that generate sustainable free cash flow. The quality of a corporation’s earnings is revealed by excluding data that may affect the true bottom-line performance. For example, abnormalities, accounting deception, or one-time events. When these are removed, the earnings gained from more sales or cheaper costs are plainly visible. Even circumstances outside of the organization can influence an assessment of the quality of earnings. For example, during periods of excessive inflation, the profit quality of many companies declines. This is because their sales figures are also overstated.
Earnings that are calculated conservatively are often seen as more dependable than earnings calculated using aggressive accounting techniques. Fringe accounting methods that conceal low sales or increased business risk can undermine earnings quality. Fortunately, broadly acknowledged accounting rules exist (GAAP). The more closely a corporation adheres to those guidelines, the higher the quality of its earnings will be. Several big financial scandals, such as Enron and Worldcom, were extreme cases of low earnings quality that misled investors.