In regression analysis, you'd like your regression model to have significant variables and to produce a high R-squared value. This low P value / high R2 combination indicates that changes in the predictors are related to changes in the response variable and that your model explains a lot of the response variability.
This combination seems to go together naturally. But what if your regression model has significant variables but explains little of the variability? It has low P values and a low R-squared.
At first glance, this combination doesn’t make sense. Are the significant predictors still meaningful?
It’s difficult to understand the situation of regression using numbers alone. Research shows that graphs are essential to correctly interpret regression analysis results. Comprehension is easier when you can see what is happening!
Further, if you enter the same value for Input into both equations, you’ll calculate nearly equivalent predicted values for Output. For instance, an Input of 10 yields a predicted Output of 66.2 for one model and 64.8 for the other model.
The variability of the data around the two regression lines is drastically different. R2 and S (standard error of the regression) numerically describe this variability.
A low R-squared graph shows that even noisy, high-variability data can have a significant trend. The trend indicates that the predictor variable still provides information about the response even though data points fall further from the regression line.
To assess the precision, we’ll examine the prediction interval. A prediction interval is a range that is likely to contain the response value of a single new observation given specified settings of the predictors in your model. Narrower intervals indicate more precise predictions.
The model with the high variability data produces a prediction interval that extends from about -500 to 630, over 1100 units! Meanwhile, the low variability model has a prediction interval from -30 to 160, about 200 units. Clearly, the predictions are much more precise from the high R-squared model, even though the fitted values are nearly the same!
The difference in precision should make sense after seeing the variability present in the actual data. When the data points are spread out further, the predictions must reflect that added uncertainty.
The coefficients estimate the trends while R-squared represents the scatter around the regression line.
The interpretations of the significant variables are the same for both high and low R-squared models.
Low R-squared values are problematic when you need precise predictions.
So, what’s to be done if you have significant predictors but a low R-squared value? I can hear some of you saying, "add more variables to the model!"
In some cases, it’s possible that additional predictors can increase the true explanatory power of the model. However, in other cases, the data contain an inherently higher amount of unexplainable variability. For example, many psychology studies have R-squared values are less that 50% because people are fairly unpredictable.
The good news is that even when R-squared is low, low P values still indicate a real relationship between the significant predictors and the response variable.
ButyGlobal is an academic research and project management firm that is duly registered under the CAC. The firm is under the leadership of Mr Adetayo Olaniyi ADENIRAN. Contact us on: Phone/Whatsap +2347036196773 Gmail: firstname.lastname@example.org