What should be the correlation threshold value based onwhich we determine the highly collinear variables?
Answers
Answer:
Correlation is a statistical technique that can show whether and how strongly pairs of variables are related to each other. It is a scaled version of covariance and values ranges from -1 to +1. The only difference between covariance and the correlation is the covariance has no ranges, the value of the covariance will vary from 0 to ∞. In the case of correlation, the value ranges from -1 to +1. If the correlation value is -1 then we can say that this is highly negatively correlated to each other, if the correlation between two independent variables is 0 it means that is no correlation between these two independent variable and id the value of correlation between two independent variables is 1 then we can say that these two variables are highly correlated to each other. So now let us see how to calculate the correlation.
The correlation is nothing but the ration of covariance and the standard deviation so when we divide the covariance with the standard derivation will give us the correlation which is in the range of -1 to +1. Now first let us understand why do we need to go for correlation?
We go for correlation to avoid the related information in the model so let us understand the importance of correlation with the model perspective. Suppose an example- I have 10 independent variables and now I want to build a model on the top of it so I will consider the variables which are very important to develop my model so if the two variables are carrying the same information which means that suppose ex- If two variables are highly dependent on each other it means that these two variables are carrying the same information then we will consider only one variable so to evaluate the metric to evaluate the level of dependency between two independent variables we go for correlation is a very important parameter or metric in case of transport parameter or metric in case of transaction data. So generally the correlation and covariance used in banking finance and insurance companies. So now let us look at the visualization to understand the correlation better.