explain liner correlation and non colinear co relation with example.
Answers
Explanation:
n other words, when all the points on the scatter diagram tend to lie near a line which looks like a straight line, the correlation is said to be linear. This is shown in the figure on the left below. Non Linear (Curvilinear) Correlation. Correlation is said to be non linear if the ratio of change is not constant.
Answer:
We know that the purpose of any supervised machine learning model is to establish a function of the predictors that best explains the response variable. For this function to be stable and to be a good and reliable estimate of the target variable, it is very important that these predictors are not correlated with each other. The first step to ensure this is Correlation Analysis.
Correlation analysis is one of those important checks that need to be performed during various stages of a project like data analysis, before and after feature transformations, feature engineering, and feature selection. Before we understand correlation in a more detailed manner, you can take a look at our machine learning related case studies here.
Understanding Correlation
Correlation is a statistical measure that indicates the extent to which two or more variables move together¹. A positive correlation indicates that the variables increase or decrease together. A negative correlation indicates that if one variable increases, the other decreases, and vice versa².
Covariance is another measure that describes the degree to which two variables tend to deviate from their means in similar ways. But covariance is not unit-less which makes it difficult to interpret anything about the relation between the variables. Hence, it is normalized by standard deviations of the variables to make it a dimensionless and unit-less measure called the correlation coefficient.
The correlation coefficient indicates the strength of the linear relationship that might be existing between two variables.
Image for postImage for post
“Correlation doesn’t imply causation”
For example, the sales of ice creams and sunglasses increase in summer. They tend to have a high positive correlation, but this doesn’t mean that buying ice cream makes people want to wear sunglasses. We have to read between the lines to understand that there might be another variable like ‘temperature’ that might be influencing both the variables similarly.
Also, when the correlation coefficient of the two variables is zero, it only indicates the absence of a ‘linear’ relationship between them and doesn’t imply that the variables are independent.
How are correlation and collinearity different?
Collinearity is a linear association between two predictors. Multicollinearity is a situation where two or more predictors are highly linearly related. In general, an absolute correlation coefficient of >0.7 among two or more predictors indicates the presence of multicollinearity.
‘Predictors’ is the point of focus here. Correlation between a ‘predictor and response’ is a good indication of better predictability. But, correlation ‘among the predictors’ is a problem to be rectified to be able to come up with a reliable model.
Diagnostics of multicollinearity
1. Prominent changes in the estimated regression coefficients by adding or deleting a predictor
2. Variance inflation factor (VIF) helps a formal detection-tolerance for multicollinearity. VIF of 5 or 10 and above (depends on the business problem) indicates a multicollinearity problem.
If there is no collinearity, the VIF should be:
A VIF of 10 means that the variance of the coefficient of the predictor is 10 times more than what it should be if there’s no collinearity.
VIF = 1/(1-R2) where R2 is the coefficient of Determination which indicates the extent to which a predictor can explain the change in the response variable
3. The correlation matrix of predictors, as mentioned above, may indicate the presence of multicollinearity. Though correlation talks about bivariate linear relationship whereas multicollinearity are multivariate, if not always, correlation matrix can be a good indicator of multicollinearity and indicate the need for further investigation