Problem 2:
The dataset Education - Post 12th Standard.csv contains information on various colleges. You are expected to do a Principal Component Analysis for this case study according to the instructions given. The data dictionary of the 'Education - Post 12th Standard.csv' can be found in the following file: Data Dictionary.xlsx.
Perform Exploratory Data Analysis [both univariate and multivariate analysis to be performed]. What insight do you draw from the EDA?
Is scaling necessary for PCA in this case? Give justification and perform scaling.
Comment on the comparison between the covariance and the correlation matrices from this data [on scaled data].
Check the dataset for outliers before and after scaling. What insight do you derive here? [Please do not treat Outliers unless specifically asked to do so]
Extract the eigenvalues and eigenvectors.[Using Sklearn PCA Print Both]
Perform PCA and export the data of the Principal Component (eigenvectors) into a data frame with the original features
Write down the explicit form of the first PC (in terms of the eigenvectors. Use values with two places of decimals only). [hint: write the linear equation of PC in terms of eigenvectors and corresponding features]
Consider the cumulative values of the eigenvalues. How does it help you to decide on the optimum number of principal components? What do the eigenvectors indicate?
Explain the business implication of using the Principal Component Analysis for this case study. How may PCs help in the further analysis? [Hint: Write Interpretations of the Principal Components Obtained]
Answers
Answer:
Sol.The main purpose of univariate analysis is to describe the data, summarize and finds pattern, it doesn’t deal with causes and relationships unlike regression.1.We start with loading the dataset, checking its shape and data types of variable . shape tell us how many rows and columns we have in the data and data type tell us whether the variable is object
The main purpose of the dataset is to analyse and give the suitable information in the form of different components and go decide the optimum number of principal components. The eigen vectors indicating the direction of the dataset which is overwhelming business implication of using the principal component analysis for this case study.
PCs may help in further amalysis of the components that are used in the datadet for outliers before and after scaling the particular data set.
#SPJ3