Which of the following statements about the KNN algorithm is true?
a. KNN is expected to perform substantially worse than QDA with training data if the Bayes decision boundary is highly non-linear.
b. The KNN decision boundary is highly flexible with K=1.
c. The KNN decision boundary is highly flexible when K is large (such as K>100).
d. KNN is very useful in answering inference problems.
Which of the following actions is reasonable with a data set containing both numerical and categorical variables?
a. Perform PCA with all predictor variables.
b. Perform PCA only with the numerical predictors.
c. Do not use PCA at all.
d. Do PCA with numerical predictor variables and PCR with categorical variables.
Which of the following uses a kernel?
a. Maximal margin classifier.
b. Support vector classifier.
c. Support vector machines.
d. All of the above
What is the main difference between bagging and random forest?
a. In random forest there is pruning of the trees
b. In bagging there is pruning of the trees.
c. In random forest normalization of the predictors is required.
d. In random forest randomization of the predictors is performed.
PCA is used for
a. Dimension reduction
b. Classification
c. Optimization of model
d. All above
Let us assume that you fit the following regression model: Y = a + bX +cZ+ e, where Y is income, X takes value 1 if the person is a female and 0 otherwise, Z is the years of education, and e is an irreducible error term. Which of the following statements is correct?
a. Plot of the fitted regression model will give two parallel lines.
b. The estimate of “a” gives the only intercept of the model, when it is plotted.
c. We cannot make a meaningful two-dimensional plot of the fitted model.
d. The parameters, “b” and “c” determine the slopes of the regression lines, when plotted.
Which of the following is a measure of collinearity?
a. Cook's distance
b. Deviance statistic
c. Variance inflation factor (VIF)
d. Leverage statistics
What will happen if we increase the cost in the support vector machine model?
a. Lower training misclassification.
b. Higher training misclassification.
c. Higher test misclassification error.
d. Increase Gamma.
In K-means clustering algorithm K stands for
a. Number of neighbors that are investigated.
b. Number of iterations.
c. Number of initial partitions.
d. Number of clusters.
n a logistic regression, you got the coefficient estimate of 0.025 associated with the only independent variable, X. Which of the following interpretations are correct?
a. For one unit change in X, the response change by 0.025 unit.
b. For one unit change in X, the response change by 0.025 standard deviation of X.
c. For 1% change in X, the odds of success changes by 2.5% (approx.).
d. For 1 unit change of X, the odds of success changes by 2.5% (approx.).
Which of the following statements about the “glm” function in R is correct?
a. It cannot fit a simple linear regression model.
b. It uses ordinary least square method for model fitting .
c. This function is recommended for multi class prediction problems.
d. It uses maximum likelihood method for model fitting and inference.
If the Bayes decision boundary is non-linear, which of the following methods is expected to produce better test data classification accuracy, compared to the other alternative(s), below?
a. Linear discriminant analysis (LDA)
b. Quadratic discriminant analysis (QDA)
c. Logistic regression model
d. K-means cluster
Which of the following statements is correct?
a. We need to normalize all the features before fitting a "gam" model.
b. The “glm” in R function is not useful for fitting a cubic spline model.
c. The “gam” function (in "gam" library) cannot fit a logistic regression model.
d. The “glm” function is not useful for fitting a smoothing spline.
Find the odd man out.
a. Ridge regression
b. Lasso
c. Partial least square
d. Principal component analysis
Which of the following statements about ridge regression is correct?
a. It is useful under collinearity problem.
b. it is useful when irreducible error terms have unequal variances.
c. It is the same as lasso when its tuning (or shrinkage) parameter is set to 0.
d. It produces unbiased estimate of the model parameters.
In which of the following cases standardization (to mean 0, and unit sd) of the features is recommended?
a. Linear regression analysis.
b. Smoothing spline.
c. Lasso for variable selection.
d. Random forest model.
Answers
Answered by
27
Answer:
c. The KNN decision boundary is highly flexible when K is large (such as K>100).
c. Do not use PCA at all.
b. Support vector classifier.
Answered by
26
Answer:
The knn is expected to perform substantially
Do not use PCA at all.
b. Support vector classifier.
Similar questions
Math,
21 days ago
Computer Science,
21 days ago
Math,
1 month ago
Math,
9 months ago
French,
9 months ago