Ad Code

50 Multiple choice Questions (MCQ) on Regression analysis

🔵 1. What is the primary purpose of regression analysis?

A) To test for randomness
B) To measure the relationship between two variables
C) To predict the value of a dependent variable based on one or more independent variables
D) To group variables into clusters
Answer: C
Rationale: Regression predicts outcomes and examines relationships between dependent and independent variables.


🔵 2. In a simple linear regression, how many independent variables are there?

A) None
B) One
C) Two
D) Multiple
Answer: B
Rationale: Simple regression involves one independent and one dependent variable.


🔵 3. What does the regression coefficient represent in linear regression?

A) The strength of correlation
B) The average change in the dependent variable due to one-unit change in the independent variable
C) Random error
D) Sample mean
Answer: B
Rationale: It indicates the slope of the line, showing how the dependent variable changes.


🔵 4. What does a high R² value indicate in a regression model?

A) Weak correlation
B) Strong variability in data
C) A large amount of variance in the dependent variable is explained by the model
D) No relationship
Answer: C
Rationale: R² measures the proportion of variance in the dependent variable explained by the model.


🔵 5. Which assumption is NOT part of classical linear regression?

A) Homoscedasticity
B) Linearity
C) Multicollinearity
D) Normality of residuals
Answer: C
Rationale: Multicollinearity is a violation, not an assumption, in regression models.


🔵 6. Which of the following is a measure of goodness of fit in regression?

A) Mean
B) Median
C) R²
D) Standard error
Answer: C
Rationale: R² assesses how well the model explains the dependent variable.


🔵 7. What happens when multicollinearity exists in a regression model?

A) Coefficients become unstable
B) R² becomes 0
C) Residuals increase
D) Linearity assumption fails
Answer: A
Rationale: It distorts the estimated relationships due to correlated predictors.


🔵 8. What type of regression is used when the dependent variable is binary?

A) Simple linear regression
B) Logistic regression
C) Ridge regression
D) Polynomial regression
Answer: B
Rationale: Logistic regression models outcomes with two categories.


🔵 9. What is the main difference between multiple and simple linear regression?

A) Number of dependent variables
B) Type of data
C) Number of independent variables
D) Use of residuals
Answer: C
Rationale: Multiple regression includes more than one independent variable.


🔵 10. Which method is used to estimate the coefficients in linear regression?

A) Maximum likelihood
B) Method of moments
C) Ordinary Least Squares (OLS)
D) Principal component
Answer: C
Rationale: OLS minimizes squared differences between observed and predicted values.


🔵 11. In regression analysis, the residual is the difference between:

A) Independent and dependent variable
B) Predicted and actual value
C) Two independent variables
D) Two predicted values
Answer: B
Rationale: Residuals are the prediction errors (actual – predicted).


🔵 12. What is heteroscedasticity?

A) Equal variance of errors
B) Unequal variance of errors
C) Correlated errors
D) Normal errors
Answer: B
Rationale: Heteroscedasticity refers to non-constant variance in residuals.


🔵 13. What is the intercept in a regression model?

A) The slope
B) The error term
C) The predicted mean of independent variable
D) Value of dependent variable when all predictors are zero
Answer: D
Rationale: Intercept is the starting point of the regression line.


🔵 14. Which test is used to check the overall significance of a regression model?

A) Chi-square
B) t-test
C) F-test
D) Z-test
Answer: C
Rationale: F-test evaluates whether the model explains a significant portion of variance.


🔵 15. What does a regression line minimize?

A) Sum of actual values
B) Sum of squared residuals
C) Mean error
D) Variance of predictors
Answer: B
Rationale: The OLS method minimizes squared differences between observed and predicted values.


🔵 16. A model with an R² of 0.90 means:

A) 90% of the variation in dependent variable is explained by independent variable(s)
B) The error is 90%
C) It is a nonlinear model
D) It has 90% residual
Answer: A
Rationale: R² = 0.90 means a good model fit with high explanatory power.


🔵 17. What happens when a model is overfitted?

A) It performs well on new data
B) It generalizes better
C) It captures noise rather than pattern
D) It has low variance
Answer: C
Rationale: Overfitting leads to poor performance on unseen data due to excessive complexity.


🔵 18. Which regression technique penalizes large coefficients?

A) OLS
B) Lasso regression
C) Simple regression
D) Logistic regression
Answer: B
Rationale: Lasso adds a penalty term to reduce large coefficients and prevent overfitting.


🔵 19. What is the dependent variable also known as?

A) Predictor
B) Explanatory
C) Output
D) Input
Answer: C
Rationale: It is the outcome or variable being predicted.


🔵 20. What is multivariate regression?

A) Regression with one independent variable
B) Regression with two dependent variables
C) Regression with multiple independent and dependent variables
D) Regression with no variables
Answer: C
Rationale: Multivariate regression includes multiple predictors and multiple outcomes.


🔵 21. Which plot is most commonly used to check residual patterns in regression analysis?

A) Line plot
B) Scatter plot
C) Histogram
D) Residual vs fitted plot
Answer: D
Rationale: The residual vs fitted plot helps assess the assumptions of linearity and homoscedasticity.


🔵 22. Which variable is manipulated to see the effect on the dependent variable in regression?

A) Dependent variable
B) Constant
C) Independent variable
D) Residual
Answer: C
Rationale: The independent variable is used to predict or explain changes in the dependent variable.


🔵 23. What is the result of adding more relevant predictors to a regression model?

A) Decreased R²
B) Increased error
C) Improved explanatory power
D) Increased residuals
Answer: C
Rationale: Adding meaningful predictors often increases the model's ability to explain variance.


🔵 24. When does multicollinearity become a problem in regression?

A) When independent variables are unrelated
B) When independent variables are highly correlated
C) When dependent variables are equal
D) When residuals are constant
Answer: B
Rationale: Multicollinearity occurs when predictors are correlated, affecting model stability.


🔵 25. What statistical method is used to identify the most influential predictors?

A) Variance analysis
B) t-test
C) Stepwise regression
D) R² comparison
Answer: C
Rationale: Stepwise regression helps in model selection by adding or removing predictors.


🔵 26. Which type of regression is used when the relationship between variables is not linear?

A) Simple linear regression
B) Logistic regression
C) Non-linear regression
D) Binary regression
Answer: C
Rationale: Non-linear regression models curved relationships between variables.


🔵 27. What is the standard error of estimate used for?

A) To test normality
B) To test heteroscedasticity
C) To measure the accuracy of predictions
D) To determine outliers
Answer: C
Rationale: It indicates how much the observed values deviate from the predicted values.


🔵 28. What does adjusted R² account for that R² does not?

A) Number of observations
B) Number of predictors
C) Outliers
D) Residual normality
Answer: B
Rationale: Adjusted R² adjusts R² for the number of predictors in the model.


🔵 29. Which regression technique is most appropriate for highly collinear variables?

A) OLS regression
B) Logistic regression
C) Ridge regression
D) Linear regression
Answer: C
Rationale: Ridge regression handles multicollinearity by adding a penalty term.


🔵 30. In regression analysis, which type of variable cannot be used as the dependent variable in linear regression?

A) Continuous
B) Interval
C) Binary
D) Ratio
Answer: C
Rationale: Binary outcomes require logistic, not linear, regression.


🔵 31. What is the purpose of dummy variables in regression?

A) Represent continuous variables
B) Reduce multicollinearity
C) Encode categorical variables
D) Test normality
Answer: C
Rationale: Dummy variables convert categorical data into numerical form.


🔵 32. What is a significant p-value (typically < 0.05) in regression testing?

A) Weak relationship
B) Significant relationship between predictor and outcome
C) No effect
D) Multicollinearity
Answer: B
Rationale: A small p-value suggests the independent variable significantly predicts the dependent variable.


🔵 33. What does a negative regression coefficient indicate?

A) No relationship
B) As the independent variable increases, the dependent variable also increases
C) As the independent variable increases, the dependent variable decreases
D) Residuals are positive
Answer: C
Rationale: A negative coefficient shows an inverse relationship.


🔵 34. What does the term “extrapolation” mean in regression?

A) Using data from within the sample
B) Predicting values beyond the observed range
C) Increasing sample size
D) Using only dependent variables
Answer: B
Rationale: Extrapolation involves predicting outcomes for values outside the range of the data.


🔵 35. What is the impact of outliers on regression models?

A) No impact
B) Improve model accuracy
C) Distort coefficient estimates
D) Reduce R²
Answer: C
Rationale: Outliers can greatly affect the slope and fit of the regression line.


🔵 36. Which is the best indicator of multicollinearity?

A) R²
B) t-statistic
C) Variance Inflation Factor (VIF)
D) Residual plot
Answer: C
Rationale: VIF quantifies the level of multicollinearity among predictors.


🔵 37. Which tool is used to compare regression models?

A) ANOVA
B) Correlation matrix
C) AIC (Akaike Information Criterion)
D) Bar chart
Answer: C
Rationale: AIC is used for model comparison; lower values indicate a better model.


🔵 38. If the regression coefficient is zero, it means:

A) Strong relationship
B) Moderate relationship
C) No linear relationship
D) Positive residuals
Answer: C
Rationale: A zero coefficient means the independent variable doesn't influence the dependent variable.


🔵 39. What is the purpose of using residual plots?

A) To show multicollinearity
B) To check randomness of errors
C) To measure central tendency
D) To show predictor strength
Answer: B
Rationale: Residual plots help diagnose issues like non-linearity or heteroscedasticity.


🔵 40. If two predictors are highly correlated, what is the likely effect on their coefficients?

A) Smaller variance
B) Larger standard error
C) More reliable estimates
D) Lower R²
Answer: B
Rationale: High correlation increases standard error and instability of coefficients.


🔵 41. Which method is used when the dependent variable is count data?

A) Linear regression
B) Poisson regression
C) Logistic regression
D) Lasso regression
Answer: B
Rationale: Poisson regression is appropriate for modeling count data.


🔵 42. What does the term “regression to the mean” mean?

A) All variables regress
B) Outliers tend to move closer to the average on retesting
C) Coefficients return to zero
D) Predictors move away from mean
Answer: B
Rationale: It refers to the tendency of extreme observations to return closer to average on subsequent measurements.


🔵 43. What does the slope of the regression line indicate?

A) Direction and strength of relationship
B) Mean of the variable
C) Intercept
D) Residual distribution
Answer: A
Rationale: The slope indicates the rate of change and direction of the relationship.


🔵 44. Which of the following can improve model accuracy?

A) Adding irrelevant predictors
B) Removing noise variables
C) Increasing residuals
D) Increasing sample bias
Answer: B
Rationale: Removing non-informative variables improves model clarity and performance.


🔵 45. Which value of R² indicates a perfect fit?

A) 0
B) 0.5
C) 1
D) -1
Answer: C
Rationale: An R² of 1 indicates that the model perfectly explains the variation in the dependent variable.


🔵 46. Which regression method is best for predicting probabilities?

A) Linear
B) Polynomial
C) Logistic
D) Poisson
Answer: C
Rationale: Logistic regression models the probability of an outcome.


🔵 47. What happens when we include too many predictors?

A) Model gets simpler
B) Overfitting may occur
C) Underfitting happens
D) Less accuracy
Answer: B
Rationale: Too many predictors increase complexity and reduce generalizability.


🔵 48. When is ridge regression preferred over linear regression?

A) When predictors are not correlated
B) When sample size is large
C) When multicollinearity exists
D) When data is non-linear
Answer: C
Rationale: Ridge regression handles correlated predictors better than OLS.


🔵 49. What does “underfitting” mean in a regression model?

A) The model is too complex
B) Model fails to capture the pattern
C) Model captures noise
D) Model has no residuals
Answer: B
Rationale: Underfitting means the model is too simple to capture trends in data.


🔵 50. The relationship in multiple linear regression is:

A) One-to-one
B) One-to-many
C) Many-to-one
D) Many-to-many
Answer: C
Rationale: Multiple predictors (independent variables) explain variation in a single dependent variable.





Post a Comment

0 Comments

Close Menu