From the very beginning, I was asked to share my screen and open excel.
1) Write 1 to 5 and calculate the std. deviation without using default functions. What is std. deviation.
2) Explain Type 1 and Type 2 Errors - If one has cancer but predicted as not cancer.- Which type of error is this?
3) Using first three columns of the given data, derive the last column(Total Salary) using SQL queries as well as pandas.
data = """EID Month Salary Total Salary
1 1 10000 10000
1 2 12000 22000
1 3 14000 36000
2 1 16000 16000
2 2 18000 34000
2 3 20000 54000"""
4) Central Limit Theorem
5) 4 basic assumption of Linear Regression
6) If there are 10k obs. out of which 500 observations is for 1: defaulter for loans and 9500 observation for non- defaulters for loans for target variable, so whenever we are doing 70:30 train test split, it is observed that the data is training on non-defaulters so how to mitigate it? How to handle before train-test split? (Hint: Strata(Stratification), dont say SMOTE/RandomOverSampler/class_weight)
7) What should be the correct evaluation metric for the above?
8) Feature Scaling - Which model work with feature scaling and which model cant work on feature scaling? Linear Regression/Decision Tree ?
9) What is Stationarity in Time Series Forecasting.