# Health Insurance Costs of Customers for a Real Health Insurance Company - Statistics Assignment Help

There are three questions to be answered. All calculations need to be in the corresponding sheet.

Taks1: It is a hypothesis testing. You need to answer to below questions:
a)
Is the difference significant?
b) Is this durg helping patients or worsening it?

Task2: Linear regression: This is a dataset to predict the heath insurance costs of customers for a real health isurance company. Target variable is "charges" For "sex", and "smoker" features, replace classes with 0 or 1. For instance "female" could be 0, "male" could be 1. Choice of 0 and 1 is oprional. You will use these to build the model Use multiple linear regressionto predict charges. Set 20% of the data to be "test" set. Calculate RMSE for test set and training set
a) Calculate RMSE?
Find out which feature can be dropped, either using correlaton between features, or use t-stat ot p-value. Build a new model with one less feature
b) which feature better to be dropped?
Compare first model and second model
c) which model is better
d) why

Task3:There are two colums coming rom a classification problem. First column is the predicted value, second is the real value value of 1 stands for positive/yes and 0 means negative or no. Try to build a confusion matrix by counting how many instances of TN, TP, FP, FN we have
a) Find TP, TN, FP, and FN in order
b) Calculate accuracy, precision, recall, and f1-score
c) Is the dataset balanced or imbalanced?
d) Which performance metrics will you choose if we don’t have any information about what is the dataset about?
Let's assume 0 stands for not-rain and 1 stands for rainy days. A business of a large chain ice cream store is using this model to close some of the branches if it is predicted to rain A night before, based on model prediction, employees receive an email if they need to show up tomorrow at work or not The company is already well-known and doesn't need to acquire more customers. Their focus for now is to save costs and not open the store, when there is no customer due to the rain
e) With this background, which performance metrics will you choose?

• Posted on : January 10th, 2020

