MATH2349: Data Preprocessing- World Health Organisation- Species and Surveys- R Studio Assignment Help

Internal Code: 1HABE

R Studio Assignment Help:

You will use WHO data set for Tasks 1- 5. Read the WHO data using an appropriate function and complete the tasks 1-5.

Use appropriate “tidyr” functions to reshape the WHO data set into the form given below:

The WHO data set is not in a tidy format yet. The “code” column still contains four different variables’ information (see variable description section for the details). Separate the “code” column and form four new variables using appropriate “tidyr” functions.  The final format of the WHO data set for this task should be in the form given below:

The WHO data set is not in a tidy format yet. The “rel”, “ep”, “sn”, and “sp” keys need to be in their own columns as we will treat each of these as a separate variable. In this step, move the “rel”, “ep”, “sn”, and “sp” keys into their own columns. The final format of the WHO data set for this task should be in the form given below:

There is one more step to tidy the WHO data set. We have two categorical variables “sex” and “age”. Use “mutate()” to factorise sex and age. For “age” variable, you need to create labels and also order the variable. Labels would be: <15, 15-24, 25-34, 35-44, 45-54, 55-64, 65>=. The final tidy version of the WHO data set would look like this:

5- Task 5: Filter & Select

Drop the redundant columns “iso2” and “new”, and filter any three countries from the tidy version of the WHO data set. Name this subset of the data frame as “WHO_subset”.

You will use surveys and species data sets for Tasks 6 – 10. Read the species and surveys data sets using an appropriate function. Name these data frames as “species” and “surveys”, respectively.

Combine “surveys” and “species” data frames using the key variable “species_id”. For this task, you need to add the species information (“genus”, “species”, “taxa”) to the “surveys” data. Rename the combined data frame as “surveys_combined”.

Using the “surveys_combined” data frame, calculate the average weight and hindfoot length of one of the species observed in each month (irrespective of the year). Make sure to exclude missing values while calculating the average.