Internal Code: 1IEBD
Statistical Methods for Data Science
This assessment involves writing a short report that summarises a statistical investigation that you have conducted on data that you have collected yourself. You will need to collect your own data using good practices. For hints on how to collect good data see the section on Experimental Design in Week 1. Your data can be collected from an experiment or from observations that you have made directly, or it could a dataset relevant to your place of work. The data should be appropriate for addressing the hypotheses that you propose.
You might like to look online for journal papers for examples of how to shape your report. Obviously, many of these papers will have undergone extensive work to collect their data, we don’t expect that for you. We expect that you can obtain your data within 8 hours or less. Teaching staff do not expect you to win a Nobel prize with this assessment, but they do hope you can demonstrate that:
1. you have grasped important concepts associated with this course such as experimental design, visualisation, data analysis, hypothesis testing and interpretation
2. you can communicate your investigation in a formal written manner.
The introduction sets the scene for the investigative efforts. It provides motivation for the work and relevant background information and references that will enable the reader to put in context the key objectives and findings in your report. Address the important issues that have motivated your investigation. At the end of the introduction clearly state the objectives of the paper and associated hypotheses. Do not put any results from your investigation in the introduction. Do not discuss the data and methods in this section. Do not discuss your conclusions or key findings in the introduction.
This section should summarise the statistical methods that were used to analyse the data and the software used to generate the results. To cite R-Studio type RStudio.Version() from the command line. The methods should be appropriate to ensure the objectives of the paper are met. It is often helpful if the author lists the key R functions and associated arguments that generated the results. E.g. “The lm command with default settings for the arguments was used to produce a simple linear regression model between y and x in R-Studio”. It is important to provide sufficient details so that your methodology could be repeated by an independent person.
This section presents and discusses the results. The discussion centres on the outputs from the statistical tests, supporting graphs, etc you have provided and will draw any parallels with similar or opposing investigative works. Has anyone else found a similar result? Do your results produce different findings to that of others? It can be useful to present a nice graph and/or a table of summary statistics that summarises your data in the results section. This helps to provide a clearer picture to the reader about the data that were presented, and with a good grasp about the data that were collected, the reader will more easily be able to understand the results and discussion.