Assignment Task:

Assignment Task 

This assignment consists of two deliverables, being: 

• One code implementation (40%). The code file in Jupyter Notebook format and the relevant data set files should be contained within a folder named: Task 3_YourName_StudentNumber, the folder is then to be zipped and uploaded to blackboard. 

• A report (60%). The report must be uploaded as a separate file. 

Part I - PySpark source code 

Important Note: For code reproduction, your code must be self-contained. That is, it should not require other libraries besides PySpark environment we have used in the semester. The data files are packaged properly with your code file. 

In this component, we need to utilise Python 3 and PySpark to complete the following data analysis tasks: 

1. Exploratory data analysis

2. Recommendation engine

3. Classification 

You need to choose a dataset from Kaggle (https://www.kaggle.com/datasets) to complete these tasks. Remember to include the data set file in you source code submission. 

Task I.1: Exploratory data analysis 

This subtask requires you to explore your dataset by 

• telling its number of rows and columns, 

• doing the data cleaning (missing values or duplicated records) if necessary 

• selecting 3 columns, and drawing 1 plot (e.g. bar chart, histogram, boxplot, etc.) for each to summarise it 

Task I.2: Recommendation engine 

This subtask requires you to implement a recommender system on Collaborative filtering with Alternative Least Squares Algorithm. You need to include 

 Model training and predictions 

• Model evaluation using MSE 

Task I.3: Classification 

This subtask requires you to implement a classification system with Logistic regression. You need to include 

• Logistic Regression model training 

• Model evaluation 

Part II –Report  

You are required to write a report with the following content: 

• Provide a high-level survey on the advances of data science in the past 2 years. 

• Explain how Spark fits into the field of data science. Compare Spark with its competitors. 

• Explain your design and implementation of the machine learning parts in your code, including the following topics: 

  • Background of your selected data set
  • For each task, which learning algorithm is used and what are its key parameters and how you set them up
  • For each task, provide comments/evaluation for the model learnt 

Your report should use the following template: 

Table of Contents 

1.0 Advancement of Data Science 

2.0 Spark in Data Science  

3.0 Machine Learning Implementation  

3.1 Data set

3.2 Collaborative filtering 

Features of the model, key parameters and configuration Evaluation

3.3 Logistic regression 

Features of the model, key parameters and configuration Evaluation 

 

This Engineering Assignment Help has been solved by our Engineering Experts at UniLearnO. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.

Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

Eureka! You've stumped our genius minds (for now)! This exciting new question has our experts buzzing with curiosity. We can't wait to craft a fresh solution just for you!

  • Uploaded By : Mia
  • Posted on : July 11th, 2019

Whatsapp Tap to ChatGet instant assistance