Assignment Task:

The objective of the Assignment
To successfully apply a set of data mining skills imparted through lectures and lab sessions to a previously unseen dataset using Weka to achieve knowledge discovery and producing a written technical paper format report.


Deliverables

A single zip called FirstName_LastName_StudentNumber._ass1.zip to be uploaded to Moodle containing the following files
This file edited to contain the results of your investigation. Each of the NUMBERED headings should be expanded to satisfy the requirements of the section. 
A set of supporting files including but not limited to the following, which should be clearly referenced from your documentation.

  • dataset.arff
  • trainigSet.arff
  • testingSet.arff
  • j48tree.arff
  • associationrules.arff
  • kmeans.arff
  • dbscan.arff

Choosing Your Dataset 
Your dataset should concern a real-world problem that lends itself to easy understanding by your classmates. 

  • It should ideally have >1000 tuples/rows/instances.
  • It should ideally have >=6 attributes 
  • It should have attributes which can serve as labels so that the accuracy of your data analysis can be determined. 
  • If you cannot find one dataset which is suitable for use with all techniques, then you may choose 2. Please clearly indicate which dataset was used in which case and introduce this dataset

* Please refer to additional materials section in moodle for datasets links.
* Please post to the student discussion forum “Assignment 1 - Dataset Selection” clearly indicating which set you are using so that other students do not select the same dataset. 

Part 1 – Classification
1. Description of your dataset and findings – 10%
Title: Brief title to capture the data and objective of your assignment 
Objective: What you want to uncover by examining the data in this assignment. You can update this as you progress through your project revising it and making it more specific. 
Data description: A description of the data in detail under the following subheadings: 
The problem domain
The source of the data
The agencies working with the data
The intended use of the data
The attribute types of the data
Please include screen shots (with one or two sentences of summary) of the dataset and also of the data summaries and graphs that are available through Weka. 
Summary of Findings: This should feature here at the top of the document, but be written following the application of your data mining techniques. Should contain numerical values and discussion.
2. Preprocessing – 10%
In this section you should 
Identify the set of preprocessing techniques that can be applied to your data and clearly indicate which techniques are appropriate and which ones are not. 
Provide evidence through screenshot of the effects of preprocessing the data along with a short explanation. 
Generate a file called dataset.arff which is the outcome of the preprocessing. 
3. Divide your dataset into training and test set – 5%
D
ivide the dataset into training and testing data sets (9:1). Additional resources links are in moodle.  The files generated as part of this process should be saved and submitted as the following

  • trainingSet.arff and 
  • testingSet.arff
  • Screen shots of these files should be included.

Experiments
For each of the following classification techniques
Train your model using trainingSet.arff 
Test your model using testingSet.arff
Write a few paragraphs analyzing the results. Be sure to vary parameters at least 3 times in each case. Support this analysis with screenshots of the following 
The model or a visualization of the model
The results of the model 
Any additional output of the model including but not limited to 
Rules 
Confidence Values
Confusion Matrixes
Etc.
Simple references to the notes or URL links to online resources complete with a sentence or two of explanation. 
3.1Classification: J48 Tree – 10%
3.2Classification: Association Rules – 10%

Part 2 - Clustering
1. Description of your dataset and findings – 10%
Title: Brief title to capture the data and objective of your assignment 
Data description: A description of the data in detail under the following subheadings: 
The problem domain
The source of the data
The agencies working with the data
The intended use of the data
The attribute types of data
Please include screenshots (with one or two sentences of summary) of the dataset and also of the data summaries that are available through Weka. 
Objective: What you want to uncover by examining the data in this assignment. You can update this as you progress through your project revising it and making it more specific. 
Summary of Findings: This should be written following the application of your data mining techniques. 
2. Preprocessing – 10%
In this section, you should 
Identify the set of preprocessing techniques that can be applied to your data and clearly indicate which techniques are appropriate and which ones are not. 
Provide evidence through a screenshot of the effects of preprocessing the data along with a short explanation. 
Generate a file called dataset.arff which is the outcome of the preprocessing. 
Experiments
For each of the following 2 clustering techniques
Use dataset.aff as input. If adoptions are necessary clearly indicate them.
Write one or two paragraphs analyzing the results of the clustering.  Be sure to vary parameters at least 3 times in each case. Support this analysis with screenshots of the following 
The clusters and/or a visualization of the clusters
The results of the clusters 
Any additional output of the clustering process
Simple references to the notes or URL links to online resources complete with a sentence or two of explanation. 
Evaluate the clusters using the “classes to clusters evaluation”. A worked example may be found here 

This Computer Science Assignment has been solved by our Computer Science Experts at UniLearnO. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our experts are well trained to follow all marking rubrics & referencing style.

Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

Eureka! You've stumped our genius minds (for now)! This exciting new question has our experts buzzing with curiosity. We can't wait to craft a fresh solution just for you!

  • Uploaded By : Mia
  • Posted on : March 18th, 2019

Whatsapp Tap to ChatGet instant assistance