Subject Code : COMP337
Assignment Task:

Learning outcome assessed:
A critical awareness of current problems and research issues in data mining.

Purpose of assessment:
This assignment assesses the understanding of the k-means clustering algorithm by implementing k-means for text clustering.

1. Objectives
This assignment requires you to implement the k-means clustering algorithm using the Python programming language.

2. Word Clustering using k-means
In the assignment, you are required to cluster words belonging to four categories: animals, countries, fruits and veggies. The words are arranged into four different files. The first entry in each line is a word followed by 300 features (word embedding) describing the meaning of that word.

Questions
(1) Implement the k-means clustering algorithm with Euclidean distance to cluster the instances into k clusters.

(2) Vary the value of k from 1 to 10 and compute the precision, recall, and F-score for each set of clusters. Plot k in the horizontal axis and precision, recall and F-score in the vertical axis in the same plot.

(3) Now re-run the k-means clustering algorithm you implemented in part (1) but normalise each feature vector to unit `2 lengths before computing Euclidean distances. Vary the value of k from 1 to 10 and compute the precision, recall, and F-score for each set of clusters. Plot k in the horizontal axis and precision, recall and F-score in the vertical axis in the same plot.

(4) Now re-run the k-means clustering algorithm you implemented in part (1) but this time uses Manhattan distance over the unnormalised feature vectors. Vary the value of k from 1 to 10 and compute the precision, recall, and F-score for each set of clusters. Plot k in the horizontal axis and precision, recall and F-score in the vertical axis in the same plot. (10 marks)

(5) Now re-run the k-means clustering algorithm you implemented in part (1) but this time use Manhattan distance with `2 normalised feature vectors. Vary the value of k from 1 to 10 and compute the precision, recall, and F-score for each set of clusters. Plot k in the horizontal axis and precision, recall and F-score in the vertical axis in the same plot. (10 marks)

(6) Now re-run the k-means clustering algorithm you implemented in part (1) but this time use cosine similarity as the distance (similarity) measure. Vary the value of k from 1 to 10 and compute the precision, recall, and F-score for each set of clusters. Plot k in the horizontal axis and precision, recall and F-score in the vertical axis in the same plot.

(7) Comparing the different clusterings you obtained in (2)-(6) discuss what is the best setting for k-means clustering for this dataset.


This COMP 337/527: IT Assignment has been solved by our IT Experts at UniLearnO. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our Experts are well trained to follow all marking rubrics & referencing style.

Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.

Eureka! You've stumped our genius minds (for now)! This exciting new question has our experts buzzing with curiosity. We can't wait to craft a fresh solution just for you!

  • Uploaded By : Alex Cerry
  • Posted on : June 04th, 2019

Whatsapp Tap to ChatGet instant assistance