Subject Code : CO3090
Assignment Task:

Tasks: 
Word frequency analysis is the first step towards text mining and automatic web page classification. The aim of this coursework is to implement a multiple word frequency counter. The program should count the number of occurrences of a specific keyword on all web pages. Starting from a seed page, your program should be able to traverse deeper through every hyperlink found on the pages. You will need to take advantage of multi-threading parallelism to improve the performance of the program. 

Question 1
(1.1) Explain why a multi-threaded letter frequency counter has better performance than the single-threaded version.
(1.2) To fetch all the pages from a seed URL, the frequency counter needs to traverse through the site by following the links present on the pages. In general, there are two crawling strategies (Please refer to Appendix 1.2 for more details). Explain the strategy you used for your implementation and justify your choice.
(1.3) Given the diagram below, how many threads will be started according to your implementation?

Question 2
You will need to modify WebAnalyser.java 
(2.1) Add appropriate thread-safe data structures to store the following information: 
(a) The URLs visited. 
(b) How many times a chosen keyword occurs on every page it has visited. (Note: you are allowed to use built-in Java collection classes (e.g.Vector, ArrayBlockingQueue and ConcurrentHashMap, etc.) 
(Note: You are allowed to use Java thread-safe collection classes like Vector or ArrayBlockingQueue, - extra bonus will be given to those who implement their own version of the thread-safe list, map or queue) 
(2.2) Implement all abstract methods defined in the WebStatistics interface. 

Question 3
(3.1) Limit the maximum number of the threads running in parallel to MAX_THREAD_NUM
(3.2) The program prints the statistics when one of the following events occurs: 
• The number of the web pages visited by all counter threads exceeds MAX_PAGES_NUM. 
• The total number of words on all pages exceeds MAX_WORDS_COUNT. 
• A specified time (TIME_OUT in milliseconds) has passed. 
• All WebAnalyser threads (except main thread) have finished their executions.

This Computer Science Assignment has been solved by our Computer Science Experts at UniLearnO. Our Assignment Writing Experts are efficient to provide a fresh solution to this question. We are serving more than 10000+ Students in Australia, UK & US by helping them to score HD in their academics. Our experts are well trained to follow all marking rubrics & referencing style.

Be it a used or new solution, the quality of the work submitted by our assignment experts remains unhampered. You may continue to expect the same or even better quality with the used and new assignment solution files respectively. There’s one thing to be noticed that you could choose one between the two and acquire an HD either way. You could choose a new assignment solution file to get yourself an exclusive, plagiarism (with free Turnitin file), expert quality assignment or order an old solution file that was considered worthy of the highest distinction.


 

Eureka! You've stumped our genius minds (for now)! This exciting new question has our experts buzzing with curiosity. We can't wait to craft a fresh solution just for you!

  • Uploaded By : Mia
  • Posted on : February 25th, 2019

Whatsapp Tap to ChatGet instant assistance