School of Information Technologies


 

COMP5318 KNOWLEDGE DISCOVERY AND DATA MINING
Semester 1, 2013

Outline eLearning/Blackboard
Timetable  Assessment 
Syllabus Resources


News

4/3/2013
Welcome to COMP5318!



Course outline

This course will offer a comprehensive coverage of well known Data Mining topics including classification, clustering and association rules. A number of specific algorithms and techniques under each category will be discussed. Methods for feature selection, dimensionality reduction and performance evaluation will also be covered. Students will learn and work with appropriate software tools and packages in the laboratory. They will be exposed to relevant Data Mining research.

Teaching staff


Sanjay Chawla- course coordinator, lecturer and tutor
Email: chawlaAT it.usyd.edu.au
Office: room 424, School of IT Building
Consultation time: Monday 5-6pm (before the lectures)

Wei Liu - lecturer
Email: wei.liu AT nicta.com.au


Didi Surian - tutor
Email: didi AT it.usyd.edu.au

Linsey Pang- tutor
Email: qlinsey AT it.usyd.edu.au


Fei Wang - tutor
Email: fwan7956 AT uni.sydney.edu.au


Georgina Wilcox - tutor
Email: georgina.wilcox AT sydney.edu.au

(Please change AT with @ for example ABC AT DEF.com = ABC@DEF.com)

Timetable

Activity
Day
Time
Venue
Lectures
Monday
6-8pm
Architecture LT 1
Laboratory/Tutorial
(start in Week 2)
Monday
8-9pm
SIT labs 115, 116 and 117

Assessment overview
The assignment specifications will be available on the eLearning site. 

Assignement
%
Out
Due
Individual/Group
Notes
Late submission policy
Ass1: Test
15
 
w6, in class, Monday 15 April 2013

Individual

Not possible to re-sit the test.
Ass2: Data
analysis

20

May 19th. 11:59 PM
Individual or in groups (max 4 per group) Submission: electronically via eLearning
Instructions to hand in assignments
Room allocation for group presentation
- A penalty of minus 1 mark per each day after the deadline
- the maximum delay is 7 days; after that assignments will not be accepted
Ass3:
Research paper presentation

 final schedule
15

w12 and 13, in class
Group
Groups and Paper Assignment - No late presentations are allowed; a student who is unable to present on the specified date will receive 0 marks for this assessment
Written exam 50

examination period
Individual


In order to pass the course, the School requires at least 40% in the written exam, at least 40% in the other assessment components together and an overall final mark of 50 or more. This means that students who score less than 40% in the exam will fail the course regardless of their marks during the semester.

Academic honesty: Please read the University Policy on Academic Honesty and submit the appropriate cover sheet with your signature with your assignments. The cover sheets are available from the link above.

Special considerations: If you have a condition requiring a special consideration, you must: 1) submit a form within 1 week from the date when assessment was due, 2) include your e-mail address, phone number and the name of your tutor, and 3) e-mail your lecturer that you have submitted a special consideration form. For more information please read the Policy on special consideration due to illness or misadventure; you can also download the form from there.


Syllabus

 The teaching materials (lecture notes, lab notes and lab solutions) will be available on the eLearning site.

Week Date Topic
1 4 March Admin matters. Introduction to Data Mining (DM); challenges, origins, DM vs Machine Learning and Knowledge Discovery in Databases; DM tasks.
 
similarity measures. Slides
2 11th March Introduction to Map-Reduce.
 
Slides
Tutorial
3 18th March Introduction to Clustering
 
Slides
Tutorial
Sample Codes
4 25th March Clustering and Probability
 
Slides
Tutorial
Tutorial Solutions
5 8th April Classification
 
Slides
Exam Prep
Tutorial
Codes for Tutorial
6 15th April
 
Mid Term Exam
Solution
7 22nd April
 
Association rules
Tutorial
8 29th April Classification based on Association Rules
 
Slides
Tutorial
9 6th May Dimensionality Reduction
 

Slides
Tutorial
Ionosphere dataset
10 13th May Dimensionality Reduction Continued + Discussion on Assignment
 

Slides
11 20th May Recommendation Systems
 

Slides
12 3rd June Review Slides
 

Top 20 Slides
13 11th June Review Sessions: Tuesday 11th June 2013, 6-8PM, Carslaw Lecture Theatre 173
 

- 18th June Final Exam: Tuesday 18th June 2013. Quad McRae Room S418
 



Resources

Textbook

Mining of Massive Data Sets
Anand Rajaram, Jure Leskovec and Jeff Ullman
Cambridge University Press


Introduction to Data Mining
Pang-Ning Tan, Michael Steinbach, Vipin Kumar,
Pearson Education (Addison Wesley), 0-321-32136-7, 2006

Chapters 4, 6 and 8 are freely available here and from the publisher.

tan.jpg

Recommended book

Data mining - practical machine learning tools and techniques with Java implementations, 3d edition
Ian H. Witten, Eibe Frank and M. Hall
Morgan Kaufmann, 2011, ISBN: 978-0-12-374856-0

Machine Learning view of Data Mining. Very readable. The book of the WEKA software. You cana lso use the previous edition of the book (2d edition).



Other recommended books

Data Mining: Introductory and Advanced Topics
Margaret Dunham, Prentice Hall, 0-13088892-3, 2003

Good coverage of the topics included in the course. Very readable. Pseudo code and computation complexity covered.

dunham.jpg

Data Mining Concepts and Techniques
 J. Han and M. Kamber
Morgan Kaufmann, 2006, ISBN 1-55860-901-6 

Database view of Data Mining.

han

Principles of Data Mining
D. Hand, H. Mannila, P. Smyth, Principles of data mining,
MIT Press, 2001, ISBN: 0-262-08290-X

Statistical view of Data Mining. Advanced, requires good statistical knowledge.

hand.jpg


Tan and Witten are placed in the library Reserve collection (2 Hour Loan collection) and are also available in the Co-op Bookshop.


Last modified: 12 May 2012