|
|
|
School of Information Technologies |
COMP5318 KNOWLEDGE DISCOVERY
AND
DATA MINING
Semester 1,
2013
News
| 4/3/2013 |
Welcome
to
COMP5318! |
Fei Wang - tutor
Email: fwan7956 AT uni.sydney.edu.au
Georgina Wilcox - tutor
Email: georgina.wilcox AT sydney.edu.au
(Please change AT with @ for example ABC AT DEF.com = ABC@DEF.com)
Timetable
| Activity |
Day |
Time |
Venue |
| Lectures |
Monday |
6-8pm |
Architecture LT 1 |
| Laboratory/Tutorial (start in Week 2) |
Monday |
8-9pm |
SIT labs 115, 116 and 117 |
Assessment overview
The assignment
specifications will
be available on the eLearning
| Assignement |
% |
Out |
Due |
Individual/Group |
Notes |
Late
submission policy |
| Ass1: Test |
15 |
|
w6,
in
class, Monday 15 April 2013 |
Individual |
In
the 1st first hour of the lectures (6-7pm). Semi-open as the exam.
Students are allowed 1 sheet of their own notes (A4-size,
double-sided, handwritten or typed). The test will cover the material
on Clustering. |
Not
possible to re-sit the test. |
| Ass2: Data analysis |
20 |
May 19th. 11:59 PM |
Individual or in groups (max 4 per group) | Submission:
electronically via eLearning
Instructions to hand in assignments Room allocation for group presentation |
- A
penalty of minus 1 mark per each day after the deadline - the maximum delay is 7 days; after that assignments will not be accepted |
|
| Ass3: Research paper presentation final schedule |
15 |
w12
and
13,
in
class |
Group |
Groups and Paper Assignment | - No late presentations are allowed; a student who is unable to present on the specified date will receive 0 marks for this assessment | |
| Written exam | 50 |
examination
period |
Individual |
The exam will be semi-open. You are
allowed 1 sheet of your own notes (hand-written or typed,
double-sided, A4-size) and a non-programable calculator (you don't need
a calculator). No other material is
allowed (no book,
no additional notes). The exam will be on all material except
Clustering. |
Academic honesty: Please read
the University
Policy
on
Academic
Honesty and submit the appropriate cover sheet
with your signature with your assignments. The cover sheets are
available from the link above.
| Week |
Date |
Topic |
| 1 |
4 March |
Admin
matters.
Introduction
to
Data
Mining
(DM);
challenges,
origins,
DM
vs
Machine Learning and Knowledge Discovery in Databases; DM tasks. similarity measures. Slides |
| 2 |
11th March |
Introduction to Map-Reduce. Slides Tutorial |
| 3 |
18th March |
Introduction to Clustering Slides Tutorial Sample Codes |
| 4 |
25th March |
Clustering and Probability Slides Tutorial Tutorial Solutions |
| 5 |
8th April |
Classification Slides Exam Prep Tutorial Codes for Tutorial |
| 6 |
15th April |
Mid Term Exam Solution |
| 7 |
22nd April |
Association rules Tutorial |
| 8 |
29th April |
Classification based on Association Rules Slides Tutorial |
| 9 |
6th May |
Slides Tutorial Ionosphere dataset |
| 10 |
13th May |
Slides |
| 11 |
20th May |
Slides |
Textbook
|
Mining of Massive Data Sets Anand Rajaram, Jure Leskovec and Jeff Ullman Cambridge University Press
|
|
| Introduction
to
Data
Mining Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Pearson Education (Addison Wesley), 0-321-32136-7, 2006 Chapters 4, 6 and 8 are freely available here and from the publisher.
|
|
Recommended book
| Data mining - practical machine
learning
tools and techniques with Java implementations, 3d edition Ian H. Witten, Eibe Frank and M. Hall Morgan Kaufmann, 2011, ISBN: 978-0-12-374856-0 Machine Learning view of Data Mining. Very readable. The book of the WEKA software. You cana lso use the previous edition of the book (2d edition). |
|
Other recommended books
| Data Mining: Introductory and
Advanced
Topics Margaret Dunham, Prentice Hall, 0-13088892-3, 2003 Good coverage of the topics included in the course. Very readable. Pseudo code and computation complexity covered. |
|
| Data Mining
Concepts and Techniques. J. Han and M. Kamber Morgan Kaufmann, 2006, ISBN 1-55860-901-6 Database view of Data Mining. |
|
| Principles of Data
Mining D. Hand, H. Mannila, P. Smyth, Principles of data mining, MIT Press, 2001, ISBN: 0-262-08290-X Statistical view of Data Mining. Advanced, requires good statistical knowledge. |
|
Tan and Witten are placed in the library Reserve collection (2 Hour Loan collection) and are also available in the Co-op
Bookshop.
Last modified: 12 May 2012