Artificial Intelligence and Machine Learning
Lecturer: Steve Kroon (kroon@sun.ac.za)
Duration: 1st Semester, 2006
Class Times: Mondays, 12h00
Class Venue: M203, Mechanical and Industrial Engineering
Course evaluation
Visit the course's bulletin board
Incorrect conversions provided:The .int and .char files provided in the facerec.zip files do not encode correctly all the data in the .gif files in the facerec.zip file. For more information, see this forum post.
Error in Assignment 6: A new version of the assignment is now available, addressing a small error.
NEWS: For admin reasons, I must once again advertise to you that this course will not be subject to an exam as indicated in the yearbook, but will be treated as a continual assessment subject for the purposes of the administrative system. Details of evaluation have, of course, already been available on the website for some time. (See link above)
Final deadline for all assignments is 27 June.
The preliminary deadline for handing in assignment 1 to have it marked for feedback has been shifted from 31 March to 2 April, to accomodate the B. Ing. Wet. Engineering Test Week.
Material for lecture 1
Material for lecture 2
Material for lecture 3
Material for lecture 4
Material for lecture 5
Material for lecture 7
Material for lecture 8
Material for lecture 9
If you're interested in this field, and want this course to cover particular methods and techniques, please contact the lecturer at the e-mail address above. Otherwise, this course will cover traditional and cutting-edge techniques designed to allow machines to do work previously reserved for humans only.
The techniques we will cover will mostly be selected from the following list
(others will be considered by request): support vector machines, simple neural
networks, nearest neighbour techniques, the perceptron, classification and
regression trees, parametric models, hidden Markov models, Gaussian mixture
models, genetic algorithms, simulated annealing, clustering techniques for
unsupervised learning, self-organizing maps, on-line learning, reinforcement
learning, E-M algorithms, novelty detection techniques, bagging, boosting, and
an introduction to statistical data mining techniques. We will also consider
various methods for evaluating techniques and reporting results, and cover
responsible use of data, a vital aspect in this field (eg. training vs. test
data, cross-validation, and hold-out sets)
The course will be evaluated by a number of programming and data set investigation assignments and reports, as well as a presentation at some time during the semester.
Course material and lecture contents
Lecture 1
Introduction to the field; brief historical overview; types of problems: supervised, unsupervised, classification, regression, density estimation; main groups of techniques: metric, network-based, statistical, tree methods; issues in machine learning: model selection, parameter selection, complexity control, efficient implementation, handling noisy data, finding model accuracy, encoding prior knowledge, extracting learnt knowledge; benchmarks and typical applications.
Reading related to this lecture:
Mike Alder - Principles of Pattern Classification, Chapter 1 - read the Preface and everything until the end of Section 1.1 (on page 24).
Steve Kroon - extracts from Support Vector Machines, Generalization Bounds and Transduction.
Gunnar Ratsch - A Brief Introduction to Machine Learning - read the whole thing.
No assignment for this week.
Lecture 2
Instance-based learning: The nearest neighbour algorithm and variants. Similarity and dissimilarity measures. Test sets, training sets, cross-validation and model selection. Confusion matrices.
Reading related to this lecture:
Kardi Teknomo's web tutorial on k-Nearest Neighbours - work through the whole thing.
Kardi Teknomo's web tutorial on Similarity and Dissimilarity measures - work through as much as you can cope with. If you want to skip something, I'd suggest leaving out most of the difference measures for ordinal and quantitative variables (but do look at at least one for each type).
Brent Martin's 1995 Masters Thesis - Instance-Based Learning: Nearest Neighbour with Generalisation - read the beginning of Chapter 2, until the end of 2.1.
Brian Ripley's slides on "Why do Nearest Neighbour Algorithms Do So Well?" - browse this to see the kinds of variants people try.
Jerry Zhu's introduction to Machine Learning through k-Nearest Neighbours - slides 13-23.
Assignment 1: Download here - preliminary deadline Sunday 2 April, 2006.
Error in original version of assignment 1: Please note that the URL's given in Question 2 of the original Assignment 1 are incorrect. Both URL's should begin with: http://www.ics.uci.edu/~mlearn
rather than http://www.ics.uci.edu/ mlearn
The assignment document above has been corrected.
Lecture 3
Contingency tables, entropy, information gain, decision trees, the ID3 algorithm, pruning trees, handling real-valued data, regression trees.
Reading related to this lecture:
DecisionTrees.Net's Tutorial on decision trees - work through the whole thing.
Andrew Moore's slides on Information Gain - click here for a temporary local mirror of the slides. Do the whole thing.
Andrew Moore's slides on Decision Trees - click here for a temporary local mirror of the slides. Skip the part about Chi-squared pruning unless you are familiar with the statistics of Chi-squared tests.
Assignment 2: Download here. Also download the adult database for the assignment here.
Lecture 4
Re-sampling and the bootstrap, bagging, weighted re-sampling, boosting
Reading related to this lecture:
Bagging predictors - Leo Breiman presents his technique in a UCB Technical Report in 1996. The first 3 sections are the most important.
A Brief Introduction to Boosting - Robert Schapire, the person who found the first polynomial-time boosting algorithm in 1989, gives an overview of AdaBoost, which he co-designed in 1995, at IJCAI'99. Don't worry too much about the section called "Generalization Error".
Assignment 3: Download here.
Lecture 5
Neural networks: biological origins, the perceptron, networks of neurons, recurrent and feedforward networks, layered networks and multi-layer perceptrons. Training neural networks and the backpropagation algorithm.
Reading related to this lecture:
Neural networks in nature - notes by Dr. David Newman of the Queen's University Management School in Belfast for their Intelligent Systems course.
An introduction to Multi-layer perceptrons - from Antti Honkela's Master's thesis on Nonlinear Switching State-Space Models.
An overview of feedforward network learning and the backpropagation algorithm - more notes by Dr. David Newman of the Queen's University Management School in Belfast for their Intelligent Systems course.
The backpropagation algorithm - notes from the Willamette University Neural Networks course presented by Jenny Orr.
Leonardo Noriega's Multilayer Perceptron Tutorial - Leonardo's at the Staffordshire University School of Computing. This PDF file contains a suggested data structure for implementing neural networks in C, and a possible multi-layer perceptron implementation algorithm for you to use as a guideline.
Assignment 4: Download here.
Here are the training data and test data for the assignment.
3 April
University holidays - no lecture.
Lecture 6
Question and answer session - no new material.
17 April
Easter Monday - no lecture.
Lecture 7: 24 April
Grammar induction: the Trakhtenbrot-Barzdin Algorithm (1973), Lang's breadth-first state merging (1992), the blue-fringe strategy, evidence-driven state merging spawned by Abbadingo One (Price, Juille and Pollack, 1997).
Reading related to this lecture:
Intro to DFA's and DFA Learning for the Abbadingo One Competition - this will be especially useful for people who don't know what a DFA is.
Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm - Read sections 7, 9 and 10 in Part II. This discusses the competition-winning algorithm in the 1997 Abbadingo One competition, which you will be required to code as part of your assignment.
Assignment 5: Download here.
1 May
Worker's day - no lecture.
Lecture 8: 8 May
Self Organizing Maps: dimension reduction, vector quantization, the Kohonen SOM algorithm
Reading related to this lecture:
Wikipedia's short description of SOMs - note that technically the SOM training phase is not vector quantization, although it is related.
Jaakko Hollmen on SOMs - extract from his Master's Thesis in Process Modelling (Computer Science) at the Helsinki University of Technology.
Tom Germano on SOMs.
WEBSOM - an application of SOMs to clustering newsgroup posts.
Assignment 6: Download here (download facerec.zip here).
Error in original version of assignment 6: Please note that the number of faces given in Question 4 of the original Assignment 6 are incorrect. There are 9, not 10, faces of each individual. You should only reserve 2 faces of each for testing.
The assignment document above has been corrected.
Incorrect conversions provided:The .int and .char files provided in the facerec.zip files do not encode correctly all the data in the .gif files in the facerec.zip file. For more information, see this forum post.
Lecture 9: 15 May
Support Vector Machines - topic presented by McElory Hoffman
Reading related to this lecture:
McElory's slides for his talk.
Getting to grips with Support Vector Machines: Theory - article by Steve Kroon from the SA Statistics Journal.
Getting to grips with Support Vector Machines: Applications - article by Steve Kroon from the SA Statistics Journal.
A practical guide to Support Vector Classification - an article by Chih-Wei Hsu, together with the authors of LIBSVM
Assignment 7: No assignment (But click here to download a local copy of libsvm - version 2.82. Authors: Chih-Chung Chang and Chih-Jen Lin).
Question and Answer Session: 12 June
Evaluation
- Assignment 1 will be marked if delivered by the preliminary deadline indicated on the website, to give an indication of the standard required.
- There will be N assignments, N probably being between 6 and 8.
- Any student who has not handed in N-2 assignments by the end-of-semester deadline will be awarded a grade of zero.
- Of the N assignments, the 3 of which the least people have handed in shall be marked. Your mark for the course shall be your best mark of these 3 assignments.
- If you have queries on the evaluation, contact me urgently. In general, if you want to see a change in the evaluation system, you must contact me before 15 March.
Class list
The assignments which qualified for marking this year are assignments 3, 5 and 6.
| Name | Surname | Student Number | Course | Mark |
| Atumbe | Baruani | 14662248 | M Sc | 30% |
| Stephan | Basson | 14052555 | B Ing Wet | 54% |
| Deon | Borman | 12400734 | B Sc Hons | 0 |
| Pascal | Brandt | 14025892 | B Sc Hons | 79% |
| Heinrich | Brink | 13887157 | B Sc Hons | 40% |
| Carl | Crous | 14059967 | B Sc Hons | 75% |
| John | Dalton | 13903276 | B Ing Wet | 0 |
| Marius | De Wit | 14046245 | B Ing Wet | 75% |
| Marianne | Du Preez | 14042703 | B Ing Wet | 93% |
| Francois | Du Toit | 14115875 | B Sc Hons (OA) | 86% |
| Ezard | Esau | 13868764 | B Ing Wet | 0 |
| Tayseer | Fathelrahman | 14634783 | M Sc | 0% |
| Jean | Fourie | 13949977 | M Sc | 37% |
| Abrie | Greeff | 13557343 | B Sc Hons | 69% |
| Victoria | Hasheela | 14953455 | B Sc Hons | 0% |
| Werner | Hattingh | 14182386 | B Ing Wet | 59% |
| Nicolene | Heunis | 14127091 | B Ing Wet | 56% |
| William | Howard | 14075555 | B Ing Wet | 43% |
| Basie | Kok | 14063611 | B Sc Hons (OA) | 63% |
| Andre | Kriek | 14052903 | B Sc Hons | 84% |
| Philip | Louw | 13825852 | B Ing Wet | 73% |
| Andre | Luus | 14029316 | B Sc Hons | 61% |
| Sebastian | Maul | 14734454 | B Sc Hons | 0 |
| Wu | Meng | 14983850 | B Sc Hons | 0 |
| Du Toit | Minnaar | 13812793 | B Comm Hons | 0 |
| Campbell | Morrison | 14067587 | B Ing Wet | 83% |
| David | Mwakyusa | 14702223 | B Sc Hons | 0 |
| Jaco | Myburg | 14043378 | B Ing Wet | 54% |
| Jeong-hun | Park | 14529130 | B Sc Hons | 0 |
| Wynand | Pieters | 14182718 | B Ing Wet | 79% |
| Dirk | Potgieter | 13953494 | B Sc Hons | 0 |
| Emile | Rossouw | 14127768 | B Ing Wet | 62% |
| Derick | Schmidt | 13816462 | B Ing Wet | 0 |
| Walter | Schulze | 14073021 | B Sc Hons | 64% |
| Eugene | Smal | 14056615 | B Sc Hons | 91% |
| Ettienne | Steenkamp | 13925881 | B Ing Wet | 0 |
| Riaan | Swart | 13846531 | B Ing Wet | 77% |
| Lin | Zicheng | 14936356 | B Sc Hons | 0 |