Understanding the Effect of Input Ordering on a Clustering Algorithm

Summary This exercise demonstrates via experimentation the value of having a training set with randomly ordered input vectors. The exercise uses an implementation of the COBWEB conceptual clustering algorithm that visualizes the taxonomy of clusters it creates. The internal or summary nodes are displayed as objects that reflect the average color and size of the objects in the subtree below the node, and the most frequently occurring shape. Students design a data set consisting of multiple objects described by (color, shape, size) tuples. Then they experiment with various input orderings to learn the effect that input ordering has on the resulting taxonomy and number of high-level clusters that emerge. Students can click on nodes in the tree to reveal the summary data represented by each internal node or the feature vector associated with the leaf node.
The exercise is part of the TAILS Conceptual Clustering module.
Topics Machine learning, clustering algorithms, conceptual clustering, COBWEB, input ordering
Audience Introduction to AI. Can be adapted to K-12 courses to demonstrate concepts in the Common Core.
Difficulty This is an introductory assignment. The assignment can be performed as a demonstration for the youngest students, or by individuals or pairs among the older students. A demonstration can be completed in thirty minutes. For a hands-on activity with students familiar with the structure of an experiment, the assignment requires a brief introduction and one hour for the students to carry out the experiments and record results and observations.
Strengths Students are introduced to the concept of a feature vector and can quickly observe the effect that the order in which examples are presented to the learning algorithm effects what is learned. They also learn about posing a hypothesis, constructing an experiment to test it, and recording the results.
Weaknesses Students need access to a JavaScript-enabled browser.
Dependencies As a minimum, students need a rudimentary understanding of taxonomies and the ability to make the connection between a feature vector and an object. To gain the most from the assignment, students also need to know how an average is calculated and to be familiar with the concept of probability.
Variants The instructor can specify the types of objects and distribution of instances that the students will use or allow the students to experiment by creating their own sets of feature vectors and observing the outcome.
Materials Assignment - Understanding Input Ordering
Conceptual Clustering Application
Tutorial for using the conceptual clustering tool
Solution - Understanding Input Ordering assignment
Teaching Notes