What is the name for groups that have objects that are more similar to each other than those in other groups?
Cluster
Each cluster is a collection of __________.
data objects
What is clustering also known as?
Segmentation
Objects in a group will be similar or _________ to one another and different from the objects in other groups.
homogeneous
What happens to intra-cluster distances when clustering groups?
They are minimized
What happens to inter-cluster distances when clustering groups?
They are maximized
Different methods to calculate distance.
Euclidean, Manhattan, Chebyshev
Manhattan distance formula
|x1-x2| +|y1-y2|
Euclidean distance formula
sqrt((x1-x2)^2 +(y1-y2)^2)
What is the name of the method used to handle calculating distance with multiple data points
K-means clustering method
What does k-means mean?
Average distance between clusters
K-Means Algorithm
1.Select K points as the initial centroids
2.repeat
3. Form K clusters by assigning all points to the closest centroid
4.Recompute the centroid of each cluster
5. Until the centroids don't change
What is Manhattan distance?
A distance metric between two points in a N dimensional vector space
Which line represents Manhattan distance?
the blue line
What is manhattan distance often used to calculate the distance of?
integrated circuits where wires only run parallel to the X or Y axis
Manhattan distance is also called_______.
Minkowski's L1 distance
What is Euclidean distance?
The straight line distance between two points.
What formula does euclidean distance take from?
Pythagorean theorem
What type of approach is the k-means clustering method?
Produces a set of nested clusters organized as a hierarchical tree
What can hierarchical clustering be visualized as?
Dendrogram
Dendrogram
A tree-like diagram that records the sequences of merges or splits
What are the strengths of hierarchical clustering?
No assumptions on the number of clusters(any number of clusters can be obtained by cutting the dendrogram at the proper level), they correspond to meaningful taxonomies
What are the two main types of hierarchical clustering?
Agglomerative and Divise
Agglomerative
(bottom up method) starts with the points as individual clusters and each step, merge the closest pair of clusters until only one cluster left
Divisive
(top bottom method) start with one, all-inclusive cluster and at each step, split a cluster until each cluster contains a point.
Examples of Clustering.
Document clustering, marketing, city-planning
What type of learning is clustering?
Unsupervised
Association Rule Mining
Given a set of transactions, find rules that will predict occurrence of an item based on the occurrences of other items in the transaction.
What is the goal of association rule mining?
Finding regularities in data
Example of association rule mining
Target product recommendation
What is the goal of market basket analysis?
To determine the strength of all the association rules among a set of items.
What question does the application of market basket analysis answers?
Which items are likely to be purchased together?
Support
({X,Y} or X-> Y): how often X and Y go together. # of records containing X and Y divided by total # of records.
Confidence
(X -> Y): how often Y go together with X. # of records containing X and Y divided by # of records containing X