38 Terms

ðŸ˜ƒ Not studied yet (38)

Cluster Analysis (or Clustering)

Is the task of grouping a set of objects

What is the name for groups that have objects that are more similar to each other than those in other groups?

Cluster

Each cluster is a collection of __________.

data objects

What is clustering also known as?

Segmentation

Objects in a group will be similar or _________ to one another and different from the objects in other groups.

homogeneous

What happens to intra-cluster distances when clustering groups?

They are minimized

What happens to inter-cluster distances when clustering groups?

They are maximized

Different methods to calculate distance.

Euclidean, Manhattan, Chebyshev

Manhattan distance formula

|x1-x2| +|y1-y2|

Euclidean distance formula

sqrt((x1-x2)^2 +(y1-y2)^2)

What is the name of the method used to handle calculating distance with multiple data points

K-means clustering method

What does k-means mean?

Average distance between clusters

K-Means Algorithm

1.Select K points as the initial centroids
2.repeat
3. Form K clusters by assigning all points to the closest centroid
4.Recompute the centroid of each cluster
5. Until the centroids don't change

What is Manhattan distance?

A distance metric between two points in a N dimensional vector space

Which line represents Manhattan distance?

the blue line

What is manhattan distance often used to calculate the distance of?

integrated circuits where wires only run parallel to the X or Y axis

Manhattan distance is also called_______.

Minkowski's L1 distance

What is Euclidean distance?

The straight line distance between two points.

What formula does euclidean distance take from?

Pythagorean theorem

What type of approach is the k-means clustering method?

Partitional clustering approach

What must be specified in k-means clustering?

Number of clusters(k)

What methods can be used to select k?

Subject-matter knowledge, convenience, constraints, arbitrarily

Hierarchical clustering

Produces a set of nested clusters organized as a hierarchical tree

What can hierarchical clustering be visualized as?

Dendrogram

Dendrogram

A tree-like diagram that records the sequences of merges or splits

What are the strengths of hierarchical clustering?

No assumptions on the number of clusters(any number of clusters can be obtained by cutting the dendrogram at the proper level), they correspond to meaningful taxonomies

What are the two main types of hierarchical clustering?

Agglomerative and Divise

Agglomerative

(bottom up method) starts with the points as individual clusters and each step, merge the closest pair of clusters until only one cluster left

Divisive

(top bottom method) start with one, all-inclusive cluster and at each step, split a cluster until each cluster contains a point.

Examples of Clustering.

Document clustering, marketing, city-planning

What type of learning is clustering?

Unsupervised

Association Rule Mining

Given a set of transactions, find rules that will predict occurrence of an item based on the occurrences of other items in the transaction.

What is the goal of association rule mining?

Finding regularities in data

Example of association rule mining

Target product recommendation

What is the goal of market basket analysis?

To determine the strength of all the association rules among a set of items.

What question does the application of market basket analysis answers?

Which items are likely to be purchased together?

Support

({X,Y} or X-> Y): how often X and Y go together. # of records containing X and Y divided by total # of records.

Confidence

(X -> Y): how often Y go together with X. # of records containing X and Y divided by # of records containing X