Cluster Analysis - Fallen Soldier

Description

Cluster analysis
Tyson Mcleod
Flashcards by Tyson Mcleod, updated more than 1 year ago
Tyson Mcleod
Created by Tyson Mcleod about 6 years ago
12
1

Resource summary

Question Answer
What is clustering? Finding groups of objects such that objects in one group will be similar to one another and different from objects in other groups
Partitional clustering Data objects in non-overlapping subsets (clusters) such that each object is in exactly one subset.
Hierarchical clustering A set of nested clusters organized as a hierarchical tree
Clustering algorithms (3 used in this course) K - means (partitional) Density - based clustering Hierarchical clustering
Clustering distinctions 1. Exclusive versus non-exclusive 2. Fuzzy versus non-fuzzy 3. Partial versus complete 4. Heterogeneous versus homogeneous 1. non-exclusive: points may belong to multiple clusters 2. a point belongs to every cluster with weight between 0 and 1. weights must sum to 1 3. partial cluster: only want to cluster some of the data 4. clusters of widely different sizes, shapes and densities
Centroid (typically) The mean of the points in the cluster
K - means complexity O(n*K*I*d) n = number of points K = number of clusters I = number of iterations d = number of attributes
Well-separated cluster Every point in a cluster is closer to every other point in the cluster, than to any point not in the cluster
Center-based cluster Every point in the cluster is close to the center of the cluster (often the centroid)
Contiguity- based cluster A point in a a cluster is closer to one or more other points in the cluster than to any point not in the cluster
Density-based cluster A cluster is a dense region of points separated by low-density regions from other regions of high density
Shared property clusters (conceptual clusters) Clusters that share some common property or represent a particular concept
Pre-processing and post-processing (clustering) Pre-processing: normalize the data and eliminate outliers Post-processing: 1. eliminate small clusters that may represent outliers. 2.Split loose clusters with high SSE. 3.Merge clusters with low SSE.
Strengths of hierarchical clustering * Desired number of clusters can be obtained by cutting the dendogram. * Meaningful taxonomies e.g. (animal kingdom)
Two main types of hierarchical clustering Agglomerative Divisive
Cluster similarity - MIN or Single Link Similarity of two clusters is based on the two most similar (closest) points in the different clusters
Cluster similarity - MAX or Complete Link Similarity of two clusters is based on the two most different (distant) points in the different clusters
Cluster similarity - Group average Proximity of two clusters is the average of pairwise proximity between points in two clusters
Cluster similarity - Ward's method Similarity of two clusters is based on the increase in squared error when two clusters are merged. Less susceptible to noise.
DBSCAN - density Density = the number of points within a specified radius Eps
DBSCAN - MinPts A point is a core point if it has more than a specified number of points (MinPts) within Eps
DBSCAN - Border A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point.
DBSCAN - Noise A noise point is any point that is not a core point or a border point
DBSCAN - pros and cons Pros: cluster of arbitrary shape, robust to noise, does not need a priori k Cons: required connected regions of sufficiently high density, data sets with varying densities are problematic
Show full summary Hide full summary

Similar

C6 Flash cards
Anna Hollywood
GCSE Maths Symbols, Equations & Formulae
Andrea Leyden
Geography Coasts Questions
becky_e
Command or Process Words for Essay Writing
Bekki
Advantages and Disadvantages of Parliamentary Law making
Sinead Gapp
BIOLOGY HL DEFINITIONS IB
Luisa Mandacaru
med chem 2
lola_smily
Science Additional B3 - Animal and Plant Cells Flashcards
Stirling v
CUBAN MISSILE CRISIS
Olivia Andrews
AQA GCSE Chemistry - C1
Izzy T