Cluster Analysis - Fallen Soldier

Descripción

Cluster analysis
Tyson Mcleod
Fichas por Tyson Mcleod, actualizado hace más de 1 año
Tyson Mcleod
Creado por Tyson Mcleod hace alrededor de 6 años
12
1

Resumen del Recurso

Pregunta Respuesta
What is clustering? Finding groups of objects such that objects in one group will be similar to one another and different from objects in other groups
Partitional clustering Data objects in non-overlapping subsets (clusters) such that each object is in exactly one subset.
Hierarchical clustering A set of nested clusters organized as a hierarchical tree
Clustering algorithms (3 used in this course) K - means (partitional) Density - based clustering Hierarchical clustering
Clustering distinctions 1. Exclusive versus non-exclusive 2. Fuzzy versus non-fuzzy 3. Partial versus complete 4. Heterogeneous versus homogeneous 1. non-exclusive: points may belong to multiple clusters 2. a point belongs to every cluster with weight between 0 and 1. weights must sum to 1 3. partial cluster: only want to cluster some of the data 4. clusters of widely different sizes, shapes and densities
Centroid (typically) The mean of the points in the cluster
K - means complexity O(n*K*I*d) n = number of points K = number of clusters I = number of iterations d = number of attributes
Well-separated cluster Every point in a cluster is closer to every other point in the cluster, than to any point not in the cluster
Center-based cluster Every point in the cluster is close to the center of the cluster (often the centroid)
Contiguity- based cluster A point in a a cluster is closer to one or more other points in the cluster than to any point not in the cluster
Density-based cluster A cluster is a dense region of points separated by low-density regions from other regions of high density
Shared property clusters (conceptual clusters) Clusters that share some common property or represent a particular concept
Pre-processing and post-processing (clustering) Pre-processing: normalize the data and eliminate outliers Post-processing: 1. eliminate small clusters that may represent outliers. 2.Split loose clusters with high SSE. 3.Merge clusters with low SSE.
Strengths of hierarchical clustering * Desired number of clusters can be obtained by cutting the dendogram. * Meaningful taxonomies e.g. (animal kingdom)
Two main types of hierarchical clustering Agglomerative Divisive
Cluster similarity - MIN or Single Link Similarity of two clusters is based on the two most similar (closest) points in the different clusters
Cluster similarity - MAX or Complete Link Similarity of two clusters is based on the two most different (distant) points in the different clusters
Cluster similarity - Group average Proximity of two clusters is the average of pairwise proximity between points in two clusters
Cluster similarity - Ward's method Similarity of two clusters is based on the increase in squared error when two clusters are merged. Less susceptible to noise.
DBSCAN - density Density = the number of points within a specified radius Eps
DBSCAN - MinPts A point is a core point if it has more than a specified number of points (MinPts) within Eps
DBSCAN - Border A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point.
DBSCAN - Noise A noise point is any point that is not a core point or a border point
DBSCAN - pros and cons Pros: cluster of arbitrary shape, robust to noise, does not need a priori k Cons: required connected regions of sufficiently high density, data sets with varying densities are problematic
Mostrar resumen completo Ocultar resumen completo

Similar

Juego de 10 Preguntas de Ciencia
maya velasquez
Test de Matemáticas
Diego Santos
Inglés - Verbos Compuestos I (Phrasal Verbs)
Diego Santos
Traducciones de Latín
Diego Santos
Prepara la Selectividad
Diego Santos
Elaboración de mapas mentales
cielom92
SÍNTESIS DE PROTEÍNAS
Juliangelly Beltran Guillot
Hormonas corticotroficas
Néstor León Arbulú
Repaso de Trastornos hidroelectroliticos
Claudia Genoveva Perez Cacho
Repaso de Fisiopatologia Cardiovascular
Claudia Genoveva Perez Cacho
MG - M3M MK3 / MAE
Donuts Donettes