Cluster Analysis - Fallen Soldier

Descrição

Cluster analysis
Tyson Mcleod
FlashCards por Tyson Mcleod, atualizado more than 1 year ago
Tyson Mcleod
Criado por Tyson Mcleod aproximadamente 6 anos atrás
12
1

Resumo de Recurso

Questão Responda
What is clustering? Finding groups of objects such that objects in one group will be similar to one another and different from objects in other groups
Partitional clustering Data objects in non-overlapping subsets (clusters) such that each object is in exactly one subset.
Hierarchical clustering A set of nested clusters organized as a hierarchical tree
Clustering algorithms (3 used in this course) K - means (partitional) Density - based clustering Hierarchical clustering
Clustering distinctions 1. Exclusive versus non-exclusive 2. Fuzzy versus non-fuzzy 3. Partial versus complete 4. Heterogeneous versus homogeneous 1. non-exclusive: points may belong to multiple clusters 2. a point belongs to every cluster with weight between 0 and 1. weights must sum to 1 3. partial cluster: only want to cluster some of the data 4. clusters of widely different sizes, shapes and densities
Centroid (typically) The mean of the points in the cluster
K - means complexity O(n*K*I*d) n = number of points K = number of clusters I = number of iterations d = number of attributes
Well-separated cluster Every point in a cluster is closer to every other point in the cluster, than to any point not in the cluster
Center-based cluster Every point in the cluster is close to the center of the cluster (often the centroid)
Contiguity- based cluster A point in a a cluster is closer to one or more other points in the cluster than to any point not in the cluster
Density-based cluster A cluster is a dense region of points separated by low-density regions from other regions of high density
Shared property clusters (conceptual clusters) Clusters that share some common property or represent a particular concept
Pre-processing and post-processing (clustering) Pre-processing: normalize the data and eliminate outliers Post-processing: 1. eliminate small clusters that may represent outliers. 2.Split loose clusters with high SSE. 3.Merge clusters with low SSE.
Strengths of hierarchical clustering * Desired number of clusters can be obtained by cutting the dendogram. * Meaningful taxonomies e.g. (animal kingdom)
Two main types of hierarchical clustering Agglomerative Divisive
Cluster similarity - MIN or Single Link Similarity of two clusters is based on the two most similar (closest) points in the different clusters
Cluster similarity - MAX or Complete Link Similarity of two clusters is based on the two most different (distant) points in the different clusters
Cluster similarity - Group average Proximity of two clusters is the average of pairwise proximity between points in two clusters
Cluster similarity - Ward's method Similarity of two clusters is based on the increase in squared error when two clusters are merged. Less susceptible to noise.
DBSCAN - density Density = the number of points within a specified radius Eps
DBSCAN - MinPts A point is a core point if it has more than a specified number of points (MinPts) within Eps
DBSCAN - Border A border point has fewer than MinPts within Eps, but is in the neighborhood of a core point.
DBSCAN - Noise A noise point is any point that is not a core point or a border point
DBSCAN - pros and cons Pros: cluster of arbitrary shape, robust to noise, does not need a priori k Cons: required connected regions of sufficiently high density, data sets with varying densities are problematic

Semelhante

Matérias para Estudar para o Vestibular
Alice Sousa
Tempos Verbais - Português
GoConqr suporte .
Controle de Constitucionalidade
GoConqr suporte .
Artigo 7° da CF
GoConqr suporte .
Mapa Mental - Como Criar um Mapa Mental
Adelaide Silva A
Livros para Vestibular - Fuvest e Unicamp
GoConqr suporte .
Plano de estudos ENEM - Parte 1 *Humanas
Alice Sousa
Mapa Conceitual com GoConq
Alessandra S.
Mentalidade de Crescimento
GoConqr suporte .
Contextualização da Aula 4 - Gestão - Administração da Carreira Profissional
Fabrícia Assunção
História da Saúde Pública no Brasil
Hilton Soares