DATA PREPROCESSING

Descrição

Mapa Mental sobre DATA PREPROCESSING, criado por sudhibala93 em 18-02-2015.
sudhibala93
Mapa Mental por sudhibala93, atualizado more than 1 year ago
sudhibala93
Criado por sudhibala93 mais de 9 anos atrás
253
0

Resumo de Recurso

DATA PREPROCESSING
  1. Cleaning
    1. Missing Values

      Anotações:

      • Use the attributes mean to fill in the missing value
      • Use the attribute mean for all sample belonging to the same class as the given tupe
      • Use the most probable values to fill in the missing values
      1. Data Cleaning

        Anotações:

        • Fill in missing values & correct inconsistencies in the data
        1. Ignore the tuple

          Anotações:

          • Class label is missing & not effective
          1. Fill in the missing value manuvally

            Anotações:

            • time consuming & may not be feasible
            1. Use global constant to fill in the missing value

              Anotações:

              • replace values & same constant
            2. Noisy Data
              1. Binning

                Anotações:

                • It smooth sorted data value by consulting its neighborhood
                • smoothing by bin means smoothing by bin medians smoothing by bin boundaries
                1. Clustering

                  Anotações:

                  • Similar values organized into groups & outliers may be detected by clustering
                  1. Combained computer & human Inspection

                    Anotações:

                    • Outlier may be identified through a combination of computer and human inspection
                    1. Regression

                      Anotações:

                      • Data can be smoothed by fitting the data to function
                      • Linear regression
                      • Multiple linear regression
                    2. Inconsistent Data

                      Anotações:

                      • Data inconsistencies may be corrected manually using external reference
                      • Knowledge engineering tools may also be used to detect the violation of  known data constrains
                    3. Data Reduction

                      Anotações:

                      • It can be applied to obtain a reduced representation of the data set yet closely maintains the integrity of the original data
                      1. Dimensionality Reduction
                        1. Data Compression
                          1. wavelet transforms
                            1. Principal components analysis
                            2. Numerosity Reduction
                              1. Data cube Aggregation
                                1. Strategies
                                  1. Data cube aggregation
                                    1. Dimension reduction

                                      Anotações:

                                      • step wise forward selection step wise backward elimination combination of  forward selection & backward elimination decision tree induction
                                      1. Numerosity reduction
                                        1. Histograms
                                          1. Clustering
                                            1. Sampling

                                              Anotações:

                                              • SRSWOR of size n SRSWR of size n Cluster sample Stratisfied sample
                                            2. Discretization & hierarchy Generation
                                              1. For numeric data
                                                1. Bining
                                                  1. Histogram & Analysis
                                                    1. Cluster Analysis
                                                      1. Entropy based Discretization
                                                        1. Segmantation by Natural Partitioning
                                                        2. For categorical data
                                                          1. Portion of a hierarchy by explicit data grouping
                                                            1. Partial ordering of attributes explicity at the schema level
                                                              1. Set of attributes,but not their partial orderies
                                                        3. Discretization & Concept hierarchy Generation
                                                          1. For Categorical Data
                                                            1. For Numeric Data
                                                            2. Integration & Transformation
                                                              1. Data Integration

                                                                Anotações:

                                                                • It can help improve accuracy & speed of the subsequent mining process
                                                                • Reduce and avoid Redundancies & inconsistencies
                                                                • Detection and resolution of data value conflicts
                                                                1. Data Transformation
                                                                  1. Smoothing

                                                                    Anotações:

                                                                    • Remove the noise from data
                                                                    1. Attribute construction

                                                                      Anotações:

                                                                      • new attributes are constructed and added to help the mining process
                                                                      1. Aggregation

                                                                        Anotações:

                                                                        • Aggregation operation are applied to the data
                                                                        1. Generlization

                                                                          Anotações:

                                                                          • Primitive data are replaced by high-level concept through the use of concept hierarchies
                                                                          1. Normalization

                                                                            Anotações:

                                                                            • Attribute data are scaled with in small specified range such as  -1.0 to 1.0 or 0.0 to 1.0
                                                                        2. Why Data Preprocessing ?
                                                                          1. Ease of Mining Process
                                                                            1. To Improve the Quality of Data

                                                                            Semelhante

                                                                            English Basic Grammar
                                                                            Alessandra S.
                                                                            Guia de Estudos para OAB 1a Fase
                                                                            Alessandra S.
                                                                            GESTÃO DE PESSOAS
                                                                            cesarfabr
                                                                            Aprenda a fazer uma boa Redação para concursos Públicos em 5 passos
                                                                            Alessandra S.
                                                                            Controle de Constitucionalidade
                                                                            Carlos Moradore
                                                                            II Guerra Mundial
                                                                            GoConqr suporte .
                                                                            O Segredo para uma Memória Ativa
                                                                            Alice Sousa
                                                                            Direito Constitucional I - Cartões para memorização
                                                                            Silvio R. Urbano da Silva
                                                                            2a Lei de Mendel
                                                                            Andrea Barreto M. Da Poça
                                                                            Trauma - Abordagem inicial
                                                                            Vanessa Palauro