DATA PREPROCESSING

Descripción

Mapa Mental sobre DATA PREPROCESSING, creado por sudhibala93 el 18/02/2015.
sudhibala93
Mapa Mental por sudhibala93, actualizado hace más de 1 año
sudhibala93
Creado por sudhibala93 hace más de 9 años
253
0

Resumen del Recurso

DATA PREPROCESSING
  1. Cleaning
    1. Missing Values

      Nota:

      • Use the attributes mean to fill in the missing value
      • Use the attribute mean for all sample belonging to the same class as the given tupe
      • Use the most probable values to fill in the missing values
      1. Data Cleaning

        Nota:

        • Fill in missing values & correct inconsistencies in the data
        1. Ignore the tuple

          Nota:

          • Class label is missing & not effective
          1. Fill in the missing value manuvally

            Nota:

            • time consuming & may not be feasible
            1. Use global constant to fill in the missing value

              Nota:

              • replace values & same constant
            2. Noisy Data
              1. Binning

                Nota:

                • It smooth sorted data value by consulting its neighborhood
                • smoothing by bin means smoothing by bin medians smoothing by bin boundaries
                1. Clustering

                  Nota:

                  • Similar values organized into groups & outliers may be detected by clustering
                  1. Combained computer & human Inspection

                    Nota:

                    • Outlier may be identified through a combination of computer and human inspection
                    1. Regression

                      Nota:

                      • Data can be smoothed by fitting the data to function
                      • Linear regression
                      • Multiple linear regression
                    2. Inconsistent Data

                      Nota:

                      • Data inconsistencies may be corrected manually using external reference
                      • Knowledge engineering tools may also be used to detect the violation of  known data constrains
                    3. Data Reduction

                      Nota:

                      • It can be applied to obtain a reduced representation of the data set yet closely maintains the integrity of the original data
                      1. Dimensionality Reduction
                        1. Data Compression
                          1. wavelet transforms
                            1. Principal components analysis
                            2. Numerosity Reduction
                              1. Data cube Aggregation
                                1. Strategies
                                  1. Data cube aggregation
                                    1. Dimension reduction

                                      Nota:

                                      • step wise forward selection step wise backward elimination combination of  forward selection & backward elimination decision tree induction
                                      1. Numerosity reduction
                                        1. Histograms
                                          1. Clustering
                                            1. Sampling

                                              Nota:

                                              • SRSWOR of size n SRSWR of size n Cluster sample Stratisfied sample
                                            2. Discretization & hierarchy Generation
                                              1. For numeric data
                                                1. Bining
                                                  1. Histogram & Analysis
                                                    1. Cluster Analysis
                                                      1. Entropy based Discretization
                                                        1. Segmantation by Natural Partitioning
                                                        2. For categorical data
                                                          1. Portion of a hierarchy by explicit data grouping
                                                            1. Partial ordering of attributes explicity at the schema level
                                                              1. Set of attributes,but not their partial orderies
                                                        3. Discretization & Concept hierarchy Generation
                                                          1. For Categorical Data
                                                            1. For Numeric Data
                                                            2. Integration & Transformation
                                                              1. Data Integration

                                                                Nota:

                                                                • It can help improve accuracy & speed of the subsequent mining process
                                                                • Reduce and avoid Redundancies & inconsistencies
                                                                • Detection and resolution of data value conflicts
                                                                1. Data Transformation
                                                                  1. Smoothing

                                                                    Nota:

                                                                    • Remove the noise from data
                                                                    1. Attribute construction

                                                                      Nota:

                                                                      • new attributes are constructed and added to help the mining process
                                                                      1. Aggregation

                                                                        Nota:

                                                                        • Aggregation operation are applied to the data
                                                                        1. Generlization

                                                                          Nota:

                                                                          • Primitive data are replaced by high-level concept through the use of concept hierarchies
                                                                          1. Normalization

                                                                            Nota:

                                                                            • Attribute data are scaled with in small specified range such as  -1.0 to 1.0 or 0.0 to 1.0
                                                                        2. Why Data Preprocessing ?
                                                                          1. Ease of Mining Process
                                                                            1. To Improve the Quality of Data
                                                                            Mostrar resumen completo Ocultar resumen completo

                                                                            Similar

                                                                            GED en Español: Todo lo que necesitas saber
                                                                            Diego Santos
                                                                            AMÉRICA: PAÍSES~CAPITALES...
                                                                            Ulises Yo
                                                                            ESTADO DE FLUJOS DE EFECTIVO
                                                                            Christian Muñoz
                                                                            Test: The Passive voice
                                                                            wendygil_22
                                                                            EVENTOS EN JAVA
                                                                            **CR 7**
                                                                            Ecosystems
                                                                            ricardico55555
                                                                            constitucion de una empresa
                                                                            isabel escobar
                                                                            FARMACOCINETICA
                                                                            sofia collazos
                                                                            =ARTE=...
                                                                            JL Cadenas
                                                                            Test de Radicales 1 sencillo
                                                                            MANUEL LUIS PÉREZ SALAZAR
                                                                            Sistema óseo
                                                                            Laura Mon