DATA PREPROCESSING

Description

Mind Map on DATA PREPROCESSING, created by sudhibala93 on 18/02/2015.
sudhibala93
Mind Map by sudhibala93, updated more than 1 year ago
sudhibala93
Created by sudhibala93 over 9 years ago
253
0

Resource summary

DATA PREPROCESSING
  1. Cleaning
    1. Missing Values

      Annotations:

      • Use the attributes mean to fill in the missing value
      • Use the attribute mean for all sample belonging to the same class as the given tupe
      • Use the most probable values to fill in the missing values
      1. Data Cleaning

        Annotations:

        • Fill in missing values & correct inconsistencies in the data
        1. Ignore the tuple

          Annotations:

          • Class label is missing & not effective
          1. Fill in the missing value manuvally

            Annotations:

            • time consuming & may not be feasible
            1. Use global constant to fill in the missing value

              Annotations:

              • replace values & same constant
            2. Noisy Data
              1. Binning

                Annotations:

                • It smooth sorted data value by consulting its neighborhood
                • smoothing by bin means smoothing by bin medians smoothing by bin boundaries
                1. Clustering

                  Annotations:

                  • Similar values organized into groups & outliers may be detected by clustering
                  1. Combained computer & human Inspection

                    Annotations:

                    • Outlier may be identified through a combination of computer and human inspection
                    1. Regression

                      Annotations:

                      • Data can be smoothed by fitting the data to function
                      • Linear regression
                      • Multiple linear regression
                    2. Inconsistent Data

                      Annotations:

                      • Data inconsistencies may be corrected manually using external reference
                      • Knowledge engineering tools may also be used to detect the violation of  known data constrains
                    3. Data Reduction

                      Annotations:

                      • It can be applied to obtain a reduced representation of the data set yet closely maintains the integrity of the original data
                      1. Dimensionality Reduction
                        1. Data Compression
                          1. wavelet transforms
                            1. Principal components analysis
                            2. Numerosity Reduction
                              1. Data cube Aggregation
                                1. Strategies
                                  1. Data cube aggregation
                                    1. Dimension reduction

                                      Annotations:

                                      • step wise forward selection step wise backward elimination combination of  forward selection & backward elimination decision tree induction
                                      1. Numerosity reduction
                                        1. Histograms
                                          1. Clustering
                                            1. Sampling

                                              Annotations:

                                              • SRSWOR of size n SRSWR of size n Cluster sample Stratisfied sample
                                            2. Discretization & hierarchy Generation
                                              1. For numeric data
                                                1. Bining
                                                  1. Histogram & Analysis
                                                    1. Cluster Analysis
                                                      1. Entropy based Discretization
                                                        1. Segmantation by Natural Partitioning
                                                        2. For categorical data
                                                          1. Portion of a hierarchy by explicit data grouping
                                                            1. Partial ordering of attributes explicity at the schema level
                                                              1. Set of attributes,but not their partial orderies
                                                        3. Discretization & Concept hierarchy Generation
                                                          1. For Categorical Data
                                                            1. For Numeric Data
                                                            2. Integration & Transformation
                                                              1. Data Integration

                                                                Annotations:

                                                                • It can help improve accuracy & speed of the subsequent mining process
                                                                • Reduce and avoid Redundancies & inconsistencies
                                                                • Detection and resolution of data value conflicts
                                                                1. Data Transformation
                                                                  1. Smoothing

                                                                    Annotations:

                                                                    • Remove the noise from data
                                                                    1. Attribute construction

                                                                      Annotations:

                                                                      • new attributes are constructed and added to help the mining process
                                                                      1. Aggregation

                                                                        Annotations:

                                                                        • Aggregation operation are applied to the data
                                                                        1. Generlization

                                                                          Annotations:

                                                                          • Primitive data are replaced by high-level concept through the use of concept hierarchies
                                                                          1. Normalization

                                                                            Annotations:

                                                                            • Attribute data are scaled with in small specified range such as  -1.0 to 1.0 or 0.0 to 1.0
                                                                        2. Why Data Preprocessing ?
                                                                          1. Ease of Mining Process
                                                                            1. To Improve the Quality of Data
                                                                            Show full summary Hide full summary

                                                                            Similar

                                                                            Key Paintings
                                                                            Julia Lee
                                                                            Macbeth - Charcters
                                                                            a.agagon
                                                                            UNIT 1 DIGITAL MEDIA SECTORS AND AUDIENCES
                                                                            carolyn ebanks
                                                                            French diet and health vocab
                                                                            caitlindavies8
                                                                            History - Germany 1918 - 1945
                                                                            Grace Evans
                                                                            AQA Biology 12.1 cellular organisation
                                                                            Charlotte Hewson
                                                                            How the European Union Works
                                                                            Sarah Egan
                                                                            Procedimientos Operacionales
                                                                            Adriana Forero
                                                                            Theory of Knowledge Essay Preparation
                                                                            Derek Cumberbatch
                                                                            Salesforce Admin 201 Test Chunk 4 (91-125)
                                                                            Brianne Wright