DATA PREPROCESSING

Description

Mind Map on DATA PREPROCESSING, created by sudhibala93 on 18/02/2015.
sudhibala93
Mind Map by sudhibala93, updated more than 1 year ago
sudhibala93
Created by sudhibala93 about 10 years ago
260
0
1 2 3 4 5 (0)

Resource summary

DATA PREPROCESSING
  1. Cleaning
    1. Missing Values

      Annotations:

      • Use the attributes mean to fill in the missing value
      • Use the attribute mean for all sample belonging to the same class as the given tupe
      • Use the most probable values to fill in the missing values
      1. Data Cleaning

        Annotations:

        • Fill in missing values & correct inconsistencies in the data
        1. Ignore the tuple

          Annotations:

          • Class label is missing & not effective
          1. Fill in the missing value manuvally

            Annotations:

            • time consuming & may not be feasible
            1. Use global constant to fill in the missing value

              Annotations:

              • replace values & same constant
            2. Noisy Data
              1. Binning

                Annotations:

                • It smooth sorted data value by consulting its neighborhood
                • smoothing by bin means smoothing by bin medians smoothing by bin boundaries
                1. Clustering

                  Annotations:

                  • Similar values organized into groups & outliers may be detected by clustering
                  1. Combained computer & human Inspection

                    Annotations:

                    • Outlier may be identified through a combination of computer and human inspection
                    1. Regression

                      Annotations:

                      • Data can be smoothed by fitting the data to function
                      • Linear regression
                      • Multiple linear regression
                    2. Inconsistent Data

                      Annotations:

                      • Data inconsistencies may be corrected manually using external reference
                      • Knowledge engineering tools may also be used to detect the violation of  known data constrains
                    3. Data Reduction

                      Annotations:

                      • It can be applied to obtain a reduced representation of the data set yet closely maintains the integrity of the original data
                      1. Dimensionality Reduction
                        1. Data Compression
                          1. wavelet transforms
                            1. Principal components analysis
                            2. Numerosity Reduction
                              1. Data cube Aggregation
                                1. Strategies
                                  1. Data cube aggregation
                                    1. Dimension reduction

                                      Annotations:

                                      • step wise forward selection step wise backward elimination combination of  forward selection & backward elimination decision tree induction
                                      1. Numerosity reduction
                                        1. Histograms
                                          1. Clustering
                                            1. Sampling

                                              Annotations:

                                              • SRSWOR of size n SRSWR of size n Cluster sample Stratisfied sample
                                            2. Discretization & hierarchy Generation
                                              1. For numeric data
                                                1. Bining
                                                  1. Histogram & Analysis
                                                    1. Cluster Analysis
                                                      1. Entropy based Discretization
                                                        1. Segmantation by Natural Partitioning
                                                        2. For categorical data
                                                          1. Portion of a hierarchy by explicit data grouping
                                                            1. Partial ordering of attributes explicity at the schema level
                                                              1. Set of attributes,but not their partial orderies
                                                        3. Discretization & Concept hierarchy Generation
                                                          1. For Categorical Data
                                                            1. For Numeric Data
                                                            2. Integration & Transformation
                                                              1. Data Integration

                                                                Annotations:

                                                                • It can help improve accuracy & speed of the subsequent mining process
                                                                • Reduce and avoid Redundancies & inconsistencies
                                                                • Detection and resolution of data value conflicts
                                                                1. Data Transformation
                                                                  1. Smoothing

                                                                    Annotations:

                                                                    • Remove the noise from data
                                                                    1. Attribute construction

                                                                      Annotations:

                                                                      • new attributes are constructed and added to help the mining process
                                                                      1. Aggregation

                                                                        Annotations:

                                                                        • Aggregation operation are applied to the data
                                                                        1. Generlization

                                                                          Annotations:

                                                                          • Primitive data are replaced by high-level concept through the use of concept hierarchies
                                                                          1. Normalization

                                                                            Annotations:

                                                                            • Attribute data are scaled with in small specified range such as  -1.0 to 1.0 or 0.0 to 1.0
                                                                        2. Why Data Preprocessing ?
                                                                          1. Ease of Mining Process
                                                                            1. To Improve the Quality of Data
                                                                            Show full summary Hide full summary

                                                                            0 comments

                                                                            There are no comments, be the first and leave one below:

                                                                            Similar

                                                                            Kwasi Enin - College Application Essay
                                                                            philip.ellis
                                                                            GCSE AQA Chemistry Atomic Structure and Bonding
                                                                            Joseph Tedds
                                                                            Hitler's Chancellorship
                                                                            c7jeremy
                                                                            Romeo and Juliet: Key Points
                                                                            mbennett
                                                                            GCSE AQA Biology - Unit 2
                                                                            James Jolliffe
                                                                            Mitosis
                                                                            Selam H
                                                                            Repaso Revalida PR 2016
                                                                            Rodrigo Lopez
                                                                            Topic
                                                                            TEL Bath
                                                                            Which GoConqr Product is Right for Me?
                                                                            Sarah Egan
                                                                            Unit 1.1 Systems Architecture
                                                                            Mathew Wheatley