Data Warehousing and Mining

Description

Revision mind map for Data Warehousing and Mining.
i7752068
Mind Map by i7752068, updated more than 1 year ago
i7752068
Created by i7752068 over 10 years ago
233
1

Resource summary

Data Warehousing and Mining
  1. Data Warehousing
    1. Increased corporate productivity.
      1. Competitive advantage.
        1. Potential for high ROI.
          1. Extremely high initial costs (£50k+)
            1. Long development time (3 years +/-)
              1. High demand for memory.
                1. High maintenance costs.
                  1. Problems with source data (extraction, cleaning, loading).
                  2. Building a Data Warehouse Database (Dimensionality Modelling)
                    1. Fact Tables
                      1. Contains facts generated by events in the past.
                        1. Data in tables should be regarded as read only.
                          1. Tables are often very large.
                          2. Dimension Tables
                            1. Contains descriptive textual data.
                              1. Simple primary keys.
                                1. Gives a characteristic star scheme or star join.
                              2. Star Schema
                                1. De-normalising reference data can speed up query performance.
                                  1. Main aim is to avoid data redundancy.
                                    1. This achieved in part via the process of normalisation.
                                    2. OTLP System
                                      1. Automating business saves money.
                                        1. Data could be useful in organisations future operations.
                                          1. Information too detailed.
                                            1. May require information from more than one OTLP system.
                                              1. Difficult to extract information.
                                              2. Snowflake Schema
                                                1. Variant of Star Schema where dimension tables do not contain de-normalised data.
                                                  1. Dimension tables have other dimension tables linked to them via foreign keys.
                                                    1. More than one dimension table can share these "dimension of a dimension" tables.
                                                    2. Starflake Schema
                                                      1. Hybrid structure that contains a mixture of star and snowflake schema's.
                                                        1. Contains both normalised and de-normalised data.
                                                          1. Some dimension tables may be present in both normalised and de-normalised forms.
                                                          2. OLAP Analytical Operations
                                                            1. Consolitation
                                                              1. Involves the aggregation of data, such as "roll ups" e.g. branches can be rolled up to cities, cities to countries etc.
                                                              2. Drill-down
                                                                1. Reverse of consolidation.
                                                                  1. Involves displaying the detailed data that compromises the consolidated data.
                                                                  2. Slicing and Dicing (aka pivoting)
                                                                    1. Ability to view data from different viewpoints.
                                                                      1. One slice may display revenue by type of property within cities.
                                                                        1. Another slice may display revenue by branch office within city.
                                                                          1. Often performed along a time axis to find patterns and trends.
                                                                        2. Data Mining Operations and Techniques
                                                                          1. Predictive Modelling
                                                                            1. Reflect human experience using observations to form a model of the important characteristics of some phenomenon.
                                                                              1. Model developed using a two-phase supervised learning approach.
                                                                                1. The training phase uses a large sample of historical data called a training set to build a model of the important characteristics.
                                                                                  1. The testing phase tests the accuracy and performance of the model on new data.
                                                                                  2. Used in credit approval, customer retention management, direct marketing.
                                                                                  3. Database Segmentation
                                                                                    1. Partition database into an unknown number of segments or clusters of similar records.
                                                                                      1. Results can be displayed on scatterplot.
                                                                                        1. Used in customer profiling and direct marketing.
                                                                                        2. Link Analysis
                                                                                          1. Aims to discover links (called associations) between individual records or groups of records in a database.
                                                                                          2. Anomaly Detection
                                                                                            1. Identifies outliers (expressions of deviation from previously known expectations and norms).
                                                                                              1. Used in detection of credit card and insurance fraud, quality control and defects tracing.
                                                                                            Show full summary Hide full summary

                                                                                            Similar

                                                                                            Transactions
                                                                                            i7752068
                                                                                            Chapter 19 Key Terms
                                                                                            Monica Holloway
                                                                                            Insurance Policy Advisor
                                                                                            Sufiah Takeisu
                                                                                            Marketing Research and Support Systems
                                                                                            Kathleen Keller
                                                                                            Data Mining Part 1
                                                                                            Kim Graff
                                                                                            Chapter 4 Flashcards
                                                                                            Dennis Jameson
                                                                                            Minería de Datos.
                                                                                            Marcos Soledispa
                                                                                            Machine Learning
                                                                                            Alberto Ochoa
                                                                                            Data Mining from Big Data 4V-s
                                                                                            Prohor Leykin
                                                                                            Model Roles
                                                                                            Steve Hiscock
                                                                                            Data Mining Process
                                                                                            Steve Hiscock