Created by Amelie Stute
almost 5 years ago
|
||
Question | Answer |
CAP-Theorem | For any system sharing data it is impossible to guarantee simultaneously all of these three properties - Consistency - Availability - Partition Tolerance |
Availability (CAP-Theorem) | system can run even if parts have failed |
Partition tolerance (CAP-Theorem) | network can break into two or more parts, each with active components that cannot communicate with other parts; overall system can tolerate such situations |
Consistency (CAP-Theorem) | all copies have same value |
Is eventual consistency ok? | If there is an ongoing partition and you want to be available, the compromise eventual consistency is a good idea. If there is no partition going on, the CAP Theorem does not justify the eventual consistency! |
New Data related trends | - New Data sources - (Near) real time ETL - Self-service BI & Data discovery |
New Data Sources | - Physical objects merge with IT (IoT) - Cyber-physical systems (CPS) integrate smart objects and information systems to: o Record data using sensors o (Re-/)Actively interact both with the physical and digital world |
(Near) real-time ETL | The value of data-driven information reduces over time. Motives: - Higher user expectations - Globalization (different time zones) - New types of data sources (stock market prices -> real time info accessible) - Affordable technical realizations |
Self-service BI & data discovery | enable BI users to become more self-reliant and less dependent on the IT organization |
Information Needs Analysis (Components) | - Requested Information - Information supply - Subjective needed information - objective needed information |
Requested information | explicit Management request |
Information supply | all the information that are out there and can be provided |
Objective Information need | actual required to fulfill an organization’s objectives) deducted from the strategy; the decisions that need to be made; external knowledge |
subjective Information need | perceived as required; implicit in his mind |
Operational Objectives & Business Strategy | Operational Objectives enable management to do the right things right. “right things”: defined by business strategy “right”: derived from operational objectives |
DWH Definition Inmon | a collection of -subject oriented (focus on analytical requirements) - integrated (complex effort of joining together data) -nonvolatile (durability of data is ensured, disallowing data modification) -and time-varying (different values for the same information and the time restoring the historical truth of data) data to support management decisions |
Data mart | -Are specialized DWH targeted toward a particular functional area/user group in an organization -Can be either derived from an enterprise DWH or collected directly from data sources -Are easier to build than an enterprise WH |
Disadvantages of Data marts | -No reconcilability of data (Disregarding “single point of truth” (SPOT) principle) - Extract proliferation (Extractvermehrung - ever increasing extracts form the DWH needed) - Change propagation (Änderungsausbreitung - changes in one data mart may echo through all data marts; chances of error quickly grow - Non-extensibility (danger to start form scratch after major organizational changes) |
Design principles |
Image:
Image (binary/octet-stream)
|
Design principle Examples for each step | 1. avoid variations in color that do not encode any meaning 2. grid lines in graphs often represent distracting non-data Pixels 3. Present only information that is relevant for the user 4. Different degrees of visual emphasis |
Designing the multidimensional model (Process) | 1. Choose the Business process 2. Declare the grain 3. Identify the dimensions 4. Identify the facts |
Logical data modeling Schema / Generic DWH schema | - RO – RO comb – RO hier. - Dim – Dim comb – Dim – hier. - Fact – Ratio – Ratio comb |
Advantages Snowflake-schema | + better storage and querying with sparse dimensions (geringe Maße) + reflects the way users think about data |
Disadvantages snowflake-schema | - performance is affected, since more joins need to be performed, when executing queries along hierarchy paths - benefit on normalization is insignificant - more complex structure than star |
Advantages Star Schema | + less joins need to be performed + benefit on normalization is quite significant |
Disadvantages Star-Schema | - querying with larger dimensions - does not reflect the way users think |
Ways to handle slowly changing dimensions | - Overwrite (+ easy; - not for analytical attr) - add new dimension row (+ "reliable workhorse"; - big tables get bigger) - Add new dimension Attributes (+"soft changes" (new & old values relevant; - only for limited No of changes) |
Want to create your own Flashcards for free with GoConqr? Learn more.