Zusammenfassung der Ressource
Data lake (warehouse)
- Data mining from Big Data (4V's)
- 4 V's
- Volume
- The simpliest
- Variety
- That is what really makes the data Big
- Velocity
- Delays significantly decrease the value of information
- Value
- Finding patterns, clusters, corelations, predicting variable
- ETL (Extract, transform and load)
- Hadoop (open source). Bring algorithm to a data MapReduce
- To load or not to load Data to a Lake?
- Streaming instead of batch update
- Data governance
- Or just liabilities?
- Data lake ->
- Lake may leak
- Data swamp ->
- Toxic swamp
- Information assets
- All kinds of information collected by an organization
- Structural
- Unstructural
- Semi structural information
- Business objectives
- Know your customer (KYC)
- OLAP (olnine analysis processing)
- OLTP (online transaction processing)
- Collect and process the data at the moment of it's creation
- OLA (online action) due to user's activity, patterns, clusters, value prediction