FIRST CHAPTER

Description

FIRST CHAPTER BIG DATA
Richard Xavier
Note by Richard Xavier, updated more than 1 year ago
Richard Xavier
Created by Richard Xavier over 2 years ago
2
0

Resource summary

Page 1

FACTS - CLASS 2 - There are 2.5 quintillion data per day being generated; - 90% of the data generated on the planet was generated in the last two years; - 80% of data is unstructured or in different formats; - The 4 V's of Big Data -> Volume, Variety, Speed, Veracity; - Volume: Data Size; - Variety: Data Format; - Speed: Data Generation; - Veracity: Data Reliability; - Big data and Data Science are not the same things. - Big Data is the raw material x Data Science is a set of techniques for analyzing data. - When applying Data Science to Big Data, Big Data Analytics is obtained. - Data Engineer -> Extract and store all data - Data Scientist - Process and analyze

Page 2

Storage System - Class 3  - It is important in storing data to know how it will be accessed. - structured data is used data warehouse. - unstructured data is used data lake or data store. - Relational Database has schema defined before data storage - Data Warehouse is created by some technology DBMS - Database Management System or (SGBD in Portuguese). - NoSQL Database cannot be unstructured data. - Data Warehouse - stores a large amount of data from different sources. - The goal is to feed business intelligence (The goal: Alimentar). - DW Benefits: Better Business Analytics, Faster Queries, Data Quality Improvements, Historical View. - Data Lake Benefits: Raw Format Storage, Real-time Data Import. - Data Store Benefits: Flexibility. 

Page 3

Parallel Storage and Processing - Class 4 - Cluster a set of computers that form a server - Parallel Storage - Distribute storage across multiple servers - Hadoop HDFS- Hadoop Distributed File System cluster manager - You can build a Data Lake that runs on a Cluster and allows you to store large volumes at a low cost. - Job Tracker manages the processing - Task Tracker does the work in the process

Page 4

Cloud Computing - Class 5 - Delivery of services over the internet

Page 5

Machine Learning - Class 6 - Area that focuses on using data and algorithms to mimic human learning.  

Show full summary Hide full summary

Similar

Managing Digital Data Review
Shannon Anderson-Rush
Spain Studyguide
Selam H
Diagnosis and symptoms of depression
bro-bro-bro123
Veterans Day Facts
joemontin
Python
matsieveki
10 Fun Facts You Never Knew About Australia
Andrea Leyden
Big Data - Hadoop
Pedro J. Plasenc
Al~Mulk (The Sovereignty)
Farah Abid
Al- FATIHA (The Opening)
Farah Abid
Hepatitis A
Rayner JL
Antarctica Facts
Saffron Ellis