FIRST CHAPTER

Beschreibung

FIRST CHAPTER BIG DATA
Richard Xavier
Notiz von Richard Xavier, aktualisiert more than 1 year ago
Richard Xavier
Erstellt von Richard Xavier vor mehr als 2 Jahre
2
0

Zusammenfassung der Ressource

Seite 1

FACTS - CLASS 2 - There are 2.5 quintillion data per day being generated; - 90% of the data generated on the planet was generated in the last two years; - 80% of data is unstructured or in different formats; - The 4 V's of Big Data -> Volume, Variety, Speed, Veracity; - Volume: Data Size; - Variety: Data Format; - Speed: Data Generation; - Veracity: Data Reliability; - Big data and Data Science are not the same things. - Big Data is the raw material x Data Science is a set of techniques for analyzing data. - When applying Data Science to Big Data, Big Data Analytics is obtained. - Data Engineer -> Extract and store all data - Data Scientist - Process and analyze

Seite 2

Storage System - Class 3  - It is important in storing data to know how it will be accessed. - structured data is used data warehouse. - unstructured data is used data lake or data store. - Relational Database has schema defined before data storage - Data Warehouse is created by some technology DBMS - Database Management System or (SGBD in Portuguese). - NoSQL Database cannot be unstructured data. - Data Warehouse - stores a large amount of data from different sources. - The goal is to feed business intelligence (The goal: Alimentar). - DW Benefits: Better Business Analytics, Faster Queries, Data Quality Improvements, Historical View. - Data Lake Benefits: Raw Format Storage, Real-time Data Import. - Data Store Benefits: Flexibility. 

Seite 3

Parallel Storage and Processing - Class 4 - Cluster a set of computers that form a server - Parallel Storage - Distribute storage across multiple servers - Hadoop HDFS- Hadoop Distributed File System cluster manager - You can build a Data Lake that runs on a Cluster and allows you to store large volumes at a low cost. - Job Tracker manages the processing - Task Tracker does the work in the process

Seite 4

Cloud Computing - Class 5 - Delivery of services over the internet

Seite 5

Machine Learning - Class 6 - Area that focuses on using data and algorithms to mimic human learning.  

Zusammenfassung anzeigen Zusammenfassung ausblenden

ähnlicher Inhalt

Managing Digital Data Review
Shannon Anderson-Rush
Spain Studyguide
Selam H
Diagnosis and symptoms of depression
bro-bro-bro123
Veterans Day Facts
joemontin
Python
matsieveki
10 Fun Facts You Never Knew About Australia
Andrea Leyden
Big Data - Hadoop
Pedro J. Plasenc
Al~Mulk (The Sovereignty)
Farah Abid
Al- FATIHA (The Opening)
Farah Abid
Hepatitis A
Rayner JL
Antarctica Facts
Saffron Ellis