Cada pregunta de este test está cronometrada.
Big Data dedicated to
Enterprise Apps
analysis
Requeriments of big data
Procesing unstructured data
Procesing only unstructured data
Bit data addresses
combine multiple unreleated datasets
Another DW to enrich data
BD type of data
machine generate
Only human generate
hidden data
Benefits Big Data
Enterprise app
fault and fraud detection
slow procesing
characteristics of data in big data
complex, variety, volumen, veracity, value
variety, volumen, velocity, veracity, value
incompatibilities, value,velocity,volumen, veracity, many data
characteristics only of data in BigData
value, variety, veracity
velocity, volumen, value
velocity, variety, volumen
veracity
structured data, unstructured data
signal, noise
social network
Human-generated data examples
micro bloggin
web log
sensor data
value
more value more veracity, more value more time
more value less veracity, more value less time
more value more veracity, more value less time
benefits
operational optimization, noise information
scientific discoveries, actionable intelligence
accurate predictions, shipped cloud computing
Dataset
collections or groups of related data, shares the different set of attributes
collections or groups of related data, shares the same set of attributes
discipline of gaining an understanding of data
Data Analysis
process of examining data to find facts, relationships, patterns
analytics
process of gaining an understanding of data
discipline of gaining an undestanding of data by analizing it via multitude techniques
collections or groups of related data
in business-oriented, analytics
results can lower operational costs and facilitate strategic decison-making
help identify the cause of a phenomenon to improve the accuracyof predictions
help strengthen the focus on delivering high quality services by driving down costs
in the scientific domain, analytics
in services-based enviroments, analytics
Business intelligent
process of gaining insigths into the the workings of an enterprise
KPI (Key performance indicators)
utilize the consolidated data contained in data warehouses to run analytical queries
is a measure for gauging success within a particular context
KPI used to
achieve regulatory compliance
measure in Big data
kilometer, megabyte, gigabyte, terabyte, petabyte, exabyte, settabytte, yottabyte
kilobyte, megabyte, gigabyte, terabyte, petabyte, exabyte, settabytte, yottabyte
kilobyte, megabyte, gigabyte, terabyte, petabyte, exabyte, settabytte, youtube
big data emerged from a combination of business needs and technology innovations
true
false
analitcs & data science
For many business, digital mediums have replaced physical mediums as the de facto
Based in opens sorce software tha requires little more than commodity hardware
machine learning algoritms, statistical techniques and data warehousing
digitalization
Lead to an opportunity to collect further secondary data
Collecting and storing more data to potentially find new insigths and gain a competitive edge
collecting and processing large quantities of diverse data has become increasingly affordable
affordable technology & commodity hardware
the madurity of these fields of practice inspired and enabled much of the core functionality
use of commodity hardware makes the adoption of Big Data solutions accessible to business
some examples include on-demand TV and streaming video
social media
Has empowered customers to provide feedback in near-realtime via open and public mediums
Has resulted in massive data streams
are capable of providing highly scalable, on-demand IT resources that can be leased
hyper-connected communities & devices
A an result, business are storing increasing amounts of data on customer interaction
leverage the infrastructure, storage and processing capabilities provided by this enviroments
the broadening coverage of internet and the proliferation of cellar and WI-FI networks .
cloud computing
Businesses are also increasingly interested in incorporating publicly available datasets
Is either directly through online interaction or indirectly through the usage of connected devices
Can be leased dramatically reduces the required up-front investment of Big Data projects
Online Transaction Processing (OLTP)
software system that processes transaction-oriented data
Is a system used for processing data analysis queries
process of loading data from a source system into a target system
batch-processed
OLTP
OLAP
data fully normalized
C R U D with subsecond response times
Online Analytical Processing (OLAP)
Store operational data that is fully normalized
data mining and machine learning processes
Representing a common source of structured analytics input
Can serve as both a data source as well as a data sink that capable of receiving data
Big data analysis results can also be fed back
are used in diagnostic, predictive and prescriptive analytics
An example ticket reservation systems and banking and POS transactions
data that is aggregated and denormalized
OLAP use databases
that store historical data in multidimensional arrays and can answer complex queries
are comprised of simple insert, delete and update operations
that processes transaction-oriented data
An OLAP system is always
fed with data from multiple OLTP system using regular batch processing jobs
data is fully normalized
comprised of simple C R U D
OLAP: denormalized data in the form of cubes
FALSE
TRUE
Extract-transform-load (ETL)
allows the data to be queried during any data analysis tasks that are performes later
Is a process of loading data from a source system into a target system
queries can take several minutes or even longer, depending on the complexity of query.
Extract-transform-load (ETL) source
database, flat file or an application
on-demand TV and streaming video
digitalization and social media
ETL
Represents the main operation through wich data warehouses are fed data
Represents the main operation through wich datasets are fed data
Represents de main operation through wich database are fed data
Extract load transform
Extract transform load
Extract transform leave
ETL type data
Unstructure data, structure data and semi structura data
Only structure data and semi structura data
Only Unstructure data
Data Warehouses EDWH has historical data?
Data Warehouses EDWH
is a subset of the data, that typically belongs to a deparment
is a framework open source
usually interface with an OLAP sYstem to support analytical queries
EDWH
this allows the data to be queried during any data analysis
Heavily used by BI to run various analytical queries
software system that processes transaction-orientes data
EDWH sources
social media, facebook twitter
OLTP, ERP, CRM and SCM systems
OLAP, ERP, CRM and SCM systems
For the amount data contained will continue to increase. The anlysis BI can suffer.
Has established itself as a de facto industry platform for contemporary Big Data solutions
usually contain optimized databases, called analytical databases to handle reporting and data analysis
EDWH: analytical database can´t exist as separate DBMS
Data mart
can have multiple EDWH
based on cleansed data, which is a prerequisite for accurate and error-free reports
hadoop is open source framework for
large data storage
data processing
diagnostic, predictive and prescriptive
run on commodity hardware
denormalized data in the form of cubes
hadoop
has established itself as a de facto industry platform for contemporary Big Data solutions
is a central, enterprise-wide repository
is always fed with data from multiple OLTP system using regular batch processing jobs
hadoop can be used as engine of
hadoop can process large amounts of structured, semi-structured and unstructured data
volumen refers to
insert data
process data
velocity processing
Data volumes can include
Online transaction
batch
scientific and research data
velocity
multiple types of data that need to be supported by Big Data solutions
data translates into the amount of time it takes for the data to be processed
data is processed by Big Data solutions is substantial and usually ever growing
Depending on the data source, velocity may not always be high
variety
Quality or fidelity of data
usefulness of data for an enterprise
refers to the multiple formats and types of data that need to be supported by Big Data Solutions
The appropriate form of data storage
Refers to the quality or fidelity of data
Refers ti the usefulness of data for an enterprise
Noise and SIgnal refers to
volumen
veracityx
refers to quality or fidelity of data
refers to the multiple formats and types of data that need to be supported
refers to usefulness of data for an enterprise
the value is directly related to the veracity in that de higher the data fidelity, the more value it holds for the business.
type of data
structured data, unstructured data, semi-structured data
structured data, unstructured data, semi-structured data, metadata
ERP and CRM are example of
unstructured data
structured data
semi-structured data
image, audio and video files are examples of
strutured data
Unstructured data generally makes up 80%
unstructured data does generally require special or customized logic when it comes to pre-processing and storage
has a defined level of structured and consistency can be relational in nature
has a defined level of structured and consistenc, but cannot be relational in nature
cannot be inheremtly processed or queried using SQL or traditional programming features
semi_structured data
CRM or ERP
XML or , electronic data interchanges, e-mails, spreaddheets, RSS feeds and senso data
image or adio files
metadata
provides information about the analysis
provides information about a datasets characteristics and structure
metadata generally machine generated and utomatically appended to the data
metadata xml tag
provided the author and creation date of a document, file size and resolution of a digital photograph
audio binary file
videobinary file
semi-structured data and unstructured data have a greater noise-to-signal ratio than structured data
can ETL can cleansing data and verification
data analysis
quantitative analysis
cientics analysis
qualitative analysis
data mining
quantifying analysis the patterns and correlations found in the data
phenomrnon in the data
outlayer
qualitative analysis use
numbers
words
graphical
involve analyzing a smaller sample in greater depth
quantitave analysis
analysis that targets large datasets
data mining (data discovery)
descriptions
patterns and correlations
identify patterns and trends
data mining forms the basis for predictive analytics and business intelligence (BI)
analysis tools can automate data analyses
descriptive , diagnostic , predictive , prescriptive
types of analytics
types of analisys
diagnostic
questions about events that have already occurred
diagnostic analytics
descriptive analytics
predictive analytics
reporting or dahsboards. The reports are generally static, queries are executed in OLTP such CRM and ERP
descriptive analytis
preescriptive analytics
determine the causes of a phenomenon that occurred in the past
interactive visualization to identify trends and patterns, and queries are executed in OLAP systems
attemp to determine the outcome of an event that might occur in the future
The focus is on which prescribed option to follow and why and when it should be followed, to gain and advantage or mitigate risk
prescriptive analytics
incorporate internal data (historical etc..) and external data (social media, demographic data)
machine learning
is the process of teaching computers to learn from existing data and apply the acquired knowledge to formulate predictions about unknow data.
is the discipline of teaching computers to learn from existing data and apply the acquired knowledge to formulate predictions about unknow data.
is the framework refers to teaching computers to learn from existing data and apply the acquired knowledge to formulate predictions about unknow data.
based on the input data and categorys
supervised learning
unsupervised learning
the algorithm attemp to categorized data by grouping data with similara attributes together
machine learning makes predictions and identify hidden patterns
machine learning can use the output from data mining for further data classification.
traditional BI utilizes
descriptive and diagnostic
diagnostic and predictive
descriptive an prescriptive
ad-hoc reports and dashboards
tradictional Big Data
traditional BI
ad hoc reporting
the focus is a usually on a specific area of the bussines
the focus is view of key bussines areas in real time or near to real time
big data
facilitate the development of an enterprise-wide understanding of the way a bussines works
focus on indivicual bussines processes
descriptive and diagnostic to facilitate the development of an an enterprise-wide understanding of the way a bussines works
data visualization analytical results are graphically communicated using elements like
charts, maps, data grids, infographics and alerts
ad-hoc reports drill, down
dashboards
data visualization tool in Big Data use
in-disk analytical tchnologies
in-memory analytical technologies
aggregation
groups data across multiple categories to show subtotals and totals
global an sumarized view of data across multiple context
enables a detail view of the data of interest by focusing in on a data subset
visualization features: drill down
visualization features: roll up
data visualization tools for Big data solutions incorporate
diagnostic and descriptive
predictive a descriptive
predictive and prescriptive
in the advance visualizations you needs an ETL