Modulo 2 - Big Data Analysis & Technology Concepts

Question

Big Data Analysis

Answer 1

differs from traditional data analysis primarily because of the volume, velocity and variety characteristics of the data it processes

Answer 2

When two variables are considered to be _____________ they are considered to be aligned based on a linear relationship

Answer 3

This means that when one variable changes, the other variable also changes proportionally and constantly

Answer 4

this helps maintain data provenance throughout the big data analysis lifecycle, which helps establish and preserve data accuracy and quality

Answer 5

is needed to organize the task involved with retrieving, processing, producing and repurposing data

Answer 6

therefore, it is advisable to store a verbatim copy of the original dataset before proceeding with the filtering. To save on required storage space, the verbatim copy is compressed before storage

Answer 7

Although the data format may be the same, the data model may be different

Answer 8

depending on the business scope of the analysis project and nature of the business problems being addressed, the requiered datasets and their sources can be internal and/or external to the enerprise

Answer 9

Business Case Evaluation

Answer 10

Data Identification

Answer 11

A/B Testing

Answer 12

An ________ is employed when the comparatively simple data manipulation functions of a query engine are insufficient

Answer 13

Data Adquisition & Filtering

Answer 14

Data Extraction

Answer 15

suggestions commonly pertain to recommending items, such as movies, books, web pages, people, etc.

Answer 16

the processing engine mechanism will often use the ___________ to coordinate data processing across a large number of servers. This way, the processing engine does not require its own coordination logic

Answer 17

Data Validation & Cleansing

Answer 18

Data Aggregation & Representation

Answer 19

The ______ itself is a visual, color-coded representation of data values

Answer 20

the processing engine mechanism will often use the ___________ to coordinate data processing across a large number of servers. This way, the processing engine does not require its own coordination logic

Answer 21

Data Analysis

Answer 22

Data Visualization

Answer 23

A _______ is generally expressed using a line chart, with time plotted on the x-axis and recorded data value plotted on the y-axis

Answer 24

Utilization of Analysis Results

Answer 25

requires that a business case be created, assessed and approved prior to proceeding with the actual hands-on analysis task

Answer 26

helps decision-makers understand the business resources that will need to be utilized and which business challenges the analysis will tackle

Answer 27

Unstructured text is generally much more difficult to analyze and search, compared to structured text

Answer 28

is an example of the application of the law of large numbers

Answer 29

the further identification of KPI during this stage helps determine how closely the data analysis outcome needs to meet the identified goals and objectives

Answer 30

based on the business requirements documented, it can be determined whether the business problems being addressed are really Big Data problems

Answer 31

Applications for ___________ include fraud detection, medical diagnosis, network data analysis and sensor data analysis

Answer 32

can be applied to the categorization of unknown documents, and to personalized marketing campaings by grouping together customers with similar behavior

Answer 33

in order to qualify as a Big Data Problem, a business problem needs to be directly related to one or more of the Big Data characteristics of volume, velocity or variety

Answer 34

Note also that another outcome of this stage is the determination of the underlying budget required to carry out the analysis project

Answer 35

The processing engine enables data to be queried and manipulated in other ways, but to implement this type of functionality requires custom programming

Answer 36

Workflow Engine

Answer 37

any required purchase of tools, hardware, training, etc. need to be understood in advance, so that the anticipated investment can be weighed against the expected benefits of archieving the goals

Answer 38

initial iteration of the big data analysis lifecycle will require more up-front investment of Big Data technologies, products and training compared to later iterations where these earlier investment can be repeatedly leveraged

Answer 39

In the context of traditional data analysis, the ______ states that, starting with a reasonably large sample size, the value obtained from the analysis of additional data decreases as more data is successively added to the original sample

Answer 40

can be applied to the categorization of unknown documents, and to personalized marketing campaings by grouping together customers with similar behavior

Answer 41

is dedicated to identify datasets (and their sources) required for the analysis project

Answer 42

identifying a wider variety of data sources may increase the probability of finding hidden patterns and correlations

Answer 43

Classification, outlier detection, filtering, natural language processing, text analytics and sentiment analysis can utilize ___________

Answer 44

Clustering

Answer 45

it can be beneficial to identify as many types of related data sources and insights as possible, especially when we dont know exactly what we're looking for

Answer 46

depending on the business scope of the analysis project and nature of the business problems being addressed, the requiered datasets and their sources can be internal and/or external to the enerprise

Answer 47

examples of appenden metadata can include dataset size and structure, source information, date and time of creation or collection, language-specific information, etc.

Answer 48

is generally used in data mining to get an understanding of the properties of a given dataset. After developing this understanding, classification can be used to make better predictions about similar, but new or unseen data

Answer 49

a list of available datasets from sources, such as data marts and operational systems, are typically compiled and matched against a predefined dataset specification

Answer 50

A distributed Big Data solution that needs to run on multiple servers relies on the coordination engine mechanism to ensure operational consistency across all of the participating servers

Answer 51

Subsequent to __________ being made available to business users to support business decision-making (such as via dashboard), there may be further oportunities to utilize the __________

Answer 52

In other areas such as the scientific domains, the objective may simply be to observe which version works better in order to improve a process or product

Answer 53

a list of possible third-party data providers (data markets and publicity available datasets), are generally compiled. Some forms of external data may be embedded within blogs or other types of content-based Websites, in which case they may need to be harvested via automated tools

Answer 54

can be applied to the categorization of unknown documents, and to personalized marketing campaings by grouping together customers with similar behavior

Answer 55

Reconciling these differences can require complex logic that is executed automatically without the need for human intervention

Answer 56

A big data _________ utilizes a distributed parallel programming framework that enables it to process very large amounts of data distributed across multiple nodes

Answer 57

the data is gathered from all of the data sources that were identified during the previous stage and is then subjected to the automated filtering of corrupt data or data that has been deemed to have no value to the analysis objectives

Answer 58

depending on the type of data source, data may come as a dump of files (such as data purchased from a third-party data provider), or may require API integration (such as with Twitter)

Answer 59

A ________ engine enables data to be moved in or out big data solution storage devices

Answer 60

A _________ comprises grouped read/writes, with a larger data footprint consisting of complex joins and high-latency responses

Answer 61

In many cases, especially where external, unstructured data is concerned, some or most of the acquired data may be irrelevant (noise) and can be discarded as par of the filtering process

Answer 62

data classified as "corrupt" can include records with missing or nonsensical values or invalid data types

Answer 63

it involves plotting entities as nodes and connections as edges between nodes

Answer 64

OLTP and operational systems (write-intensive) as well as operational BI and analytics (read-intensive), both fall within this category

Answer 65

data that is filtered out for one analysis may possibly be valuable for a different type of analysis

Answer 66

therefore, it is advisable to store a verbatim copy of the original dataset before proceeding with the filtering. To save on required storage space, the verbatim copy is compressed before storage

Answer 67

is an example of the application of the law of large numbers

Answer 68

Coordination Engine

Answer 69

both internal and external data needs to be persisted once it gets generated or enters the enterprise boundary

Answer 70

for batch analytics, this data is persisted to disk prior to analysis

Answer 71

extracting text for text analytics, which requires scans of whole documents, will not be necessary if the underlying Big Data solution can already read the document in its native format directly

Answer 72

is dedicated to determining how and where processed analysis data can be further leveraged

Answer 73

in the case of realtime analytics, the data is analyzed first and then persisted to disk

Answer 74

metadata can be added via automation to data from both internal and external data sources to improve the classification and querying

Answer 75

Also known as offline processing, ________ processing involves processing data in batches and usually imposes delays (resulting in high-latency responses)

Answer 76

is generally applied via the following two approaches: collaborative ____________ and content-based ____________

Answer 77

examples of appenden metadata can include dataset size and structure, source information, date and time of creation or collection, language-specific information, etc.

Answer 78

it is vital that metadata be machine-readable and passed forward along subsequent analysis stages

Answer 79

The ability to analyze massive amounts of data and find useful insights carries little value if the only ones that can interpret the results are the analysis

Answer 80

both version are subjected to an experiment simultaneously, the observations are recorded to determine which version is more successful

Answer 81

this helps maintain data provenance throughout the big data analysis lifecycle, which helps establish and preserve data accuracy and quality

Answer 82

metadata is added through an automated mechanism to data received from both internal and external data sources

Answer 83

any required purchase of tools, hardware, training, etc. need to be understood in advance, so that the anticipated investment can be weighed against the expected benefits of archieving the goals

Answer 84

helps decision-makers understand the business resources that will need to be utilized and which business challenges the analysis will tackle

Answer 85

Some of the data identified as input for the analysis may arrive in a format incompatible with the big data solution

Answer 86

the need to address disparate types of data is more likely with data from external sources

Answer 87

make it possible to develop highly reliable, highly available distributed big data solutions that can be deployed in a cluster

Answer 88

Data needs to be imported before it can be processed by the big data solution

Answer 89

is dedicated to extracting disparate data and transforming it into a format that the underlying big data solution can use for the purpose of the data analysis

Answer 90

the extend of extraction and transformation required depends on the types of analytics and capabilities of the big data solution

Answer 91

provenance can play an important role in determining the accuracy and quality of qustionable data

Answer 92

is closely related to parallel data processing in how the same principle of "divide-and-conquer" is applied

Answer 93

extracting text for text analytics, which requires scans of whole documents, will not be necessary if the underlying Big Data solution can already read the document in its native format directly

Answer 94

further transformation is needed in order to separate the data into two separate fields as required by the big data solution

Answer 95

it can also be used to make predictions about the values of the dependent variable while it is still unknown

Answer 96

However, ___________ is always archieved through physically separate machines that are networked together as a cluster

Answer 97

Invalid data can skew and falsify analysis results

Answer 98

Unlike traditional enterprise data where the data structure is pre-defined and data is pre-validated, data input into big data analyses can be unstructured without any indication of validity

Answer 99

More than one independent variable can be tested at the same time

Answer 100

The _______ essencially acts a resource arbitrator that manages and allocates available resources

Answer 101

its complexity can further make it difficult to arrive at a set of suitable validation constraints

Answer 102

is dedicated to establishing (often complex) validation rules and removing any known invalid data

Answer 103

is closely related to the concept of classificatopm and clustering, although its algorithms focus on finding abnormal values

Answer 104

for batch analytics, ______________ can be achieved via an offline ETL operation

Answer 105

Big data solutions often receive redundant data across different datasets

Answer 106

this redundancy can be exploited to explore interconnected datasets in order to assemble validation parameters and fill in missing valid data

Answer 107

A ___ represents a geographic measure by which different regions are color-coded according to a certain theme

Answer 108

A _________ is a file system that can store large files spread across a cluster

Answer 109

for batch analytics, ______________ can be achieved via an offline ETL operation

Answer 110

The presence of invalid data is resulting in spikes. Although the data appears abnormal, it may be indicative of a new pattern

Answer 111

A _______ is generally expressed using a line chart, with time plotted on the x-axis and recorded data value plotted on the y-axis

Answer 112

for realtime analytics, a more complex in-memory system is required to validate and cleanse the data at the source

Answer 113

provenance can play an important role in determining the accuracy and quality of qustionable data

Answer 114

data that appears to be invalid may still be valuable in that it may possess hidden patterns and trends

Answer 115

No hypothesis or predetermined assumptions are generated

Answer 116

A _______ database is a non-relational database that is highly scalable, fault-tolerant and specifically designed to house unstructured data

Answer 117

Data may be spread across multiple datasets, requiring that datasets be joined together via common fields, in other cases, the same data fields may appear in multiple datasets

Answer 118

Either way, a method of data reconciliation is required or the dataset representing the correct value needs to be determined

Answer 119

Law of Diminishing Marginal Utility

Answer 120

A ______ can be in the form of a chart or a map

Answer 121

is dedicated to integrating multiple datasets together to arrive at a unified view

Answer 122

future dara analysis requirements need to be considered during this stage to help foster data reusability

Answer 123

The ________ mechanism can also be used for support distributed locks, support distributed queues, establish a highly available registry for obtaining configuration information, reliable asynchronous communication between processes that are running on different servers

Answer 124

essentially provides the ability to discover text rather than just search it

Answer 125

performing the stage of data aggregation & representation can become complicated because of differences in this

Answer 126

Reconciling these differences can require complex logic that is executed automatically without the need for human intervention

Answer 127

Within Big Data ________ can first be applied to discover if a relationship exists

Answer 128

both version are subjected to an experiment simultaneously, the observations are recorded to determine which version is more successful

Answer 129

Although the data format may be the same, the data model may be different

Answer 130

are an effective visual analysis technique for expressing patterns, data compositions via part-whole relations and geographic distributions of data

Answer 131

Data Adquisition & Filtering

Answer 132

Hadoop's batch-based data processing fully lends itself to the pay -per-use model of __________, which can reduce operational costs since a typical Hadoop cluster size can range from a few to a few thousand nodes

Answer 133

A value that is labelled differently in two different datasets may mean the same thing

Answer 134

Instead of hard-coding the required learning rules, either supervised or unsupervised machine learning is applied to develop the computer's understanding of the __________

Answer 135

Network Analysis

Answer 136

In other areas such as the scientific domains, the objective may simply be to observe which version works better in order to improve a process or product

Answer 137

The large volumes processed by Big Data solutions can make____________ a time and effort-intensive operation

Answer 138

Whether _____________ is required or not, it is important to understand that the same data can be stored in many different forms. One form may be better suited for a particular type of analysis than another

Answer 139

require processing resources that they request from the resource manager

Answer 140

the data is then analyzed to prove or disprove the hypothesis and provide definitive answers to specific questions

Answer 141

can act as a common denominator that can be used for a range of analysis techniques and projects. This can require establishing a central, standard analysis repository, such as a NoSQL database

Answer 142

A big data _________ utilizes a distributed parallel programming framework that enables it to process very large amounts of data distributed across multiple nodes

Answer 143

The _____ essencially acts a resource arbitrator that manages and allocates available resources

Answer 144

comprise random read/writes that involve fewer joins and require low-latency responses, with a smaller data footprint

Answer 145

is dedicated to carrying out the actual analysis task, which typically involves one or more types of analysis

Answer 146

this stage can be iterative in nature, especially if the _________________ is exploratory so that analysis is repeated until the appropiate pattern or correlation is uncovered

Answer 147

A _______ may internally use a processing engine to process multiple large datasets in parallel

Answer 148

the accuracy and applicability of the patterns and relationships that are found in a large dataset will be higher than that of a smaller dataset

Answer 149

the exploratory analysis approach is explained shortly, along with confirmatory analysis

Answer 150

depending on the type of analytics required, this stage can be as simple as querying a dataset to compute an aggregation for comparision

Answer 151

make it possible to develop highly reliable, highly available distributed big data solutions that can be deployed in a cluster

Answer 152

Correlation, regression, time series analysis, classification, clustering, outlier detection, filtering, natural language processing, text analytics and sentiment analysis are considered forms of ________

Answer 153

it can be as challenging as combining data mining and complex statistical analysis techniques to discover patterns and anomalies, or to generate a statistical or mathematical model to depict relationship between variables

Answer 154

The approach taken when carrying out this stage can be classified as confirmatory analysis or exploratory analysis (the latter is linked to data mining)

Answer 155

The results of completing the _______________ stage provide users with the ability to perform visual analysis, allowing for the discovery of answers to questions that users have not yet even formulated

Answer 156

A given ______ may support either data ingress or egress functions

Answer 157

_____________ is a deductive approach where the cause of phenomenon being investigated is proposed beforehand

Answer 158

the data is then analyzed to prove or disprove the hypothesis and provide definitive answers to specific questions

Answer 159

can be used to determine the number of entities that fall within a certain radius of another entity

Answer 160

can act as a common denominator that can be used for a range of analysis techniques and projects. This can require establishing a central, standard analysis repository, such as a NoSQL database

Answer 161

The proposed cause or assumption is called a ____________

Answer 162

is an item filtering technique based on the collaboration (merging) of users' past behavior

Answer 163

This type of environment is provided by a platform that is comprised of a set of distributed storage and processing technologies

Answer 164

As the amount of digitized documents, e-mails, social media posts and log files increases, business have an increasing need to leverage any value that can be extracted from these forms of semi-structured and unstructured data

Answer 165

the data is then analyzed to prove or disprove the hypothesis and provide definitive answers to specific questions

Answer 166

Data samples are typically used

Answer 167

this information can then be integrated into the decision-making process

Answer 168

Unexpected findings or anomalies are usually ignored since a predetermined cause was assumed

Answer 169

_____________ is an inductive approach that is closely associated to data mining

Answer 170

No hypothesis or predetermined assumptions are generated

Answer 171

can be carried out via the use of supported by correlation, heat maps, time series analysis, network analysis, spatial data analysis, clustering, outlier detection, natural language processing and text analytics

Answer 172

is an item filtering technique based on the collaboration (merging) of users' past behavior

Answer 173

Instead, the data is explored through analysis to develop an understanding of the cause of the phenomenon

Answer 174

Although it may not provide definitive answers, this method provides a general direction that can facilitate the discovery of patterns or anomalies

Answer 175

represents a constant rate of change

Answer 176

Large amounts of data and visual analysis are typically used

Answer 177

The ability to analyze massive amounts of data and find useful insights carries little value if the only ones that can interpret the results are the analysis

Answer 178

is dedicated to using __________________ techniques and tools to graphically communicate the analysis results for efective interpretarion by business users

Answer 179

is the process of finding data that is significantly different from or inconsistent with the rest of the data within a given dataset

Answer 180

for batch analytics, ______________ can be achieved via an offline ETL operation

Answer 181

Business users need to be able to understand the results in order to obtain value from the analysis and subsequently have the ability to provide feedback

Answer 182

The results of completing the _______________ stage provide users with the ability to perform visual analysis, allowing for the discovery of answers to questions that users have not yet even formulated

Answer 183

Large amounts of data and visual analysis are typically used

Answer 184

is an item filtering technique focused on the similarity between users and items

Answer 185

The same results may be presented in a number of different ways, which can influence the interpretation of the results

Answer 186

Consequently, it is important to use the most suitable visualization technique by keeping the business domain in context

Answer 187

Another aspect to keep in mind is that providing a method of drilling down to comparatively simple statistics were generated

Answer 188

The objective is to use graphic representations to develop a deeper understanding of the data being analyzed. Specifically, it helps identify and highlight hiden patterns, correlations and anomalies

Answer 189

Subsequent to __________ being made available to business users to support business decision-making (such as via dashboard), there may be further oportunities to utilize the __________

Answer 190

Natural Language Processing

Answer 191

A ___________ provides the ability to design and process a complex sequence of operations that can be triggered either at set time intervals or when data becomes available

Answer 192

includes both text and speech recognition

Answer 193

is dedicated to determining how and where processed analysis data can be further leveraged

Answer 194

Depending on the nature of the analysis problems being addressed, it is possible for the analysis results to produce "models" that encapsulate new insights and understandings about the nature of the patterns and relationships that exist within the data that was just analyzed

Answer 195

Data Transfer Engine

Answer 196

is generally applied via the following two approaches: collaborative ____________ and content-based ____________

Answer 197

A model look like a mathematical equation or a set of rules

Answer 198

Models can be used to improve business process logic, application system logic and can form the basis of a new system or software program

Answer 199

A distributed Big Data solution that needs to run on multiple servers relies on the coordination engine mechanism to ensure operational consistency across all of the participating servers

Answer 200

new models may be used to improve the programming logic within existing enterprise systems or may form the basis of new systems

Answer 201

An ________ is employed when the comparatively simple data manipulation functions of a query engine are insufficient

Answer 202

the data analysis results may be automatically (or manually) fed directly into enterprise systems to enhance and optimized their behavior and performance

Answer 203

new models may be used to improve the programming logic within existing enterprise systems or may form the basis of new systems

Answer 204

The identified patterns, correlations and anormalies discovered during the data analysis are used to refine business processes

Answer 205

models may also lead to opportunities to improve business process logic

Answer 206

is a computer's ability to comprehend human speech and text as naturally understood by humans

Answer 207

When two variables are considered to be _____________ they are considered to be aligned based on a linear relationship

Answer 208

Data analysis results can be used as input for existing _______ or may form the basis of new _______

Answer 209

this helps maintain data provenance throughout the big data analysis lifecycle, which helps establish and preserve data accuracy and quality

Answer 210

Text Analytics

Answer 211

Recommender systems may also be based on a hybrid of both collaborative _______ and content-based _______ to fine-tune the accuracy and effectiveness of generated suggestions

Answer 212

Statistical Analysis

Answer 213

Visual Analysis

Answer 214

it can be as challenging as combining data mining and complex statistical analysis techniques to discover patterns and anomalies, or to generate a statistical or mathematical model to depict relationship between variables

Answer 215

Note that distributed file systems and databases are both on disk _________ mechanisms

Answer 216

Machine Learning

Answer 217

Semantic Analysis

Answer 218

A _______ database is a non-relational database that is highly scalable, fault-tolerant and specifically designed to house unstructured data

Answer 219

Each node in the _____ has its own dedicated resources such as memory and hard drive and runs its own operating system just like a desktop computer

Answer 220

A/B Testing

Answer 221

Correlation

Answer 222

Unstructured text is generally much more difficult to analyze and search, compared to structured text

Answer 223

Regression

Answer 224

Classification

Answer 225

Clustering

Answer 226

The use of ________ can reduce development time and enables the manipulation of large datasets without the need to write complex programming logic

Answer 227

it is vital that metadata be machine-readable and passed forward along subsequent analysis stages

Answer 228

Outlier Detection

Answer 229

Some propietary ________ also provide specialized data analysis features, such as text analytics and machine log analysis processing

Answer 230

Hadoop's batch-based data processing fully lends itself to the pay -per-use model of __________, which can reduce operational costs since a typical Hadoop cluster size can range from a few to a few thousand nodes

Answer 231

Time series analysis

Answer 232

examples of appenden metadata can include dataset size and structure, source information, date and time of creation or collection, language-specific information, etc.

Answer 233

can act as a common denominator that can be used for a range of analysis techniques and projects. This can require establishing a central, standard analysis repository, such as a NoSQL database

Answer 234

Network Analysis

Answer 235

Spatial Data Analysis

Answer 236

it can be based on either supervised or unsupervised learning

Answer 237

Invalid data can skew and falsify analysis results

Answer 238

suggest that there is no relationship at all between the two variables

Answer 239

Applications of __________ include operations and logistic optimization, environmental sciences and infrastructure planning

Answer 240

Natural Language Processing

Answer 241

Text Analytics

Answer 242

Sentiment Analysis

Answer 243

suggest that there is no relationship at all between the two variables

Answer 244

Applications for ___________ include fraud detection, medical diagnosis, network data analysis and sensor data analysis

Answer 245

Unexpected findings or anomalies are usually ignored since a predetermined cause was assumed

Answer 246

uses statistical methods based on mathematical formulas as a means for analyzing data

Answer 247

this type of analysis is commonly used to describe datasets via summarization, such as providing the mean, median or mode of statistics associated with the dataset

Answer 248

Spatial or geospatial data is commonly used to identify the geographic location of individual entities

Answer 249

The _____ essencially acts a resource arbitrator that manages and allocates available resources

Answer 250

it can also be used to infer patterns and relationships within the dataset, such as regression and correlation

Answer 251

is generally applied via the following two approaches: collaborative ____________ and content-based ____________

Answer 252

is a supervised learning technique by which data is classified into relevant, previously learned categories

Answer 253

We may be further interested in discovering how closely Variables A and B are related, which means we may also want to analyze the extend to which Variable B increases in relation to Variable A's increase

Answer 254

also known as split or bucket testing compares two versions of an element to determine which version is superior based on a pre-defined metric

Answer 255

the element can be a range of things

Answer 256

is expressed as a decimal number between -1 to +1, which is known as the _____________ coefficient

Answer 257

Classification, outlier detection, filtering, natural language processing, text analytics and sentiment analysis can utilize ___________

Answer 258

the current version of the element is called the control version, whereas the modified version is called the treatment

Answer 259

both version are subjected to an experiment simultaneously, the observations are recorded to determine which version is more successful

Answer 260

However, ___________ is always archieved through physically separate machines that are networked together as a cluster

Answer 261

Instead of hard-coding the required learning rules, either supervised or unsupervised machine learning is applied to develop the computer's understanding of the __________

Answer 262

Although __________ can be implemented in almost any domain, it is most often used in marketing

Answer 263

Generally, the objective is to gauge human behavior with the goal of increasing sales

Answer 264

This is a traditional data analysis principle that claims that data held in a reasonably sized dataset provides the maximum value

Answer 265

Either way, a method of data reconciliation is required or the dataset representing the correct value needs to be determined

Answer 266

In other areas such as the scientific domains, the objective may simply be to observe which version works better in order to improve a process or product

Answer 267

A ______ can be in the form of a chart or a map

Answer 268

for batch analytics, ______________ can be achieved via an offline ETL operation

Answer 269

Correlation, regression, time series analysis, classification, clustering, outlier detection, filtering, natural language processing, text analytics and sentiment analysis are considered forms of ________

Answer 270

is an analysis technique used to determine whether two variables are related to each other

Answer 271

if they are found to be related, the next step is to determine what their relationship is

Answer 272

Query Engine

Answer 273

In general, the more learning data the computer has, the more correctly it can decipher human text and speech

Answer 274

We may be further interested in discovering how closely Variables A and B are related, which means we may also want to analyze the extend to which Variable B increases in relation to Variable A's increase

Answer 275

The use of ________ helps to develop and understanding of a dataset and find relationships that can assist in explaining a phenomenon

Answer 276

Network Analysis

Answer 277

Data may be spread across multiple datasets, requiring that datasets be joined together via common fields, in other cases, the same data fields may appear in multiple datasets

Answer 278

Is therefore commonly used for data mining where the identification of relationships between variables in a dataset leads to the discovery of patterns and anomalies

Answer 279

This can reveal the nature of the dataset or the cause of a phenomenon

Answer 280

Migrating to the cloud is logical for enterprises planning to run analytics on datasets that are available via data markets, as most data markets store their data in the cloud such as Amazon S3

Answer 281

Big Data solutions require a distibuted processing environment that can accomodate large-scale data volumes, velocity and variety

Answer 282

When two variables are considered to be _____________ they are considered to be aligned based on a linear relationship

Answer 283

This means that when one variable changes, the other variable also changes proportionally and constantly

Answer 284

items can be ______ either based on a user's own behavior or by matching the behavior of multiple users

Answer 285

Note that a workflow engine may provide integration with a _______ to enable the automated import and export data

Answer 286

is expressed as a decimal number between -1 to +1, which is known as the _____________ coefficient

Answer 287

The degree of relationship changes from being strong to weak when moving from -1 to 0 or +1 to 0

Answer 288

For example, a __________ can execute logic that collects relational data from multiple databases at regular intervals via the data transfer engine mechanism, applies a set of ETL operations via the processing engine mechanism and finally persists the results to a NoSQL storage device

Answer 289

essentially provides the ability to discover text rather than just search it

Answer 290

suggest that there is a strong positive relationship between the two variables

Answer 291

When one variable increases, the other also increases and viceversa

Answer 292

typically involve large quantities of data with sequential read/writes, and comprises a group of read or write queries

Answer 293

Data Extraction

Answer 294

suggest that there is no relationship at all between the two variables

Answer 295

when one increases, the other may stay the same, or increase or decrease arbitrarily

Answer 296

it can also be used to make predictions about the values of the dependent variable while it is still unknown

Answer 297

Generally, the objective is to gauge human behavior with the goal of increasing sales

Answer 298

suggest that there is a strong negative relationship between the two variables

Answer 299

when one variable increases, the other decreases and viceversa

Answer 300

can be carried out via the use of supported by correlation, heat maps, time series analysis, network analysis, spatial data analysis, clustering, outlier detection, natural language processing and text analytics

Answer 301

Therefore, the value of each additional batch does not diminish value; rather, it provides more value

Answer 302

The analysis technique of _________ explores how a dependent variable is related to an independent variable within a dataset

Answer 303

As a sample scenario, __________ could help determine the type of relationship that exists between temperature (independent variable) and crop yield (dependent variable)

Answer 304

In the context of traditional data analysis, the ______ states that, starting with a reasonably large sample size, the value obtained from the analysis of additional data decreases as more data is successively added to the original sample

Answer 305

Data Analysis

Answer 306

Applying this technique helps determine how the value of the dependent variable changes in relation to change in the value of the independent variable

Answer 307

When the independent variable increases, for example, does the dependent variable also increase? If yes, is the increase in a linear or non-linear proportion?

Answer 308

Business Case Evaluation

Answer 309

A _________ comprises grouped read/writes, with a larger data footprint consisting of complex joins and high-latency responses

Answer 310

More than one independent variable can be tested at the same time

Answer 311

However, in such cases only one independent variable may change. The others are kept constant

Answer 312

The results of completing the _______________ stage provide users with the ability to perform visual analysis, allowing for the discovery of answers to questions that users have not yet even formulated

Answer 313

a list of possible third-party data providers (data markets and publicity available datasets), are generally compiled. Some forms of external data may be embedded within blogs or other types of content-based Websites, in which case they may need to be harvested via automated tools

Answer 314

can help enable a better undestanding of what a phenomenon is, and why it occurred

Answer 315

it can also be used to make predictions about the values of the dependent variable while it is still unknown

Answer 316

Users of Big Data solutions can make numerous data processing requests, each of which can have different processing workload requirements

Answer 317

The _____ essencially acts a resource arbitrator that manages and allocates available resources

Answer 318

represents a constant rate of change

Answer 319

it can be as challenging as combining data mining and complex statistical analysis techniques to discover patterns and anomalies, or to generate a statistical or mathematical model to depict relationship between variables

Answer 320

the ______ states that the confidence with which predictions can be made increases as the size of the data that is being analyzed increases

Answer 321

Data samples are typically used

Answer 322

This type of environment is provided by a platform that is comprised of a set of distributed storage and processing technologies

Answer 323

Hadoop's batch-based data processing fully lends itself to the pay -per-use model of __________, which can reduce operational costs since a typical Hadoop cluster size can range from a few to a few thousand nodes

Answer 324

A ________ is a method of storing and organizing data on a storage medium, such as hard drives, DVD´s, and flash drives

Answer 325

represents the variable rate of change

Answer 326

does not imply a causation. The change in the value of one variable may not be responsible for the change in the value of the second variable, although both may change at the same rate

Answer 327

assumes that both variables are independent

Answer 328

Within Big Data ________ can first be applied to discover if a relationship exists

Answer 329

However, ___________ is always archieved through physically separate machines that are networked together as a cluster

Answer 330

deal with already identified dependent and independent variables

Answer 331

implies that there is a degree of causation between the dependent and independent variables that may be direct or indirect

Answer 332

can then be applied to further explore the relationship and predict the values of the dependent variable, based on the known values of the independent variables

Answer 333

is an example of the application of the law of large numbers

Answer 334

is a form of data analysis that involves the graphic representation of data to enable or enhance its visual perception

Answer 335

based on the premise that humans can understand and draw conclusions from graphics more quickly than from text, _______ act as a discovery tool in the field of Big Data

Answer 336

As the amount of digitized documents, e-mails, social media posts and log files increases, business have an increasing need to leverage any value that can be extracted from these forms of semi-structured and unstructured data

Answer 337

Is therefore commonly used for data mining where the identification of relationships between variables in a dataset leads to the discovery of patterns and anomalies

Answer 338

The objective is to use graphic representations to develop a deeper understanding of the data being analyzed. Specifically, it helps identify and highlight hiden patterns, correlations and anomalies

Answer 339

is also directly related to exploratory data analysis, as it encourages the formulation of questions from different angles

Answer 340

Workflow Engine

Answer 341

require processing resources that they request from the resource manager

Answer 342

are an effective visual analysis technique for expressing patterns, data compositions via part-whole relations and geographic distributions of data

Answer 343

they also facilitate the identification of areas of interest and the discovery of extreme (high/low) values within a dataset

Answer 344

Migrating to the cloud is logical for enterprises planning to run analytics on datasets that are available via data markets, as most data markets store their data in the cloud such as Amazon S3

Answer 345

Data analysis results can be used as input for existing _______ or may form the basis of new _______

Answer 346

The ______ itself is a visual, color-coded representation of data values

Answer 347

Each value is given a color according to its type, or the range that it falls under

Answer 348

Solely analyzing operational (structured) data may cause businesses to miss out on cost-saving or business expansion opportunities, especially those that are customer-focused

Answer 349

Applications of __________ include operations and logistic optimization, environmental sciences and infrastructure planning

Answer 350

A ______ can be in the form of a chart or a map

Answer 351

Instead of coloring the whole region, the map may be superimposed by a layer made up of collections of colored shapes representing various regions

Answer 352

suggestions commonly pertain to recommending items, such as movies, books, web pages, people, etc.

Answer 353

Sentiment Analysis

Answer 354

A _____ represents a matrix of values in which each cell is color-coded according to the value

Answer 355

in order to qualify as a Big Data Problem, a business problem needs to be directly related to one or more of the Big Data characteristics of volume, velocity or variety

Answer 356

items can be ______ either based on a user's own behavior or by matching the behavior of multiple users

Answer 357

Big data solutions often receive redundant data across different datasets

Answer 358

A ___ represents a geographic measure by which different regions are color-coded according to a certain theme

Answer 359

Although __________ can be implemented in almost any domain, it is most often used in marketing

Answer 360

NLP, Text analytics and sentiment analysis be used in support of __________

Answer 361

As a sample scenario, __________ could help determine the type of relationship that exists between temperature (independent variable) and crop yield (dependent variable)

Answer 362

Instead of coloring the whole region, the map may be superimposed by a layer made up of collections of colored shapes representing various regions

Answer 363

Data needs to be imported before it can be processed by the big data solution

Answer 364

The data collected for _______ is always time-dependent

Answer 365

Named Entities(person, group, place, company), Pattern-Based Entities(social insurance number, zip code), Concepts (an abstract representation of a entity), Facts (relationship between entities)

Answer 366

is the analysis of data that is recorded over periodic intervals of time

Answer 367

this type of analysis makes use of _________, which is a time-ordered collections of values recorded over regular time intervals

Answer 368

in order to qualify as a Big Data Problem, a business problem needs to be directly related to one or more of the Big Data characteristics of volume, velocity or variety

Answer 369

data that appears to be invalid may still be valuable in that it may possess hidden patterns and trends

Answer 370

helps to uncover patterns within data that are time-dependent. Once identified, the pattern can be extrapolated for future predictions.

Answer 371

are usually used for forecasting by identifying long-term trends, seasonal periodic patterns and irregular short-term variations in the dataset

Answer 372

For example, a __________ can execute logic that collects relational data from multiple databases at regular intervals via the data transfer engine mechanism, applies a set of ETL operations via the processing engine mechanism and finally persists the results to a NoSQL storage device

Answer 373

Big Data solutions can be partially or fully deployed in clouds in order to leverage the storage and computing resources that are available from the cloud provider

Answer 374

Unlike other types of analyses, _________ always includes time as a comparision variable

Answer 375

The data collected for _______ is always time-dependent

Answer 376

Data samples are typically used

Answer 377

is solely based on the similarity between users' behavior, and requires a large amount of user behavior data in order to accurately filter items

Answer 378

A _______ is generally expressed using a line chart, with time plotted on the x-axis and recorded data value plotted on the y-axis

Answer 379

is the specialized analysis of text through the application of data mining, machine learning and natural language processing techniques to extract value out of unstructured text

Answer 380

new models may be used to improve the programming logic within existing enterprise systems or may form the basis of new systems

Answer 381

depending on the type of data source, data may come as a dump of files (such as data purchased from a third-party data provider), or may require API integration (such as with Twitter)

Answer 382

Within the context of visual analysis, a _______ is a interconnected collection of entities

Answer 383

new models may be used to improve the programming logic within existing enterprise systems or may form the basis of new systems

Answer 384

this type of analysis makes use of _________, which is a time-ordered collections of values recorded over regular time intervals

Answer 385

Consequently, it is important to use the most suitable visualization technique by keeping the business domain in context

Answer 386

An ____ can be a person, a group or some other business domain object such as a product

Answer 387

may be connected with another directly or indirectly

Answer 388

is a form of data analysis that involves the graphic representation of data to enable or enhance its visual perception

Answer 389

Also known as online processing, ____________ processing follows an approach whereby data is processed interactively, without delay (resulting in low-latency responses)

Answer 390

Some connections may only be one-way, so that traversal in the reverse direction is not possible

Answer 391

is a technique that focuses on analyzing relationships between entities within the network

Answer 392

For example, a __________ can execute logic that collects relational data from multiple databases at regular intervals via the data transfer engine mechanism, applies a set of ETL operations via the processing engine mechanism and finally persists the results to a NoSQL storage device

Answer 393

provides analysis features more sophisticated than those of heat maps

Answer 394

it involves plotting entities as nodes and connections as edges between nodes

Answer 395

There are specialized variations of __________ include route optimization, social network analysis and spread prediction

Answer 396

Unlike other types of analyses, _________ always includes time as a comparision variable

Answer 397

are based on predictive analytics techniques and therefore are associated with the same analysis techniques as predictive analytics. Additionally, _____ may utilize heat maps, network analysis and spatial data analysis to graphically show various outcomes

Answer 398

focused on analyzing location-based data in order to find different geographic relationship and patterns between entities

Answer 399

Spatial or geospatial data is commonly used to identify the geographic location of individual entities

Answer 400

in order to qualify as a Big Data Problem, a business problem needs to be directly related to one or more of the Big Data characteristics of volume, velocity or variety

Answer 401

The ________ mechanism can also be used for support distributed locks, support distributed queues, establish a highly available registry for obtaining configuration information, reliable asynchronous communication between processes that are running on different servers

Answer 402

is manipulated through a geographical information system (GIS) that plots spatial data on a map generally using its longitude and latitude coordinates

Answer 403

With the ever-increasing availability of location-based data, _________ can be analyzed to gain location insights

Answer 404

is dedicated to establishing (often complex) validation rules and removing any known invalid data

Answer 405

Correlation

Answer 406

Applications of __________ include operations and logistic optimization, environmental sciences and infrastructure planning

Answer 407

Data used as input for_________ can either contain exact locations (longitude,latitude) or the information required to calculate locations (such as zip codes or IP addresses)

Answer 408

the accuracy and applicability of the patterns and relationships that are found in a large dataset will be higher than that of a smaller dataset

Answer 409

helps decision-makers understand the business resources that will need to be utilized and which business challenges the analysis will tackle

Answer 410

provides analysis features more sophisticated than those of heat maps

Answer 411

can be used to determine the number of entities that fall within a certain radius of another entity

Answer 412

the need to address disparate types of data is more likely with data from external sources

Answer 413

is dedicated to establishing (often complex) validation rules and removing any known invalid data

Answer 414

Law of large numbers

Answer 415

Law of Diminishing Marginal Utility

Answer 416

A target user´s past behavior (likes, rating, purchase history, etc.) is collaborated with the behavior of similar users

Answer 417

initial iteration of the big data analysis lifecycle will require more up-front investment of Big Data technologies, products and training compared to later iterations where these earlier investment can be repeatedly leveraged

Answer 418

the ______ states that the confidence with which predictions can be made increases as the size of the data that is being analyzed increases

Answer 419

the accuracy and applicability of the patterns and relationships that are found in a large dataset will be higher than that of a smaller dataset

Answer 420

Data Extraction

Answer 421

is an analysis technique used to determine whether two variables are related to each other

Answer 422

this means that the greater the amount of data available for analysis, the better we become at making correct decisions

Answer 423

Within computing, a ______ is a tightly coupled collection of servers, or nodes. These servers usually have the same hardware specifications and are connected together via network to work as a single unit

Answer 424

Unlike traditional enterprise data where the data structure is pre-defined and data is pre-validated, data input into big data analyses can be unstructured without any indication of validity

Answer 425

Classification

Answer 426

In the context of traditional data analysis, the ______ states that, starting with a reasonably large sample size, the value obtained from the analysis of additional data decreases as more data is successively added to the original sample

Answer 427

This is a traditional data analysis principle that claims that data held in a reasonably sized dataset provides the maximum value

Answer 428

A _______ provides a logical view of the data stored on the storage medium as a tree structure of files and directories

Answer 429

this redundancy can be exploited to explore interconnected datasets in order to assemble validation parameters and fill in missing valid data

Answer 430

The ____ does not apply to Big Data

Answer 431

The greater the volume and variety of data that Big Data solutions can process allows for each additional batch of data to carry greater potential of unearthing new patterns and anomalies

Answer 432

for speech recognition, the system attemps to comprehend the speech and then performs an action, such as transcribing text

Answer 433

Applying this technique helps determine how the value of the dependent variable changes in relation to change in the value of the independent variable

Answer 434

Therefore, the value of each additional batch does not diminish value; rather, it provides more value

Answer 435

Classification

Answer 436

This means that when one variable changes, the other variable also changes proportionally and constantly

Answer 437

is an example of the application of the law of large numbers

Answer 438

is a supervised learning technique by which data is classified into relevant, previously learned categories

Answer 439

the system is fed data that is already categorized or labeled, so that it can develop an understanding of the different categories

Answer 440

therefore, it is advisable to store a verbatim copy of the original dataset before proceeding with the filtering. To save on required storage space, the verbatim copy is compressed before storage

Answer 441

Therefore, the value of each additional batch does not diminish value; rather, it provides more value

Answer 442

the system is fed unknown (but similar) data for classification, based on the understanding it developed

Answer 443

a common application of this technique is for the filtering of e-mail spam. Note that ___________ can be performed for two or more categories

Answer 444

can help enable a better undestanding of what a phenomenon is, and why it occurred

Answer 445

it involves plotting entities as nodes and connections as edges between nodes

Answer 446

in a simplified _____ process, the machine is fed labeled data during training that builds its understanding of the _______. The machine is then fed unlabeled data, which is classifies itself

Answer 447

also known as split or bucket testing compares two versions of an element to determine which version is superior based on a pre-defined metric

Answer 448

A file is an atomic unit of storage used by the _________ to stored data. Files are organizated inside of a directory

Answer 449

The objective is to use graphic representations to develop a deeper understanding of the data being analyzed. Specifically, it helps identify and highlight hiden patterns, correlations and anomalies

Answer 450

is an unsupervised learning technique by which data is divided into different groups so that the data in each group has similar properties

Answer 451

There is no prior learning of categories required; intead, categories are implicity generated based on the data groupings

Answer 452

Big Data solutions can be partially or fully deployed in clouds in order to leverage the storage and computing resources that are available from the cloud provider

Answer 453

Applications include document classification and search, as well as builiding a 360-degree view of a customer by extracting information from a CRM system

Answer 454

How the data is grouped depends on the type of algorithm used. Each algorithm uses a different technique to identify ______

Answer 455

is generally used in data mining to get an understanding of the properties of a given dataset. After developing this understanding, classification can be used to make better predictions about similar, but new or unseen data

Answer 456

Is solely dedicated to individual user preferences and does not require data about other users

Answer 457

A ______ can be in the form of a chart or a map

Answer 458

can be applied to the categorization of unknown documents, and to personalized marketing campaings by grouping together customers with similar behavior

Answer 459

Within computing, a ______ is a tightly coupled collection of servers, or nodes. These servers usually have the same hardware specifications and are connected together via network to work as a single unit

Answer 460

items can be ______ either based on a user's own behavior or by matching the behavior of multiple users

Answer 461

the ______ states that the confidence with which predictions can be made increases as the size of the data that is being analyzed increases

Answer 462

is the process of finding data that is significantly different from or inconsistent with the rest of the data within a given dataset

Answer 463

The machine learning technique is used to identify anomalies, abnormalities and desviation that can be advantageous (such as oportunities) or disadvantageous (such a risk)

Answer 464

The data collected for _______ is always time-dependent

Answer 465

for realtime analytics, a more complex in-memory system is required to validate and cleanse the data at the source

Answer 466

is closely related to the concept of classificatopm and clustering, although its algorithms focus on finding abnormal values

Answer 467

it can be based on either supervised or unsupervised learning

Answer 468

involves the simultaneous execution of multiple sub-tasks that collectivelly comprise a larger task

Answer 469

it can also be used to infer patterns and relationships within the dataset, such as regression and correlation

Answer 470

Applications for ___________ include fraud detection, medical diagnosis, network data analysis and sensor data analysis

Answer 471

suggest that there is a strong positive relationship between the two variables

Answer 472

To the client, a file appears local and can be accessed via multiple locations

Answer 473

is the automated process of finding relevant items from a pool of items

Answer 474

items can be ______ either based on a user's own behavior or by matching the behavior of multiple users

Answer 475

The ____ does not apply to Big Data

Answer 476

Although __________ can be implemented in almost any domain, it is most often used in marketing

Answer 477

is generally applied via the following two approaches: collaborative ____________ and content-based ____________

Answer 478

A common medium by which ________ is implemented is via the use of a recommender system

Answer 479

A given ______ may support either data ingress or egress functions

Answer 480

A ________ generally provides only one of the listed functions

Answer 481

is an item filtering technique based on the collaboration (merging) of users' past behavior

Answer 482

A target user´s past behavior (likes, rating, purchase history, etc.) is collaborated with the behavior of similar users

Answer 483

We may be further interested in discovering how closely Variables A and B are related, which means we may also want to analyze the extend to which Variable B increases in relation to Variable A's increase

Answer 484

There are specialized variations of __________ include route optimization, social network analysis and spread prediction

Answer 485

Based on the similarity of the user´s behavior, items are filtered for the target user

Answer 486

is solely based on the similarity between users' behavior, and requires a large amount of user behavior data in order to accurately filter items

Answer 487

A _____ represents a matrix of values in which each cell is color-coded according to the value

Answer 488

The presence of invalid data is resulting in spikes. Although the data appears abnormal, it may be indicative of a new pattern

Answer 489

is an example of the application of the law of large numbers

Answer 490

A ______ can be in the form of a chart or a map

Answer 491

the system is fed unknown (but similar) data for classification, based on the understanding it developed

Answer 492

_____________ is an inductive approach that is closely associated to data mining

Answer 493

is an item filtering technique focused on the similarity between users and items

Answer 494

A user profile is created based on the user´s past behavior (likes, rating, purchase history, etc.)

Answer 495

Analytics Engine

Answer 496

can be used to determine the number of entities that fall within a certain radius of another entity

Answer 497

The similarities identified between the user profile and the attributes of various items, lead to items being filtered for the user

Answer 498

Is solely dedicated to individual user preferences and does not require data about other users

Answer 499

Applying this technique helps determine how the value of the dependent variable changes in relation to change in the value of the independent variable

Answer 500

However, in such cases only one independent variable may change. The others are kept constant

Answer 501

A recommender system predicts user preferences and generate suggestions for the user accordingly

Answer 502

suggestions commonly pertain to recommending items, such as movies, books, web pages, people, etc.

Answer 503

represents a constant rate of change

Answer 504

can be carried out via the use of supported by correlation, heat maps, time series analysis, network analysis, spatial data analysis, clustering, outlier detection, natural language processing and text analytics

Answer 505

A recommender system typically uses either colaborative _____ or content-based _________ to generate suggestions

Answer 506

Recommender systems may also be based on a hybrid of both collaborative _______ and content-based _______ to fine-tune the accuracy and effectiveness of generated suggestions

Answer 507

Classification, outlier detection, filtering, natural language processing, text analytics and sentiment analysis can utilize ___________

Answer 508

Data analysis results can be used as input for existing _______ or may form the basis of new _______

Answer 509

In order for the machines to extract valuable information, text and speech data needs to be understood by the machines in the same way as humans do. _____ represents practices for extracting meaningful information from textual and speech data

Answer 510

require processing resources that they request from the resource manager

Answer 511

The same results may be presented in a number of different ways, which can influence the interpretation of the results

Answer 512

Instead, the data is explored through analysis to develop an understanding of the cause of the phenomenon

Answer 513

is a computer's ability to comprehend human speech and text as naturally understood by humans

Answer 514

this allows computers to perform a variety of useful task, such as full-text searches

Answer 515

A _______ database generally provides an API-based query interface, rather than the SQL Interface

Answer 516

Each node in the _____ has its own dedicated resources such as memory and hard drive and runs its own operating system just like a desktop computer

Answer 517

Instead of hard-coding the required learning rules, either supervised or unsupervised machine learning is applied to develop the computer's understanding of the __________

Answer 518

In general, the more learning data the computer has, the more correctly it can decipher human text and speech

Answer 519

A distributed Big Data solution that needs to run on multiple servers relies on the coordination engine mechanism to ensure operational consistency across all of the participating servers

Answer 520

functionally can be further grouped into the following categories: event, file, relational

Answer 521

includes both text and speech recognition

Answer 522

for speech recognition, the system attemps to comprehend the speech and then performs an action, such as transcribing text

Answer 523

A user profile is created based on the user´s past behavior (likes, rating, purchase history, etc.)

Answer 524

The processing engine enables data to be queried and manipulated in other ways, but to implement this type of functionality requires custom programming

Answer 525

Unstructured text is generally much more difficult to analyze and search, compared to structured text

Answer 526

is the specialized analysis of text through the application of data mining, machine learning and natural language processing techniques to extract value out of unstructured text

Answer 527

As the amount of digitized documents, e-mails, social media posts and log files increases, business have an increasing need to leverage any value that can be extracted from these forms of semi-structured and unstructured data

Answer 528

Useful insights from text-based data can be gained by helping businesses develop an understanding of the information that is contained within a large body of text

Answer 529

essentially provides the ability to discover text rather than just search it

Answer 530

The basic tenet of ___________ is to turn unstructured text into data that can be searched and analyzed

Answer 531

Analysts working with big data solutions are not expected to know how to program processing engines

Answer 532

comprise random read/writes that involve fewer joins and require low-latency responses, with a smaller data footprint

Answer 533

Solely analyzing operational (structured) data may cause businesses to miss out on cost-saving or business expansion opportunities, especially those that are customer-focused

Answer 534

Applications include document classification and search, as well as builiding a 360-degree view of a customer by extracting information from a CRM system

Answer 535

However, in such cases only one independent variable may change. The others are kept constant

Answer 536

is a form of data analysis that involves the graphic representation of data to enable or enhance its visual perception

Answer 537

generally involves two steps: Parsing text within documents to extract, Categorization of documents using these extracted entities and facts

Answer 538

the extracted information can be used to perform a context-specific search on entities, based on the type of relationship that exists between the entities

Answer 539

identifying a wider variety of data sources may increase the probability of finding hidden patterns and correlations

Answer 540

Similarly, processed data may need to be exported to other systems before it can be used outside of the big data solution

Answer 541

Named Entities(person, group, place, company), Pattern-Based Entities(social insurance number, zip code), Concepts (an abstract representation of a entity), Facts (relationship between entities)

Answer 542

Data Extraction

Answer 543

is generally used in data mining to get an understanding of the properties of a given dataset. After developing this understanding, classification can be used to make better predictions about similar, but new or unseen data

Answer 544

The ________ mechanisms is responsible for processing data (usually retrieved from storage devices) based on pre-defined logic, in order to produce a result

Answer 545

is a specialized form of text analysis that focuses on determining the bias or emotions of individuals

Answer 546

this form of analysis determines the attitude of the author (of the text) by analyzing the text within the context of the natural language

Answer 547

The proposed cause or assumption is called a ____________

Answer 548

In other areas such as the scientific domains, the objective may simply be to observe which version works better in order to improve a process or product

Answer 549

not only provides information about how individuals feel, but also the intensity of their feeling

Answer 550

this information can then be integrated into the decision-making process

Answer 551

Machine Learning

Answer 552

Instead, the data is explored through analysis to develop an understanding of the cause of the phenomenon

Answer 553

Common applications for __________ include early identification of customer satisfaction or dissatisfaction, gauging product sucess or failure and spotting new trends

Answer 554

Utilization of Analysis Results

Answer 555

Generally, the objective is to gauge human behavior with the goal of increasing sales

Answer 556

are usually divided into two types: Batch and Transactional

Answer 557

Correlation and regression are examples of ________. A/B testing can make use of ____________ techniques for results comparision.

Answer 558

Unstructured text is generally much more difficult to analyze and search, compared to structured text

Answer 559

Storage Device

Answer 560

Clustering

Answer 561

NLP, Text analytics and sentiment analysis be used in support of __________

Answer 562

Machine Learning

Answer 563

To the client, a file appears local and can be accessed via multiple locations

Answer 564

For example, a __________ can execute logic that collects relational data from multiple databases at regular intervals via the data transfer engine mechanism, applies a set of ETL operations via the processing engine mechanism and finally persists the results to a NoSQL storage device

Answer 565

can be carried out via the use of supported by correlation, heat maps, time series analysis, network analysis, spatial data analysis, clustering, outlier detection, natural language processing and text analytics

Answer 566

this stage can be iterative in nature, especially if the _________________ is exploratory so that analysis is repeated until the appropiate pattern or correlation is uncovered

Answer 567

An ________ is employed when the comparatively simple data manipulation functions of a query engine are insufficient

Answer 568

metadata is added through an automated mechanism to data received from both internal and external data sources

Answer 569

A/B testing, heat maps and spatial data analysis are considered forms of ____________

Answer 570

There are specialized variations of __________ include route optimization, social network analysis and spread prediction

Answer 571

The output of one workflow can become the input of another workflow

Answer 572

Strategic BI and analytics fall in this category, since they are highly read intensive task involving large volumes of data

Answer 573

Correlation, regression, time series analysis, network analysis and spatial data analysis are considered forms of _________

Answer 574

The workflow logic processed by a _____________ mechanism can involve the participation of other big data mechanism

Answer 575

The ________ mechanisms is responsible for processing data (usually retrieved from storage devices) based on pre-defined logic, in order to produce a result

Answer 576

in order to qualify as a Big Data Problem, a business problem needs to be directly related to one or more of the Big Data characteristics of volume, velocity or variety

Answer 577

Correlation, regression, time series analysis, classification, clustering, outlier detection, filtering, natural language processing, text analytics and sentiment analysis are considered forms of ________

Answer 578

is an unsupervised learning technique by which data is divided into different groups so that the data in each group has similar properties

Answer 579

Named Entities(person, group, place, company), Pattern-Based Entities(social insurance number, zip code), Concepts (an abstract representation of a entity), Facts (relationship between entities)

Answer 580

provenance can play an important role in determining the accuracy and quality of qustionable data

Answer 581

are based on predictive analytics techniques and therefore are associated with the same analysis techniques as predictive analytics. Additionally, _____ may utilize heat maps, network analysis and spatial data analysis to graphically show various outcomes

Answer 582

Applying this technique helps determine how the value of the dependent variable changes in relation to change in the value of the independent variable

Answer 583

This can reveal the nature of the dataset or the cause of a phenomenon

Answer 584

Time series analysis

Answer 585

Classification, outlier detection, filtering, natural language processing, text analytics and sentiment analysis can utilize ___________

Answer 586

A big data _________ utilizes a distributed parallel programming framework that enables it to process very large amounts of data distributed across multiple nodes

Answer 587

uses statistical methods based on mathematical formulas as a means for analyzing data

Answer 588

A user profile is created based on the user´s past behavior (likes, rating, purchase history, etc.)

Answer 589

Clustering, outlier detection, filtering, natural language processing, text analytics and sentiment analysis can utilize ___________

Answer 590

Data analysis results can be used as input for existing _______ or may form the basis of new _______

Answer 591

A _____ represents a matrix of values in which each cell is color-coded according to the value

Answer 592

Models can be used to improve business process logic, application system logic and can form the basis of a new system or software program

Answer 593

Within computing, a ______ is a tightly coupled collection of servers, or nodes. These servers usually have the same hardware specifications and are connected together via network to work as a single unit

Answer 594

Each node in the _____ has its own dedicated resources such as memory and hard drive and runs its own operating system just like a desktop computer

Answer 595

These engines may provide the agent-based processing of inflight data, which enables various data cleasing and transformation activities to be performed in realtime

Answer 596

Unexpected findings or anomalies are usually ignored since a predetermined cause was assumed

Answer 597

In the diagram, a _____ is used to execute a task based on distributed / parallel data processing frameworks

Answer 598

A/B Testing

Answer 599

Big Data solutions require a distibuted processing environment that can accomodate large-scale data volumes, velocity and variety

Answer 600

Migrating to the cloud is logical for enterprises planning to run analytics on datasets that are available via data markets, as most data markets store their data in the cloud such as Amazon S3

Answer 601

A ________ is a method of storing and organizing data on a storage medium, such as hard drives, DVD´s, and flash drives

Answer 602

A file is an atomic unit of storage used by the _________ to stored data. Files are organizated inside of a directory

Answer 603

This can reveal the nature of the dataset or the cause of a phenomenon

Answer 604

The proposed cause or assumption is called a ____________

Answer 605

A _______ provides a logical view of the data stored on the storage medium as a tree structure of files and directories

Answer 606

Operating systems employ ______ for data storage. Each operating system provides support for one or more ________, like NTFS for windows and ext for linux

Answer 607

this form of analysis determines the attitude of the author (of the text) by analyzing the text within the context of the natural language

Answer 608

Within Big Data ________ can first be applied to discover if a relationship exists

Answer 609

A _________ is a file system that can store large files spread across a cluster

Answer 610

To the client, a file appears local and can be accessed via multiple locations

Answer 611

is the process of finding data that is significantly different from or inconsistent with the rest of the data within a given dataset

Answer 612

Natural Language Processing

Answer 613

Examples include the Google File System and Hadoop ________

Answer 614

requires that a business case be created, assessed and approved prior to proceeding with the actual hands-on analysis task

Answer 615

Machine Learning

Answer 616

Data Aggregation & Representation

Answer 617

A _______ database is a non-relational database that is highly scalable, fault-tolerant and specifically designed to house unstructured data

Answer 618

A _______ database generally provides an API-based query interface, rather than the SQL Interface

Answer 619

The use of ________ helps to develop and understanding of a dataset and find relationships that can assist in explaining a phenomenon

Answer 620

when one increases, the other may stay the same, or increase or decrease arbitrarily

Answer 621

However, some _______ databases may also provide a SQL-like query interface

Answer 622

this allows computers to perform a variety of useful task, such as full-text searches

Answer 623

depending on the type of analytics required, this stage can be as simple as querying a dataset to compute an aggregation for comparision

Answer 624

processing engine, storage device, resource manager

Answer 625

involves the simultaneous execution of multiple sub-tasks that collectivelly comprise a larger task

Answer 626

the premise is to reduce the execution time by dividing a single larger task into multiple smaller task

Answer 627

A _________ in Big Data os defined as the amount and nature of data that is processed within a certain amount of time

Answer 628

For example, a __________ can execute logic that collects relational data from multiple databases at regular intervals via the data transfer engine mechanism, applies a set of ETL operations via the processing engine mechanism and finally persists the results to a NoSQL storage device

Answer 629

Although __________ can be archieved through multiple networked machines, it is more typically achieved within the confines of a single machine (multiple processors or cores)

Answer 630

the system is fed data that is already categorized or labeled, so that it can develop an understanding of the different categories

Answer 631

Law of Diminishing Marginal Utility

Answer 632

provides analysis features more sophisticated than those of heat maps

Answer 633

is closely related to parallel data processing in how the same principle of "divide-and-conquer" is applied

Answer 634

However, ___________ is always archieved through physically separate machines that are networked together as a cluster

Answer 635

essentially provides the ability to discover text rather than just search it

Answer 636

This allows large amounts of data to be imported or exported within a short period of time

Answer 637

A _________ in Big Data os defined as the amount and nature of data that is processed within a certain amount of time

Answer 638

are usually divided into two types: Batch and Transactional

Answer 639

includes both text and speech recognition

Answer 640

A distributed Big Data solution that needs to run on multiple servers relies on the coordination engine mechanism to ensure operational consistency across all of the participating servers

Answer 641

Also known as offline processing, ________ processing involves processing data in batches and usually imposes delays (resulting in high-latency responses)

Answer 642

typically involve large quantities of data with sequential read/writes, and comprises a group of read or write queries

Answer 643

also known as split or bucket testing compares two versions of an element to determine which version is superior based on a pre-defined metric

Answer 644

An ____ can be a person, a group or some other business domain object such as a product

Answer 645

Queries can be complex and involve multiple joins

Answer 646

Strategic BI and analytics fall in this category, since they are highly read intensive task involving large volumes of data

Answer 647

Sentiment Analysis

Answer 648

NLP, Text analytics and sentiment analysis be used in support of __________

Answer 649

A _________ comprises grouped read/writes, with a larger data footprint consisting of complex joins and high-latency responses

Answer 650

Spatial Data Analysis

Answer 651

Correlation, regression, time series analysis, network analysis and spatial data analysis are considered forms of _________

Answer 652

Users of Big Data solutions can make numerous data processing requests, each of which can have different processing workload requirements

Answer 653

Also known as online processing, ____________ processing follows an approach whereby data is processed interactively, without delay (resulting in low-latency responses)

Answer 654

involves small amounts of data with random read/writes

Answer 655

Data Validation & Cleansing

Answer 656

A/B Testing

Answer 657

OLTP and operational systems (write-intensive) as well as operational BI and analytics (read-intensive), both fall within this category

Answer 658

Although these workloads contain a mix of read/write queries, they are generally more write-intensive than read-intensive

Answer 659

can act as a common denominator that can be used for a range of analysis techniques and projects. This can require establishing a central, standard analysis repository, such as a NoSQL database

Answer 660

comprise random read/writes that involve fewer joins and require low-latency responses, with a smaller data footprint

Answer 661

is a specialized form of distibuted computing that introduce utilization models for remotely provisioning scalable and measured IT resources

Answer 662

Big Data solutions can be partially or fully deployed in clouds in order to leverage the storage and computing resources that are available from the cloud provider

Answer 663

Data samples are typically used

Answer 664

It can also represent hierarchical values by using color-coded nested rectangles

Answer 665

the clustered processing resources required by Big Data solutions can benefit from the highly scalable and elastic IT resources available on cloud-based environments

Answer 666

Hadoop's batch-based data processing fully lends itself to the pay -per-use model of __________, which can reduce operational costs since a typical Hadoop cluster size can range from a few to a few thousand nodes

Answer 667

is generally applied via the following two approaches: collaborative ____________ and content-based ____________

Answer 668

In short _______ provides the three ingredients required for a big data solutions: input data, computing and storage

Answer 669

IT already possesses the required cloud computing skills

Answer 670

the imput data already exists in the cloud

Answer 671

Correlation, regression, time series analysis, network analysis and spatial data analysis are considered forms of _________

Answer 672

In the context of traditional data analysis, the ______ states that, starting with a reasonably large sample size, the value obtained from the analysis of additional data decreases as more data is successively added to the original sample

Answer 673

Migrating to the cloud is logical for enterprises planning to run analytics on datasets that are available via data markets, as most data markets store their data in the cloud such as Amazon S3

Answer 674

Workflow Engine

Answer 675

not only provides information about how individuals feel, but also the intensity of their feeling

Answer 676

This can reveal the nature of the dataset or the cause of a phenomenon

Answer 677

Big Data solutions require a distibuted processing environment that can accomodate large-scale data volumes, velocity and variety

Answer 678

This type of environment is provided by a platform that is comprised of a set of distributed storage and processing technologies

Answer 679

it can be beneficial to identify as many types of related data sources and insights as possible, especially when we dont know exactly what we're looking for

Answer 680

Is solely dedicated to individual user preferences and does not require data about other users

Answer 681

represents the primary, common components of big data solutions, regardless of the open source or vendor products used for implementation

Answer 682

Storage Device

Answer 683

Instead of coloring the whole region, the map may be superimposed by a layer made up of collections of colored shapes representing various regions

Answer 684

Query Engine

Answer 685

Processing Engine

Answer 686

Resource Manager

Answer 687

Applications of __________ include operations and logistic optimization, environmental sciences and infrastructure planning

Answer 688

is generally used in data mining to get an understanding of the properties of a given dataset. After developing this understanding, classification can be used to make better predictions about similar, but new or unseen data

Answer 689

Data Transfer Engine

Answer 690

Analytics Engine

Answer 691

is closely related to parallel data processing in how the same principle of "divide-and-conquer" is applied

Answer 692

The data collected for _______ is always time-dependent

Answer 693

Workflow Engine

Answer 694

Coordination Engine

Answer 695

Recommender systems may also be based on a hybrid of both collaborative _______ and content-based _______ to fine-tune the accuracy and effectiveness of generated suggestions

Answer 696

There is no prior learning of categories required; intead, categories are implicity generated based on the data groupings

Answer 697

processing engine, storage device, resource manager

Answer 698

storage device, analytics engine, coordination engine

Answer 699

processing engine , query engine, data transfer engine

Answer 700

resource manager, analytics engine, workflow engine

Answer 701

___________ mechanisms provide the underlying data storage environment for persisting the datasets that are processed by big data solutions

Answer 702

A ________ is a method of storing and organizing data on a storage medium, such as hard drives, DVD´s, and flash drives

Answer 703

A _______ can exists as a distibuted file system or a database

Answer 704

The ability to analyze massive amounts of data and find useful insights carries little value if the only ones that can interpret the results are the analysis

Answer 705

Distributed file systems can be used for persisting immutable data that is intended for streaming access or batch processing

Answer 706

Databases, such as NoSQL repositories, can be used for structured and unstructured storage and read/write data access

Answer 707

Note that distributed file systems and databases are both on disk _________ mechanisms

Answer 708

Natural Language Processing

Answer 709

The ________ mechanisms is responsible for processing data (usually retrieved from storage devices) based on pre-defined logic, in order to produce a result

Answer 710

Any data processing that is requested by the big data solution is fulfilled by the __________

Answer 711

Whether _____________ is required or not, it is important to understand that the same data can be stored in many different forms. One form may be better suited for a particular type of analysis than another

Answer 712

Hadoop's batch-based data processing fully lends itself to the pay -per-use model of __________, which can reduce operational costs since a typical Hadoop cluster size can range from a few to a few thousand nodes

Answer 713

A big data _________ utilizes a distributed parallel programming framework that enables it to process very large amounts of data distributed across multiple nodes

Answer 714

require processing resources that they request from the resource manager

Answer 715

Classification

Answer 716

are usually used for forecasting by identifying long-term trends, seasonal periodic patterns and irregular short-term variations in the dataset

Answer 717

Provides support for batch data where processing tasks can take anywhere from minutes to hours to complete. This type of processing engine is considered to have high latency

Answer 718

The identified patterns, correlations and anormalies discovered during the data analysis are used to refine business processes

Answer 719

When one variable increases, the other also increases and viceversa

Answer 720

Operating systems employ ______ for data storage. Each operating system provides support for one or more ________, like NTFS for windows and ext for linux

Answer 721

Provides support for realtime data with sub-seconds response times. This type of processing engine is considered to have low latency

Answer 722

To the client, a file appears local and can be accessed via multiple locations

Answer 723

Migrating to the cloud is logical for enterprises planning to run analytics on datasets that are available via data markets, as most data markets store their data in the cloud such as Amazon S3

Answer 724

depending on the type of data source, data may come as a dump of files (such as data purchased from a third-party data provider), or may require API integration (such as with Twitter)

Answer 725

Users of Big Data solutions can make numerous data processing requests, each of which can have different processing workload requirements

Answer 726

Data that is held in storage can be processed in a variety of ways by a given Big Data solutions and all data processing requests require the allocation of processing resources

Answer 727

it involves plotting entities as nodes and connections as edges between nodes

Answer 728

the system is fed data that is already categorized or labeled, so that it can develop an understanding of the different categories

Answer 729

A _______ acts as a schedules and prioritizes processing requests according to individual processing workload requirements

Answer 730

The _____ essencially acts a resource arbitrator that manages and allocates available resources

Answer 731

A value that is labelled differently in two different datasets may mean the same thing

Answer 732

The proposed cause or assumption is called a ____________

Answer 733

Data needs to be imported before it can be processed by the big data solution

Answer 734

Similarly, processed data may need to be exported to other systems before it can be used outside of the big data solution

Answer 735

this form of analysis determines the attitude of the author (of the text) by analyzing the text within the context of the natural language

Answer 736

Text Analytics

Answer 737

A ________ engine enables data to be moved in or out big data solution storage devices

Answer 738

Unlike other data processing systems where input data conforms to a schema and is mostly structured, data sources for a big data solution tend to include a mix of structured and unstructured data

Answer 739

is dedicated to determining how and where processed analysis data can be further leveraged

Answer 740

A given ______ may support either data ingress or egress functions

Answer 741

functionally can be further grouped into the following categories: event, file, relational

Answer 742

the processing engine mechanism will often use the ___________ to coordinate data processing across a large number of servers. This way, the processing engine does not require its own coordination logic

Answer 743

can help enable a better undestanding of what a phenomenon is, and why it occurred

Answer 744

therefore, it is advisable to store a verbatim copy of the original dataset before proceeding with the filtering. To save on required storage space, the verbatim copy is compressed before storage

Answer 745

A ________ generally provides only one of the listed functions

Answer 746

It is common for multiple diferent ________ to be part a big data solution to facilitate a range of import and export requirements for different types of data

Answer 747

A _______ provides a logical view of the data stored on the storage medium as a tree structure of files and directories

Answer 748

are based on predictive analytics techniques and therefore are associated with the same analysis techniques as predictive analytics. Additionally, _____ may utilize heat maps, network analysis and spatial data analysis to graphically show various outcomes

Answer 749

Event-based __________ generally use a publish-subcribe model based on the use of a queue to ensure high reliability and availability

Answer 750

A file is an atomic unit of storage used by the _________ to stored data. Files are organizated inside of a directory

Answer 751

it can be based on either supervised or unsupervised learning

Answer 752

The data collected for _______ is always time-dependent

Answer 753

These engines may provide the agent-based processing of inflight data, which enables various data cleasing and transformation activities to be performed in realtime

Answer 754

enable the substitution of data that is distributed across a range of sources residing in multiple systems outside of the big data solution

Answer 755

is manipulated through a geographical information system (GIS) that plots spatial data on a map generally using its longitude and latitude coordinates

Answer 756

A _______ may internally use a processing engine to process multiple large datasets in parallel

Answer 757

This allows large amounts of data to be imported or exported within a short period of time

Answer 758

Note that a workflow engine may provide integration with a _______ to enable the automated import and export data

Answer 759

is a computer's ability to comprehend human speech and text as naturally understood by humans

Answer 760

The processing engine enables data to be queried and manipulated in other ways, but to implement this type of functionality requires custom programming

Answer 761

Analysts working with big data solutions are not expected to know how to program processing engines

Answer 762

this form of analysis determines the attitude of the author (of the text) by analyzing the text within the context of the natural language

Answer 763

the extracted information can be used to perform a context-specific search on entities, based on the type of relationship that exists between the entities

Answer 764

The _______ mechanism abstracts the processing engine from end-users by providing a front-end user-interface that can used to query underlying data, along with features for creating query execution plans

Answer 765

Languages that are more familiar and easier to work with (such as SQL) can be used by non-technical users to perform ETL tasks and run ad hoc queries for data analysis activities

Answer 766

this helps maintain data provenance throughout the big data analysis lifecycle, which helps establish and preserve data accuracy and quality

Answer 767

Either way, a method of data reconciliation is required or the dataset representing the correct value needs to be determined

Answer 768

Common processing functions performed by a ______ include sum,average, group by join and sort

Answer 769

Under the hood, the ________ seamlessly transforms user queries into the relevant low-level code that can be used by the processing engine

Answer 770

The use of ________ can reduce development time and enables the manipulation of large datasets without the need to write complex programming logic

Answer 771

based on the business requirements documented, it can be determined whether the business problems being addressed are really Big Data problems

Answer 772

The ________ mechanism is able to process advanced statistical and machine learning algorithms in support of analytics processing requirements, including the identification of patterns and correlations

Answer 773

It generally uses the processing engine mechanism to run algorithms on large datasets.

Answer 774

A _______ database generally provides an API-based query interface, rather than the SQL Interface

Answer 775

A ________ generally provides only one of the listed functions

Answer 776

An ________ is employed when the comparatively simple data manipulation functions of a query engine are insufficient

Answer 777

Some propietary ________ also provide specialized data analysis features, such as text analytics and machine log analysis processing

Answer 778

How the data is grouped depends on the type of algorithm used. Each algorithm uses a different technique to identify ______

Answer 779

This is a traditional data analysis principle that claims that data held in a reasonably sized dataset provides the maximum value

Answer 780

A ___________ provides the ability to design and process a complex sequence of operations that can be triggered either at set time intervals or when data becomes available

Answer 781

The workflow logic processed by a _____________ mechanism can involve the participation of other big data mechanism

Answer 782

Strategic BI and analytics fall in this category, since they are highly read intensive task involving large volumes of data

Answer 783

Law of large numbers

Answer 784

For example, a __________ can execute logic that collects relational data from multiple databases at regular intervals via the data transfer engine mechanism, applies a set of ETL operations via the processing engine mechanism and finally persists the results to a NoSQL storage device

Answer 785

The defined workflows are analogous to a flowchart with control logic (such as decisions, forks, joins) and generally rely on a batch-style processing engine for execution

Answer 786

The output of one workflow can become the input of another workflow

Answer 787

it can be based on either supervised or unsupervised learning

Answer 788

A distributed Big Data solution that needs to run on multiple servers relies on the coordination engine mechanism to ensure operational consistency across all of the participating servers

Answer 789

make it possible to develop highly reliable, highly available distributed big data solutions that can be deployed in a cluster

Answer 790

A model look like a mathematical equation or a set of rules

Answer 791

data that appears to be invalid may still be valuable in that it may possess hidden patterns and trends

Answer 792

the processing engine mechanism will often use the ___________ to coordinate data processing across a large number of servers. This way, the processing engine does not require its own coordination logic

Answer 793

The ________ mechanism can also be used for support distributed locks, support distributed queues, establish a highly available registry for obtaining configuration information, reliable asynchronous communication between processes that are running on different servers

Answer 794

in the case of realtime analytics, the data is analyzed first and then persisted to disk

Answer 795

Big Data solutions require a distibuted processing environment that can accomodate large-scale data volumes, velocity and variety

	Created by Juan Taborda almost 8 years ago

Next up

Modulo 2 - Big Data Analysis & Technology Concepts

Description

Resource summary

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Question 11

Question 12

Question 13

Question 14

Question 15

Question 16

Question 17

Question 18

Question 19

Question 20

Question 21

Question 22

Question 23

Question 24

Question 25

Question 26

Question 27

Question 28

Question 29

Question 30

Question 31

Question 32

Question 33

Question 34

Question 35

Question 36

Question 37

Question 38

Question 39

Question 40

Question 41

Question 42

Question 43

Question 44

Question 45

Question 46

Question 47

Question 48

Question 49

Question 50

Question 51

Question 52

Question 53

Question 54

Question 55

Question 56

Question 57

Question 58

Question 59

Question 60

Question 61

Question 62

Question 63

Question 64

Question 65

Question 66

Question 67

Question 68

Question 69

Question 70

Question 71

Question 72

Question 73

Question 74

Question 75

Question 76