Juan Taborda
Quiz by , created more than 1 year ago

BIG DATA ARCHITECH Informática Quiz on Modulo 10 - Fundamental Big Data Architect, created by Juan Taborda on 31/03/2017.

95
0
0
Juan Taborda
Created by Juan Taborda over 7 years ago
Close

Modulo 10 - Fundamental Big Data Architect

Question 1 of 200

1

Big Data Architech

Select one or more of the following:

  • the term ___________ refers to a technology environment comprised of Big Data mechanisms and technology artifacts that serves as a platform for developing Big Data solutions

  • The ___________ concentrate on the design of the underlying technology architecture of common Big Data platforms

  • Other memory-resident datasets can also be incorporated for performing analytics that provide context-aware results

  • In the context of software systems, a _________ represents the fundamental design of a software system

Explanation

Question 2 of 200

1

Design patterns

Select one or more of the following:

  • are primarly applied through the implementation of big data mechanisms

  • A _________ can be considered a proven design solution to a common design problem

  • Relational Source

  • File-Based Source

Explanation

Question 3 of 200

1

Design patterns

Select one or more of the following:

  • Streaming Source

  • Data Size Reduction

  • Alternative, the member patterns that comprise a __________ can represent a set of related features provided by a particular environment. In this case, a coexistent application of patterns establishes a "solution environment" that may be realized by a combination of tools and technologies

  • The _________ compound pattern represents a fundamental solution evironment comprised of a processing ______ with data ingress, storage, processing and egress capabilities

Explanation

Question 4 of 200

1

Design patterns

Select one or more of the following:

  • Dataset Decomposition

  • Streaming Storage

  • High Volume Binary Storage

  • High Volume Hierarchical Storage

Explanation

Question 5 of 200

1

Design patterns

Select one or more of the following:

  • Relational Sink

  • File-Based Sink

  • Processing Abstraction

  • Automatic Data Replication and Reconstruction

Explanation

Question 6 of 200

1

Design patterns

Select one or more of the following:

  • Automatic Data Sharding

  • Large-Scale Batch Processing

  • To provide efficient storage of such data, the ____________ pattern can be applied to stipulate the use of a storage device in the form of a key-value NoSQL database that services insert, select and delete operations

  • Processing Abstraction

Explanation

Question 7 of 200

1

Design patterns

Select one or more of the following:

  • Complex Logic Decomposition

  • Automated Dataset Execution

  • High Volume Tabular Storage

  • High Volume Linked Storage

Explanation

Question 8 of 200

1

Design patterns

Select one or more of the following:

  • represent proven solutions to common problems

  • Big data ________ are (partially or entirely) applied by implementing different combinations of Big Data Mechanism

  • security layer

  • Note that the analysis layer may also encompass some level of data visualization features

Explanation

Question 9 of 200

1

Compound Pattern

Select one or more of the following:

  • is a coarse-grained pattern comprised of a set of finer-grained patterns singled out in this catalog are some of the more common and important combinations of the patterns, each of which is classified as a __________

  • Each _________ is represented by a hierarchy comprised of core (required) member patterns and extension (optional) patterns

  • However, with the passage of time, as more Big Data solutions are built and their complexity increases, additional Big Data mechanisms are introduced

  • Random Access Storage

Explanation

Question 10 of 200

1

Compound Pattern

Select one or more of the following:

  • Core patterns are connected via solid lines, and extension patterns are connected via dashed lines

  • document the effects of applying multiple patterns together

  • A _________ can represent a set of patterns that are applied together in order to establish a specific set of design characteristics. This would be referred to as joint application

  • Alternative, the member patterns that comprise a __________ can represent a set of related features provided by a particular environment. In this case, a coexistent application of patterns establishes a "solution environment" that may be realized by a combination of tools and technologies

Explanation

Question 11 of 200

1

Compound Pattern

Select one or more of the following:

  • Nested ___________ also exist. The Random Access Storage and Streaming Access Storage _________ are part of the Poly Storage pattern

  • Big Data Pipeline

  • Intermediate Results Storage (optional)

  • Poly Sink

Explanation

Question 12 of 200

1

Compound Pattern

Select one or more of the following:

  • Poly Source

  • Poly Storage

  • The architecture of a set of Big Data Mechanisms assembled into a solution

  • Big Data Processing Environment

Explanation

Question 13 of 200

1

Compound Pattern

Select one of the following:

  • are comprised of specific combinations of core (required) and extension(optional) member patterns

  • It helps define policies for: acquiring data from internal and external sources, which fields need to be anonymized/removed/encrypted, what consitutes personally identifiable information, how processed data should be persisted, the publication of the analytics' results and how long the data should be stored

  • When a event data transfer engine is used, the ingested data can normally be filtered out in-flight via the removal of unwanted or corrupt data

  • The extend to which an enterprise can benefit from a Big Data Solution is limited when it is deployed in isolation from the rest of the traditional enterprise systems

Explanation

Question 14 of 200

1

Big Data technology artifacts

Select one of the following:

  • Each pattern is asociated with one or more mechanism that represent common_______

  • visualization layer

  • Correct access levels need to be consistently configured across all required resources, such as a storage device or a processing engine

  • It is important to assess the interoperability and extensibility of each ________ so that upgrading a single ____________ does not impact the funcionality of other ______

Explanation

Question 15 of 200

1

Pattern-Mechanism Associations

Select one or more of the following:

  • A mechanism may either be implemented when the pattern is applied, or it may be directly affected by the application of the pattern

  • Not all of ________ may be required to apply the pattern. Sometimes one or more associated mechanisms may act as an alternative to others for how the pattern is applied

  • For regular exports, the file data transfer engine can be configurated via a workflow engine to run at regular intervals

  • Correspondingly, an ____________ (apart from other specifications) generally includes the specifications of multiple component architectures

Explanation

Question 16 of 200

1

Pattern-Mechanism Associations

Select one or more of the following:

  • The application of a pattern may not be limited to the use of its associated mechanisms. Other required components or artifacts are explained as part of the pattern descriptions

  • Note also that mechanisms are not associated with compound patterns

  • However, when performing deep analytics, such as in the case of predictive and prescriptive analytics, an analytics engine also exists

  • Due to its enterprise-wide focus, the _____________ can also be considered as a reference point for understanding what constitutes the enterprise infraestructure

Explanation

Question 17 of 200

1

Mechanisms

Select one of the following:

  • represent technology artifacts that can be combined to form Big Data architectures

  • A ________ provides the ability to compress and decompress data in big data platform

  • As a result, the corresponding architecture of the __________ also gets complicated

  • The ___________ provides an opportunity for developing an understanding of the analysis results in a graphical manner

Explanation

Question 18 of 200

1

Big Data Mechanisms

Select one or more of the following:

  • Serialization Engine

  • Compression Engine

  • Visualization Engine

  • Relational Sink

Explanation

Question 19 of 200

1

Big Data Mechanisms

Select one or more of the following:

  • Security Engine

  • Cluster Manager

  • Data Governance Manager

  • Productivity Portal

Explanation

Question 20 of 200

1

Serialization

Select one or more of the following:

  • is the process of transforming objects or data entities into bytes for persistence (in memory or on disk) or transportation from one machine to another over a network

  • In Big Data platforms, ________ is required for establishing communication between machines by exchanging messages between them, and for persisting data

  • To make data processing easier by not having to deal with the intricacies of processing engines, the ________ pattern can be applied, which uses a query engine to abstract away the underlying processing engine

  • The ________ bytes can either be encoded using a binary format or a plain-text format

Explanation

Question 21 of 200

1

Deserialization

Select one of the following:

  • The opposite transformation process from bytes to objects or data entities is called _________

  • The _______ pattern is normally applied together with the Large-Scale Batch Processing pattern as part of a complete solution

  • The _________ pattern is primarily implemented by using an event data transfer engine that is built on a publish-subscribe model and further uses a queue to ensure availability and reliability

  • A documentation of the _________ further helps to ascertain which maturity level of the analytics the enterprise is currently at

Explanation

Question 22 of 200

1

Serialization Engine

Select one or more of the following:

  • The ________ provides the ability to serialize and deserialize data in a Big Data platform

  • Different ____________ may provide different levels of speed, extensibility and interoperability

  • For example, a data processing job needs to be partitioned first to act on sub-groups of data. After processing, the results from each partition need to be consolidated together

  • A file data transfer engine is generally used to implement this design pattern that can further be encapsulated via the productivity portal

Explanation

Question 23 of 200

1

Serialization Engine

Select one or more of the following:

  • Ideally, a __________ should serialize/deserialize data at a fast speed, be amenable to future changes and work with a variety of data producers and consumers

  • These goals are achieved in part by serializing and deserializing data into and out of non-proprietary formats, such as XML, JSON and BSON

  • Generally, these multiple processing runs are connected together using the provided functionality within the processing engine or through further application of the Automated Dataset Execution pattern

  • The ___________ pattern is associated with the data transfer engine (relational), storage device, workflow engine and productivity portal mechanisms

Explanation

Question 24 of 200

1

Compression

Select one of the following:

  • is the process of compacting data in order to reduce its size, whereas decompression is the process of uncompacting data in order to bring the data back to its original size

  • The _________ compound pattern represents a part of a Big Data solution environment capable of storing high-volume and high-variety data and make-it available for indexed, ___________

  • The ______ consists of the storage device(s) that store the acquired data and generally consist of a distributed file system and a least one NoSQL database

  • A ____________ represents the design of a single software program, representing a module, is structured within a modular/distributed software environment

Explanation

Question 25 of 200

1

Compression Engine

Select one or more of the following:

  • A ________ provides the ability to compress and decompress data in big data platform

  • In Big Data environments, there is a requirement to acquire and store as much data as possible in order to derive the largest potential value from analysis.

  • governance layer

  • Intermediate Results Storage (optional)

Explanation

Question 26 of 200

1

Compression Engine

Select one or more of the following:

  • However, if data is stored in an uncompressed form, the available storage space may not be effienciently utilized

  • As a result, data compression can be used to effectively increase the storage capacity of disk/memory space. In turn, this helps to reduce storage cost

  • Due to the aforementioned reasons, a Big Data solution is generally integrated with the rest of the enterprise IT systems in order to provide maximum value

  • By using a point and click interface to work with the Big Data platform, the _______ makes it easier and quicker to populate the Big Data platform with the required data, manage and process that data and export the processed results

Explanation

Question 27 of 200

1

Security Engine

Select one or more of the following:

  • A Big Data platform is a combination of multiple resources, which are provided by different mechanism, each with its own unique security configuration, This requires developing and maintaining separate security policies and API-based integration

  • In a clustered environment, this can become cumbersome and difficult to maintain, especially where data is accessed across the enterprise with varying levels of authorization

  • is the process of transforming objects or data entities into bytes for persistence (in memory or on disk) or transportation from one machine to another over a network

  • However, a __________ goes beyond data curation and includes data processing, analysis and visualization technologies as well

Explanation

Question 28 of 200

1

Security Engine

Select one or more of the following:

  • Correct access levels need to be consistently configured across all required resources, such as a storage device or a processing engine

  • Instead of configuring access for each resource individually, a ___________ can be used

  • A ______ acts as a single point of contact for securing a Big Data platform, providing authentication, authorization and access auditing features

  • It acts as a perimeter guard for the cluster with centralized security policy declaration and management, enabling role-based security

Explanation

Question 29 of 200

1

Security Engine

Select one or more of the following:

  • For enhanced data access security, a ________ may provide fine-grained control over how data is accessed from a range of storage devices and assist with addressing regulatory compliance concerns

  • A _______ may also integrate with enterprise identity and access management (IAM) systems to enable single sign-on (SSO)

  • Futhermore, _________ provide data confidentiality by enabling data encryption for at-rest (data stored in a storage device) and in-motion (data in transit over a network) data

  • It helps define policies for: acquiring data from internal and external sources, which fields need to be anonymized/removed/encrypted, what consitutes personally identifiable information, how processed data should be persisted, the publication of the analytics' results and how long the data should be stored

Explanation

Question 30 of 200

1

Cluster Manager

Select one or more of the following:

  • A Big Data platform generally exists as a cluster-based environment ranging from a few to a large number of nodes

  • Due to the multi-node nature of such an environment, the provisioning, configuration,day-to-day, management and health monitoring of a cluster can be a daunting task

  • Cloud-based Big Data Storage (optional)

  • High Volume Binary Storage

Explanation

Question 31 of 200

1

Cluster Manager

Select one or more of the following:

  • A ________ provides centralized management of a cluster, enabling streamlined deployment of core services over the cluster and their subsequent monitoring

  • In the context of a Big Data platform, a service, such as MapReduce (a processing engine) or HDFS (a distributed file system), refers to a background process that executes a Big Data mechanism

  • Instead of individually installing, managing and monitoring services on each node, a ________ provides a dashboard from where these task can be centrally performed via simple mouse click instead of the authoring and running of command line scripts

  • A _______ provides a centralized view to monitor cluster health, services status and resource utilization. It also provides for the configuration of various node-level and cluster-level alerts

Explanation

Question 32 of 200

1

Cluster Manager

Select one or more of the following:

  • the ______ supports the deployment of new services and the addition of nodes to a cluster

  • the _______ further help reduce cluster administration overhead and makes diagnosis more efficient

  • Note that the ____________ pattern also covers the acquisition of semi-structured data, such as XML or JSON-formatted data

  • Based on the type and location of the data sources, this layer may consist of more than one data transfer engine mechanism

Explanation

Question 33 of 200

1

Cluster Manager

Select one or more of the following:

  • For instance, the _______ makes it quicker and easier to find out why a particular service responsible for a specific storage device is not running

  • The net effect is streamlined _____________ that minimizes cluster downtime and enables reliable and timely Big Data analysis

  • Futhermore, the _________ can integrate with other infraestructure management tools to provide a unified view and the ability to perform performance tuning

  • This pattern is associated with the storage device (column-family) and serialization engine mechanisms

Explanation

Question 34 of 200

1

Data Governance Manager

Select one or more of the following:

  • controls the management of the data lifecycle to ensure that quality data is available in a controlled, secure and timely fashion

  • helps ensure regulatory compliance, risk management and the establishment of data lineage

  • The _________ provides an understanding of the range of datasets used by different Big Data solutions

  • One of the main differentiating characteristics of Big Data environments when compared with traditional data processing environments is the sheer amount of data that needs to be processed

Explanation

Question 35 of 200

1

Data Governance Manager

Select one or more of the following:

  • in a Big Data environment, the variety characteristic coupled with unknown access scenarios can make __________ a challenging task

  • A _________ is a tool with features for performing a range of common ____________ task in a centralized manner

  • A _________ can provide information on where the dataset resides, who the data owner/steward is, what the format of the data is, when the dataset was acquired, the source of the dataset, expiry date(if any), schema information via metadata search, a lineage viewer for establishing provenance

  • A _________ supports data lifecycle management through: the authoring of data retention and eviction policies, the establishment of security policies that specify the condition under which encryption is applied to a dataset or specific fields of a dataset, the creation of policies that establish disaster recovery management procedures

Explanation

Question 36 of 200

1

Data Governance Manager

Select one or more of the following:

  • Futhermore, a ___________ can provide information on the level of trust and sensitivity of data

  • This information includes whether or not the data can be stored in a cloud environment, as well as any geographical limitations for data persistence

  • To ensure enhanced data confidentiality and privacy within a cluster, an advanced _________ may further enable fine-grained control over data storage by specifying which nodes can store which types of datasets

  • Automatic Data Replication and Reconstruction (core)

Explanation

Question 37 of 200

1

Visualization Engine

Select one or more of the following:

  • To make sense of large amounts of data and to perform exploratory data analysis in support of finding meaningful insights, it is important to correctly interpret the results obtained from data analysis

  • This interpretation is dependent upon the Big Data platform's ability to present data in visual form

  • In the context of a Big Data platform, a service, such as MapReduce (a processing engine) or HDFS (a distributed file system), refers to a background process that executes a Big Data mechanism

  • A Big Data platform is a combination of multiple resources, which are provided by different mechanism, each with its own unique security configuration, This requires developing and maintaining separate security policies and API-based integration

Explanation

Question 38 of 200

1

Visualization Engine

Select one or more of the following:

  • A ________ graphically plots large amounts of data using traditional visualization techniques, including the bar chart, line graph and pie chart, alongside contemporary, Big Data-friendly visualization techniques, such as heat maps, word clouds, maps and spark charts

  • Additionally, a __________ may allow the creation of dashboards with filtering, aggregation, drill-down and what-if analysis features, along with the exportation of data for specific views

  • A ________ greatly enhances the productivity of data scientists and business analysts

  • A ________ provides a foundation for creating self-service visualizations for business intelligence (BI) and analytics

Explanation

Question 39 of 200

1

Productivity Portal

Select one or more of the following:

  • A Big Data platform provides a range of features, including data import, storage, processing and analysis, as well as workflow creation, via various mechanisms

  • Interacting with each of these mechanisms using their default interfaces can be difficult and time-consuming due to the mechanisms non-uniform natures. Further tools may need to be installed in order to make this interaction easier and to make sense of the processing results

  • the term ___________ refers to a technology environment comprised of Big Data mechanisms and technology artifacts that serves as a platform for developing Big Data solutions

  • Multiple __________ are generally deployed in an enterprise for fulfilling different business requirements

Explanation

Question 40 of 200

1

Productivity Portal

Select one or more of the following:

  • As a result, it takes longer to get from data import to data visualization, which further impacts the productivity and the overall value attributed to Big Data exploration and insight discovery (the value characteristic)

  • A ________ provides a centralized graphical user interface (GUI) for performing key activities that are a part of working with Big Data, including importing and exporting data, manipulating data storage, running data processing jobs, creating and running workflows, querying data, viewing schemas and performing searches

  • By understanding the makeup of each _______ the corresponding data ingestion requirements are better understood

  • The ________ pattern can be applied to automatically export data from the Big Data platform as a delimited or a hierarchical file

Explanation

Question 41 of 200

1

Productivity Portal

Select one or more of the following:

  • A ________ establishes a unified interface for configuring and managing the underlying mechanisms of the Big Data solution environment, such as establishing settings for the security engine

  • Additionally, a __________ may encapsulate a visualization engine in order to provide more meaningful, graphical views of data

  • By using a point and click interface to work with the Big Data platform, the _______ makes it easier and quicker to populate the Big Data platform with the required data, manage and process that data and export the processed results

  • Big data ________ are (partially or entirely) applied by implementing different combinations of Big Data Mechanism

Explanation

Question 42 of 200

1

Shared-Everything Architecture

Select one or more of the following:

  • is a machine-level architecture where multiple processors (CPUs) share memory and disk storage

  • can be implemented in two different ways: simetric multiprocessing and distributed shared memory

  • This is due to the fact that the same _________ can be physically implemented in a number of ways. The physical implementation is dependent upon the underlying technology that is used

  • A ________ represents the high-level components of a system, their functionality and how they are connected with one another

Explanation

Question 43 of 200

1

Shared-Everything Architecture

Select one or more of the following:

  • It should be noted that SMP and DSM apply to a single machine

  • is suitable for transactional workloads where data being processed is small and can be stored on a single machine

  • As all the resources (processor, memory and disk) exist within the boundaries of a single machine, data exchange only occurs within those boundaries

  • Therefore, transactional data can be processed quickly without any latency using a simple programming framework

Explanation

Question 44 of 200

1

Shared-Everything Architecture

Select one or more of the following:

  • Since all of the resources are tightly coupled together in a ___________, scalability becomes an issue

  • A storage area network (SAN) or a network-attached storage (NAS) solution can possibly be attached to a high-end multiprocessor machine to process large amounts of data

  • However, the network then becomes a bottleneck, with the data transfer taking longer than the actual data processing because the data needs to be transferred across the network

  • A ________ augmented with an SAN, which geatly increases storage capacity. However, the data now needs to be transferred across the network for processing, which adds to the data processing latency

Explanation

Question 45 of 200

1

Shared-Everything Architecture

Select one or more of the following:

  • With a ___________ , in order to cope with greater resource demands for CPU and/or disk space, the only option to scale up by replacing existing machines with higher-end (expensive) machines. Scaling up allows more processing and offers greater storage

  • However, any type of architecture that relies upon vertical scaling has an upper limit due to technology constraints such as maximum number of processors or memory limitations

  • Once the limit is reached, the only option is to scale out. Scaling out is a Big Data processing requirement that is not supported by _________

  • In the case that stored value conforms to a structure, such as a log file, the field names, along with field type information, are also recorded

Explanation

Question 46 of 200

1

Simetric Multiprocessing (SMP)

Select one of the following:

  • memory is pooled and shared between all processors. ______ is also known as uniform memory access (UMA)

  • In most cases, the ingested data is first stored on the distributed file system in a compressed form (apart from removal of unwanted and corrupt data)

  • the adquisition of new hardware resources

  • Although expensive, _________ databases provide atomicity, consistency, isolation and durability (ACID) compliance while supporting the querying of data using Structured Query Language (SQL)

Explanation

Question 47 of 200

1

Distributed Shared Memory (DSM)

Select one of the following:

  • multiple pools of memory exist. Thus memory is not shared between processors. ____ is also know as non-uniform memory access (NUMA)

  • As a result, data compression can be used to effectively increase the storage capacity of disk/memory space. In turn, this helps to reduce storage cost

  • Each pattern is asociated with one or more mechanism that represent common_______

  • For ________, apart form documenting the attributes and types of each entity, the possible connections (the edges) between entities are also recorded

Explanation

Question 48 of 200

1

Shared-Nothing Architecture

Select one or more of the following:

  • A ________ is a type of distributed architecture that consists of fully independent machines. The machines each have their own processors, memory, disk and operating system and are networked together as a single system

  • The ________ is self-sufficient and is free of any shared resources. For this reason, it is a highly scalable architecture that provides scale-out support, meaning extra machines can be added as required.

  • To ensure enhanced data confidentiality and privacy within a cluster, an advanced _________ may further enable fine-grained control over data storage by specifying which nodes can store which types of datasets

  • A ____________ is somewhat similar to a component architecture. However, in practice a single mechanism may itself be comprised of more that one component

Explanation

Question 49 of 200

1

Shared-Nothing Architecture

Select one or more of the following:

  • Although highly scalable, this architecture approach requires the use of complex distributed programming frameworks

  • For example, a data processing job needs to be partitioned first to act on sub-groups of data. After processing, the results from each partition need to be consolidated together

  • This design pattern is generally applied together with the Stream Access Storage pattern

  • The ______ consists of the storage device(s) that store the acquired data and generally consist of a distributed file system and a least one NoSQL database

Explanation

Question 50 of 200

1

Shared-Nothing Architecture

Select one or more of the following:

  • Usual data processing techniques employed within a ___________ include data sharding and replication where large datasets are divided and replicated across multiple machines

  • This work well for Big Data where a single dataset may be divided across several machines due to its volume

  • In this way, with Big Data processing, data and processing resources can be co-located, thereby reducing data transfer frecuency and volume

  • the __________ achieves this functionality via the data _________ manager

Explanation

Question 51 of 200

1

Masive parallel processing

Select one or more of the following:

  • is an architecture that can be applied to distributed query processing in a shared-nothing architecture

  • is mainly employed by high-end databases and database appliances like IBM Netezza and Teradata

  • An __________ describes the design used to integrate two or more applications, and further encompasses the technology architectures of the integrated applications.

  • the architecture of a single Big Data mechanism

Explanation

Question 52 of 200

1

Masive parallel processing

Select one or more of the following:

  • Databases based on ______ architecture generally use high-end hardware and a proprietary interconnect to link machines in order to enable the throughput required for high-speed analytics

  • Although expensive, _________ databases provide atomicity, consistency, isolation and durability (ACID) compliance while supporting the querying of data using Structured Query Language (SQL)

  • By understanding the makeup of each _______ the corresponding data ingestion requirements are better understood

  • Alternative, the member patterns that comprise a __________ can represent a set of related features provided by a particular environment. In this case, a coexistent application of patterns establishes a "solution environment" that may be realized by a combination of tools and technologies

Explanation

Question 53 of 200

1

Masive parallel processing

Select one or more of the following:

  • databases generally require data to exist in a structured format at the time of loading the data into the database. In other words, a schema needs to exist

  • This prior knowledge about the structure of the data makes ___________ databases very fast at querying large datasets

  • The structured format requirement introduces the need for an extra ETL step to be performed before unstructured data can be loaded into the __________ database

  • It is advisable to develop an inventory of all _________ to avoid duplication of datasets. This also helps with identifying relevant datasets when performing exploratory analysis

Explanation

Question 54 of 200

1

MapReduce

Select one or more of the following:

  • Similar to MPP, __________ (a batch processing engine) is a distributed data processing framework that requires a shared-nothing architecture

  • makes use of commodity hardware where machines are generally networked using Local Area Network (LAN) technology

  • the _______ further help reduce cluster administration overhead and makes diagnosis more efficient

  • A productivity portal normally encapsulates a relational data transfer engine for point-and-click import

Explanation

Question 55 of 200

1

MapReduce

Select one or more of the following:

  • ___________ - based processing platforms, such a Hadoop, do not require knowledge of data structure at load time.

  • Therefore ______ is ideal for processing semi-structured and unstructured data in support of executing analytical queries

  • However, without any knowledge of the data structure, data processing is slower with ________ as compered to MPP due to the inability to optimize query execution

  • Both MPP databases and __________ make use of the shared-nothing architecture and are based on the divide-and-conquer principle

Explanation

Question 56 of 200

1

MapReduce

Select one or more of the following:

  • Both MPP systems and __________ can be used for Big Data processing. However, from scalability point of view, MPP systems, when compare with ________, provide limited support for scaling out, as they are generally appliance-based

  • MPP systems are also a costlier option than ___________, which leverages inexpensive commodity hardware

  • _________, a framework for processing data, requires interaction via a general purpose programming language, such as java

  • A ________ provides raw storage where the value (the stored data) can be of any type, such as a file or a image, and is accessible via a key

Explanation

Question 57 of 200

1

Technology Infraestructure

Select one or more of the following:

  • In the context of software systems, a _________ represents the underlying environment that enables the design and execution of a software system

  • A ____________ defines the overall processing and storage capabilities of an IT enterprise. As well, a ____________ set the constraints within which the technology architecture needs to be designed

  • Efficient processing of large amounts of data demands an offline processing strategy, as dictated by the ___________ design pattern

  • As a result, the corresponding architecture of the __________ also gets complicated

Explanation

Question 58 of 200

1

Technology Architecture

Select one or more of the following:

  • In the context of software systems, a _________ represents the fundamental design of a software system

  • A __________ can be defined for varying levels of software artifacts ranging from a single software library to the set of sotfware systems across the entire IT enterprise

  • With a reasonable amount of data acquisition, IT spending only increases slightly with the passage of time. As the amount of acquired data increases exponentially, there is a tendency for IT spending to increase exponentially as well

  • In a Big Data solution environment, quite often data needs to be imported from relational databases into the Big Data platform for various data analysis tasks

Explanation

Question 59 of 200

1

Traditional Architecture Types

Select one or more of the following:

  • component architecture

  • application architecture

  • Poly Source

  • storage layer

Explanation

Question 60 of 200

1

Traditional Architecture Types

Select one or more of the following:

  • integration architecture

  • enterprise technology architecture

  • High Volume Binary Storage

  • High Volume Hierarchical Storage

Explanation

Question 61 of 200

1

component architecture

Select one or more of the following:

  • A ____________ represents the design of a single software program, representing a module, is structured within a modular/distributed software environment

  • Feature-wise, a module is inherently different from a software program, as the former only provides a specific set of functionality for performing a subset of operations when compared against the complete software program

  • The modules are dependent on other modules for providing the full set of functionality and hence are designed to be composable

  • Although expensive, _________ databases provide atomicity, consistency, isolation and durability (ACID) compliance while supporting the querying of data using Structured Query Language (SQL)

Explanation

Question 62 of 200

1

application architecture

Select one or more of the following:

  • An __________ represents the design and structure of a complete software system that can be deployed on its own

  • In a modular/distributed software environment, an _____________ generally consists of a number of modules and some storage

  • Correspondingly, an ____________ (apart from other specifications) generally includes the specifications of multiple component architectures

  • is the process of transforming objects or data entities into bytes for persistence (in memory or on disk) or transportation from one machine to another over a network

Explanation

Question 63 of 200

1

integration architecture

Select one or more of the following:

  • An __________ describes the design used to integrate two or more applications, and further encompasses the technology architectures of the integrated applications.

  • This generally involves connectors, middleware and any custom developed components

  • A documented _____________ provides a point of reference for ensuring continued integration in the face of changes to the integrated applications' architecture

  • Futhermore, the capabilities of this layer indicate the kinds of a Big Data solutions that can be built

Explanation

Question 64 of 200

1

enterprise technology

Select one of the following:

  • An _________ architecture represents an ____________ landscape including its respective architectures

  • In the case of continuosly arriving data, data is first accumulated to create a batch of data and only then processed

  • In Big Data environment, large volume not only refers to tall datasets (a large number of rows) but also to wide datasets (a large number of columns)

  • Futhermore, the capabilities of this layer indicate the kinds of a Big Data solutions that can be built

Explanation

Question 65 of 200

1

enterprise technology architecture

Select one or more of the following:

  • In contrast to the other three technology architectures, which can be documented before their development the ______________ can is generally documented once other architectures are in place

  • The scope of the _____________ encompasses component, application and integration architectures

  • Due to its enterprise-wide focus, the _____________ can also be considered as a reference point for understanding what constitutes the enterprise infraestructure

  • Therefore ______ is ideal for processing semi-structured and unstructured data in support of executing analytical queries

Explanation

Question 66 of 200

1

Big Data Mechanisms Architecture

Select one or more of the following:

  • the architecture of a single Big Data mechanism

  • The ___________ refers to the technology architecture of an individual ___________ that provides a specific functionality, such as a data transfer engine or a query engine

  • As a result, it takes longer to get from data import to data visualization, which further impacts the productivity and the overall value attributed to Big Data exploration and insight discovery (the value characteristic)

  • However, if internal data is also integrated, the same solution will provide business-specific results

Explanation

Question 67 of 200

1

Big Data Mechanisms Architecture

Select one or more of the following:

  • A ____________ is somewhat similar to a component architecture. However, in practice a single mechanism may itself be comprised of more that one component

  • Unlike traditional component architecture, the architectures of many ___________ are available due to the many active, open source projects that create and sustain them

  • A _____________ is generally a complete software package that can exist on its own but only truly realices its full potential when it is combined with other __________

  • For example, a storage device and a processing engine can exist on their own. However, the real value of these two is accomplished when the processing engine retrieves data from the storage device and processes the data to obtain meaningful results

Explanation

Question 68 of 200

1

Big Data Mechanisms Architecture

Select one or more of the following:

  • It is important to assess the interoperability and extensibility of each ________ so that upgrading a single ____________ does not impact the funcionality of other ______

  • For example, a resource manager should be compatible with different types of processing engines (batch and realtime) and should provide extension points for integrating future, processing-specific processing engines

  • Similarly, when upgraded, the resource manager should provide backward compatibility with older processing engines to ensure disruption-free operation

  • A ________ provides centralized management of a cluster, enabling streamlined deployment of core services over the cluster and their subsequent monitoring

Explanation

Question 69 of 200

1

Big Data Solution Architecture

Select one or more of the following:

  • The architecture of a set of Big Data Mechanisms assembled into a solution

  • The ___________ represents a solution environment built to address a specific Big Data problem, such as realtime sensor data analysis or a recommendation system

  • The __________ is associated with the storage device mechanism

  • An _________ architecture represents an ____________ landscape including its respective architectures

Explanation

Question 70 of 200

1

Big Data Solution Architecture

Select one or more of the following:

  • Such a solution environment represents a set of multiple Big Data mechanism that collectively provide the required business functionality

  • In a Big Data environment, the term __________ is similar to the term application architecture. However, in the domain of Big Data, it is the collective application of Big Data mechanisms that results in the creation of a __________. This is different that the concept of a traditional, package software application

  • A ___________ is generally a Big Data pipeline comprising multiple stages where complex processing is broken down into modular steps called tasks

  • Each task in a Big Data pipeline can make use of a processing engine mechanism, such as MapReduce or Spark, or a query engine mechanism, such as Hive or Pig to perform operations on Data

Explanation

Question 71 of 200

1

Big Data Solution Architecture

Select one or more of the following:

  • Complex ___________ may involve more than one data pipeline, for example, one for realtime data processing and the other for batch data processing

  • Multiple __________ are generally deployed in an enterprise for fulfilling different business requirements

  • A documentation of the corresponding architectures provides an understanding of the utilization levels of common Big Data mechanism

  • This helps with establishing the scalability requirements of each Big Data mechanism and determining any potential performance bottlenecks

Explanation

Question 72 of 200

1

Big Data Integration Architecture

Select one or more of the following:

  • The architecture that consists of integrating a Big Data solution with the traditional enterprise systems

  • The extend to which an enterprise can benefit from a Big Data Solution is limited when it is deployed in isolation from the rest of the traditional enterprise systems

  • The ________ pattern is associated with the query engine, processing engine, storage device, resource manager and coordination engine mechanisms

  • With a ___________ architecture, in order to cope with greater resource demands for CPU and/or disk space, the only option to scale up by replacing existing machines with higher-end (expensive) machines. Scaling up allows more processing and offers greater storage

Explanation

Question 73 of 200

1

Big Data Integration Architecture

Select one or more of the following:

  • A Big Data solution that is integrated with other parts of an enterprise ecosystem provides maximum value because the data it contains or the analytics results it generates can be usedby other traditional enterprise systems, such as the enterprise data warehouse or an ERP system

  • Similarly, a Big Data Solution that only makes use of external datasets may produce generic or out-of-context results that are of little value to the business

  • However, if internal data is also integrated, the same solution will provide business-specific results

  • Due to the aforementioned reasons, a Big Data solution is generally integrated with the rest of the enterprise IT systems in order to provide maximum value

Explanation

Question 74 of 200

1

Big Data Integration Architecture

Select one or more of the following:

  • The resulting architecture is known as the __________, which includes the architecture of the Big Data solution, any connected enterprise systems and integration components

  • With respect to a Big Data solution, there are generally two integration points: one for importing the raw data that needs to be processed and the other for exporting the results or, in some cases, exporting the ingested cleansed data

  • One prominent area within the field of pattern identification is the analysis of connected entities. Due to the large volume of data in Big Data environments, efficient and timely analysis of such data requires specialized storage

  • Big Data solutions demand opposing access requirements when it comes to raw versus processed data

Explanation

Question 75 of 200

1

Big Data Integration Architecture

Select one or more of the following:

  • For this, multiple data transfer engines or connectors, such as ODBC, are employed

  • Instead of using multiple point-to-point connections (connectors) between each Big Data solution and the traditional system, a single data bus can be used that provides a standardized integration approach across multiple Big Data solutions

  • The _________ compound pattern represents a part of a Big Data solution environment capable of storing high-volume and high-variety data and make-it available for indexed, ___________

  • Although highly scalable, this architecture approach requires the use of complex distributed programming frameworks

Explanation

Question 76 of 200

1

Big Data Platform Architecture

Select one or more of the following:

  • The architecture of the entire ____________ that enables the execution of multiple Big Data solutions

  • The __________ is the underlying technology architecture that supports the execution of multiple Big Data solutions

  • Furthermore, the storage device automatically detects when a replica becomes unavailable and recreates the lost replica from one of the available replicas

  • can be implemented in two different ways: simetric multiprocessing and distributed shared memory

Explanation

Question 77 of 200

1

Big Data Platform Architecture

Select one or more of the following:

  • This type of architecture documents the underlying Big Data mechanism that have been assembled in different combinations to construct multiple Big Data solutions

  • This generally represents a layered architecture where each top layer makes use of the successive bottom layer

  • The typical makeup of a __________ includes the storage layer, processing layer, analysis layer and visualization layer

  • At the start of a Big Data initiative, the _________ may only consist of a rudimentary set of Big Data mechanism for supporting a simple Big Data Solution

Explanation

Question 78 of 200

1

Big Data Platform Architecture

Select one or more of the following:

  • However, with the passage of time, as more Big Data solutions are built and their complexity increases, additional Big Data mechanisms are introduced

  • As a result, the corresponding architecture of the __________ also gets complicated

  • A documentation of the _________ further helps to ascertain which maturity level of the analytics the enterprise is currently at

  • For example, the existence of an analytics engine in addition to a query engine indicates that the enterprise employs some level of predictive analytics

Explanation

Question 79 of 200

1

Big Data Platform Architecture

Select one or more of the following:

  • An enterprise generally starts off or is already at the descriptive or diagnostic analytics maturity level and aims to move towards the predictive or predictive analytics maturity level

  • A _________ can also be considered a superset of the traditional data architecture, as the former includes the development of data architecture for both raw and processed data

  • Dataset Location: the location from where the data will be available, which can be internal or external to the enterprise, including the cloud

  • Furthermore, the storage device automatically detects when a replica becomes unavailable and recreates the lost replica from one of the available replicas

Explanation

Question 80 of 200

1

Big Data Platform Architecture

Select one or more of the following:

  • It further includes decisions based around which storage technologies to use, what should be the format and structure of processed data and dictionary of datasets available for developing Big Data solutions

  • However, a __________ goes beyond data curation and includes data processing, analysis and visualization technologies as well

  • The _______ pattern is normally applied together with the Large-Scale Batch Processing pattern as part of a complete solution

  • For _________, the description and type of the entity being stored is documented, such as product image and png

Explanation

Question 81 of 200

1

Logical Architecture

Select one or more of the following:

  • A ________ represents the high-level components of a system, their functionality and how they are connected with one another

  • The term logical emphasizes the fact that the description of the architecture does not bear any resemblance to the physical implementation of the system

  • This is due to the fact that the same _________ can be physically implemented in a number of ways. The physical implementation is dependent upon the underlying technology that is used

  • For health monitoring purposes, the cluster manager gathers metrics from various components running within different layers, such as the storage , processing and analysis layers, and displays their current status using a dashboard

Explanation

Question 82 of 200

1

Big Data Analytics Logical Architecture

Select one or more of the following:

  • A __________ defines the logical components required for the implementation of a Big Data analytics solution

  • It is a specialized form of a Big Data platform architecture that defines the Big Data mechanisms at each of its different layers, the responsibility of each of these layers and the generic flow of data between these layers

  • Data is ingested generally via file and/or relational data transfer engines, saved to the disk-based storage device and then processed using a ___________ engine

  • Dataset Decomposition

Explanation

Question 83 of 200

1

Big Data Analytics Logical Architecture

Select one or more of the following:

  • Devising such an architecture provides an easy-to-understand reference point for both the Big Data architects and the Big Data engineers

  • data sources layer

  • data adquisition layer

  • storage layer

Explanation

Question 84 of 200

1

Big Data Analytics Logical Architecture

Select one or more of the following:

  • processing layer

  • analysis layer

  • visualization layer

  • utilization layer

Explanation

Question 85 of 200

1

Big Data Analytics Logical Architecture

Select one or more of the following:

  • management layer

  • security layer

  • governance layer

  • Productivity Portal

Explanation

Question 86 of 200

1

data sources layer

Select one or more of the following:

  • The ____ comprises all the _______ that had been identified during the Data Identification stage of the Big Data analysis lifecycle

  • However, rather than consisting of ________ for a single big Data solution, the _________ encompasses ___________ across all Big Data Solutions

  • Dataset Type: the underlying format of the data produced by the source (structured, unstructured or semi-structured)

  • As all the resources (processor, memory and disk) exist within the boundaries of a single machine, data exchange only occurs within those boundaries

Explanation

Question 87 of 200

1

data sources layer

Select one or more of the following:

  • It is important to understand that this layer does not form part of the physical Big Data analytics architecture, as the data is produced by a source, such as an API, a database or web location, that is part of a separate system

  • The _________ provides an understanding of the range of datasets used by different Big Data solutions

  • It is advisable to develop an inventory of all _________ to avoid duplication of datasets. This also helps with identifying relevant datasets when performing exploratory analysis

  • By understanding the makeup of each _______ the corresponding data ingestion requirements are better understood

Explanation

Question 88 of 200

1

data sources layer

Select one or more of the following:

  • Processing Big Data datasets involves the use of processing engines that need programmatic skills in order to work with them

  • This prior knowledge about the structure of the data makes ___________ databases very fast at querying large datasets

  • Access Type: has the data open or restricted access?

  • Access Method: is data available via a simple connection or does it need scraping from a web resource?

Explanation

Question 89 of 200

1

data sources layer

Select one or more of the following:

  • Access Cost: is the data available freely or is there a cost associated with this acquisition, such as from a data market?

  • Data Production Speed: the rate at which the data source generates the data

  • Dataset Location: the location from where the data will be available, which can be internal or external to the enterprise, including the cloud

  • By using a point and click interface to work with the Big Data platform, the _______ makes it easier and quicker to populate the Big Data platform with the required data, manage and process that data and export the processed results

Explanation

Question 90 of 200

1

data acquisition layer

Select one or more of the following:

  • The _______ provides functionality for acquiring data from the sources in the data sources layer

  • Based on the type and location of the data sources, this layer may consist of more than one data transfer engine mechanism

  • In the context of software systems, a _________ represents the fundamental design of a software system

  • However, a __________ goes beyond data curation and includes data processing, analysis and visualization technologies as well

Explanation

Question 91 of 200

1

data acquisition layer

Select one or more of the following:

  • For internal structured data sources, a relational data transfer engine can be used

  • For semi-structured and unstructured data sources, whether internal or external, an event or file data transfer engine can be used

  • In the case of realtime processing of data or stream analysis, an event data transfer engine is generally used

  • When a event data transfer engine is used, the ingested data can normally be filtered out in-flight via the removal of unwanted or corrupt data

Explanation

Question 92 of 200

1

data acquisition layer

Select one or more of the following:

  • Similarly, in the case of a relational data transfer engine, corrupt or wanted data can be filtered out at the source by the specification of a constrained selection criteria

  • However, in the case of a file transfer engine, the file needs to be ingested before it can be examinated for the filtration process

  • The ________ provides an easy-to-interact interface for analyzing data in the storage layer and consists of the query and analytics engines

  • Although expensive, _________ databases provide atomicity, consistency, isolation and durability (ACID) compliance while supporting the querying of data using Structured Query Language (SQL)

Explanation

Question 93 of 200

1

data acquisition layer

Select one or more of the following:

  • this layer also includes mechanism for automatically appending metadata to the ingested data for assuring quality and maintaining provenance and compression of data

  • The data adquisition and Filtering stage of the Big Data analysis lifecycle is supported by the ________

  • Under certain circumstances, acquiring data may require API integration, which further warrants the development of custom (code) libraries or service development, which reside in this layer

  • the distributed file system automatically splits a large dataset into multiple smaller datasets that are then spread across the cluster

Explanation

Question 94 of 200

1

storage layer

Select one or more of the following:

  • The ______ consists of the storage device(s) that store the acquired data and generally consist of a distributed file system and a least one NoSQL database

  • Note that in the case of realtime data processing, the ________ also consist of in -memory storage technologies that enable fast analysis of high velocity data as it arrives

  • Furthermore, in a production environment, the complete cycle needs to be repeated over and over again

  • Therefore ______ is ideal for processing semi-structured and unstructured data in support of executing analytical queries

Explanation

Question 95 of 200

1

storage layer

Select one or more of the following:

  • In most cases, the ingested data is first stored on the distributed file system in a compressed form (apart from removal of unwanted and corrupt data)

  • This is because a distributed file system provides the most inexpensive form of storing large volumes of data

  • From the distributed file system, data can be pre-processed and put into a more structured form using an appropiate NoSQL storage device

  • A structured (but not necessarily relational) form is required because the exploratory analysis of data and the derivation and application of statistical and machine learning models require data whose attributes can be accessed in a standardized manner

Explanation

Question 96 of 200

1

storage layer

Select one or more of the following:

  • Although the conversion to a structured form may not seem obvious in the case of applying semantic analysis techniques, even techniques such as text analytics first convert a document into a structured form before performing clustering, classification or a searching

  • Data that has undergone transformation, validation and cleasing operations is generally stored in one of the NoSQL databases, namely key-value, column-family, document and graph NoSQL databases

  • Big data ________ are (partially or entirely) applied by implementing different combinations of Big Data Mechanism

  • This interpretation is dependent upon the Big Data platform's ability to present data in visual form

Explanation

Question 97 of 200

1

key-value database

Select one or more of the following:

  • A ________ provides raw storage where the value (the stored data) can be of any type, such as a file or a image, and is accessible via a key

  • For _________, the description and type of the entity being stored is documented, such as product image and png

  • The modules are dependent on other modules for providing the full set of functionality and hence are designed to be composable

  • the _______ pattern is associated with the storage device (distributed file system) and processing engine (batch) mechanisms.

Explanation

Question 98 of 200

1

document database

Select one or more of the following:

  • A ________ is capable of storing each record in a hierarchical form that can be accessed via a key, imitating a physical document that can have multiple selections

  • For ________, the hierarchical structure of the different documents being stored, along with their types, is documented

  • Data Production Speed: the rate at which the data source generates the data

  • In most cases, the ingested data is first stored on the distributed file system in a compressed form (apart from removal of unwanted and corrupt data)

Explanation

Question 99 of 200

1

column-family database

Select one or more of the following:

  • A ________ is like a relational database that stores data in rows and columns. However, rather than storing a value per column, multiple key-value pairs can be stored inside a single column

  • For ________, the field names of each entity and any sub-fields within each field, along with their data types, are recorded. Also, based on the analysis requirements, it is important to decide between storing data as wide-rows or as tall-columns

  • The ___________ pattern is associated with the data transfer engine (relational), storage device, workflow engine and productivity portal mechanisms

  • In an actual implementation, the __________ consists of Open Source libraries, third-party business intelligence (BI) or analytics software

Explanation

Question 100 of 200

1

graph database

Select one or more of the following:

  • A ________ stores data in the form of connected entities where each record is called a node or a vertex and the connection between the entities is called the edge, which can be one-way or two-way

  • For ________, apart form documenting the attributes and types of each entity, the possible connections (the edges) between entities are also recorded

  • the _______ pattern is associated with the storage device (distributed file system) and processing engine (batch) mechanisms.

  • The ________ pattern is associated with the query engine, processing engine, storage device, resource manager and coordination engine mechanisms

Explanation

Question 101 of 200

1

storage layer

Select one or more of the following:

  • Before data can be stored in a NoSQL database, a data modelling exercise is generally undertaken

  • However, unlike the relational data modeling activity where entity names, attributes and relationships are documented, the nature of the NoSQL data modelling activity is different due to its non-relational nature and further depends on the type of the NoSQL database

  • In a NoSQL database, the emphasis is more on the structure of the individual aggregate, which is a self-contained record that has no relationships with other records

  • In Big Data environment, large volume not only refers to tall datasets (a large number of rows) but also to wide datasets (a large number of columns)

Explanation

Question 102 of 200

1

storage layer

Select one or more of the following:

  • In the case that stored value conforms to a structure, such as a log file, the field names, along with field type information, are also recorded

  • Apart from different storage devices, this layer also houses serialization and compression engines for storing data in an appropriate format and reducing storage footprint, respectively

  • This generally represents a layered architecture where each top layer makes use of the successive bottom layer

  • Not all of ________ may be required to apply the pattern. Sometimes one or more associated mechanisms may act as an alternative to others for how the pattern is applied

Explanation

Question 103 of 200

1

processing layer

Select one or more of the following:

  • The ________ provides a range of processing capabilities that play a pivotal role in generating value from a variety of voluminous data arriving at a high velocity in a meaningful time period

  • Apart from resource manager and coordination engines, although this layer can contain both the batch and the realtime processing engine mechanisms, based on the type of analytics performed, only one processing engine, such as the batch processing engine, may actually be present

  • The opposite transformation process from bytes to objects or data entities is called _________

  • controls the management of the data lifecycle to ensure that quality data is available in a controlled, secure and timely fashion

Explanation

Question 104 of 200

1

processing layer

Select one or more of the following:

  • Futhermore, the capabilities of this layer indicate the kinds of a Big Data solutions that can be built

  • Data Ingress/Egress - a data transfer engine may utilize a processing engine for transfering data

  • This pattern requires the use of a storage device implemented via a document NoSQL database servicing insert, select, update and delete operations. The document NoSQL database generally automatically encodes the data using a binary or a plain-text hierarchical format, such as JSON, before storage

  • However, unlike the relational data modeling activity where entity names, attributes and relationships are documented, the nature of the NoSQL data modelling activity is different due to its non-relational nature and further depends on the type of the NoSQL database

Explanation

Question 105 of 200

1

processing layer

Select one or more of the following:

  • Data Wrangling - data pre-processing activities, including data validation, cleansing and joining

  • Data Analysis - analytical activities, including querying, exploratory data analysis and model generation

  • This layer can further be divided into the batch and realtime ________

  • The ________ compound pattern represents a part of a Big Data solution environment capable of egressing high-volume, high-velocity and high-variety data out of the Big Data solution environment

Explanation

Question 106 of 200

1

Batch Processing

Select one or more of the following:

  • involves a _______ engine that processes large amounts of data stored on disk-based storage device in batches

  • This is the most common form of data processing employed in a Big Data environment for data wrangling operations, exploring data and developing and executing statistical and machine learning models

  • This layer represents the functionality as required by the Utilization of Analysis Results stage of the Big Data analysis lifecycle

  • the _______ further help reduce cluster administration overhead and makes diagnosis more efficient

Explanation

Question 107 of 200

1

Batch Processing

Select one or more of the following:

  • Due to its nature of processing, the processing results are not available instantaneously

  • Data is ingested generally via file and/or relational data transfer engines, saved to the disk-based storage device and then processed using a ___________ engine

  • Although the __________ is mainly concerned with policies that guide data management activities, it may further provide functionality for managing other aspects of the Big Data platform

  • Note that in the case of realtime data processing, the ________ also consist of in -memory storage technologies that enable fast analysis of high velocity data as it arrives

Explanation

Question 108 of 200

1

Realtime Processing

Select one or more of the following:

  • involves a _______ engine that processes continuosly arriving data (streams) or data arriving at intervals (events) as it arrives

  • Instead of persisting the data to a disk-based storage device, _________ persists the data to a memory-based storage device

  • As a result, the corresponding architecture of the __________ also gets complicated

  • The architecture of a set of Big Data Mechanisms assembled into a solution

Explanation

Question 109 of 200

1

Realtime Processing

Select one or more of the following:

  • Although providing instantaneous results, setting up such as capability is not only complex but also expensive due to the reliance on memory-based storage (memory is more expensive that disk)

  • Data is ingested via an event data transfer, saved to a memory-based storage device and then processed using a ___________ engine

  • Note that although in-memory storage is initially used, data is also saved to disk-based storage for deep analysis or future use

  • For providing maximun value, a __________ layer should provide low latency, high throughput, high availability and high fault tolerance

Explanation

Question 110 of 200

1

Realtime Processing

Select one or more of the following:

  • Event Stream Processing (ESP)

  • Complex Event Processing (CEP)

  • During ________, multiple streams or events that generally originate form disparate sources and are spread out over different time intervals are analyzed simultaneously for finding correlations, patterns, anomalous behavior and error conditions

  • A __________ defines the logical components required for the implementation of a Big Data analytics solution

Explanation

Question 111 of 200

1

Realtime Processing

Select one or more of the following:

  • generally refers to processing event-based data

  • However, the execution of data queries, which requires an instant response, on already persisted data acquired via batch import also falls in the domain of _________

  • Not all of ________ may be required to apply the pattern. Sometimes one or more associated mechanisms may act as an alternative to others for how the pattern is applied

  • The application of the __________ pattern further increases the reach of the Big Data soltution environment to non-IT users, such as data analysts and data scientists

Explanation

Question 112 of 200

1

Event Stream Processing

Select one or more of the following:

  • During ________, the incoming stream of data or events, which generally originates from a single source and is ordered by time, is continuosly analyzed via the application of algorithms or query execution

  • In simple use cases, ____________ involves data cleasing, transformation and the generation of some statistics, such as sum, mean, min or max, which are then fed to an operational dashboard

  • A ________ is a type of distributed architecture that consists of fully independent machines. The machines each have their own processors, memory, disk and operating system and are networked together as a single system

  • To counter this issue, the dataset is horizontally broken into smaller parts as prescribed by the ___________ pattern

Explanation

Question 113 of 200

1

Event Stream Processing

Select one or more of the following:

  • In complex use cases, statistical or machine learning algorithms with fast execution times can be executed to detect a pattern or an anomaly or to predict the future state

  • Other memory-resident datasets can also be incorporated for performing analytics that provide context-aware results

  • Although the processing results can be directly utilized (a dashboard or an application), they can act as a trigger for another application that performs a preconfigured action, such as making computational adjustments, or further analyses

  • In general, ___________ focuses more on speed than complexity. The operation needs to be executed in a comparatively simple manner in order to aid faster execution. Also, it is easier to set up than CEP but provides less value

Explanation

Question 114 of 200

1

Complex Event Processing

Select one or more of the following:

  • During ________, multiple streams or events that generally originate form disparate sources and are spread out over different time intervals are analyzed simultaneously for finding correlations, patterns, anomalous behavior and error conditions

  • Like ESP, the objective is to aid in making realtime decisions either automatically or through human intervention the moment data is received

  • In an actual implementation, the __________ consists of Open Source libraries, third-party business intelligence (BI) or analytics software

  • Storing very large dataset where they are accessed by a number of users simultaneously can seriously affect the data access performance of the underlying database

Explanation

Question 115 of 200

1

Complex Event Processing

Select one or more of the following:

  • When compared with ESP, __________ provides more value but is harder to set up, as it involves connecting with multiple data sources and executing complex logic

  • Complex correlation and pattern identification algorithms are applied, and business logic and KPIs are also taken into account for discovering cross-cutting ________ patterns

  • can be considered a superset of ESP. Oftentimes, both approaches can be deployed together such that the synthetic events generated as the output of ESP can become input for __________

  • provides rich analytics. However, due to its complex nature, time-to-insight may be adversely affected

Explanation

Question 116 of 200

1

analysis layer

Select one or more of the following:

  • The ________ provides an easy-to-interact interface for analyzing data in the storage layer and consists of the query and analytics engines

  • Depending on the type of amalytics being performed, this layer may only consist of a query engine, such as in the case of descriptive and diagnostic analytics

  • Due to the contemporary nature of these processing engines and the specialized processing frameworks they follow, programmers may not be conversant with the APIs of each processing engine

  • With a reasonable amount of data acquisition, IT spending only increases slightly with the passage of time. As the amount of acquired data increases exponentially, there is a tendency for IT spending to increase exponentially as well

Explanation

Question 117 of 200

1

analysis layer

Select one or more of the following:

  • However, when performing deep analytics, such as in the case of predictive and prescriptive analytics, an analytics engine also exists

  • This is the layer that converts large amounts of data into information that can be acted upon

  • This layer abstracts the processing layer with a view of making data analysis easier and further increasing the reach of the Big Data platform to data scientists and data analysts

  • Activities supported by this layer include: data cleasing, data mining, exploratory data analysis, preparing data for statistical/machine learning model development, model development, model evaluation and model execution

Explanation

Question 118 of 200

1

analysis layer

Select one or more of the following:

  • The functionality provided by this layer corresponds to the data analysis stage of the Big Data analysis lifecycle

  • In an actual implementation, the __________ consists of Open Source libraries, third-party business intelligence (BI) or analytics software

  • A mechanism may either be implemented when the pattern is applied, or it may be directly affected by the application of the pattern

  • provides rich analytics. However, due to its complex nature, time-to-insight may be adversely affected

Explanation

Question 119 of 200

1

analysis layer

Select one or more of the following:

  • In case of Open Source libraries, interaction is mostly command-line-based, with a basic graphical user interface (GUI) in some cases

  • However, third party software provides a GUI with point-and-click functionality for statistical/machine learning model development and other general data querying

  • The ________ compound pattern represents a part of a Big Data solution environment capable of egressing high-volume, high-velocity and high-variety data out of the Big Data solution environment

  • makes use of commodity hardware where machines are generally networked using Local Area Network (LAN) technology

Explanation

Question 120 of 200

1

visualization layer

Select one or more of the following:

  • The __________ hosts the visualization engine and provides functionality as required by the Data Visualization stage of the Big Data analysis lifecycle

  • Note that the analysis layer may also encompass some level of data visualization features

  • This generally represents a layered architecture where each top layer makes use of the successive bottom layer

  • The ________ pattern is associated with the compresssion engine, storage device, data transfer engine and processing engine mechanisms

Explanation

Question 121 of 200

1

visualization layer

Select one or more of the following:

  • However, the nature od such visualizations is different and more analysis-specific

  • Visualizations are generally utilized by data scientists and analysts to help them understand the data and the output of various analysis techniques

  • The visualization features provided by the __________ are mainly geared towards business users so that they can easily interpret the insights obtained from the data analysis exercise

  • However, based on the physical implementation, the third-party tools used at the analysis layer may provide the ability to create visualizations and publish them for enterprise-wide use so that different information workers and business users can turn the published information into knowledge for making informed decisions

Explanation

Question 122 of 200

1

visualization layer

Select one or more of the following:

  • This layer is also fundamental to the concept of self service BI, where business users can access enterprise data directly without first requesting it from the IT team, can perform the required analyses and can create the required reports and dashboards themselves

  • To ensure longevity of the Big Data analytics platform, the compatibility of the visualization engine, with respect to the types of data sources it can connect to, needs to be assessed, as the analysis results are normally persisted to a storage device, such as a NoSQL database

  • The ___________ provides an opportunity for developing an understanding of the analysis results in a graphical manner

  • is the process of transforming objects or data entities into bytes for persistence (in memory or on disk) or transportation from one machine to another over a network

Explanation

Question 123 of 200

1

utilization layer

Select one or more of the following:

  • However, if the maximum benefit is to be gained from these results, they need to be incorporated into the enterprise in some shape and form

  • The __________ provides the analysis results so that an enterprise can take advantage of an opportunity or mitigate a risk in a proactive manner

  • The query engine provides an easy-to-interact-with interface where the user specifies a script that is automatically converted to low-level API calls for the required processing engine

  • The resulting architecture is known as the __________, which includes the architecture of the Big Data solution, any connected enterprise systems and integration components

Explanation

Question 124 of 200

1

utilization layer

Select one or more of the following:

  • This layer represents the functionality as required by the Utilization of Analysis Results stage of the Big Data analysis lifecycle

  • The functionality provided by the ____________ will vary based on the utilization pattern of the analysis results

  • This includes exporting results to dashboard and alerting applications (online portal), operational systems (CRM, SCM, ERP and e-commerce systems) and automated business processes (Business Process Execution Language-based processes)

  • At other times, data products are provided that enable the generation of computational results, such as outlier detection, recommendations, predictions or scores that can be used to optimize business operations

Explanation

Question 125 of 200

1

utilization layer

Select one or more of the following:

  • In almost all cases, one or more data transfer engines are present that enable the export of the analysis results form the storage device(s) to downstream systems or applications

  • To automate the entire export process, a workflow engine that resides in the management layer is used in conbination with a data transfer engine

  • This enables automatic access to the analysis results stored in the disk-based or memory-based storage devices

  • Although the conversion to a structured form may not seem obvious in the case of applying semantic analysis techniques, even techniques such as text analytics first convert a document into a structured form before performing clustering, classification or a searching

Explanation

Question 126 of 200

1

management layer

Select one or more of the following:

  • the _________ is tasked with the automated and continuous monitoring as well as maintenance of a Big Data platform for ensuring its operational integrity

  • The functionality supported by this layer relates to the operational requirements of a Big Data platform, including cluster setup, cluster expansion, system and software upgrades across the cluster and fault diagnosis and health monitoring of the cluster

  • A __________ defines the logical components required for the implementation of a Big Data analytics solution

  • the __________ provides functionality that ensures that the storage and access to data within the Big Data platform are managed throughout the lifespan of the data

Explanation

Question 127 of 200

1

management layer

Select one or more of the following:

  • The ________ achieves the provisioning of the required functionality by hosting a cluster manager

  • For health monitoring purposes, the cluster manager gathers metrics from various components running within different layers, such as the storage , processing and analysis layers, and displays their current status using a dashboard

  • For cluster maintenance, such as adding a new node to the cluster, taking a node offline or installing a new service, and disaster ___________ task a graphical user interface (GUI) is used

  • Due to the requirement of integrating with multiple types of components, an inteorperable and extensible cluster manager should be chosen

Explanation

Question 128 of 200

1

management layer

Select one or more of the following:

  • This ensures continuous monitoring and _________ of the Big Data platform, as new components are added across multiple layers in response to the ever-changing analytics requirements of an enterprise

  • Apart from operational ________, this layer also provides data processing and data __________ functionality through workflow engine and productivity portal mechanisms

  • The _______ pattern is normally applied together with the Large-Scale Batch Processing pattern as part of a complete solution

  • The ________ provides a range of processing capabilities that play a pivotal role in generating value from a variety of voluminous data arriving at a high velocity in a meaningful time period

Explanation

Question 129 of 200

1

security layer

Select one or more of the following:

  • The __________ is responsible for securing various components operating within other layers of the Big Data platform

  • This layer provides functionality for authentication, authorization and confidentiality via the encryption of at-rest and in-motion data

  • Storage of unstructured data generally involves access scenarios where partial updates of data are not performed and specific data items (records) always accessed in their entirely, such as an image or user session data

  • Instead of persisting the data to a disk-based storage device, _________ persists the data to a memory-based storage device

Explanation

Question 130 of 200

1

security layer

Select one or more of the following:

  • As ilustrated in the upcoming diagram, the ____________ houses the security engine. The security features provided by this layer are primarily used to secure the data acquisition layer, storage layer, processing layer and analysis layer

  • This layer provides functionality for authoring, applying and managing security policies as well as monitoring resource access via auditing

  • Databases based on ______ architecture generally use high-end hardware and a proprietary interconnect to link machines in order to enable the throughput required for high-speed analytics

  • A _________ is a tool with features for performing a range of common ___________ task in a centralized manner

Explanation

Question 131 of 200

1

security layer

Select one or more of the following:

  • One of the main objectives of this layer is to ensure that only the intended user with the correct access level can access the requested resources, such as the storage device or the processing engine

  • The _________ intercepts access requests, which are made from enterprise application and systems using different security schemes, for resources within the Big Data platform

  • By acting as an intermediary, the __________ provides seamless access to the Big Data platform in a secure manner without the need for custom integration

  • A documentation of the _________ further helps to ascertain which maturity level of the analytics the enterprise is currently at

Explanation

Question 132 of 200

1

governance layer

Select one or more of the following:

  • the __________ provides functionality that ensures that the storage and access to data within the Big Data platform are managed throughout the lifespan of the data

  • the __________ achieves this functionality via the data _________ manager

  • The _________ compound pattern represents a fundamental solution evironment comprised of a processing ______ with data ingress, storage, processing and egress capabilities

  • A ________ stores data in the form of connected entities where each record is called a node or a vertex and the connection between the entities is called the edge, which can be one-way or two-way

Explanation

Question 133 of 200

1

governance layer

Select one or more of the following:

  • It helps define policies for: acquiring data from internal and external sources, which fields need to be anonymized/removed/encrypted, what consitutes personally identifiable information, how processed data should be persisted, the publication of the analytics' results and how long the data should be stored

  • Although the __________ is mainly concerned with policies that guide data management activities, it may further provide functionality for managing other aspects of the Big Data platform

  • what kind of encryption should be used for data at-rest and in-motion

  • the integration of new components or tools within the Big Data platform

Explanation

Question 134 of 200

1

governance layer

Select one or more of the following:

  • the adquisition of new hardware resources

  • the evolution of the Big Data platform

  • A ________ stores data in the form of connected entities where each record is called a node or a vertex and the connection between the entities is called the edge, which can be one-way or two-way

  • This prior knowledge about the structure of the data makes ___________ databases very fast at querying large datasets

Explanation

Question 135 of 200

1

Big Data platform

Select one of the following:

  • A __________ is a set of technologies that collectively provide Big Data storage and processing capabilities

  • Instead of directly using the data transfer engine, it can be indirectly invoked via a productivity portal which normally denotes ad-hoc usage

  • A ___________ is a data-driven workflow consisting of task where each task involves the input , operation and output

  • To achieve low latency data access, a memory based storage device can be used instead, however, this increases the cost of setting up the Big Data platform

Explanation

Question 136 of 200

1

Data Pipeline

Select one or more of the following:

  • A ___________ is a data-driven workflow consisting of task where each task involves the input , operation and output

  • Each _________ consists of multiple tasks joined together in a sequencial manner such that the output of the previous task becomes the input of the next task. Such a combination of tasks denotes an single stage

  • A documentation of the _________ further helps to ascertain which maturity level of the analytics the enterprise is currently at

  • In such a situation, the _________ pattern can be applied, which requires dividing the ________ into multiple simple steps. This is executed over multiple processing runs

Explanation

Question 137 of 200

1

Big Data Pipeline

Select one or more of the following:

  • The _________ compound pattern represents a fundamental solution evironment comprised of a processing ______ with data ingress, storage, processing and egress capabilities

  • A __________ can be very simple, consisting of a single stage or very complex, consisting of multiple stages

  • Data flowing in at high speeds needs to be captured instantly so that it can be processed without any delay for obtaining maximum value

  • This is because a distributed file system provides the most inexpensive form of storing large volumes of data

Explanation

Question 138 of 200

1

Big Data Pipeline

Select one or more of the following:

  • The entire set of activities, from data ingestion to data egress, can be thought of as a _______ involving a range of operations from data cleasing to the computation of a statistic

  • Depending on the required functionality, a _______represents a partial or a complete Big Data solution in support of Big Data analysis

  • Poly Source

  • Poly Storage

Explanation

Question 139 of 200

1

Big Data Pipeline

Select one or more of the following:

  • Big Data Processing Environment

  • Poly Sink

  • Automated Dataset Execution

  • Serialization Engine

Explanation

Question 140 of 200

1

Poly Source

Select one or more of the following:

  • The __________ compound pattern represents a part of a Big Data solution environment capable of ingesting high-volume and high-velocity data from a range of structured, unstructured and semi-structured data sources

  • Relational Source (core)

  • An __________ represents the design and structure of a complete software system that can be deployed on its own

  • This generally represents a layered architecture where each top layer makes use of the successive bottom layer

Explanation

Question 141 of 200

1

Poly Source

Select one or more of the following:

  • File-Based Source (core)

  • Streaming Source (core)

  • Fan-in ingress (optional)

  • Fan-out ingress (optional)

Explanation

Question 142 of 200

1

Relational Source

Select one or more of the following:

  • In a Big Data solution environment, quite often data needs to be imported from relational databases into the Big Data platform for various data analysis tasks

  • this can be enabled through the application of the ___________ design pattern, which involves the use of a relational data transfer engine

  • such storage devices implement functionality that automatically creates replicas of a dataset and copies them on multiple machines

  • Although expensive, _________ databases provide atomicity, consistency, isolation and durability (ACID) compliance while supporting the querying of data using Structured Query Language (SQL)

Explanation

Question 143 of 200

1

Relational Source

Select one or more of the following:

  • a relational data transfer engine is used to extract data from the relational database based on an SQL Query that internally uses connectors for connecting to different relational databases

  • The ___________ design pattern is generally applied when data needs to be extracted from internal OLTP systems, operational systems, such as CRM, ERP and SCM systems, or data warehouses

  • The ___________ pattern is associated with the data transfer engine (relational), storage device, workflow engine and productivity portal mechanisms

  • A productivity portal normally encapsulates a relational data transfer engine for point-and-click import

Explanation

Question 144 of 200

1

File-based Source

Select one or more of the following:

  • Finding hidden insights generally involves analyzing unstructured data, such as textual data, from internal as well as external data sources

  • Acquisition of large amounts of unstructured data from a variety of data sources can be automated through the application of the _____________ design pattern

  • is the process of compacting data in order to reduce its size, whereas decompression is the process of uncompacting data in order to bring the data back to its original size

  • the __________ provides functionality that ensures that the storage and access to data within the Big Data platform are managed throughout the lifespan of the data

Explanation

Question 145 of 200

1

File-based Source

Select one or more of the following:

  • A file data transfer engine is generally used to implement this design pattern that can further be encapsulated via the productivity portal

  • Apart from textual files, images, audio and video files can also be imported through the application of this design pattern

  • Note that the ____________ pattern also covers the acquisition of semi-structured data, such as XML or JSON-formatted data

  • The ___________ pattern is associated with the data transfer engine (file), storage device, workflow engine and productivity portal mechanisms

Explanation

Question 146 of 200

1

Streaming Source

Select one or more of the following:

  • Data flowing in at high speeds needs to be captured instantly so that it can be processed without any delay for obtaining maximum value

  • The _________ pattern is primarily implemented by using an event data transfer engine that is built on a publish-subscribe model and further uses a queue to ensure availability and reliability

  • A _________ can also be considered a superset of the traditional data architecture, as the former includes the development of data architecture for both raw and processed data

  • However, based on the physical implementation, the third-party tools used at the analysis layer may provide the ability to create visualizations and publish them for enterprise-wide use so that different information workers and business users can turn the published information into knowledge for making informed decisions

Explanation

Question 147 of 200

1

Streaming Source

Select one or more of the following:

  • The _________ pattern covers both human and machice-generated data and deals exclusively with unstructured and semi-structured data

  • The Realtime Access Storage design pattern is often applied in combination with the __________ design pattern when high velocitity data needs to be analyzed in realtime

  • Based on the supported feature set, an event data transfer engine may provide some level of in-flight data cleansing and simple statistic computation, such as count, min, max functionality

  • The _________ pattern is associated with the data transfer engine (event), storage device (in-memory) and productivity portal mechanisms

Explanation

Question 148 of 200

1

Poly Storage

Select one or more of the following:

  • The _________ compound pattern represents a part of a Big Data solution environment capable of storing high-volume, high-velocity and high-variety data for both streaming and random access

  • Random Access Storage (core)

  • Furthermore, the storage device automatically detects when a replica becomes unavailable and recreates the lost replica from one of the available replicas

  • Usual data processing techniques employed within a ___________ include data sharding and replication where large datasets are divided and replicated across multiple machines

Explanation

Question 149 of 200

1

Poly Storage

Select one or more of the following:

  • Streaming Access Storage (core)

  • Realtime Access Storage (core)

  • Automatic Data Replication and Reconstruction (core)

  • Data Size Reduction (optional)

Explanation

Question 150 of 200

1

Poly Storage

Select one or more of the following:

  • Cloud-based Big Data Storage (optional)

  • Confidential Data Storage (optional)

  • High Volume Binary Storage

  • Visualization Engine

Explanation

Question 151 of 200

1

Automatic Data Replication and Reconstruction

Select one or more of the following:

  • A Big Data platform generally consists of a cluster environment built using commodity hardware, which increases the chances of hardware failure

  • An entire dataset can be lost in the machine that saves the dataset becomes inavailable due to a hardware failure

  • With a ___________ architecture, in order to cope with greater resource demands for CPU and/or disk space, the only option to scale up by replacing existing machines with higher-end (expensive) machines. Scaling up allows more processing and offers greater storage

  • For health monitoring purposes, the cluster manager gathers metrics from various components running within different layers, such as the storage , processing and analysis layers, and displays their current status using a dashboard

Explanation

Question 152 of 200

1

Automatic Data Replication and Reconstruction

Select one or more of the following:

  • To make sure that data is not lost and clients can still have access if there are hardware failures, the __________ pattern can be applied, which requires the use of eithera distributed file system or a NoSQL database

  • such storage devices implement functionality that automatically creates replicas of a dataset and copies them on multiple machines

  • Based on the supported feature set, an event data transfer engine may provide some level of in-flight data cleansing and simple statistic computation, such as count, min, max functionality

  • Note that the ____________ pattern also covers the acquisition of semi-structured data, such as XML or JSON-formatted data

Explanation

Question 153 of 200

1

Automatic Data Replication and Reconstruction

Select one or more of the following:

  • Furthermore, the storage device automatically detects when a replica becomes unavailable and recreates the lost replica from one of the available replicas

  • The __________ pattern is also applied whenever the Dataset Decomposition and Automatic Data Sharding patterns are applied

  • The __________ is associated with the storage device mechanism

  • Dataset Type: the underlying format of the data produced by the source (structured, unstructured or semi-structured)

Explanation

Question 154 of 200

1

Data Size Reduction

Select one or more of the following:

  • In a Big Data solution environment where large amounts of data get accumulated in a short amount of time, storing data get accumulated in a short amount of time, storing data in its raw form can quickly consume available storage and may require continuous addition of storage devices to keep increasing storage capacity

  • Also the requirements of keeping all data online and maintaining redundant storage for fault-tolerance entail more storage space

  • Although the __________ is mainly concerned with policies that guide data management activities, it may further provide functionality for managing other aspects of the Big Data platform

  • To make sense of large amounts of data and to perform exploratory data analysis in support of finding meaningful insights, it is important to correctly interpret the results obtained from data analysis

Explanation

Question 155 of 200

1

Data Size Reduction

Select one or more of the following:

  • The _________ pattern is applied in these situations to reduce the storage footprint of data, make data transfer faster and decrease data storage cost

  • The application of the __________ pattern mainly requires the use of a compression engine

  • Although reducing storage footprint, the application of this pattern can increase the overall processing time, as data first needs decompressing. Hence, an efficient compression engine needs to be employed

  • The ________ pattern is associated with the compresssion engine, storage device, data transfer engine and processing engine mechanisms

Explanation

Question 156 of 200

1

Data Size Reduction

Select one or more of the following:

  • With a reasonable amount of data acquisition, IT spending only increases slightly with the passage of time. As the amount of acquired data increases exponentially, there is a tendency for IT spending to increase exponentially as well

  • However, the storage capacity does not need to be increased proportionally if a data compression engine is introduced. As a result, IT spending only increases slightly

  • This prior knowledge about the structure of the data makes ___________ databases very fast at querying large datasets

  • The complete data processing cycle in Big Data environments consists of a number of activities, from data ingress to the computation of results and data egress

Explanation

Question 157 of 200

1

Random Access Storage

Select one or more of the following:

  • The _________ compound pattern represents a part of a Big Data solution environment capable of storing high-volume and high-variety data and make-it available for indexed, ___________

  • Big Data solutions demand opposing access requirements when it comes to raw versus processed data

  • Complex ___________ may involve more than one data pipeline, for example, one for realtime data processing and the other for batch data processing

  • the _______ further help reduce cluster administration overhead and makes diagnosis more efficient

Explanation

Question 158 of 200

1

Random Access Storage

Select one or more of the following:

  • Although raw data is normally accessed in sequential manner, processed data requires non-sequential access such that specific records, identified via a key or a field, can be accessed individually

  • To enable random write and read of data, The __________ pattern can be applied, which involves the use of a storage device in the form of a NoSQL database

  • High Volume Binary Storage

  • High Volume Tabular Storage

Explanation

Question 159 of 200

1

Random Access Storage

Select one or more of the following:

  • High Volume Linked Storage

  • High Volume Hierarchical Storage

  • Automatic Data Sharding

  • governance layer

Explanation

Question 160 of 200

1

High Volume Binary Storage

Select one or more of the following:

  • Storage of unstructured data generally involves access scenarios where partial updates of data are not performed and specific data items (records) always accessed in their entirely, such as an image or user session data

  • Such data can be treated as a BLOB that is only accessible via a unique key

  • By acting as an intermediary, the ________ provides seamless access to the Big Data platform in a secure manner without the need for custom integration

  • Note that some level of post-processing may be required to put the file in the required format before it can be copied over the target location

Explanation

Question 161 of 200

1

High Volume Binary Storage

Select one or more of the following:

  • To provide efficient storage of such data, the ____________ pattern can be applied to stipulate the use of a storage device in the form of a key-value NoSQL database that services insert, select and delete operations

  • To achieve low latency data access, a memory based storage device can be used instead, however, this increases the cost of setting up the Big Data platform

  • The ___________ pattern is associated with the storage device (key-value) and serialization engine mechanisms

  • The __________ is the underlying technology architecture that supports the execution of multiple Big Data solutions

Explanation

Question 162 of 200

1

High Volume Tabular Storage

Select one or more of the following:

  • In Big Data environment, large volume not only refers to tall datasets (a large number of rows) but also to wide datasets (a large number of columns)

  • In some cases, each column may itself contain a number of other columns

  • In a clustered environment, this can become cumbersome and difficult to maintain, especially where data is accessed across the enterprise with varying levels of authorization

  • The _________ pattern is associated with the data transfer engine (event), storage device (in-memory) and productivity portal mechanisms

Explanation

Question 163 of 200

1

High Volume Tabular Storage

Select one or more of the following:

  • A relational database cannot be used in such circumstances due to a limit on columns and the inability to store more than one value in a column

  • The ___________ pattern can be applied to store such data, which stipulates the use of a storage device implemented via a column-family NoSQL database servicing insert, select, update and delete operations

  • The functionality supported by this layer relates to the operational requirements of a Big Data platform, including cluster setup, cluster expansion, system and software upgrades across the cluster and fault diagnosis and health monitoring of the cluster

  • The term logical emphasizes the fact that the description of the architecture does not bear any resemblance to the physical implementation of the system

Explanation

Question 164 of 200

1

High Volume Tabular Storage

Select one or more of the following:

  • The use of a column-family database enables storing data in more traditional, table-like storage, where each record may further consist of a logical groups of fields that are generally accessed together

  • This pattern is associated with the storage device (column-family) and serialization engine mechanisms

  • a file data transfer engine is used directly or indirectly through the productivity portal for ad-hoc exports

  • Therefore ______ is ideal for processing semi-structured and unstructured data in support of executing analytical queries

Explanation

Question 165 of 200

1

High Volume Linked Storage

Select one or more of the following:

  • One prominent area within the field of pattern identification is the analysis of connected entities. Due to the large volume of data in Big Data environments, efficient and timely analysis of such data requires specialized storage

  • The _________ pattern can be applied to store data consisting of linked entities. This pattern is typically implemented via the use of a storage device based on a graph NoSQL database that enables defining relationships between entities

  • However, a __________ goes beyond data curation and includes data processing, analysis and visualization technologies as well

  • The _________ pattern is associated with the processing engine, storage device, workflow engine, resource manager and coordination engine mechanisms

Explanation

Question 166 of 200

1

High Volume Linked Storage

Select one or more of the following:

  • The use of graph NoSQL databases enables finding clusters of connected entities among a very large set of entities, investigating if entities are connected together or calculating distances between entities

  • The ________ pattern is associated with the storage device (graph) and serialization engine mechanisms

  • The ________ pattern is associated with the query engine, processing engine, storage device, resource manager and coordination engine mechanisms

  • Due to the nature of the deployed processing engine, it may not be possible to execute the entire logic as a single processing run. Even if it were possible to do so, the testing, debugging and maintenance of the logic may become difficult

Explanation

Question 167 of 200

1

High Volume Hierarchical Storage

Select one or more of the following:

  • Semi-structured data conforming to a nested schema often requires storage in a way such that the schema structure is maintained and sub-sections of a particular data item (record) can be individually accessed and updated

  • The ________ pattern can be applied in circumstances where data represents a document-like structure that is self-describing and access to individual elements of data is required

  • A _______ may also integrate with enterprise identity and access management (IAM) systems to enable single sign-on (SSO)

  • A __________ can be defined for varying levels of software artifacts ranging from a single software library to the set of sotfware systems across the entire IT enterprise

Explanation

Question 168 of 200

1

High Volume Hierarchical Storage

Select one or more of the following:

  • This pattern requires the use of a storage device implemented via a document NoSQL database servicing insert, select, update and delete operations. The document NoSQL database generally automatically encodes the data using a binary or a plain-text hierarchical format, such as JSON, before storage

  • The ________ pattern is associated with the storage device (document) mechanisms

  • This includes exporting results to dashboard and alerting applications (online portal), operational systems (CRM, SCM, ERP and e-commerce systems) and automated business processes (Business Process Execution Language-based processes)

  • The _______ pattern is normally applied together with the Automatic Data Replication and Reconstruction pattern so that shards are not lost in the case of a hardware failure and so that the database remains available

Explanation

Question 169 of 200

1

Automatic Data Sharding

Select one or more of the following:

  • Storing very large dataset where they are accessed by a number of users simultaneously can seriously affect the data access performance of the underlying database

  • To counter this issue, the dataset is horizontally broken into smaller parts as prescribed by the ___________ pattern

  • The term logical emphasizes the fact that the description of the architecture does not bear any resemblance to the physical implementation of the system

  • the distributed file system automatically splits a large dataset into multiple smaller datasets that are then spread across the cluster

Explanation

Question 170 of 200

1

Automatic Data Sharding

Select one or more of the following:

  • this pattern is enabled via a NoSQL database that automatically creates shards based on a configurable field in the dataset ans stores the shards across different machines in a cluster

  • As the dataset is distributed across multiple shards, the query completion time may be affected if the query requires collating data from more than one shard

  • Such a solution environment represents a set of multiple Big Data mechanism that collectively provide the required business functionality

  • The functionality provided by this layer corresponds to the data analysis stage of the Big Data analysis lifecycle

Explanation

Question 171 of 200

1

Automatic Data Sharding

Select one or more of the following:

  • The _______ pattern is normally applied together with the Automatic Data Replication and Reconstruction pattern so that shards are not lost in the case of a hardware failure and so that the database remains available

  • The _______ pattern is associated with the storage device mechanism

  • Based on the supported feature set, an event data transfer engine may provide some level of in-flight data cleansing and simple statistic computation, such as count, min, max functionality

  • To make sure that data is not lost and clients can still have access if there are hardware failures, the __________ pattern can be applied, which requires the use of either a distributed file system or a NoSQL database

Explanation

Question 172 of 200

1

Streaming Access Storage

Select one or more of the following:

  • The _______ compound pattern represents a part of a Big Data solution environment capable of storing high-volume and high-variety data and making it available for stream access

  • A large proportion of data processing tasks in Big Data involves acquiring and processing data in batches. When processing data in batches, sequential access to data is critical to timely processing. Therefore a storage device does not need to provide random access to the data, but rather ___________

  • This helps with establishing the scalability requirements of each Big Data mechanism and determining any potential performance bottlenecks

  • Once the limit is reached, the only option is to scale out. Scaling out is a Big Data processing requirement that is not supported by _________

Explanation

Question 173 of 200

1

Streaming Access Storage

Select one or more of the following:

  • Streaming Storage

  • Dataset Decomposition

  • File-Based Sink

  • Cloud-based Big Data Storage (optional)

Explanation

Question 174 of 200

1

Streaming Storage

Select one or more of the following:

  • The _______ pattern can be applied in a scenario whereby data needs to be retrieved in a streaming or sequential manner

  • The application of this design pattern requires the use of a storage device that provides non-random write and read capabilities and is generally implemented via a distributed file system

  • However, with the passage of time, as more Big Data solutions are built and their complexity increases, additional Big Data mechanisms are introduced

  • Although raw data is normally accessed in sequential manner, processed data requires non-sequential access such that specific records, identified via a key or a field, can be accessed individually

Explanation

Question 175 of 200

1

Streaming Storage

Select one or more of the following:

  • The _______ pattern is normally applied together with the Large-Scale Batch Processing pattern as part of a complete solution

  • the _______ pattern is associated with the storage device (distributed file system) and processing engine (batch) mechanisms.

  • Based on the type and location of the data sources, this layer may consist of more than one data transfer engine mechanism

  • This layer abstracts the processing layer with a view of making data analysis easier and further increasing the reach of the Big Data platform to data scientists and data analysts

Explanation

Question 176 of 200

1

Dataset Decomposition

Select one or more of the following:

  • Storing large datasets as a single file does not lend itself to the distributed processing technologies deployed within the Big Data solution environment

  • Distributed processing technologies work on the principle of divide-and-conquer, requiring a dataset to be available as parts across the cluster

  • The _______ compound pattern represents a part of a Big Data solution environment capable of storing high-volume and high-variety data and making it available for stream access

  • Although reducing storage footprint, the application of this pattern can increase the overall processing time, as data first needs decompressing. Hence, an efficient compression engine needs to be employed

Explanation

Question 177 of 200

1

Dataset Decomposition

Select one or more of the following:

  • This can be achieved through the application of the __________ pattern, which requires the use of a distributed file system storage device

  • the distributed file system automatically splits a large dataset into multiple smaller datasets that are then spread across the cluster

  • The _________ pattern is associated with the storage device and processing engine mechanisms

  • This pattern is associated with the storage device (column-family) and serialization engine mechanisms

Explanation

Question 178 of 200

1

Big Data Processing Environment

Select one or more of the following:

  • The ___________ compound pattern represents a part of a Big Data solution environment capable of handling the range of distinct requirements of large-scale Big Data dataset processing

  • Large-Scale Batch Processing (core)

  • In some cases, each column may itself contain a number of other columns

  • Additionally, a __________ may encapsulate a visualization engine in order to provide more meaningful, graphical views of data

Explanation

Question 179 of 200

1

Big Data Processing Environment

Select one or more of the following:

  • Large-Scale Graph Processing (core)

  • High Velocity Realtime Processing (core)

  • Data Size Reduction

  • Automatic Data Sharding

Explanation

Question 180 of 200

1

Big Data Processing Environment

Select one or more of the following:

  • Intermediate Results Storage (optional)

  • Processing Abstraction (optional)

  • Dataset Decomposition

  • data sources layer

Explanation

Question 181 of 200

1

Big Data Processing Environment

Select one or more of the following:

  • Automated Processing Metadata Insertion (optional)

  • Complex Logic Decomposition (optional)

  • Cloud-based Big Data Processing (optional)

  • High Volume Tabular Storage

Explanation

Question 182 of 200

1

Large-Scale Batch Processing

Select one or more of the following:

  • One of the main differentiating characteristics of Big Data environments when compared with traditional data processing environments is the sheer amount of data that needs to be processed

  • Efficient processing of large amounts of data demands an offline processing strategy, as dictated by the ___________ design pattern

  • the ______ supports the deployment of new services and the addition of nodes to a cluster

  • Note that in the case of realtime data processing, the ________ also consist of in -memory storage technologies that enable fast analysis of high velocity data as it arrives

Explanation

Question 183 of 200

1

Large-Scale Batch Processing

Select one or more of the following:

  • The application of the __________ pattern enforces processing of the entire dataset as a single processing run, which requires that the batch of data is amassed first in a storage device and only then processed using a batch processing, such as MapReduce

  • Although computed results are not immediately available, the application of this pattern enables a simple data processing solution, providing maximum throughput

  • A __________ defines the logical components required for the implementation of a Big Data analytics solution

  • A _______ may also integrate with enterprise identity and access management (IAM) systems to enable single sign-on (SSO)

Explanation

Question 184 of 200

1

Large-Scale Batch Processing

Select one or more of the following:

  • In the case of continuosly arriving data, data is first accumulated to create a batch of data and only then processed

  • This design pattern is generally applied together with the Stream Access Storage pattern

  • The _________ pattern is associated with the processing engine (batch), data transfer engine (relational/file), storage device (disk-based), resource manager and coordination engine mechanisms

  • the set of operations that need to be performed on the data is specified as a flowchart that is then automatically executed by the workflow engine at set intervals

Explanation

Question 185 of 200

1

Complex Logic Decomposition

Select one or more of the following:

  • Computing results for certain data processing jobs involves executing _________, such as finding the customer with the maximum spend amount based on transaction data for a large number of customers

  • Due to the nature of the deployed processing engine, it may not be possible to execute the entire logic as a single processing run. Even if it were possible to do so, the testing, debugging and maintenance of the logic may become difficult

  • However, the nature of such visualizations is different and more analysis-specific

  • The _______ compound pattern represents a part of a Big Data solution environment capable of storing high-volume and high-variety data and making it available for stream access

Explanation

Question 186 of 200

1

Complex Logic Decomposition

Select one or more of the following:

  • In such a situation, the _________ pattern can be applied, which requires dividing the ________ into multiple simple steps. This is executed over multiple processing runs

  • Generally, these multiple processing runs are connected together using the provided functionality within the processing engine or through further application of the Automated Dataset Execution pattern

  • The _________ pattern is associated with the processing engine, storage device, workflow engine, resource manager and coordination engine mechanisms

  • makes use of commodity hardware where machines are generally networked using Local Area Network (LAN) technology

Explanation

Question 187 of 200

1

Processing Abstraction

Select one or more of the following:

  • Processing Big Data datasets involves the use of processing engines that need programmatic skills in order to work with them

  • Due to the contemporary nature of these processing engines and the specialized processing frameworks they follow, programmers may not be conversant with the APIs of each processing engine

  • A ________ stores data in the form of connected entities where each record is called a node or a vertex and the connection between the entities is called the edge, which can be one-way or two-way

  • Due to the requirement of integrating with multiple types of components, an inteorperable and extensible cluster manager should be chosen

Explanation

Question 188 of 200

1

Processing Abstraction

Select one or more of the following:

  • To make data processing easier by not having to deal with the intricacies of processing engines, the ________ pattern can be applied, which uses a query engine to abstract away the underlying processing engine

  • The query engine provides an easy-to-interact-with interface where the user specifies a script that is automatically converted to low-level API calls for the required processing engine

  • This work well for Big Data where a single dataset may be divided across several machines due to its volume

  • a file data transfer engine is used directly or indirectly through the productivity portal for ad-hoc exports

Explanation

Question 189 of 200

1

Processing Abstraction

Select one or more of the following:

  • The application of the __________ pattern further increases the reach of the Big Data soltution environment to non-IT users, such as data analysts and data scientists

  • The ________ pattern is associated with the query engine, processing engine, storage device, resource manager and coordination engine mechanisms

  • For internal structured data sources, a relational data transfer engine can be used

  • The _______ pattern is associated with the storage device mechanism

Explanation

Question 190 of 200

1

Poly Sink

Select one or more of the following:

  • The ________ compound pattern represents a part of a Big Data solution environment capable of egressing high-volume, high-velocity and high-variety data out of the Big Data solution environment

  • Relational Sink

  • Depending on the required functionality, a _______represents a partial or a complete Big Data solution in support of Big Data analysis

  • The architecture of the entire Big Data platform that enables the execution of multiple Big Data solutions

Explanation

Question 191 of 200

1

Poly Sink

Select one or more of the following:

  • File-based Sink

  • Streaming Egress

  • Poly Source

  • Streaming Storage

Explanation

Question 192 of 200

1

Relational Sink

Select one or more of the following:

  • The majority of enterprise IT systems use relational databases as their storage backends

  • However, the method of incorporating data analysis results from a Big Data solution into such systems by first exporting results as a delimited file and then importing into the relational database takes time, is error-prone and is not a scalable solution

  • An enterprise generally starts off or is already at the descriptive or diagnostic analytics maturity level and aims to move towards the predictive or predictive analytics maturity level

  • The ________ compound pattern represents a part of a Big Data solution environment capable of egressing high-volume, high-velocity and high-variety data out of the Big Data solution environment

Explanation

Question 193 of 200

1

Relational Sink

Select one or more of the following:

  • the ________ pattern can be applied for directly exporting processed data to a relational database, which requires the use of a relational data transfer engine

  • Instead of directly using the data transfer engine, it can be indirectly invoked via a productivity portal which normally denotes ad-hoc usage

  • The _________ compound pattern represents a fundamental solution evironment comprised of a processing ______ with data ingress, storage, processing and egress capabilities

  • Storing very large dataset where they are accessed by a number of users simultaneously can seriously affect the data access performance of the underlying database

Explanation

Question 194 of 200

1

Relational Sink

Select one or more of the following:

  • A workflow engine can be used to automate the whole process and to perform data export at regular intervals

  • The _________ pattern is associated with the data transfer engine (relational), storage device, processing engine, productivity portal and workflow engine mechanisms

  • The resulting architecture is known as the __________, which includes the architecture of the Big Data solution, any connected enterprise systems and integration components

  • a file data transfer engine is used directly or indirectly through the productivity portal for ad-hoc exports

Explanation

Question 195 of 200

1

File-based Sink

Select one or more of the following:

  • On some occasions, data analysis results from a Big Data solution need to be incorporated into enterprise IT systems that use proprietary storage technologies, such as an embedded database or ________ storage, rather than using a relational database and provide a ________ import method

  • Like the Relational Sink pattern, manual export from the Big Data platform and import into such systems is not a viable solution

  • The typical makeup of a __________ includes the storage layer, processing layer, analysis layer and visualization layer

  • _________, a framework for processing data, requires interaction via a general purpose programming language, such as java

Explanation

Question 196 of 200

1

File-based Sink

Select one or more of the following:

  • The ________ pattern can be applied to automatically export data from the Big Data platform as a delimited or a hierarchical file

  • a file data transfer engine is used directly or indirectly through the productivity portal for ad-hoc exports

  • ___________ - based processing platforms, such a Hadoop, do not require knowledge of data structure at load time.

  • As a result, data compression can be used to effectively increase the storage capacity of disk/memory space. In turn, this helps to reduce storage cost

Explanation

Question 197 of 200

1

File-based Sink

Select one or more of the following:

  • For regular exports, the file data transfer engine can be configurated via a workflow engine to run at regular intervals

  • Note that some level of post-processing may be required to put the file in the required format before it can be copied over the target location

  • The _________ pattern is associated with the data transfer engine (file), storage device, processing engine, productivity portal and workflow engine mechanisms

  • Although providing instantaneous results, setting up such as capability is not only complex but also expensive due to the reliance on memory-based storage (memory is more expensive that disk)

Explanation

Question 198 of 200

1

Automated Dataset Execution

Select one or more of the following:

  • The complete data processing cycle in Big Data environments consists of a number of activities, from data ingress to the computation of results and data egress

  • Furthermore, in a production environment, the complete cycle needs to be repeated over and over again

  • Instead of persisting the data to a disk-based storage device, _________ persists the data to a memory-based storage device

  • Based on the type and location of the data sources, this layer may consist of more than one data transfer engine mechanism

Explanation

Question 199 of 200

1

Automated Dataset Execution

Select one or more of the following:

  • Performing data processing activities manually is time-consuming and an inefficient use of development resources

  • To enable the automatic execution of data processing activities, the _________ pattern can be applied by implementing a workflow engine

  • However, if internal data is also integrated, the same solution will provide business-specific results

  • As a result, data compression can be used to effectively increase the storage capacity of disk/memory space. In turn, this helps to reduce storage cost

Explanation

Question 200 of 200

1

Automated Dataset Execution

Select one or more of the following:

  • the set of operations that need to be performed on the data is specified as a flowchart that is then automatically executed by the workflow engine at set intervals

  • This pattern can also be applied together with the Complex Logic Decomposition pattern to automate the execution of multiple processing jobs

  • The ________ pattern is associated with the workflow engine, data transfer engine, storage device, processing engine, query engine and productivity portal mechanisms

  • Note that some level of post-processing may be required to put the file in the required format before it can be copied over the target location

Explanation