kmellzie
Test por , creado hace más de 1 año

This is a preparation Test for Cloudera Hadoop Developer Certification.

67
0
0
kmellzie
Creado por kmellzie hace casi 9 años
Cerrar

Cloudera Hadoop Developer

Pregunta 1 de 9

1

Combiners increase the efficiency of a MapReduce program because

Selecciona una de las siguientes respuestas posibles:

  • They provide a mechanism for different mappers to communicate with each other, thereby reducing synchronization overhead.

  • They provide an optimization and reduce the total number of computations that are needed to execute an algorithm by a factor of N, where N is the number of reducers.

  • They aggregate intermediate map output locally on each individual machine and therefore reduce the amount of data that needs to be shuffled across the network to the reducers.

  • They aggregate intermediate map output horn a small number of nearby (i.e. rack-local) machines and therefore reduce the amount of data that needs to be shuffled across the network of the reducers.

Explicación

Pregunta 2 de 9

1

In a large MapReduce job with M mappers and R reducers, how many distinct copy operations will be there in the sort/shuffle phase?

Selecciona una de las siguientes respuestas posibles:

  • M

  • R

  • M + R

  • M x R

  • M ** R

Explicación

Pregunta 3 de 9

1

What happens in a MapReduce job when you set the number of reducers to one?

Selecciona una de las siguientes respuestas posibles:

  • A single reducer gathers and processes all the output from all the mappers. The output is written in as many separate files as there are mappers.

  • A single reducer gathers and processes all the output from all the mappers. The output is written to a simple file in HDFS.

  • Setting the number of reducers to one creates a processing bottleneck, and since the number of reducers as specified by the programmer is used as a reference value only, the MapReduce runtime provides a default setting for the number of reducers.

  • Setting the number of reducers to one is invalid, and an exception is thrown.

Explicación

Pregunta 4 de 9

1

In the standard Word-count MapReduce algorithm, why might using a combiner reduce the overall Job running time?

Selecciona una de las siguientes respuestas posibles:

  • Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster

  • Because combiners perform local aggregation of word counts, thereby reducing the number of mappers that need to run

  • Because combiners perform local aggregation of word counts, and then transfer data to reducers without writing the intermediate data on disk

  • Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be snuff across the network to the reducers

Explicación

Pregunta 5 de 9

1

Which two of the following are valid statements? (Select two)

Selecciona una o más de las siguientes respuestas posibles:

  • HDFS is optimized for storing a large number of files smaller than the HDFS block size

  • HDFS has the Characteristic of supporting a "write once, read many" data access model

  • HDFS is a distributed file system that replaces ext3 or ext4 on linux nodes in a Hadoop cluster

  • HDFS is a distributed file system that runs on top of native OS filesystems and is well suited to storage of very large data sets

Explicación

Pregunta 6 de 9

1

You need to create a GUI application to help your company's sales people add and edit customer information. Would HDFS be appropiate for this customer information file?

Selecciona una de las siguientes respuestas posibles:

  • Yes, because HDFS is optimized for random access writes

  • Yes, because HDFS is optimized for fast retrieval of relatively small amounts of data

  • No, because HDFS can only be accessed by MapReduce applications

  • No, because HDFS is optimized for write-once, streaming access for relatively large files

Explicación

Pregunta 7 de 9

1

Which of the following describes how a client reads a file from HDFS

Selecciona una de las siguientes respuestas posibles:

  • The client queries the NameNode for the block location(s). The NameNode returns the block location(s) to the client. The client reads the data directly off the DataNode(s)

  • The client queries all DataNodes in parallel. The DataNode that contains the requested data responds directly to the client. The client reads the data directly off the DataNode

  • The client contacts the NameNode for the block location(s). The NameNode then queries the DataNodes for block locations. The DataNodes respond to the NameNode, and the NameNode redirects the client to the DataNode that holds the requested data block(s). The client then reads the data directly off the DataNode

  • The client contacts the NamaNode for the block location(s). The NameNode contacts the DataNode that holds the requested data block. Data is transferred from the DataNode to the NameNode, and then from the NameNode to the client

Explicación

Pregunta 8 de 9

1

Which of the following statements best describes how a large (100 GB) file stored in HDFS

Selecciona una de las siguientes respuestas posibles:

  • The file is divided into variable size blocks, which are stored on multiple data nodes. Each block is replicated three times by default

  • The file is replicated three times by default. Each copy of the file is stored on a separate datanodes

  • The master copy of the file is stored on a sigle DataNode. The replica copies are divided into fixed-size blocks, which are stored on multiple DataNodes

  • The file is divided into fixed-sized blocks, which are stored on multiple DataNodes. Each block is replicated three times by default. Multiple blocks from the same file might reside on the same DataNode

  • The file is divided into fixed-size blocks, which are stored on multiple DataNodes. Each block is replicated three times by default. HDFS guarantees that different blocks from the same file are never on the same DataNode

Explicación

Pregunta 9 de 9

1

Your cluster has 10 DataNodes, each with a single 1 TB hard drive. You utilize all your disk capacity for HDFS, reserving none for MapReduce. You implement default replication settings.

What is the storage capacity of your Hadoop cluster (assuming no compression)

Selecciona una de las siguientes respuestas posibles:

  • About 3 TB

  • About 5 TB

  • About 10 TB

  • About 11 TB

Explicación