Eladio Rocha
Quiz por , criado 18 dias atrás

This quiz covers the core concepts of data lakes and their implementation in big data systems, with a focus on AWS tools like S3, Glue, Athena, and Redshift Spectrum. It explores how to structure, query, and optimize data in distributed file storage systems, emphasizing practical design considerations and performance optimization strategies.

1
0
0
Eladio Rocha
Criado por Eladio Rocha 18 dias atrás
Fechar

Data Lakes and Big Data Systems

Questão 1 de 9

1

What is a "data lake"?

Selecione uma das seguintes:

  • A formal database for structured data.

  • A distributed file storage system containing raw, unstructured data.

  • A highly redundant server-based database system.

  • A collection of relational database schemas.

Explicação

Questão 2 de 9

1

Which cloud service is commonly used to implement a data lake?

Selecione uma das seguintes:

  • Amazon RDS

  • Amazon S3

  • Amazon DynamoDB

  • Amazon EC2

Explicação

Questão 3 de 9

1

What is the primary purpose of AWS Glue in the context of a data lake?

Selecione uma das seguintes:

  • To store data redundantly across regions.

  • To provide a SQL interface for querying raw data.

  • To crawl unstructured data and define schemas.

  • To optimize database queries for performance.

Explicação

Questão 4 de 9

1

What tool allows SQL queries directly on data stored in Amazon S3?

Selecione uma das seguintes:

  • Amazon DynamoDB

  • Amazon Athena

  • Amazon ElasticSearch

  • Amazon Lambda

Explicação

Questão 5 de 9

1

How does Redshift Spectrum enhance the capabilities of Amazon Redshift?

Selecione uma das seguintes:

  • By integrating with Amazon Glue to create schemas.

  • By querying data stored directly in Amazon S3.

  • By offering serverless SQL querying capabilities.

  • By storing all data in highly redundant clusters.

Explicação

Questão 6 de 9

1

Why is partitioning data important in a data lake?

Selecione uma das seguintes:

  • To replicate data across regions for redundancy.

  • To organize raw files into predefined schemas.

  • To improve query performance by narrowing data access.

  • To ensure compatibility with Amazon Glue.

Explicação

Questão 7 de 9

1

What is a typical partitioning strategy for storing log data?

Selecione uma das seguintes:

  • Partitioning by file size.

  • Partitioning by data source.

  • Partitioning by date.

  • Partitioning by user ID.

Explicação

Questão 8 de 9

1

How should you approach data lake architecture from a system design perspective?

Selecione uma das seguintes:

  • Design the data lake structure based on how end-users will query the data.

  • Store all data in a single bucket without structure to maximize flexibility.

  • Focus exclusively on schema design before considering query patterns.

  • Prioritize database migration over partitioning strategies.

Explicação

Questão 9 de 9

1

What is one advantage of using off-the-shelf tools like AWS Glue and Amazon Athena?

Selecione uma das seguintes:

  • They allow complete control over low-level data management.

  • They eliminate the need to think about data structure.

  • They enable scalable and reliable big data solutions with minimal custom design.

  • They prevent redundancy in cloud storage systems.

Explicação