최신 Databricks-Certified-Professional-Data-Engineer 무료덤프 - Databricks Certified Professional Data Engineer

문제1

A Delta Lake table representing metadata about content posts from users has the following schema:
* user_id LONG
* post_text STRING
* post_id STRING
* longitude FLOAT
* latitude FLOAT
* post_time TIMESTAMP
* date DATE
Based on the above schema, which column is a good candidate for partitioning the Delta Table?

A. date

B. user_id

C. post_id

D. post_time

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제2

A Databricks SQL dashboard has been configured to monitor the total number of records present in a collection of Delta Lake tables using the following query pattern:
SELECT COUNT (*) FROM table -
Which of the following describes how results are generated each time the dashboard is updated?

A. The total count of records is calculated from the Hive metastore

B. The total count of rows is calculated by scanning all data files

C. The total count of records is calculated from the parquet file metadata

D. The total count of rows will be returned from cached results unless REFRESH is run

E. The total count of records is calculated from the Delta transaction logs

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

문제3

An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to load data with the following code:
df = spark.read.format("parquet").load(f"/mnt/source/(date)")
Which code block should be used to create the date Python variable used in the above code block?

A. date = spark.conf.get("date")

B. date = dbutils.notebooks.getParam("date")

C. dbutils.widgets.text("date", "null")
date = dbutils.widgets.get("date")

D. import sys
date = sys.argv[1]

E. input_dict = input()
date= input_dict["date"]

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제4

A production cluster has 3 executor nodes and uses the same virtual machine type for the driver and executor.
When evaluating the Ganglia Metrics for this cluster, which indicator would signal a bottleneck caused by code executing on the driver?

A. Overall cluster CPU utilization is around 25%

B. Total Disk Space remains constant

C. Network I/O never spikes

D. The five Minute Load Average remains consistent/flat

E. Bytes Received never exceeds 80 million bytes per second

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제5

Which of the following is true of Delta Lake and the Lakehouse?

A. Views in the Lakehouse maintain a valid cache of the most recent versions of source tables at all times.

B. Delta Lake automatically collects statistics on the first 32 columns of each table which are leveraged in data skipping based on query filters.

C. Primary and foreign key constraints can be leveraged to ensure duplicate values are never entered into a dimension table.

D. Z-order can only be applied to numeric values stored in Delta Lake tables

E. Because Parquet compresses data row by row. strings will only be compressed when a character is repeated multiple times.

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제6

A junior data engineer is migrating a workload from a relational database system to the Databricks Lakehouse. The source system uses a star schema, leveraging foreign key constrains and multi-table inserts to validate records on write.
Which consideration will impact the decisions made by the engineer while migrating this workload?

A. All Delta Lake transactions are ACID compliance against a single table, and Databricks does not enforce foreign key constraints.

B. Foreign keys must reference a primary key field; multi-table inserts must leverage Delta Lake's upsert functionality.

C. Databricks only allows foreign key constraints on hashed identifiers, which avoid collisions in highly- parallel writes.

D. Committing to multiple tables simultaneously requires taking out multiple table locks and can lead to a state of deadlock.

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제7

A Databricks job has been configured with 3 tasks, each of which is a Databricks notebook. Task A does not depend on other tasks. Tasks B and C run in parallel, with each having a serial dependency on task A.
If tasks A and B complete successfully but task C fails during a scheduled run, which statement describes the resulting state?

A. All logic expressed in the notebook associated with task A will have been successfully completed; tasks B and C will not commit any changes because of stage failure.

B. All logic expressed in the notebook associated with tasks A and B will have been successfully completed; some operations in task C may have completed successfully.

C. All logic expressed in the notebook associated with tasks A and B will have been successfully completed; any changes made in task C will be rolled back due to task failure.

D. Unless all tasks complete successfully, no changes will be committed to the Lakehouse; because task C failed, all commits will be rolled back automatically.

E. Because all tasks are managed as a dependency graph, no changes will be committed to the Lakehouse until ail tasks have successfully been completed.

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제8

A junior data engineer is working to implement logic for a Lakehouse table namedsilver_device_recordings.
The source data contains 100 unique fields in a highly nested JSON structure.
Thesilver_device_recordingstable will be used downstream to power several production monitoring dashboards and a production model. At present, 45 of the 100 fields are being used in at least one of these applications.
The data engineer is trying to determine the best approach for dealing with schema declaration given the highly-nested structure of the data and the numerous fields.
Which of the following accurately presents information about Delta Lake and Databricks that may impact their decision-making process?

A. The Tungsten encoding used by Databricks is optimized for storing string data; newly-added native support for querying JSON strings means that string types are always most efficient.

B. Schema inference and evolution on .Databricks ensure that inferred types will always accurately match the data types used by downstream systems.

C. Because Delta Lake uses Parquet for data storage, data types can be easily evolved by just modifying file footer information in place.

D. Because Databricks will infer schema using types that allow all observed data to be processed, setting types manually provides greater assurance of data quality enforcement.

E. Human labor in writing code is the largest cost associated with data engineering workloads; as such, automating table declaration logic should be a priority in all migration workloads.

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제9

The data science team has created and logged a production using MLFlow. The model accepts a list of column names and returns a new column of type DOUBLE.
The following code correctly imports the production model, load the customer table containing the customer_id key column into a Dataframe, and defines the feature columns needed for the model.

Which code block will output DataFrame with the schema'' customer_id LONG, predictions DOUBLE''?

A. Df. Select (''customer_id''.
Model (''columns) alias (''predictions'')

B. Df.apply(model, columns). Select (''customer_id, prediction''

C. Df, map (lambda k:midel (x [columns]) ,select (''customer_id predictions'')

D. Model, predict (df, columns)

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제10

The data architect has mandated that all tables in the Lakehouse should be configured as external (also known as "unmanaged") Delta Lake tables.
Which approach will ensure that this requirement is met?

A. When tables are created, make sure that the EXTERNAL keyword is used in the CREATE TABLE statement.

B. When a database is being created, make sure that the LOCATION keyword is used.

C. When the workspace is being configured, make sure that external cloud object storage has been mounted.

D. When configuring an external data warehouse for all table storage, leverage Databricks for all ELT.

E. When data is saved to a table, make sure that a full file path is specified alongside the Delta format.

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

최신 Databricks-Certified-Professional-Data-Engineer 무료덤프 - Databricks Certified Professional Data Engineer

우리와 연락하기

유용한 링크

최신 업데이트