최신 Associate-Developer-Apache-Spark-3.5 무료덤프 - Databricks Certified Associate Developer for Apache Spark 3.5

문제1

A Spark application is experiencing performance issues in client mode because the driver is resource- constrained.
How should this issue be resolved?

A. Switch the deployment mode to local mode

B. Add more executor instances to the cluster

C. Increase the driver memory on the client machine

D. Switch the deployment mode to cluster mode

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제2

A data analyst builds a Spark application to analyze finance data and performs the following operations:filter, select,groupBy, andcoalesce.
Which operation results in a shuffle?

A. coalesce

B. select

C. filter

D. groupBy

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제3

A data engineer is asked to build an ingestion pipeline for a set of Parquet files delivered by an upstream team on a nightly basis. The data is stored in a directory structure with a base path of "/path/events/data". The upstream team drops daily data into the underlying subdirectories following the convention year/month/day.
A few examples of the directory structure are:

Which of the following code snippets will read all the data within the directory structure?

A. df = spark.read.option("inferSchema", "true").parquet("/path/events/data/")

B. df = spark.read.parquet("/path/events/data/*")

C. df = spark.read.option("recursiveFileLookup", "true").parquet("/path/events/data/")

D. df = spark.read.parquet("/path/events/data/")

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제4

How can a Spark developer ensure optimal resource utilization when running Spark jobs in Local Mode for testing?
Options:

A. Increase the number of local threads based on the number of CPU cores.

B. Set the spark.executor.memory property to a large value.

C. Use the spark.dynamicAllocation.enabled property to scale resources dynamically.

D. Configure the application to run in cluster mode instead of local mode.

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제5

A developer is working with a pandas DataFrame containing user behavior data from a web application.
Which approach should be used for executing agroupByoperation in parallel across all workers in Apache Spark 3.5?
A)
Use the applylnPandas API
B)

C)

D)

A. Use theapplyInPandasAPI:
df.groupby("user_id").applyInPandas(mean_func, schema="user_id long, value double").show()

B. Use themapInPandasAPI:
df.mapInPandas(mean_func, schema="user_id long, value double").show()

C. Use a Pandas UDF:
@pandas_udf("double")
def mean_func(value: pd.Series) -> float:
return value.mean()
df.groupby("user_id").agg(mean_func(df["value"])).show()

D. Use a regular Spark UDF:
from pyspark.sql.functions import mean
df.groupBy("user_id").agg(mean("value")).show()

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제6

A data engineer is building an Apache Spark™ Structured Streaming application to process a stream of JSON events in real time. The engineer wants the application to be fault-tolerant and resume processing from the last successfully processed record in case of a failure. To achieve this, the data engineer decides to implement checkpoints.
Which code snippet should the data engineer use?

A. query = streaming_df.writeStream \
.format("console") \
.outputMode("append") \
.option("checkpointLocation", "/path/to/checkpoint") \
.start()

B. query = streaming_df.writeStream \
.format("console") \
.option("checkpoint", "/path/to/checkpoint") \
.outputMode("append") \
.start()

C. query = streaming_df.writeStream \
.format("console") \
.outputMode("complete") \
.start()

D. query = streaming_df.writeStream \
.format("console") \
.outputMode("append") \
.start()

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제7

What is a feature of Spark Connect?

A. It supports only PySpark applications

B. It supports DataStreamReader, DataStreamWriter, StreamingQuery, and Streaming APIs

C. It has built-in authentication

D. Supports DataFrame, Functions, Column, SparkContext PySpark APIs

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제8

A data scientist is analyzing a large dataset and has written a PySpark script that includes several transformations and actions on a DataFrame. The script ends with acollect()action to retrieve the results.
How does Apache Spark™'s execution hierarchy process the operations when the data scientist runs this script?

A. Thecollect()action triggers a job, which is divided into stages at shuffle boundaries, and each stage is split into tasks that operate on individual data partitions.

B. The script is first divided into multiple applications, then each application is split into jobs, stages, and finally tasks.

C. Spark creates a single task for each transformation and action in the script, and these tasks are grouped into stages and jobs based on their dependencies.

D. The entire script is treated as a single job, which is then divided into multiple stages, and each stage is further divided into tasks based on data partitions.

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

최신 Associate-Developer-Apache-Spark-3.5 무료덤프 - Databricks Certified Associate Developer for Apache Spark 3.5 - Python

우리와 연락하기

유용한 링크

최신 업데이트