최신 Associate-Developer-Apache-Spark-3.5 무료덤프 - Databricks Certified Associate Developer for Apache Spark 3.5 - Python

What is the relationship between jobs, stages, and tasks during execution in Apache Spark?
Options:

정답: C
설명: (DumpTOP 회원만 볼 수 있음)
What is the difference betweendf.cache()anddf.persist()in Spark DataFrame?

정답: B
설명: (DumpTOP 회원만 볼 수 있음)
Given:
python
CopyEdit
spark.sparkContext.setLogLevel("<LOG_LEVEL>")
Which set contains the suitable configuration settings for Spark driver LOG_LEVELs?

정답: D
설명: (DumpTOP 회원만 볼 수 있음)
Which command overwrites an existing JSON file when writing a DataFrame?

정답: B
설명: (DumpTOP 회원만 볼 수 있음)
Given the code:

df = spark.read.csv("large_dataset.csv")
filtered_df = df.filter(col("error_column").contains("error"))
mapped_df = filtered_df.select(split(col("timestamp")," ").getItem(0).alias("date"), lit(1).alias("count")) reduced_df = mapped_df.groupBy("date").sum("count") reduced_df.count() reduced_df.show() At which point will Spark actually begin processing the data?

정답: C
설명: (DumpTOP 회원만 볼 수 있음)
A data engineer is running a Spark job to process a dataset of 1 TB stored in distributed storage. The cluster has 10 nodes, each with 16 CPUs. Spark UI shows:
Low number of Active Tasks
Many tasks complete in milliseconds
Fewer tasks than available CPUs
Which approach should be used to adjust the partitioning for optimal resource allocation?

정답: C
설명: (DumpTOP 회원만 볼 수 있음)
An MLOps engineer is building a Pandas UDF that applies a language model that translates English strings into Spanish. The initial code is loading the model on every call to the UDF, which is hurting the performance of the data pipeline.
The initial code is:

def in_spanish_inner(df: pd.Series) -> pd.Series:
model = get_translation_model(target_lang='es')
return df.apply(model)
in_spanish = sf.pandas_udf(in_spanish_inner, StringType())
How can the MLOps engineer change this code to reduce how many times the language model is loaded?

정답: D
설명: (DumpTOP 회원만 볼 수 있음)
How can a Spark developer ensure optimal resource utilization when running Spark jobs in Local Mode for testing?
Options:

정답: A
설명: (DumpTOP 회원만 볼 수 있음)

우리와 연락하기

문의할 점이 있으시면 메일을 보내오세요. 12시간이내에 답장드리도록 하고 있습니다.

근무시간: ( UTC+9 ) 9:00-24:00
월요일~토요일

서포트: 바로 연락하기