최신 Associate-Developer-Apache-Spark 무료덤프 - Databricks Certified Associate Developer for Apache Spark 3.0

문제1

The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)

A. 1. write
2. mode
3. "overwrite"
4. "compression"
5. save
(Correct)

B. 1. write
2. mode
3. "overwrite"
4. compression
5. parquet

C. 1. store
2. with
3. "replacement"
4. "compression"
5. path

D. 1. save
2. mode
3. "ignore"
4. "compression"
5. path

E. 1. save
2. mode
3. "replace"
4. "compression"
5. path

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

문제2

Which of the following code blocks selects all rows from DataFrame transactionsDf in which column productId is zero or smaller or equal to 3?

A. transactionsDf.filter((col("productId")==3) | (col("productId")<1))

B. transactionsDf.filter(col("productId")==3 | col("productId")<1)

C. transactionsDf.where("productId"=3).or("productId"<1))

D. transactionsDf.filter((col("productId")==3) or (col("productId")<1))

E. transactionsDf.filter(productId==3 or productId<1)

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제3

The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes contains the element cozy.
A sample of DataFrame itemsDf is below.
Code block:
itemsDf.__1__(__2__).__3__(__4__, __5__(__6__))

A. 1. where
2. "array_contains(attributes, 'cozy')"
3. select
4. itemId
5. explode
6. attributes

B. 1. filter
2. array_contains("cozy")
3. select
4. "itemId"
5. explode
6. "attributes"

C. 1. filter
2. "array_contains(attributes, 'cozy')"
3. select
4. "itemId"
5. map
6. "attributes"

D. 1. filter
2. "array_contains(attributes, cozy)"
3. select
4. "itemId"
5. explode
6. "attributes"

E. 1. filter
2. "array_contains(attributes, 'cozy')"
3. select
4. "itemId"
5. explode
6. "attributes"

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

문제4

Which of the following statements about the differences between actions and transformations is correct?

A. Actions generate RDDs, while transformations do not.

B. Actions can trigger Adaptive Query Execution, while transformation cannot.

C. Actions do not send results to the driver, while transformations do.

D. Actions are evaluated lazily, while transformations are not evaluated lazily.

E. Actions can be queued for delayed execution, while transformations can only be processed immediately.

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제5

Which of the following describes Spark's standalone deployment mode?

A. Standalone mode is how Spark runs on YARN and Mesos clusters.

B. Standalone mode is a viable solution for clusters that run multiple frameworks, not only Spark.

C. Standalone mode means that the cluster does not contain the driver.

D. Standalone mode uses a single JVM to run Spark driver and executor processes.

E. Standalone mode uses only a single executor per worker per application.

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

문제6

Which of the following code blocks returns a 2-column DataFrame that shows the distinct values in column productId and the number of rows with that productId in DataFrame transactionsDf?

A. transactionsDf.count("productId")

B. transactionsDf.groupBy("productId").agg(col("value").count())

C. transactionsDf.groupBy("productId").select(count("value"))

D. transactionsDf.groupBy("productId").count()

E. transactionsDf.count("productId").distinct()

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제7

Which of the following code blocks reads the parquet file stored at filePath into DataFrame itemsDf, using a valid schema for the sample of itemsDf shown below?
Sample of itemsDf:
1.+------+-----------------------------+-------------------+
2.|itemId|attributes |supplier |
3.+------+-----------------------------+-------------------+
4.|1 |[blue, winter, cozy] |Sports Company Inc.|
5.|2 |[red, summer, fresh, cooling]|YetiX |
6.|3 |[green, summer, travel] |Sports Company Inc.|
7.+------+-----------------------------+-------------------+

A. 1.itemsDfSchema = StructType([
2. StructField("itemId", IntegerType()),
3. StructField("attributes", ArrayType([StringType()])),
4. StructField("supplier", StringType())])
5.
6.itemsDf = spark.read(schema=itemsDfSchema).parquet(filePath)

B. 1.itemsDfSchema = StructType([
2. StructField("itemId", IntegerType()),
3. StructField("attributes", StringType()),
4. StructField("supplier", StringType())])
5.
6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

C. 1.itemsDfSchema = StructType([
2. StructField("itemId", IntegerType),
3. StructField("attributes", ArrayType(StringType)),
4. StructField("supplier", StringType)])
5.
6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

D. 1.itemsDfSchema = StructType([
2. StructField("itemId", IntegerType()),
3. StructField("attributes", ArrayType(StringType())),
4. StructField("supplier", StringType())])
5.
6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)

E. 1.itemsDf = spark.read.schema('itemId integer, attributes <string>, supplier string').parquet(filePath)

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제8

Which of the following code blocks returns a copy of DataFrame transactionsDf where the column storeId has been converted to string type?

A. transactionsDf.withColumn("storeId", col("storeId").cast("string"))

B. transactionsDf.withColumn("storeId", col("storeId").convert("string"))

C. transactionsDf.withColumn("storeId", convert("storeId", "string"))

D. transactionsDf.withColumn("storeId", col("storeId", "string"))

E. transactionsDf.withColumn("storeId", convert("storeId").as("string"))

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제9

Which of the following code blocks applies the boolean-returning Python function evaluateTestSuccess to column storeId of DataFrame transactionsDf as a user-defined function?

A. 1.from pyspark.sql import types as T
2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.IntegerType())
3.transactionsDf.withColumn("result", evaluateTestSuccess(col("storeId")))

B. 1.from pyspark.sql import types as T
2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.BooleanType())
3.transactionsDf.withColumn("result", evaluateTestSuccessUDF(col("storeId")))

C. 1.evaluateTestSuccessUDF = udf(evaluateTestSuccess)
2.transactionsDf.withColumn("result", evaluateTestSuccessUDF(col("storeId")))

D. 1.from pyspark.sql import types as T
2.evaluateTestSuccessUDF = udf(evaluateTestSuccess, T.BooleanType())
3.transactionsDf.withColumn("result", evaluateTestSuccess(col("storeId")))

E. 1.evaluateTestSuccessUDF = udf(evaluateTestSuccess)
2.transactionsDf.withColumn("result", evaluateTestSuccessUDF(storeId))

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제10

The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before
2029-03-20 05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.
Schema:
1.root
2. |-- itemId: integer (nullable = true)
3. |-- attributes: array (nullable = true)
4. | |-- element: string (containsNull = true)
5. |-- supplier: string (nullable = true)
Code block:
1.schema = StructType([
2. StructType("itemId", IntegerType(), True),
3. StructType("attributes", ArrayType(StringType(), True), True),
4. StructType("supplier", StringType(), True)
5.])
6.
7.spark.read.options("modifiedBefore", "2029-03-20T05:44:46").schema(schema).load(filePath)

A. Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.

B. Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.

C. The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.

D. The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.

E. Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

최신 Associate-Developer-Apache-Spark 무료덤프 - Databricks Certified Associate Developer for Apache Spark 3.0

우리와 연락하기

유용한 링크

최신 업데이트