최신 Associate-Developer-Apache-Spark 무료덤프 - Databricks Certified Associate Developer for Apache Spark 3.0

문제1

Which of the following DataFrame methods is classified as a transformation?

A. DataFrame.show()

B. DataFrame.first()

C. DataFrame.count()

D. DataFrame.foreach()

E. DataFrame.select()

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

문제2

Which of the following describes a valid concern about partitioning?

A. Short partition processing times are indicative of low skew.

B. No data is exchanged between executors when coalesce() is run.

C. The coalesce() method should be used to increase the number of partitions.

D. A shuffle operation returns 200 partitions if not explicitly set.

E. Decreasing the number of partitions reduces the overall runtime of narrow transformations if there are more executors available than partitions.

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제3

The code block displayed below contains an error. The code block should save DataFrame transactionsDf at path path as a parquet file, appending to any existing parquet file. Find the error.
Code block:

A. The mode option should be omitted so that the command uses the default mode.

B. The code block is missing a bucketBy command that takes care of partitions.

C. transactionsDf.format("parquet").option("mode", "append").save(path)

D. The code block is missing a reference to the DataFrameWriter.

E. save() is evaluated lazily and needs to be followed by an action.

F. Given that the DataFrame should be saved as parquet file, path is being passed to the wrong method.

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제4

Which of the following describes properties of a shuffle?

A. Shuffles involve only single partitions.

B. Shuffles belong to a class known as "full transformations".

C. In a shuffle, Spark writes data to disk.

D. A shuffle is one of many actions in Spark.

E. Operations involving shuffles are never evaluated lazily.

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제5

The code block shown below should return all rows of DataFrame itemsDf that have at least 3 items in column itemNameElements. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Example of DataFrame itemsDf:
1.+------+----------------------------------+-------------------+------------------------------------------+
2.|itemId|itemName |supplier |itemNameElements |
3.+------+----------------------------------+-------------------+------------------------------------------+
4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|[Thick, Coat, for, Walking, in, the, Snow]|
5.|2 |Elegant Outdoors Summer Dress |YetiX |[Elegant, Outdoors, Summer, Dress] |
6.|3 |Outdoors Backpack |Sports Company Inc.|[Outdoors, Backpack] |
7.+------+----------------------------------+-------------------+------------------------------------------+ Code block:
itemsDf.__1__(__2__(__3__)__4__)

A. 1. filter
2. count
3. itemNameElements
4. >=3

B. 1. select
2. size
3. "itemNameElements"
4. >3

C. 1. select
2. count
3. col("itemNameElements")
4. >3

D. 1. filter
2. size
3. "itemNameElements"
4. >=3
(Correct)

E. 1. select
2. count
3. "itemNameElements"
4. >3

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제6

The code block shown below should read all files with the file ending .png in directory path into Spark.
Choose the answer that correctly fills the blanks in the code block to accomplish this.
spark.__1__.__2__(__3__).option(__4__, "*.png").__5__(path)

A. 1. read
2. format
3. binaryFile
4. pathGlobFilter
5. load

B. 1. read()
2. format
3. "binaryFile"
4. "recursiveFileLookup"
5. load

C. 1. open
2. format
3. "image"
4. "fileType"
5. open

D. 1. open
2. as
3. "binaryFile"
4. "pathGlobFilter"
5. load

E. 1. read
2. format
3. "binaryFile"
4. "pathGlobFilter"
5. load

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

문제7

The code block displayed below contains multiple errors. The code block should remove column transactionDate from DataFrame transactionsDf and add a column transactionTimestamp in which dates that are expressed as strings in column transactionDate of DataFrame transactionsDf are converted into unix timestamps. Find the errors.
Sample of DataFrame transactionsDf:
1.+-------------+---------+-----+-------+---------+----+----------------+
2.|transactionId|predError|value|storeId|productId| f| transactionDate|
3.+-------------+---------+-----+-------+---------+----+----------------+
4.| 1| 3| 4| 25| 1|null|2020-04-26 15:35|
5.| 2| 6| 7| 2| 2|null|2020-04-13 22:01|
6.| 3| 3| null| 25| 3|null|2020-04-02 10:53|
7.+-------------+---------+-----+-------+---------+----+----------------+ Code block:
1.transactionsDf = transactionsDf.drop("transactionDate")
2.transactionsDf["transactionTimestamp"] = unix_timestamp("transactionDate", "yyyy-MM-dd")

A. Column transactionDate should be wrapped in a col() operator.

B. Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used instead of the existing column assignment. Operator to_unixtime() should be used instead of unix_timestamp().

C. Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used instead of the existing column assignment.

D. The string indicating the date format should be adjusted. The withColumnReplaced operator should be used instead of the drop and assign pattern in the code block to replace column transactionDate with the new column transactionTimestamp.

E. Column transactionDate should be dropped after transactionTimestamp has been written. The withColumn operator should be used instead of the existing column assignment. Column transactionDate should be wrapped in a col() operator.

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제8

A. itemsDf.select(explode("attributes").alias("attributes_exploded")).filter(col("attributes_exploded").contain

B. itemsDf.select(explode("attributes").alias("attributes_exploded")).filter(attributes_exploded.contains("i"))

C. itemsDf.explode(attributes).alias("attributes_exploded").filter(col("attributes_exploded").contains("i"))

D. itemsDf.select(col("attributes").explode().alias("attributes_exploded")).filter(col("attributes_exploded").co

E. itemsDf.select(explode("attributes")).filter("attributes_exploded".contains("i"))

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제9

The code block shown below should return a DataFrame with columns transactionsId, predError, value, and f from DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__)

A. 1. select
2. "transactionId, predError, value, f"

B. 1. filter
2. "transactionId", "predError", "value", "f"

C. 1. select
2. ["transactionId", "predError", "value", "f"]

D. 1. where
2. col("transactionId"), col("predError"), col("value"), col("f")

E. 1. select
2. col(["transactionId", "predError", "value", "f"])

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제10

Which of the following code blocks reads in the two-partition parquet file stored at filePath, making sure all columns are included exactly once even though each partition has a different schema?
Schema of first partition:
1.root
2. |-- transactionId: integer (nullable = true)
3. |-- predError: integer (nullable = true)
4. |-- value: integer (nullable = true)
5. |-- storeId: integer (nullable = true)
6. |-- productId: integer (nullable = true)
7. |-- f: integer (nullable = true)
Schema of second partition:
1.root
2. |-- transactionId: integer (nullable = true)
3. |-- predError: integer (nullable = true)
4. |-- value: integer (nullable = true)
5. |-- storeId: integer (nullable = true)
6. |-- rollId: integer (nullable = true)
7. |-- f: integer (nullable = true)
8. |-- tax_id: integer (nullable = false)

A. 1.nx = 0
2.for file in dbutils.fs.ls(filePath):
3. if not file.name.endswith(".parquet"):
4. continue
5. df_temp = spark.read.parquet(file.path)
6. if nx == 0:
7. df = df_temp
8. else:
9. df = df.union(df_temp)
10. nx = nx+1
11.df

B. spark.read.parquet(filePath, mergeSchema='y')

C. 1.nx = 0
2.for file in dbutils.fs.ls(filePath):
3. if not file.name.endswith(".parquet"):
4. continue
5. df_temp = spark.read.parquet(file.path)
6. if nx == 0:
7. df = df_temp
8. else:
9. df = df.join(df_temp, how="outer")
10. nx = nx+1
11.df

D. spark.read.parquet(filePath)

E. spark.read.option("mergeSchema", "true").parquet(filePath)

정답: E

설명: (DumpTOP 회원만 볼 수 있음)

최신 Associate-Developer-Apache-Spark 무료덤프 - Databricks Certified Associate Developer for Apache Spark 3.0

우리와 연락하기

유용한 링크

최신 업데이트