최신 DEA-C02 무료덤프 - Snowflake SnowPro Advanced: Data Engineer (DEA-C02)

문제1

You are designing a data warehouse for an e-commerce company. One of the requirements is to provide fast analytics on order fulfillment times by region. You have two tables: 'ORDERS: Contains order information, including ID, 'ORDER DATE, 'REGION ID, and 'FULFILLMENT DATE. 'REGIONS': Contains region information, including 'REGION ID' and Due to the large size of the 'ORDERS' table and the complexity of calculating fulfillment times, you decide to use materialized views.
Which of the following combinations of materialized view definition and Snowflake features would BEST optimize query performance and minimize data staleness for this scenario? Choose two options.

A. Create a materialized view that joins 'ORDERS and 'REGIONS', calculates 'FULFILLMENT TIME', and groups by 'REGION NAME'. Do not specify a clustering key.

B. Use Snowflake's search optimization service on the 'ORDERS' table instead of creating a materialized view.

C. create a materialized view that joins 'ORDERS' and 'REGIONS', calculates 'FULFILLMENT_TIME' grouped by 'REGION_NAME, and cluster by 'REGION NAM Configure incremental data refreshes.

D. Partition the 'ORDERS' table by 'ORDER_DATE and create a materialized view that calculates 'FULFILLMENT_TIME grouped by REGION_NAME , clustering by 'ORDER DATE'

E. Create a materialized view that joins 'ORDERS and 'REGIONS', calculates the difference between 'FULFILLMENT DATE' and 'ORDER DATE as , and groups by REGION_NAME. Cluster the view by ' REGION_NAME.

정답: C,E

설명: (DumpTOP 회원만 볼 수 있음)

문제2

You have implemented external tokenization for a sensitive data column in Snowflake using a UDF that calls an external API. After some time, you discover that the external tokenization service is experiencing intermittent outages, causing queries using the tokenized column to fail. What is the BEST approach to mitigate this issue and maintain data availability while minimizing the risk of exposing the raw data?

A. Modify the tokenization UDF to cache tokenization mappings locally within the Snowflake environment. When the external service is unavailable, the UDF can use the cached values.

B. Replicate the tokenized table to another Snowflake region and switch to the replica during outages of the primary region. The tokenization service is guaranteed to be available in at least one region.

C. Implement a try-catch block within the UDF. In the catch block, return a pre-defined, non-sensitive default value instead of attempting to call the external tokenization service. You can't return the raw value.

D. Implement a try-catch block within the UDF. In the catch block, return a pre-defined static token value (same value always) instead of attempting to call the external tokenization service. You can't return the raw value.

E. Implement a masking policy on the column that returns the raw data when the tokenization UDF is unavailable, detected by catching exceptions within the policy logic.

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제3

A data engineer is investigating high credit consumption on a Snowflake warehouse due to frequent re-clustering operations on a large table named 'WEB EVENTS. This table is clustered on 'EVENT TIMESTAMP' and 'USER ID. The engineer suspects that the high frequency of data ingestion, especially out-of-order 'EVENT TIMESTAMP' values, contributes to the poor clustering. Choose the options that can lead to optimizing clustering and reducing credit consumption, assuming you have limited control over the ingestion process and data quality.

A. Implement a pre-processing stage to sort the incoming data by 'EVENT TIMESTAMP before loading it into the 'WEB EVENTS table, using a temporary table and then inserting into the final table.

B. Implement a maintenance task to periodically re-cluster the table less frequently, but at more strategically chosen times (e.g., during off-peak hours).

C. Partition the table based on "EVENT _ TIMESTAMP' instead of clustering.

D. Drop the clustering key altogether to avoid re-clustering costs.

E. Increase the warehouse size to accelerate the re-clustering process.

정답: A,B

설명: (DumpTOP 회원만 볼 수 있음)

문제4

You're designing a Snowpark Scala stored procedure that must execute a series of complex data quality checks on a Snowflake table.
These checks involve multiple steps, including validating data types, checking for null values, and verifying data consistency against external reference data'. You want to ensure that the stored procedure is resilient to errors, provides detailed logging, and can be easily monitored. Which of the following approaches would be the MOST robust and scalable for handling errors and logging within this Snowpark Scala stored procedure?

A. Use Scala's 'Option' type to handle potential null values and exceptions. Return a string message indicating success or failure for each check. Log these messages using 'System.out.println'.

B. Rely on Snowflake's built-in error handling and logging mechanisms. If an error occurs, the stored procedure will automatically fail, and the error details can be retrieved from Snowflake's query history.

C. Wrap each data quality check in a try-catch block and use 'println' statements to log error messages to the Snowflake console.

D. Implement a custom logging framework within the Scala stored procedure that writes detailed logs to a dedicated Snowflake table. Use try-catch blocks to handle exceptions and log error details, including timestamps, error codes, and relevant data values. Use Snowflake's 'SYSTEM$LAST QUERY ID()' function to track query lineage.

E. Use Scala's 'Try' monad to handle exceptions, mapping successes to informational messages and failures to error messages. Log these messages using Snowflake's event tables.

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제5

You are designing a data loading process for a high-volume streaming data source. The data arrives as Avro files in an AWS S3 bucket. You need to load this data into a Snowflake table with minimal latency and operational overhead. Which of the following combinations of Snowflake features and configurations would be MOST suitable for this scenario? (Select TWO)

A. Create a custom Spark application that reads Avro files from S3, transforms the data, and then writes it to Snowflake using the Snowflake Spark connector.

B. Implement Snowpipe with auto-ingest configured to listen for S3 event notifications whenever a new Avro file is added to the bucket.

C. Configure an external table pointing to the S3 bucket and query the Avro files directly from Snowflake.

D. Use a Kafka connector to stream data directly from the Kafka topic to Snowflake.

E. Use the 'COPY INTO' command with a scheduled task that runs every 5 minutes to load new files from the S3 bucket.

정답: B,D

설명: (DumpTOP 회원만 볼 수 있음)

문제6

You have configured a Kafka Connector to load JSON data into a Snowflake table named 'ORDERS. The JSON data contains nested structures. However, Snowflake is only receiving the top- level fields, and the nested fields are being ignored. Which configuration option within the Kafka Connector needs to be adjusted to correctly flatten and load the nested JSON data into Snowflake?

A. Set the 'value.converter.schemas.enable' property to 'true'.

B. Apply the 'org.apache.kafka.connect.transforms.Flatten' transformation to the 'transforms' configuration.

C. Enable the 'snowflake.ingest.stage' property and set it to a Snowflake internal stage.

D. Configure the 'snowflake.data.field.name' property to specify the column in the Snowflake table where the entire JSON should be loaded as a VARIANT.

E. Use the 'transforms' configuration with the 'org.apache.kafka.connect.transforms.ExtractField$Value' transformation to extract specific fields.

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제7

You have a Snowflake table 'orders_raw' with a VARIANT column named 'order detailS that contains an array of order items represented as JSON objects. Each object has 'item id', 'quantity' , and 'price'. You need to calculate the total revenue for each order. Which SQL statement efficiently flattens the array and calculates the total revenue using LATERAL FLATTEN and appropriate casting?

A. Option E

B. Option D

C. Option C

D. Option B

E. Option A

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제8

Your company utilizes Snowflake Streams and Tasks for continuous data ingestion and transformation. A critical task, 'TRANSFORM DATA', consumes data from a stream 'RAW DATA STREAW on table 'RAW DATA' and loads it into a reporting table 'REPORTING TABLE. You observe that 'TRANSFORM DATA is failing intermittently with a 'Stream is stale' error. What steps can you take to diagnose and resolve this issue? Choose all that apply.

A. Use the "AT' or 'BEFORE clause when querying the stream to explicitly specify a point in time to consume data from.

B. Increase the parameter at the database level to ensure Time Travel data is available for a longer period.

C. Ensure that the ' TRANSFORM DATA' task is consuming the stream data frequently enough to prevent the stream from becoming stale.

D. Drop and recreate the stream and task to reset their states.

E. Modify the task definition to use the 'WHEN condition to prevent execution when the stream is empty.

정답: B,C

설명: (DumpTOP 회원만 볼 수 있음)

문제9

You are the provider of a data product on the Snowflake Marketplace. You need to grant a trial access to a potential consumer You want to provide limited access for 7 days to specific tables in your database. Which of the following steps are REQUIRED to accomplish this?
(Select all that apply)

A. Create a new share specifically for the trial consumer, granting USAGE privilege on the database and SELECT privilege on the specific tables.

B. Contact Snowflake support to enable trial access for the consumer's account.

C. Monitor the consumer's query history to ensure they are only accessing the allowed tables.

D. Grant OWNERSHIP on the specific tables to the consumer's account temporarily.

E. Create a new role, grant USAGE privilege on the database and SELECT privilege on the specific tables to this role, and then grant this role to the trial consumer.

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제10

You have a Snowflake Task that is designed to transform and load data into a target table. The task relies on a Stream to detect changes in a source table. However, you notice that the task is intermittently failing with a 'Stream STALE' error, even though the data in the source table is continuously updated. What are the most likely root causes and the best combination of solutions to prevent this issue? (Select TWO)

A. The source table is being modified with DDL operations (e.g., ALTER TABLE ADD COLUMN), which are not supported by Streams. Use Table History to track schema changes and manually adjust the Stream's query if needed. Use 'COPY GRANTS' during the DDL.

B. DML operations (e.g., UPDATE, DELETE) being performed on the source table are affecting rows older than the Stream's retention period. Reduce the stream's 'DATA RETENTION TIME IN DAYS' to match the oldest DML operation on the source table.

C. The Stream is not configured with 'SHOW INITIAL ROWS = TRUE, causing initial changes to be missed and eventually leading to staleness. Recreate the stream with this parameter set to TRUE.

D. The Stream has reached its maximum age (default 14 days) and expired. There is no way to recover data from an expired Stream. You need to recreate the Stream and reload the source table.

E. The Task is not running frequently enough, causing the Stream to accumulate too many changes before being consumed, exceeding its retention period. Increase the task's execution frequency or increase the stream's 'DATA RETENTION TIME IN DAYS

정답: A,E

설명: (DumpTOP 회원만 볼 수 있음)

문제11

You are working with a Snowflake table 'customer_data' which contains customer information stored in a VARIANT column named raw_info'. The 'raw_info' JSON structure includes nested addresses, and preferences. Your task is to extract the city from the first address in the 'addresses' array, and the customer's preferred communication method from the 'preferences' object. Some customers might not have addresses or preferences defined. Select the two SQL snippets that correctly and efficiently extract this data, handling missing fields gracefully and providing appropriate type casting. Address array is in the format 'addresses: [ { 'city': '...', 'state': ' '},

A. Option E

B. Option D

C. Option C

D. Option B

E. Option A

정답: A,B

설명: (DumpTOP 회원만 볼 수 있음)

문제12

A data engineer is using Snowpark Python to build a data pipeline. They need to define a UDF that uses a pre-trained machine learning model stored as a file in a Snowflake stage. The UDF should receive batches of data for scoring. Which of the following is the MOST efficient way to implement this, minimizing data transfer and execution time?

A. Use 'session.read.parquet' to load the model file directly into a Snowpark DataFrame and then use 'DataFrame.foreach' to process each row.

B. Create a UDF with gudf(packages=['snowflake-snowpark-python', 'scikit-learn'], input_types=[ArrayType(StringType())], return_type=FloatType(), replace=True, is_permanent=True, and load the model within the UDF's initialization using 'session.file.get' .

C. Load the model from the stage into a DataFrame, then use 'df.mapPartitionS to apply the model to each partition.

D. Create a UDF that reads the model from the stage for each row that is passed to it using 'session.file.get' inside the UDF's execution logic.

E. Use '@vectorized' decorator from Snowpark to process each batch of data passed to the UDF and load the model inside it. Specify the appropriate data types in the decorator.

정답: B,E

설명: (DumpTOP 회원만 볼 수 있음)

최신 DEA-C02 무료덤프 - Snowflake SnowPro Advanced: Data Engineer (DEA-C02)

우리와 연락하기

유용한 링크

최신 업데이트