최신 Databricks-Machine-Learning-Associate 무료덤프 - Databricks Certified Machine Learning Associate

문제1

A data scientist is developing a single-node machine learning model. They have a large number of model configurations to test as a part of their experiment. As a result, the model tuning process takes too long to complete. Which of the following approaches can be used to speed up the model tuning process?

A. Implement MLflow Experiment Tracking

B. Enable autoscaling clusters

C. Scale up with Spark ML

D. Parallelize with Hyperopt

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

문제2

The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.
Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?

A. Spark ML cannot distribute linear regression training

B. Logistic regression

C. Least-squares method

D. Iterative optimization

E. Singular value decomposition

정답: D

문제3

A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.
Which change could the data scientist make to improve their model accuracy over the course of their tuning process?

A. Change the number of compute nodes to be double or more than double the number of evaluations.

B. Change the iterative optimization algorithm used to facilitate the tuning process.

C. Change the number of compute nodes and the number of evaluations to be much larger but equal.

D. Change the number of compute nodes to be half or less than half of the number of evaluations.

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제4

A data scientist is wanting to explore the Spark DataFrame spark_df. The data scientist wants visual histograms displaying the distribution of numeric features to be included in the exploration.
Which of the following lines of code can the data scientist run to accomplish the task?

A. This task cannot be accomplished in a single line of code.

B. dbutils.data.summarize (spark_df)

C. spark_df.describe()

D. dbutils.data(spark_df).summarize()

E. spark_df.summary()

정답: B

설명: (DumpTOP 회원만 볼 수 있음)

문제5

A data scientist is working with a feature set with the following schema:

The customer_id column is the primary key in the feature set. Each of the columns in the feature set has missing values. They want to replace the missing values by imputing a common value for each feature.
Which of the following lists all of the columns in the feature set that need to be imputed using the most common value of the column?

A. loyalty_tier

B. customer_id, loyalty_tier

C. customer_id

D. spend

E. units

정답: A

설명: (DumpTOP 회원만 볼 수 있음)

문제6

A machine learning engineer wants to parallelize the inference of group-specific models using the Pandas Function API. They have developed the apply_model function that will look up and load the correct model for each group, and they want to apply it to each group of DataFrame df.
They have written the following incomplete code block:

Which piece of code can be used to fill in the above blank to complete the task?

A. mapInPandas

B. groupedApplyInPandas

C. applyInPandas

D. predict

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제7

A data scientist has created two linear regression models. The first model uses price as a label variable and the second model uses log(price) as a label variable. When evaluating the RMSE of each model by comparing the label predictions to the actual price values, the data scientist notices that the RMSE for the second model is much larger than the RMSE of the first model.
Which of the following possible explanations for this difference is invalid?

A. The data scientist failed to take the log of the predictions in the first model prior to computing the RMSE

B. The data scientist failed to exponentiate the predictions in the second model prior to computing the RMSE

C. The RMSE is an invalid evaluation metric for regression problems

D. The first model is much more accurate than the second model

E. The second model is much more accurate than the first model

정답: C

설명: (DumpTOP 회원만 볼 수 있음)

문제8

A machine learning engineer would like to develop a linear regression model with Spark ML to predict the price of a hotel room. They are using the Spark DataFrame train_df to train the model.
The Spark DataFrame train_df has the following schema:

The machine learning engineer shares the following code block:

Which of the following changes does the machine learning engineer need to make to complete the task?

A. They need to split the features column out into one column for each feature

B. They do not need to make any changes

C. They need to call the transform method on train df

D. They need to convert the features column to be a vector

E. They need to utilize a Pipeline to fit the model

정답: D

설명: (DumpTOP 회원만 볼 수 있음)

최신 Databricks-Machine-Learning-Associate 무료덤프 - Databricks Certified Machine Learning Associate

우리와 연락하기

유용한 링크

최신 업데이트