최신 Databricks-Machine-Learning-Associate 무료덤프 - Databricks Certified Machine Learning Associate
A data scientist is developing a single-node machine learning model. They have a large number of model configurations to test as a part of their experiment. As a result, the model tuning process takes too long to complete. Which of the following approaches can be used to speed up the model tuning process?
정답: D
설명: (DumpTOP 회원만 볼 수 있음)
The implementation of linear regression in Spark ML first attempts to solve the linear regression problem using matrix decomposition, but this method does not scale well to large datasets with a large number of variables.
Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?
Which of the following approaches does Spark ML use to distribute the training of a linear regression model for large data?
정답: D
A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.
Which change could the data scientist make to improve their model accuracy over the course of their tuning process?
Which change could the data scientist make to improve their model accuracy over the course of their tuning process?
정답: B
설명: (DumpTOP 회원만 볼 수 있음)
A data scientist is wanting to explore the Spark DataFrame spark_df. The data scientist wants visual histograms displaying the distribution of numeric features to be included in the exploration.
Which of the following lines of code can the data scientist run to accomplish the task?
Which of the following lines of code can the data scientist run to accomplish the task?
정답: B
설명: (DumpTOP 회원만 볼 수 있음)
A data scientist is working with a feature set with the following schema:

The customer_id column is the primary key in the feature set. Each of the columns in the feature set has missing values. They want to replace the missing values by imputing a common value for each feature.
Which of the following lists all of the columns in the feature set that need to be imputed using the most common value of the column?

The customer_id column is the primary key in the feature set. Each of the columns in the feature set has missing values. They want to replace the missing values by imputing a common value for each feature.
Which of the following lists all of the columns in the feature set that need to be imputed using the most common value of the column?
정답: A
설명: (DumpTOP 회원만 볼 수 있음)
A machine learning engineer wants to parallelize the inference of group-specific models using the Pandas Function API. They have developed the apply_model function that will look up and load the correct model for each group, and they want to apply it to each group of DataFrame df.
They have written the following incomplete code block:

Which piece of code can be used to fill in the above blank to complete the task?
They have written the following incomplete code block:

Which piece of code can be used to fill in the above blank to complete the task?
정답: C
설명: (DumpTOP 회원만 볼 수 있음)
A data scientist has created two linear regression models. The first model uses price as a label variable and the second model uses log(price) as a label variable. When evaluating the RMSE of each model by comparing the label predictions to the actual price values, the data scientist notices that the RMSE for the second model is much larger than the RMSE of the first model.
Which of the following possible explanations for this difference is invalid?
Which of the following possible explanations for this difference is invalid?
정답: C
설명: (DumpTOP 회원만 볼 수 있음)
A machine learning engineer would like to develop a linear regression model with Spark ML to predict the price of a hotel room. They are using the Spark DataFrame train_df to train the model.
The Spark DataFrame train_df has the following schema:

The machine learning engineer shares the following code block:

Which of the following changes does the machine learning engineer need to make to complete the task?
The Spark DataFrame train_df has the following schema:

The machine learning engineer shares the following code block:

Which of the following changes does the machine learning engineer need to make to complete the task?
정답: D
설명: (DumpTOP 회원만 볼 수 있음)