Automatic machine learning

auto_ml.Rmd
Using H2O AutoML
Automatic machine learning (AutoML) is the process of automatically searching, screening and evaluating many models for a specific dataset. AutoML could be particularly insightful as an exploratory approach to identify model families and parameterization that is most likely to succeed. You can use H2O’s AutoML algorithm via the 'h2o' engine in auto_ml() . agua provides several helper functions to quickly wrangle and visualize AutoML’s results.
Let’s run an AutoML search on the concrete data.
library ( tidymodels ) library ( agua ) library ( ggplot2 ) theme_set ( theme_bw ()) h2o_start () data (concrete) set.seed (4595) concrete_split #> ══════════════════════ H2O AutoML Summary: 83 models ═════════════════════ #> #> #> ═══════════════════════════════ Leaderboard ══════════════════════════════ #> model_id rmse mse mae #> 1 StackedEnsemble_BestOfFamily_4_AutoML_1_20221012_180257 4.51 20.4 3.00 #> 2 StackedEnsemble_AllModels_2_AutoML_1_20221012_180257 4.62 21.4 3.04 #> 3 StackedEnsemble_AllModels_1_AutoML_1_20221012_180257 4.67 21.8 3.08 #> 4 StackedEnsemble_BestOfFamily_3_AutoML_1_20221012_180257 4.68 21.9 3.08 #> 5 StackedEnsemble_BestOfFamily_2_AutoML_1_20221012_180257 4.71 22.2 3.16 #> 6 GBM_5_AutoML_1_20221012_180257 4.75 22.6 3.14 #> rmsle mean_residual_deviance #> 1 0.141 20.4 #> 2 0.140 21.4 #> 3 0.142 21.8 #> 4 0.142 21.9 #> 5 0.146 22.2 #> 6 0.147 22.6
In 120 seconds, AutoML fitted 83 models. The parsnip fit object extract_fit_parsnip(auto_fit) shows the number of candidate models, the best performing algorithm and its corresponding model id, and a preview of the leaderboard with cross validation performances. The model_id column in the leaderboard is a unique model identifier for the h2o server. This can be useful when you need to predict on or extract a specific model, e.g. with predict(auto_fit, id = id) and extract_fit_engine(auto_fit, id = id). By default, they will operate on the best performing leader model.
# predict with the best model predict (auto_fit, new_data = concrete_test) #> # A tibble: 260 × 1 #> .pred #>
#> 1 40.0 #> 2 43.0 #> 3 38.2 #> 4 55.7 #> 5 41.4 #> 6 28.1 #> 7 53.2 #> 8 34.5 #> 9 51.1 #> 10 37.9 #> # … with 250 more rows
Typically, we use AutoML to get a quick sense of the range of our success metric, and algorithms that are likely to succeed. agua provides tools to summarize these results.
rank_results() returns the leaderboard in a tidy format with rankings within each metric. A low rank means good performance in a metric. Here, the top 5 models with the smallest MAE includes are four stacked ensembles and one GBM model.
rank_results (auto_fit) %>% filter (.metric == "mae") %>% arrange(rank) #> # A tibble: 83 × 5 #> id algor…¹ .metric mean rank #>
#> 1 StackedEnsemble_BestOfFamily_4_AutoML_1_20… stacki… mae 3.00 1 #> 2 StackedEnsemble_AllModels_2_AutoML_1_20221… stacki… mae 3.04 2 #> 3 StackedEnsemble_BestOfFamily_3_AutoML_1_20… stacki… mae 3.08 3 #> 4 StackedEnsemble_AllModels_1_AutoML_1_20221… stacki… mae 3.09 4 #> 5 XGBoost_grid_1_AutoML_1_20221012_180257_mo… xgboost mae 3.13 5 #> 6 XGBoost_grid_1_AutoML_1_20221012_180257_mo… xgboost mae 3.13 6 #> 7 GBM_5_AutoML_1_20221012_180257 gradie… mae 3.15 7 #> 8 StackedEnsemble_BestOfFamily_2_AutoML_1_20… stacki… mae 3.17 8 #> 9 XGBoost_grid_1_AutoML_1_20221012_180257_mo… xgboost mae 3.18 9 #> 10 GBM_grid_1_AutoML_1_20221012_180257_model_… gradie… mae 3.18 10 #> # … with 73 more rows, and abbreviated variable name ¹algorithm
collect_metrics() returns average statistics of performance metrics (summarized) per model, or raw value for each resample (unsummarized). cv_id identifies the resample h2o internally used for optimization.
collect_metrics (auto_fit, summarize = FALSE) #> # A tibble: 2,945 × 5 #> id algor…¹ .metric cv_id .esti…² #>
#> 1 StackedEnsemble_BestOfFamily_4_AutoML_1_… stacki… mae cv_1… 2.81 #> 2 StackedEnsemble_BestOfFamily_4_AutoML_1_… stacki… mae cv_2… 2.92 #> 3 StackedEnsemble_BestOfFamily_4_AutoML_1_… stacki… mae cv_3… 2.83 #> 4 StackedEnsemble_BestOfFamily_4_AutoML_1_… stacki… mae cv_4… 3.41 #> 5 StackedEnsemble_BestOfFamily_4_AutoML_1_… stacki… mae cv_5… 3.02 #> 6 StackedEnsemble_BestOfFamily_4_AutoML_1_… stacki… mean_r… cv_1… 17.7 #> 7 StackedEnsemble_BestOfFamily_4_AutoML_1_… stacki… mean_r… cv_2… 20.5 #> 8 StackedEnsemble_BestOfFamily_4_AutoML_1_… stacki… mean_r… cv_3… 16.9 #> 9 StackedEnsemble_BestOfFamily_4_AutoML_1_… stacki… mean_r… cv_4… 27.6 #> 10 StackedEnsemble_BestOfFamily_4_AutoML_1_… stacki… mean_r… cv_5… 19.1 #> # … with 2,935 more rows, and abbreviated variable names ¹algorithm, #> # ².estimate
tidy() returns a tibble with performance and individual model objects. This is helpful if you want to perform operations (e.g., predict) across all candidates.
tidy (auto_fit) %>% mutate( .predictions = map(.model, predict, new_data = head (concrete_test)) ) #> # A tibble: 83 × 5 #> id algor…¹ .metric .model .predi…² #>
#> 1 StackedEnsemble_BestOfFamily_4_Auto… stacki…
#> 10 GBM_3_AutoML_1_20221012_180257 gradie…
#> # … with 73 more rows, and abbreviated variable names ¹algorithm, #> # ².predictions
member_weights() computes member importance for all stacked ensemble models. Aside from base models such as GLM, GBM and neural networks, h2o tries to fit two kinds of stacked ensembles: one combines the all base models ("all") and the other includes only the best model of each kind ("bestofFamily"), specific to a time point. Regardless of how ensembles are formed, we can calculate the variable importance in the ensemble as the importance score of every member model, i.e., the relative contribution of base models in the meta-learner. This is typically the coefficient magnitude in a second-level GLM. This way, in addition to inspecting model performances by themselves, we can find promising candidates if stacking is needed. Here, we show the scaled contribution of different algorithms in stacked ensembles.
auto_fit %>% extract_fit_parsnip () %>% member_weights () %>% unnest(importance) %>% filter (type == "scaled_importance") %>% ggplot () + geom_boxplot ( aes (value, algorithm)) + scale_x_sqrt () + labs (y = NULL, x = "scaled importance", title = "Member importance in stacked ensembles")
You can also autoplot() an AutoML object, which essentially wraps functions above to plot performance assessment and ranking. The lower the average ranking, the more likely the model type suits the data.
autoplot (auto_fit, type = "rank", metric = c ("mae", "rmse")) + theme (legend.position = "none")
After initial assessment, we might be interested to allow more time for AutoML to search for more candidates. Recall that we have set engine argument max_runtime_secs to 120s before, we can increase it or adjust max_models to control the total runtime. H2O also provides an option to build upon an existing AutoML leaderboard and add more candidates, this can be done via refit() . The model to be re-fitted needs to have engine argument save_data = TRUE. If you also want to add stacked ensembles set keep_cross_validation_predictions = TRUE as well.
# not run auto_spec_refit % set_engine ("h2o", max_runtime_secs = 300, save_data = TRUE, keep_cross_validation_predictions = TRUE) %>% set_mode ("regression") auto_wflow_refit % add_model(auto_spec_refit) %>% add_recipe(normalized_rec) first_auto

Images Powered by Shutterstock

The Data Daily

Automatic machine learning