Logo

The Data Daily

Modeling the Extinction of Species with SVM-Kernel | R-bloggers

Modeling the Extinction of Species with SVM-Kernel | R-bloggers

Modeling the Extinction of Species with SVM-Kernel
Posted on October 31, 2022 by Selcuk Disci in R bloggers | 0 Comments
[This article was first published on DataGeeek , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here )
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Share Tweet
In the last article, we analyzed carbon emissions and the effects that created them. This time I want to look into another important environmental issue, animal biodiversity; by animals, I mean mammals, birds, fish, reptiles, and amphibians.
The metric we are going to be interested in is the living planet index which measures the change in the number of 31,831 populations across 5,230 species relative to the year 1970. The explanatory variables we will take, are annual carbon emissions per capita(co2), annual gross domestic product per capita(gdp), and regions(region).
First, we will compare the living planet index(lpi) by region. To do that, we will create our datasets; because we do that by regions, we will summarize the co2 and gdp variables by the relative region.
library(tidyverse) library(tidymodels) library(DALEX) library(DALEXtra) library(janitor) library(plotly) library(bbplot) library(scales) library(countrycode) library(glue) df_co % select( year, region, co2= annual_co2_emissions_per_capita, gdp= gdp_per_capita_ppp_constant_2017_international) %>% na.omit() %>% #adjusting the region values for merging with the LPI data frame mutate( region = str_replace(region, "&", "and"), region=case_when( region=="South Asia" | region == "East Asia and Pacific" ~ "Asia and Pacific", region=="Middle East and North Africa" | region == "Sub-Saharan Africa" ~ "Africa", TRUE ~ region )) %>% #summing the co2 and gdp values by the region, separately group_by(region,year) %>% summarise(co2= sum(co2), gdp= sum(gdp)) df_tidy % left_join(df_co_gdp) %>% na.omit() #Comparing the regions by plotting p % ggplot(aes(x= year, y=lpi, color=region, group=region, #text variable for hoverinfo text = glue("{number(round(lpi), suffix='%')}\n{year}")))+ geom_line()+ #the reference line geom_line(aes(y=100,color="red"))+ geom_point()+ #region texts geom_text(data = df_tidy %>% group_by(region) %>% filter(year==round(median(year))), aes(x= year, y= lpi,label=region), show.legend = FALSE, nudge_y = 4.2 )+ #the reference line text geom_text(aes(x = 1994, y=105, color= "red", label = "The reference line(1970 = %100)"))+ scale_y_continuous(breaks = pretty_breaks(), labels = label_percent(scale = 1))+ labs(title = "Living Planet Index")+ bbc_style() + theme(legend.position = "none", #removes the legend keys plot.title = element_text(hjust = 0.5)) #plotly for interactive plotting ggplotly(p, tooltip = c("text")) %>% #removes the reference line info style(hoverinfo = "none",traces = 6)
When we look at the regions, it seems they are all under their 1970-year values. We will now examine the underlying reasons for this. To do that, we will first model the data with a support vector machine with kernel-based algorithms .
#Preprocessing the data df_rec % step_dummy(all_nominal_predictors(), one_hot = TRUE) %>% #all numeric predictors should be in the same unit step_normalize(all_numeric_predictors()) #Modeling with svm with kernel-based algorithms df_spec % set_engine("kernlab") %>% set_mode("regression") df_wf % add_recipe(df_rec) %>% add_model(df_spec) #cross-validation for resamples set.seed(12345) df_folds % select(-lpi), y = df_train$lpi, verbose = FALSE ) set.seed(1983) #calculates the variable-importance measure vip_svm

Images Powered by Shutterstock