Midst a monsoon, another TokyoR meetup! Since the pandemic started all of TokyoR’s meetups have turned into online sessions and the transition has been seamless thanks to the efforts of the organizing team. This was the 101st TokyoR Meetup!
As you can see it was my first TokyoR in quite a long time, so it was nice to be back! On top of short summaries of all the talks I will also provide some helpful links and resources of my own to supplement the content of the talks.
As with every TokyoR meetup, we began with a set of beginner user focused talks:
presented on data cleaning techniques using the Palmer penguins data set.
This data set consists of data on 3 species of penguin with details about their weight, wingspan, beak length, etc. There are two data sets included within the package:
The goal of this talk was to start from and get close to the cleaned data set.
As a first step, we explored the data set from the lens of the function which provides us with a summary view of the data.frame.
From here we were able to identify various problems with the data set and come up with a plan to clean it. Using packages such as the {tidyverse} group, {janitor} and {lubridate}, explained each step of the long piped chain of cleaning operations.
said he’ll continue this series as he plans on doing another talk on EDA and visualization, and then more planned talks on doing various statistical analysis on this data set.
TokyoR organizer, gave a very thorough overview of regression analysis using R. From using the base function, to going step-by-step to calculating the various statistical outputs (residuals, standard errors, F-statistic, etc.) manually, and concise explanations of all of the formulas behind them, this was a helpful intro for anybody trying to understand linear regression using R.
One of the organizers of TokyoR, , gave a short intro to using snowflake with R. Snowflake is a cloud database platform, one of many that have grown out of the emergence of cloud data warehouses following a long period of time where database software was basically dominated by the likes of Oracle and MySQL.
However, there is no R package (…yet?) that directly connects with Snowflake so one needs to setup an ODBC driver and use the {DBI} package.
likes Jackson Pollock’s artwork and in this LT he talked about using R to do fractal analysis of Pollock’s world famous drip paintings.
gave us an intro to fractal analysis, talking about the fractal dimension and the mathematical theories behind in. One of the ways to calculate the fractal dimensions of an object is to use the box-counting algorithm. In R, we can use the {VoxR} package, specifically the function.
In his 2nd presentation of the day, talked about graphs, specifically some visualization functions that are included in base R.
Why does R have these functions? … For compatibility with S.
For those that might not get the joke/adage/whatever, these visualizations continued to exist in R, in part, due to its origins in its predecessor language, S.
First is the stem-and-leaf plot, which is similar to the histogram that most people should be familiar with. Unlike the histogram, however, the stem-and-leaf plot tries to retain as much of the original data as possible and orders them from least to greatest in both the “stem” and “leaf” part of the plots. R users can create this plot via the function which is available from base R graphics functionality.
Next are the Chernoff face graphs. This is a type of visualization invited by Herman Chernoff to display multivariate data in the shape of a human face. The ways to see how each individual data point is differentiated is by how the Chernoff graph displays the individual parts of the face differently by the shape, size, placement, and orientation.
R users can create this type of visualization via the {aplpack} package, specifically the function. provided an example using the Palmer penguins data set that he used in his previous presentation.
Nowadays to achieve a similar goal to view differences in multivariate data, people can make radar plots or parallel coordinate plots.
Finally, talked about sun flower plots. Sunflower plots are a variant of the traditional scatter plot that tries to reduce over plotting by adding petals for areas on a plot where multiple data points have similar values. Base R has the function available for easy access to this type of visualization.
The next TokyoR meetup is scheduled for sometime near the end of October. Please follow the official TokyoR Twitter account to keep tabs on any new updates or you can visit the TokyoR website for details on past and future meetups. For the time being meetups will continue to be conducted online. Talks in English are also welcome so come join us!