Logo

The Data Daily

rtoot: Collecting and Analyzing Mastodon Data | R-bloggers

rtoot: Collecting and Analyzing Mastodon Data | R-bloggers

schochasticsR-bloggershereclick herehere [This article was first published on , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It has been a wild view days on Twitter after Elon Musk took over. The future of the platform is unclear and many users are looking for alternatives, a popular one being mastodon. I also decided to give it a try and signed up. I quite quickly became interested in its API and realized that there is only a seemingly unmaintained R package on github. So I decided to write a new one. Fast forward a week(!!!!) and the package was accepted by CRAN. In this post I will introduce some of the functionality of the package and a roadmap for the future. (The name of the package derives from “toot”, the equivalent of a “tweet”)

Before doing anything you should setup credentials. Once setup, you will not need to bother with that anymore (hopefully). There is a vignette in the package () which explains the process. In brief, Mastodon has three types of API calls: anonymous, public, and user based. For anonymous calls you do not need any token. A public token can be obtained without an account and gives a few more API call options. A user based grants access to all endpoints but requires an account. Running the function will guide you through a process of setting up a token.

In contrast to twitter, mastodon is not a single instance, but a federation of different servers. You sign up at a specific server (say “mastodon.social”) but can still communicate with others from other servers (say “fosstodon.org”). The existence of different instances makes API calls more complex. For example, some calls can only be made within your own instance (e.g ), others can access all instances but you need to specify the instance as a parameter (e.g. ). A list of active instances can be obtained with . The results are sorted by number of users. General information about an instance can be obtained with str(get_instance_general(instance = "mastodon.social"))
## List of 16
## $ uri : chr "mastodon.social"
## $ title : chr "Mastodon"
## $ short_description: chr "The original server operated by the Mastodon gGmbH non-profit"
## $ description : chr ""
## $ email : chr "[email protected]"
## $ version : chr "4.0.0rc1"
## $ urls :List of 1
## ..$ streaming_api: chr "wss://mastodon.social"
## $ stats :List of 3
## ..$ user_count : int 831723
## ..$ status_count: int 41091494
## ..$ domain_count: int 30169
## $ thumbnail : chr "https://files.mastodon.social/site_uploads/files/000/000/001/@1x/57c12f441d083cde.png"
## $ languages :List of 1
## ..$ : chr "en"
## $ registrations : logi FALSE
## $ approval_required: logi FALSE
## $ invites_enabled : logi TRUE
## $ configuration :List of 4
## ..$ accounts :List of 1
## .. ..$ max_featured_tags: int 10
## ..$ statuses :List of 3
## .. ..$ max_characters : int 500
## .. ..$ max_media_attachments : int 4
## .. ..$ characters_reserved_per_url: int 23
## ..$ media_attachments:List of 6
## .. ..$ supported_mime_types :List of 28
## .. .. ..$ : chr "image/jpeg"
## .. .. ..$ : chr "image/png"
## .. .. ..$ : chr "image/gif"
## .. .. ..$ : chr "image/heic"
## .. .. ..$ : chr "image/heif"
## .. .. ..$ : chr "image/webp"
## .. .. ..$ : chr "image/avif"
## .. .. ..$ : chr "video/webm"
## .. .. ..$ : chr "video/mp4"
## .. .. ..$ : chr "video/quicktime"
## .. .. ..$ : chr "video/ogg"
## .. .. ..$ : chr "audio/wave"
## .. .. ..$ : chr "audio/wav"
## .. .. ..$ : chr "audio/x-wav"
## .. .. ..$ : chr "audio/x-pn-wave"
## .. .. ..$ : chr "audio/vnd.wave"
## .. .. ..$ : chr "audio/ogg"
## .. .. ..$ : chr "audio/vorbis"
## .. .. ..$ : chr "audio/mpeg"
## .. .. ..$ : chr "audio/mp3"
## .. .. ..$ : chr "audio/webm"
## .. .. ..$ : chr "audio/flac"
## .. .. ..$ : chr "audio/aac"
## .. .. ..$ : chr "audio/m4a"
## .. .. ..$ : chr "audio/x-m4a"
## .. .. ..$ : chr "audio/mp4"
## .. .. ..$ : chr "audio/3gpp"
## .. .. ..$ : chr "video/x-ms-asf"
## .. ..$ image_size_limit : int 10485760
## .. ..$ image_matrix_limit : int 16777216
## .. ..$ video_size_limit : int 41943040
## .. ..$ video_frame_rate_limit: int 60
## .. ..$ video_matrix_limit : int 2304000
## ..$ polls :List of 4
## .. ..$ max_options : int 4
## .. ..$ max_characters_per_option: int 50
## .. ..$ min_expiration : int 300
## .. ..$ max_expiration : int 2629746
## $ contact_account :List of 22
## ..$ id : chr "1"
## ..$ username : chr "Gargron"
## ..$ acct : chr "Gargron"
## ..$ display_name : chr "Eugen ????"
## ..$ locked : logi FALSE
## ..$ bot : logi FALSE
## ..$ discoverable : logi TRUE
## ..$ group : logi FALSE
## ..$ created_at : chr "2016-03-16T00:00:00.000Z"
## ..$ note : chr " Founder, CEO and lead developer get_instance_activity() shows the activity for the last three months and get_instance_trends() the trending hashtags of the week. get_instance_activity(instance = "fosstodon.org") ## # A tibble: 12 × 4 ## week statuses logins registrations ## ## 1 2022-11-10 21:47:00 13647 7623 691 ## 2 2022-11-03 21:47:00 23227 11913 3401 ## 3 2022-10-27 21:47:00 0 0 0 ## 4 2022-10-20 21:47:00 0 0 0 ## 5 2022-10-13 21:47:00 0 0 0 ## 6 2022-10-06 21:47:00 0 0 0 ## 7 2022-09-29 21:47:00 0 0 0 ## 8 2022-09-22 21:47:00 0 0 0 ## 9 2022-09-15 21:47:00 0 0 0 ## 10 2022-09-08 21:47:00 0 0 0 ## 11 2022-09-01 21:47:00 0 0 0 ## 12 2022-08-25 21:47:00 0 0 0 get_instance_trends(instance = "fosstodon.org") ## # A tibble: 70 × 5 ## name url day accou…¹ uses ## ## 1 followbackfriday https://fosstodon.org/tags/followb… 2022-11-11 175 246 ## 2 followbackfriday https://fosstodon.org/tags/followb… 2022-11-10 3 3 ## 3 followbackfriday https://fosstodon.org/tags/followb… 2022-11-09 2 2 ## 4 followbackfriday https://fosstodon.org/tags/followb… 2022-11-08 1 1 ## 5 followbackfriday https://fosstodon.org/tags/followb… 2022-11-07 0 0 ## 6 followbackfriday https://fosstodon.org/tags/followb… 2022-11-06 0 0 ## 7 followbackfriday https://fosstodon.org/tags/followb… 2022-11-05 0 0 ## 8 followfriday https://fosstodon.org/tags/followf… 2022-11-11 246 352 ## 9 followfriday https://fosstodon.org/tags/followf… 2022-11-10 26 30 ## 10 followfriday https://fosstodon.org/tags/followf… 2022-11-09 12 31 ## # … with 60 more rows, and abbreviated variable name ¹​accounts

To get the most recent toots of a specific instance use get_timeline_public(instance = "mastodon.social")
## id uri created_at content visib…¹ sensi…² spoil…³ reblo…⁴ favou…⁵ repli…⁶
##
## 1 10931614… http… 2022-11-09 22:12:13 " Th… public FALSE "" 0 0 0
## # … with 19 more variables: url , in_reply_to_id , in_reply_to_account_id ,
## # language , text , application
>, poll


>, card


>,
## # account

, reblog


>, media_attachments


>, mentions


>,
## # tags


>, emojis


>, favourited

, reblogged

, muted

,
## # bookmarked

, pinned

, and abbreviated variable names ¹​visibility, ²​sensitive,
## # ³​spoiler_text, ⁴​reblogs_count, ⁵​favourites_count, ⁶​replies_count
## # ℹ Use `colnames()` to see all variable names













To get the most recent toots containing a specific hashtag use get_timeline_hashtag(hashtag = "rstats", instance = "fosstodon.org")
## # A tibble: 20 × 29
## id uri created_at content visib…¹ sensi…² spoil…³ reblo…⁴
##
## 1 1093260576… http… 2022-11-11 16:12:55 " I … public FALSE "" 3
## 5 1093259083… http… 2022-11-11 15:35:34 " Pe… public FALSE "" 0
## 6 1093259018… http… 2022-11-11 15:34:06 " I'… public FALSE "" 1
## 7 1093258952… http… 2022-11-11 15:32:55 " Wh… public FALSE "" 0
## 8 1093258902… http… 2022-11-11 15:31:37 " Cu… public FALSE "" 4
## 9 1093258386… http… 2022-11-11 15:18:31 " Is… public FALSE "" 0
## 10 1093258337… http… 2022-11-11 15:17:16 " Th… public FALSE "" 4
## 12 1093258124… http… 2022-11-11 15:11:51 " It… public TRUE "" 0
## 13 1093257660… http… 2022-11-11 15:00:02 " If… public FALSE "" 1
## 14 1093257302… http… 2022-11-11 14:50:48 " Cr… public FALSE "" 0
## 15 1093257130… http… 2022-11-11 14:46:34 " 2/… public FALSE "" 4
## 16 1093257094… http… 2022-11-11 14:45:39 " 1/… public FALSE "" 25
## 17 1093257067… http… 2022-11-11 14:20:41 " Fo… public TRUE "Decis… 0
## 18 1093256660… http… 2022-11-11 14:34:34 " Tr… public FALSE "" 2
## 19 1093256557… http… 2022-11-11 14:31:59 " He… public FALSE "" 1
## 20 1093256340… http… 2022-11-11 14:26:28 " I … public FALSE "" 0
## # … with 21 more variables: favourites_count , replies_count ,
## # url , in_reply_to_id , in_reply_to_account_id ,
## # language , text , application >, poll >,
## # card >, account , reblog >,
## # media_attachments >, mentions >, tags ,
## # emojis >, favourited , reblogged , muted ,
## # bookmarked , pinned , and abbreviated variable names … The function allows you to get the most recent toots from your own timeline.

exposes several account level endpoints. Most require the account id instead of the username as an input. There is, to our knowledge, no straightforward way of obtaining the account id. With the package you can get the id via . search_accounts("schochastics")
## # A tibble: 2 × 21
## id usern…¹ acct displ…² locked bot disco…³ group created_at
##
## 1 10930243… schoch… scho… David … FALSE FALSE FALSE FALSE 2022-11-07 00:00:00
## 2 10926171… schoch… scho… David … FALSE FALSE FALSE FALSE 2022-10-30 00:00:00
## # … with 12 more variables: note (Future versions will allow to use the username and user id interchangeably) Using the id, you can get the followers and following users with and and statuses with . id

Images Powered by Shutterstock