Logo

The Data Daily

GitHub – The Perks of Collaboration and Version Control | R-bloggers

GitHub – The Perks of Collaboration and Version Control | R-bloggers

GitHub – The Perks of Collaboration and Version Control
Posted on September 19, 2022 by R-post on Cosima Meyer in R bloggers | 0 Comments
[This article was first published on R-post on Cosima Meyer , and kindly contributed to R-bloggers ]. (You can report issue about the content on this page here )
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Let’s talk about version control and collaboration today and one of its powerful tools: git ✨
????What is Git?
Using Git can be a lifesaver (and it has often been one in the past for me ????). It’s basically like a mini time travel machine that you use – it allows you to have version control of your work progress. But unlike Dropbox or other tools, it does not automatically save the status quo of your work but requires you to do it actively with commits and pushes. A typical workflow looks like this:
Alternative text
Image showing a git workflow from the working directory to the remote repo. Working directory → Staging area → local repo → remote repo and also common git commands (git add code.R, git commit -m “Update”, git push, git pull, git checkout, git merge)
RStudio has a nice GUI that allows you to do everything without writing code – but if you need to remember some commands, it’s most likely git add, git commit, git push, git pull, and git status (to check if you have uncommitted files) ????
Here’s what the typical workflow can look like in action:
Alternative text
GIF showing the commands git add, git commit, git push, git pull in a sequential order
You start with your local repository on your own machine, work on your code and do some changes. Now the #git workflow starts ????
git add: Once you made some changes, this command lets you add them to the staging area (this is an essential step before committing them and tells git that these are the files you want to commit in your next commit) ????
git commit: Once you made some changes, this allows you to “commit” them and to “version control” them in git. I talked to many people and I couldn’t find a best practice on how often you should send commits. I like to think of them as a status report or a (small) milestone to which you may want to return to. So I try to send a commit once a (thematic) step is reached.
git push: If you hit this command, you will push one (or more) commits to the remote repository
git pull: This is usually one of the first commands I execute – it pulls changes from others and makes sure that you’re working on the most current version ????
git status: This command allows you to check if you have still some uncommitted changes in files ????????
But there are many more commands out there! When I get lost, I usually find myself here looking things up at Atlassian ????????‍????
You have probably also heard of branches and merges in Git — this is an excellent way to collaborate with others. The GIF shows how you start working from the main branch (this is where all the changes should eventually end up and where your final product lives). Each dot shows a new commit that is pushed:
Alternative text
GIF showing how a feature branch evolves from a main branch and is then guided back (merged) into the main branch
Once you want to make changes (like integrating a new function in your package) you start a new feature branch. The feature branch eventually goes back to the main branch (this is what we call “merging”). The cool thing is that you can somewhat work independently from your colleagues or collaborators on individual tasks because they can start their own feature branch. Merging back feature branches (in the best case) requires a code review – you can also do this on GitHub and I’m a big fan of it because it makes you a better programmer step-by-step and allows sharing knowledge.
I learned it the hard way but it’s best if feature branches don’t get too long and complicated because it easily becomes hard to review them ????
If you want to visualize it yourself, here’s a slide deck ????????‍???? that explains the workflows and more.
????????‍????How do you use GitHub and RStudio?
If you connect your local repository with a global repository (for instance on GitHub), you’ll be able to store it also in the cloud and access it from everywhere. Setting up this connection is extremely easy – just follow these steps:
Alternative text
Visualization showing a typical workflow when using GitHub in RStudio with a new project: 1) Create a new repository on GitHub, 2) Open . Rproj in RStudio, 3) Connect with GitHub – and now it’s time to pull, commit and push ????
You can see one detailed use case in the GIF below. It shows how I typically set up a project with GitHub when working in academia.
Alternative text
GIF showing how I typically set up a project:
Create a new GitHub repository
Create an .Rproj
Link it with your GitHub repository
You’re all set – the version control is up and running
Populate your project with code (see below)
Commit and push your changes
I create a GitHub repository first (depending on data privacy and other things, I go for either public or private but I always add a README. READMEs are great because they allow you write a short description of your repository in markdown).
Then I go back to my RStudio desktop version and select “File” > “New project”. To enable version control, select here “Version control” and then copy-paste the link from your GitHub repository
Alternative text
Screenshot showing a green “Code” button on GitHub that reveals a HTTPS based URL that you can copy
A new project opens and your version control is up and running ????
I realized that I usually start with a similar setup when working on an academic project, so I wrote a few code snippets that populate my .Rproj with files and folders. It’s described in more detail in this blog post that I wrote in 2020
# Set up the folder structure folder_names

Images Powered by Shutterstock