Using Git/GitHub in RStudio ā¤ļøŽ

A Beginnerā€™s Guide to Version Control

š’®hilaan š’œlzahawļøŽi āˆˆ š’®tanford š’ženter for š’Ŗpen and ā„›eproducible š’®cience

Todayā€™s Plan

An image combining the logo of GitHub on the left and the logo of RStudo on the right, with a large blue plus sign in the middle. In terms of the GitHub logo, 'GitHub' is written in bold black font, accompanied by an image of a creature that appears to be the combination of an octopus and a cat. In terms of the RStudio logo, 'R' is written in blue font against a white background and placed in a circle. This circle is placed in a hexagon, in which 'Studio' is written in white font against a blue background and placed to the right of the 'R' circle.

  • Version Control: What?
  • Version Control: Why? Why not?
  • Version Control: How?
    • RStudio integration
    • No need for the command line!



Resources

Version Control: What?

  • Record the changes made over time to a file or a set of files in a folder
    • Your set of files can include, e.g., data, code, figures, tables, and reports
    • The folder that contains your files is called a repository
  • Allows tracking project history, reviewing changes, and reverting to earlier versions
  • Different version control systems exist

The Turing Way project illustration by Scriberia. On the left, we see a stack of documents named 'draft doc', 'final doc', 'doc_07', 'final_final', and a hand is pulling out a document named 'final_final_FINAL'. This appears to represent a manual approach to version control. On the right, we instead see an automated approach to version control. A hand is dialing a large blue button that ranges from V1 to V6. The dial is currently pointed towards V3, and in the background we see the third document light up.

Version Control: What?

Git + GitHub for R users

  • I focus on Git combined with GitHub, an online interface for using Git
  • Git + GitHub is one of the most popular and user-friendly version control systems
  • We can use Git/GitHub from within RStudio
    • No command line experience necessary! šŸ™

A cartoon showing two paths side-by-side. On the left is a scary spooky forest, with spiderwebs and gnarled trees, with a scared looking cute fuzzy monster running out of it. On the right is a bright, colorful path with flowers, rainbow and sunshine. A monster facing away from us in a backpack and walking stick is looking toward the right path. Although the illustration was initially created to signify the difference between manually setting working directories in R versus using a project-based approach combined with the here R package, to me, the left side represents using Git through the terminal while the right side represents using Git in RStudio.  Artwork by @allison_horst.

Version Control: What?



Version Control: What?



Version Control: Why?

  • Back up your work to a remote location
  • Improve reproducibility and transparency (benefit others and your future self)
  • Catch and fix mistakes
  • Time travel!

A cartoon Delorean, with several fuzzy monsters dressed in lab coats pouring date-times into the flux capacitor, with one holding a lubridate cheatsheet. One fuzzy monster is flying on a hoverboard, dressed like Marty McFly from Back to the Future. I include this image because Marty McFly and GitHub users have at least one thing in common: They can travel through time. Artwork by @allison_horst.

Version Control: Why not?

What are some challenges associated with Git/GitHub? Why would you not use version control?

  • Steep learning curve (but: pays off in the long-run!)
  • Collaboration
    • Do your collaborators (know how to) use GitHub?
  • Not always well suited for large data
  • What elseā€¦?

Version Control: How?

An image combining the logo of GitHub on the left and the logo of RStudo on the right, with a large blue plus sign in the middle. In terms of the GitHub logo, 'GitHub' is written in bold black font, accompanied by an image of a creature that appears to be the combination of an octopus and a cat. In terms of the RStudio logo, 'R' is written in blue font against a white background and placed in a circle. This circle is placed in a hexagon, in which 'Studio' is written in white font against a blue background and placed to the right of the 'R' circle.

Version Control: How?

Image of a web comic that shows one person sitting at a desk scrolling through their documents folder. The folder looks disorganized, and its files have uninformative names, like 'Untitled 241.doc' and 'Untitled 40 MOM ADDRESS.jpg'. Another person is standing behind the seated person, and says 'OH MY GOD.' The comic has a caption, reading 'PROTIP: NEVER LOOK IN SOMEONE ELSE'S DOCUMENTS FOLDER.'

GitHub in R: Set-up

  • Install or upgrade R, RStudio, and Git
  • Connect RStudio and GitHub



GitHub in R: Workflow

  • Create a GitHub repository
  • Clone a GitHub remote to your local computer using RStudio
  • Stage and commit changes to your local repository
  • Push your local changes to the GitHub remote

Version Control: Set-up

Install or upgrade R, RStudio, and Git

R.version.string # Older than 4.0.0? Update!
which git
git --version
  • No file path/version? Install Git
  • Configure Git (i.e., introduce yourself!šŸ‘‹) from within R
# Step 1: Install the usethis package
install.packages("usethis") 
# Step 2: Load the usethis package
library(usethis) 
# Step 3: Set your Git user name and email (must be the email associated with your GitHub account!)
  ## Option 1, code only: use_git_config()
use_git_config(user.name = "My Preferred Name", user.email = "same_as_my_github@email.com")
  ## Option 2, point and click: edit_git_config()
edit_git_config() #opens gitconfig file; now, add name and email

Version Control: Set-up

Connect RStudio and GitHub

  • Create a personal access token (PAT) on GitHub
usethis::create_github_token() #Opens a browser window to create PAT
  • Click Generate Token, and copy your PAT (you wonā€™t be able to see it again!)
  • Store your PAT to connect RStudio and GitHub
# Step 1: install the gitcreds package
install.packages("gitcreds")
# Step 2: load the gitcreds package
library(gitcreds)
# Step 3: set your git credentials
gitcreds_set() # paste your PAT when prompted
# Step 4: check that everything is set up correctly
gitcreds_get()

Version Control: Workflow

  • Create a (remote) GitHub repository
  • Clone the repository to your computer via RStudio
  • Make local changes
    • Run analyses, create tables and figures, write a reportā€¦
  • Commit your local changes (i.e., take a snapshot)
    • First, you stage files, telling Git which changes should be included in the next commit
    • Second, you commit the staged files, including a commit message
  • Push your local changes to the GitHub remote

Every time you finish a valuable chunk of work

ā€œThe fundamental unit of work in Git is a commit. A commit takes a snapshot of your code at a specified point in time. Using a Git commit is like using anchors and other protection when climbing. If youā€™re crossing a dangerous rock face, you want to make sure youā€™ve used protection to catch you if you fall. Commits play a similar role: if you make a mistake, you canā€™t fall past the previous commit.

Like rock climbing protection, you want to be judicious in your use of commits. Committing too frequently will slow your progress; use more commits when youā€™re in uncertain or dangerous territory. Commits are also helpful to others, because they show your journey, not just the destination.ā€ (Hadley Wickham & Jenny Bryan in R Packages Ch. 18: Git and GitHub)

Version Control: Workflow

Create a GitHub repository and clone it to your computer via RStudio

  • Log in to github.com

  • Create a new repository (click the large green New button)

    Show screenshot

  • Open your new repository and click the large green Code button

  • Copy the Clone HTTPS URL to your clipboard

    Show screenshot

  • Start a new Project in RStudio

    • File > New Project > Version Control > Git
      Show screenshot
    • Copy the URL of your new GitHub repo in Repository URL
      Show screenshot
    • Create project!

Version Control: Workflow

Commit local changes and push them to GitHub remote

  • Open your RStudio project and make local changes

  • Find the Git tab in the upper right pane

    Show screenshot

  • Check the Staged box for the files you want to commit

    • Click on Diff to see whatā€™s changed in the file since your last commit
      Show screenshot
  • Click Commit, type an informative Commit message and commit!

    Show screenshot

  • Simply click ā¬†ļø Push to push your local changes to your GitHub remote

    Show screenshot

  • Confirm that the local changes are now in your GitHub remote

    Show screenshot

Version Control: Workflow

Make remote changes and pull them to your local repository

  • Navigate to your GitHub remote and make changes (e.g., update README.md)
    Show screenshot
  • Commit your changes to the GitHub remote
    Show screenshot
  • In your RStudio local project, click the blue ā¬‡ļø Pull button
    Show screenshot
  • Confirm that your remote changes are now in your local repository
    Show screenshot

Best practices

An image that looks like a screenshot of someone's Git commits, showing a progression of Git commit messages over time. The earliest commit messages are informative, like 'created main loop & timing control'. The latest commit message is simply 'haaaaaaands'. The caption states: 'As a project drags on, my git commit messages get less and less informative.'
  • Each commit should be minimal (changes related to a single problem)
  • Each commit should be complete
  • Each commit message should be concise, yet informative (describe the why)
  • Donā€™t push it to your GitHub remote before it ā€˜worksā€™

Keep in mind:

  • Your future self is your most important collaborator!
  • Butā€¦ Some version control is better than no version control! Itā€™s okay to not be perfect. šŸ§˜šŸ½

Finally: Time travel šŸ”„

  • Revert an entire file to its previous commit
    • Right click on the file in the Git pane and select ā†©ļøŽļøRevert ā€¦
      Show screenshot
  • Revert part of a file to its previous commit
    • Open the Diff window (New text = light green, removed text = light red)

    • Click Discard chunk to undo a block of changes

      Show screenshot

    • Click Discard line to undo changes to individual lines

      Show screenshot

Other tips and tricks

Ignoring files

  • Are there files you donā€™t want to push to the GitHub remote (e.g., because theyā€™re too large)?
  • Add these files to your .gitignore file
    • Option 1: In the Git pane, right click on the file, and click šŸš« Ignoreā€¦
      Show screenshot
    • Option 2: In the Files pane, open your .gitignore and add the filename to the list
      Show screenshot



Things I didnā€™t cover todayā€¦

  • Experiment using branches
    • Maintain (risky) work in progress / different versions of your project
    • Satisfied? Get changes from one branch into another by merging with the main branch
  • Collaborating with others using GitHub
    • And merge conflictsā€¦ šŸ‘»

Thanks!



Questions and/or feedback? Reach out to me!

A digital cartoon with two illustrations: the top shows the R-logo with a scary face, and a small scared little fuzzy monster holding up a white flag in surrender while under a dark stormcloud. The text above says 'at first I was likeā€¦' The lower cartoon is a friendly, smiling R-logo jumping up to give a happy fuzzy monster a high-five under a smiling sun and next to colorful flowers. The text above the bottom illustration reads 'but now itā€™s likeā€¦' Artwork by Allison Horst