4.8 Additional resources

rio alternatives. While rio is a great Swiss Army knife of file handling, there may be times when you want a bit more control over how your data is pulled into or saved out of R. In addition, there have been times when I’ve had a challenging data file that rio choked on but another package could handle it. Some other functions and packages you may want to explore:

  • Base R’s read.csv() and read.table() to import text files (use ?read.csv and ?read.table to get more information). stringsAsFactors = FALSE is needed with these if you want to keep your character strings as character strings. write.csv() will save to CSV.

  • rio uses Hadley Wickham’s readxl package for reading Excel files. Another alternative for Excel is openxlsx, which can write to an Excel file as well as read one. Look at the openxlsx package vignettes for information about formatting your spreadsheets as you export.

  • Wickham’s readr package is also worth a look as part of the “tidyverse.” readr includes functions to read CSV, tab-separated, fixed-width, Web logs, and several other types of files. readr prints out the type of data it has determined for each column – integer, character, double (non-whole numbers), etc. It creates tibbles.

Import directly from a Google spreadsheet. The googlesheets package lets you import data from a Google Sheet, even if it’s private, by authenticating your Google account. The package is available on CRAN; install it with with install.packages("googlesheets"). After loading it with library("googlesheets"), read the excellent introductory vignette. At the time of this writing, the intro vignette was available within R at vignette("basic-usage", package="googlesheets"). If you don’t see it, try help(package="googlesheets") and click on the “User guides, package vignettes and other documentation” link for available vignettes, or look at the package information on GitHub at https://github.com/jennybc/googlesheets.

‘Scrape’ data from Web pages with the rvest package and SelectorGadget browser extension or JavaScript bookmarklet. SelectorGadget helps you discover the CSS elements of data you want to copy that are on an HTML page; then rvest uses R to find and save that data. This is not a technique for raw beginners, but once you’ve got some R experience under your belt, you may want to come back and re-visit this. I have some instructions and a video on how to do this at http://bit.ly/Rscraping. RStudio has a webinar available on demand as well, at https://www.rstudio.com/resources/webinars/extracting-data-from-the-web-part-2/ .

Alternatives to base R’s save and read functions. If you are working with large data sets, speed may become important to you when saving and loading files. The data.table package has a speedy fread() function, but beware that resulting objects are data.tables and not plain data frames; some behaviors are different. If you want a conventional data frame, you can get one with the as.data.frame(mydatatable) syntax. The data.table package’s fwrite() function is aimed at writing to a CSV file considerably faster than base R’s write.csv().

Two other packages might be of interest for storing and retrieving data. The feather package saves in a binary format that can be read either into R or Python. And, the fst package’s read.fst() and write.fst() offer fast saving and loading of R data frame objects – plus the option of file compression.