To install R and RStudio on your own laptop:
install.packages(c("ggplot2","reshape2","plyr","dplyr","stringr"), repos="https://cloud.r-project.org")
For Mac users, depending on your version of OSX and the version of the
packages, you may get errors when R can’t install binary packages and needs to
compile these packages from the original source code. Doing so requires you
have the Apple developer tools installed. You can force the installation of
those tools by opening the Terminal application and invoking the xcode-select
--install
command and following the prompts to install the command line
developer tools.
Be sure to solve any installation issues by consulting with myself or one of the ASIs.
Recall that you can run your entire R program by using the “Source” button in the top right of the editor window (it works similar to the “green arrow” in Thonny).
We will be reimplementing our Zipf’s law lab in R. The program will generate the same two outputs as before:
In this implementation we will not use any loops; instead all of our computations will be vectorized.
Write a R function count_words
that takes a vector of words as a parameter
and returns a data frame of the words and their counts (with column labels
“Word” and “Count”). There are many ways to go about this using both R built-in
functions and the plyr
package, most concisely the plyr count
function.
Note that in some versions of R/RStudio, the count
function in plyr is
getting overridden by another function with the same name in a different
package (but different functionality). To prevent that problem, use the full
qualified name, i.e., plyr::count
.
Here is an example call to count_words
and its output:
> count_words(c("a", "a", "the", "a", "in", "the"))
Word Count
1 a 3
2 in 1
3 the 2
Depending on your approach you may need to change the column names.
You can do so by assigning to the result of the colnames
function, e.g.,
> frame <- data.frame(a=c(1, 2), b=c(2, 3))
> colnames(frame)
[1] "a" "b"
> colnames(frame) <- c("col1", "col2")
> frame
col1 col2
1 1 2
2 2 3
Submit your function in a R file named prelab11.R to Gradescope. You can submit multiple times, with only the most recent submission graded. Note that the tests performed by Gradescope are limited. Passing all of the visible tests does not guarantee that your submission correctly satisfies all of the requirements of the assignment.
Gradescope does not integrate as tightly with R as it does with Python, thus you won’t see the same list of passing and failing tests. Instead pay attention to the Autograder Output window, like shown below:
> library("testthat"); test_file("prelab11.R");
✔ | OK F W S | Context
✔ | 4 | prelab11 [0.1 s]
══ Results ═════════════════════════════════════════════════════════════════════
Duration: 0.2 s
OK: 4
Failed: 0
Warnings: 0
Skipped: 0
You are looking to see all ✔ marks and for Failed: 0
, i.e., there are zero
failing tests.