quanteda.tidy

CRAN_Status_Badge Lifecycle: experimental R-CMD-check Codecov test coverage

About

quanteda.tidy extends the quanteda package with functionality from the “tidyverse”, especially dplyr.
Note that this is not the same as tidytext, which stretches tokens into data.frames. Instead, tidy functions operate only on document variables, but extends these functions (from dplyr) to work on quanteda objects as if they were tibbles or data.frames.

Installation

You can install the stable version of quanteda.tidy from CRAN:

install.packages("quanteda.tidy")

Or install the development version from GitHub:

pak::pkg_install("quanteda/quanteda.tidy")

Overview of Functions

The functions in quanteda.tidy are organized into four categories, following the dplyr documentation:

Category Function Description
Rows filter() Subset documents based on docvar conditions
Rows slice(), slice_head(), slice_tail() Subset documents by position
Rows slice_sample() Randomly sample documents
Rows slice_min(), slice_max() Select documents with min/max docvar values
Rows arrange(), distinct() Reorder documents; keep unique documents
Columns select() Keep or drop docvars by name
Columns rename(), rename_with() Rename docvars
Columns relocate() Change docvar column order
Columns mutate(), transmute() Create or modify docvars
Columns pull() Extract a single docvar as a vector
Columns glimpse() Get a quick overview of the corpus
Groups of rows add_count() Add count by group as a docvar
Groups of rows add_tally() Add total count as a docvar
Pairs of data frames left_join() Join corpus with external data frame

Example

Adding a document variable for full president name:

library("quanteda.tidy", warn.conflicts = FALSE)
## Loading required package: quanteda
## Package version: 4.3.1
## Unicode version: 14.0
## ICU version: 71.1
## Parallel computing: disabled
## See https://quanteda.io for tutorials and examples.

data_corpus_inaugural %>%
  mutate(fullname = paste(FirstName, President, sep = ", ")) %>%
  summary(n = 5)
## Corpus consisting of 60 documents, showing 5 documents:
## 
##             Text Types Tokens Sentences Year  President FirstName
##  1789-Washington   625   1537        23 1789 Washington    George
##  1793-Washington    96    147         4 1793 Washington    George
##       1797-Adams   826   2577        37 1797      Adams      John
##   1801-Jefferson   717   1923        41 1801  Jefferson    Thomas
##   1805-Jefferson   804   2380        45 1805  Jefferson    Thomas
##                  Party           fullname
##                   none George, Washington
##                   none George, Washington
##             Federalist        John, Adams
##  Democratic-Republican  Thomas, Jefferson
##  Democratic-Republican  Thomas, Jefferson