The Executive Communcations Dataset (ECD) is a dataset comprised of
executive communications across 41 differenct countries. The
ecdata
package is a minimal package to download data from
the ecd repositories. It includes caching and data dicitionaries.
load_ecd
The default function for loading the ECD is load_ecd
.
This function will download data from our repositories and load them
into memory. You can load the full ECD by setting
load_ecd(full_ecd = TRUE)
This can take awhile because you
are downloading a 1.9GB
parquet file.
If you want a specific country or countries you can feed a character
vector to the country
argument.
The country argument tolerates some typos, common abbreviations, and
common country names. If you want to load data based on the language of
the statement you can provide a character string or character vector of
languages to the language
argument.
english = load_ecd(language = 'English')
polyglot = load_ecd(language = c('French', 'Italian', 'Korean'))
For a full list of accepted country names and abbreviations you can
call ecd_country_dictionary
ecd_country_dictionary |>
head()
#> name_in_dataset file_name language abbr_three_letter abbr_two_letter
#> 1 Argentina argentina Spanish ARG AR
#> 2 Australia australia English AUS AU
#> 3 Austria austria English AUT AT
#> 4 Azerbaijan azerbaijan English AZE AZ
#> 5 Azerbaijan azerbaijan English AZE AZ
#> 6 Bolivia bolivia Spanish BOL BO
#> other_valid_inputs common_abr
#> 1 <NA> <NA>
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 4 <NA> <NA>
#> 5 <NA> <NA>
#> 6 <NA> <NA>
Note that the time to download and load a file will vary a lot due to various file sizes.
lazy_load_ecd
We also have a “lazy” option which will download the files and then
use arrow::open_dataset
to open the dataset out of
memory.
To bring the dataset into memory you simply need to call.
This has some speed benefits when data wrangling. One thing to be aware of is that if you lazy load a dataset previously it may bring in additional files. To prevent this behavior run
Then restart your R session.