Introduction

The Executive Communcations Dataset (ECD) is a dataset comprised of executive communications across 41 differenct countries. The ecdata package is a minimal package to download data from the ecd repositories. It includes caching and data dicitionaries.

`load_ecd`

The default function for loading the ECD is load_ecd. This function will download data from our repositories and load them into memory. You can load the full ECD by setting load_ecd(full_ecd = TRUE) This can take awhile because you are downloading a 1.9GB parquet file.


full_ecd = load_ecd(full_ecd = TRUE)

If you want a specific country or countries you can feed a character vector to the country argument.


load_ecd(country = 'Greece')

The country argument tolerates some typos, common abbreviations, and common country names. If you want to load data based on the language of the statement you can provide a character string or character vector of languages to the language argument.


english = load_ecd(language = 'English')

polyglot = load_ecd(language = c('French', 'Italian', 'Korean'))

For a full list of accepted country names and abbreviations you can call ecd_country_dictionary


ecd_country_dictionary |>
  head()
#>   name_in_dataset  file_name language abbr_three_letter abbr_two_letter
#> 1       Argentina  argentina  Spanish               ARG              AR
#> 2       Australia  australia  English               AUS              AU
#> 3         Austria    austria  English               AUT              AT
#> 4      Azerbaijan azerbaijan  English               AZE              AZ
#> 5      Azerbaijan azerbaijan  English               AZE              AZ
#> 6         Bolivia    bolivia  Spanish               BOL              BO
#>   other_valid_inputs common_abr
#> 1               <NA>       <NA>
#> 2               <NA>       <NA>
#> 3               <NA>       <NA>
#> 4               <NA>       <NA>
#> 5               <NA>       <NA>
#> 6               <NA>       <NA>

Note that the time to download and load a file will vary a lot due to various file sizes.

`lazy_load_ecd`

We also have a “lazy” option which will download the files and then use arrow::open_dataset to open the dataset out of memory.


nigeria = lazy_load_ecd(country = 'Nigeria')

To bring the dataset into memory you simply need to call.


nigeria |>
  dplyr::collect()

This has some speed benefits when data wrangling. One thing to be aware of is that if you lazy load a dataset previously it may bring in additional files. To prevent this behavior run


clear_cache()

Then restart your R session.