db2pq is an R package for moving data from PostgreSQL
into Apache Parquet files. It is designed for both general PostgreSQL
sources and the WRDS PostgreSQL service.
last_modified metadata embedded in
Parquet files.# install.packages("pak")
pak::pak("iangow/db2pqr")You can pass a WRDS username directly for a first call:
library(db2pq)
wrds_update_pq("dsi", "crsp", wrds_id = "your_wrds_id")For repeated use, configure the WRDS username and PostgreSQL password
outside the call. The authentication
article documents the WRDS_ID, .pgpass,
and wrds::wrds_set_credentials() paths. The remaining WRDS
examples assume that setup is in place.
wrds_update_pq("dsi", "crsp", force = TRUE)wrds_update_pq("dsi", "crsp", use_sas = TRUE)SSH setup is only needed for this SAS metadata path. See the WRDS SSH setup article for the key-based setup used by that option.
wrds_schema_to_pq("crsp")wrds_sql_to_pq(
"SELECT permno, date, ret FROM crsp.dsf WHERE date >= '2024-01-01'",
table_name = "dsf_recent",
schema = "crsp"
)db_to_pq(
table_name = "company",
schema = "comp",
keep = c("gvkey", "conm"),
rename = c(conm = "company_name")
)pq_last_modified(schema = "crsp")The stable default transfer path uses DBI/RPostgres. The optional
ADBC path can be selected with transfer_method = "adbc"
when adbi and a PostgreSQL ADBC driver are installed:
adbc_diagnostics()
wrds_update_pq("dsi", "crsp", transfer_method = "adbc")If ADBC reports an SSL/libpq error, use
transfer_method = "dbi" or install a current SSL-capable
adbcpostgresql build.
Files are organized as:
<DATA_DIR>/<schema>/<table>.parquet
For example:
~/pq_data/crsp/dsi.parquet
The DATA_DIR environment variable sets the root
directory. It can also be passed directly as data_dir to
any function.
When archive = TRUE, replaced files are moved to:
<DATA_DIR>/<schema>/archive/<table>_<timestamp>.parquet
MIT License. See LICENSE.md.