One of pixieweb’s strengths is its ability to connect to any PX-Web instance with the same interface. This vignette shows how to compare data across national statistics agencies.
The honest truth about cross-country comparison: The pixieweb functions work identically across APIs, but the data is not harmonised. Table IDs, variable names, and code systems differ between countries. The workflow is: find a comparable table in each country (the hard part), then use identical pixieweb code to fetch and combine the results.
Prerequisite: This vignette assumes you are comfortable with the basics from
vignette("a-quickstart").
pixieweb ships with a catalogue of known PX-Web instances. You can also connect to any PX-Web API by providing a full URL.
scb <- px_api("scb", lang = "en") # Sweden (v2)
ssb <- px_api("ssb", lang = "en") # Norway (v2)
statfi <- px_api("statfi", lang = "en") # Finland (v1)Each API object stores the base URL, language, API version, and configuration (cell limits, rate limits):
PX-Web has two API versions: - v1: Legacy, POST-only data queries, no search endpoint. Table discovery requires walking a folder hierarchy. - v2: Modern, GET+POST data queries, full-text search, codelists endpoint, saved queries.
pixieweb handles both versions transparently. The user-facing functions have the same signatures — only the internal request building differs.
Some selection helpers are v2-only: px_bottom(),
px_from(), px_to(), and
px_range() will raise an informative error if used against
a v1 API.
Suppose you want to compare population data across Sweden and Norway. The table IDs and variable codes will differ, but the workflow is identical:
library(dplyr)
library(purrr)
# Find population tables in each country
scb_tables <- get_tables(scb, query = "population")
ssb_tables <- get_tables(ssb, query = "population")
# Explore a table from each
scb_tables |> table_describe(max_n = 3)
ssb_tables |> table_describe(max_n = 3)Note that table IDs are completely different between countries, and
variable names may also differ (“Region” in SCB vs other names
elsewhere). Always run variable_describe() on each table
before building your query:
# Fetch data using prepare_query() for quick exploration
scb_q <- prepare_query(scb, "TAB638",
Region = "00", # "Riket" (whole country)
Tid = px_top(5),
ContentsCode = "BE0101N1" # Population
)
# Norwegian table IDs are different — explore to find the right one
ssb_vars <- get_variables(ssb, "05803")
ssb_vars |> variable_describe()Since get_data() returns standard tibbles with a
table_id column, you can bind results from different
APIs:
results <- list(
sweden = get_data(scb, query = scb_q),
norway = get_data(ssb, "05803",
ContentsCode = "Personer",
Tid = px_top(5)
)
)
# .id = "country" adds a column tracking which list element each row
# came from — essential for traceability after binding
bind_rows(results, .id = "country")
# NOTE: column names may differ between countries. If so, you may need
# to rename() before bind_rows() to align them.lang = "en" gives the most consistent
labels across countries, but codes and table IDs are
language-independent.get_variables() |> variable_describe() on each table
before writing queries.api$config$max_cells to check. prepare_query()
respects the limit automatically.px_from(), px_range() etc.
raise an informative error if used against a v1 API. Check
api$version and the catalogue’s versions
column.vignette("introduction-to-pixieweb") covers codelists, wide
output, and query composition.vignette("a-quickstart") for the single-API basics.