New to pixieweb? Start with
vignette("a-quickstart")for a hands-on walkthrough. This vignette covers the design and advanced features.
PX-Web is the statistical database platform used by national statistics agencies across the Nordic countries and beyond. Each agency runs its own instance (Statistics Sweden at scb.se, Statistics Norway at ssb.no, etc.), but they all share the same underlying API.
pixieweb provides a consistent, pipe-friendly R interface to all these APIs. It follows the same design principles as rKolada: tibbles everywhere, search-then-fetch, and progressive disclosure.
stop().PX-Web tables are multi-dimensional data cubes. Unlike Kolada — where the dimensions are always KPI, municipality, and period — each PX-Web table defines its own set of dimensions. pixieweb calls these variables.
| pixieweb entity | What it represents | rKolada analog |
|---|---|---|
| api | A PX-Web instance (SCB, SSB…) | (implicit — single) |
| table | A statistical table | kpi |
| variable | A dimension within a table | (municipality/year) |
| codelist | An aggregation/value set | kpi_groups |
| data | Downloaded values | values |
Tables are the central entity. get_tables() sends a
server-side search query. The result is a tibble with rich metadata:
tables <- get_tables(scb, query = "income") |>
table_search("taxable")
tables |> table_describe(max_n = 3)The table tibble includes subject path, time period range, time unit,
and data source — all of which are searchable by
table_search().
| Function | Purpose |
|---|---|
table_search() |
Filter by regex (client-side) |
table_describe() |
Print human-readable summaries |
table_minimize() |
Remove constant columns |
table_extract_ids() |
Extract ID vector for piping |
Each table has its own set of variables (dimensions). The key
discovery step is get_variables():
Important variable properties: - elimination: can
this variable be left out of your get_data() call? If
TRUE, omitting it means the API returns a pre-computed
total (e.g., omitting “Sex” gives the total for all sexes). If
FALSE, the variable is mandatory — you
must include it. - time: is this the time dimension? -
values: the available codes and their human-readable
labels. - codelists: alternative groupings
(e.g. municipalities → counties).
If you know exactly what you want:
pop <- get_data(scb, "TAB638",
Region = c("0180", "1480"),
Kon = c("1", "2"),
ContentsCode = "*",
Tid = px_top(5)
)Variables you omit are eliminated (aggregated) if the API allows it. If a variable is mandatory, you must include it.
| Helper | Meaning | Example |
|---|---|---|
c("0180") |
Specific values | Item selection |
"*" |
All values | Wildcard |
px_top(5) |
First N values | Most recent |
px_bottom(3) |
Last N values (v2 only) | |
px_from("2020") |
From value onward (v2) | |
px_to("2023") |
Up to value (v2) | |
px_range(a, b) |
Inclusive range (v2) |
prepare_query() shortcutFor interactive exploration, prepare_query() inspects
the table metadata and builds a query with sensible defaults:
Default strategy: - ContentsCode: all values
("*") - Time variable: latest 10 periods
(px_top(10)) - Eliminable variables:
omitted (API aggregates) - Small mandatory variables (≤
22 values): all ("*") - Large mandatory
variables: first value (px_top(1))
Override specific variables while letting defaults handle the rest:
With maximize_selection = TRUE, the function expands
unspecified variables to include as many values as possible while
staying under the API’s cell limit.
Then fetch:
The sections below cover features you may not need on your first query, but that become essential for complex tables or cross-country work.
Codelists provide alternative groupings of variable values. They are useful when you want data at a different aggregation level than the table’s default. For example, a “Region” variable with 290 municipalities might have a codelist that groups them into 21 counties:
When a table has multiple content variables (e.g. both Population and
Deaths), use .output = "wide" to pivot them into separate
columns. This is useful when you want to compute with multiple
measures (e.g. death rate = Deaths / Population):
For full control over the HTTP request — useful for debugging or when you need to inspect/modify the exact query before sending it — use the low-level query composers:
PX-Web v2 supports server-side stored queries. Useful for recurring reports — save a query once, then retrieve it by ID later: