pixieweb makes it easy to download open statistical data from PX-Web APIs — the platform used by Statistics Sweden (SCB), Statistics Norway (SSB), Statistics Finland, and many others. This vignette walks you from zero to a tidy tibble in five steps.
px_api() accepts a short alias ("scb",
"ssb", "statfi") or a full URL. Use
px_api_catalogue() to list known instances.
PX-Web organises data into tables. Each table holds
a data cube with one or more dimensions (called
variables). Use get_tables() to
search:
The result is a tibble. You can narrow it further on the client side
with table_search(), and inspect tables with
table_describe():
table_describe() now shows the subject path, time period
range, and data source alongside the title — making it much easier to
pick the right table.
Once you have a table ID, inspect what variables (dimensions) it has:
Each variable has a set of available values (codes). Look at a specific variable’s values:
Now you know which variables the table has and what values are
available. Pass your selections to get_data():
"*" means “all measures in this
table”.Kon
gives totals for both sexes). Not all variables allow this; see
vignette("introduction-to-pixieweb") for mandatory vs
eliminable.pop <- get_data(scb, "TAB638",
Region = c("0180", "1480"),
ContentsCode = "*",
Tid = px_top(5)
)
popSelection helpers like px_top(), px_from(),
and px_range() let you select values without knowing exact
codes. Use them when you want “the latest N periods” or “everything from
2020 onward” rather than typing out specific year codes.
prepare_query()You can skip this section if you prefer the direct approach above.
prepare_query() inspects the table and fills in sensible
defaults — handy when you don’t want to specify every variable:
It prints a summary of what was chosen and why. When you’re happy,
pass the query to get_data():
Set maximize_selection = TRUE to automatically include
as many variables as the API’s cell limit allows:
The result is a standard tibble. Use your favourite tidyverse tools:
library(ggplot2)
pop |>
ggplot(aes(x = Tid, y = value, colour = Region_text)) +
# One line per region
geom_line(aes(group = Region_text)) +
# Separate panel for each measure (Population, Deaths, etc.)
facet_wrap(~ ContentsCode_text, scales = "free_y") +
# Rotate x-axis labels to avoid overlap
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) +
labs(
title = "Population over time",
caption = px_cite(pop) # Auto-generated data citation
)Notice the _text suffix: get_data() returns
both raw code columns (Region = "0180") and human-readable
label columns (Region_text = "Stockholm"). Use
_text columns for display and plotting; use the raw codes
for filtering and joining.
Other useful helpers:
data_minimize() — remove columns where all values are
identicaldata_legend() — generate a caption string from variable
metadatapx_cite() — create a citation for the downloaded
datavignette("introduction-to-pixieweb") covers the data model,
codelists, saved queries, and query composition.vignette("multi-api") shows how to compare data across
national statistics agencies.