widr

widr provides direct API access to the World Inequality Database (WID) from R. It offers validated variable codes, structured downloads as standard data frames, and helpers for currency conversion, inequality measurement, and plotting. Independent implementation, unaffiliated with the World Inequality Lab (WIL) or the Paris School of Economics. Data are sourced from WID and maintained by WIL.

Installation

install.packages("widr")

# Development version
remotes::install_github("cherylisabella/widr")

Variable codes

WID variables follow a four-part grammar:

<type:1> <concept:5-6> [<age:3>] [<pop:1>]

Component	Width	Example	Meaning
`type`	1 letter	`s`	share
`concept`	5-6 letters	`ptinc`	pre-tax national income
`age`	3 digits	`992`	adults 20+
`pop`	1 letter	`j`	equal-split between spouses

sptinc992j denotes the share of pre-tax national income for equal-split adults aged 20+.

The full catalogue is available at World Inequality Database; widr bundles it as six searchable reference tables.

wid_search("national income")                           # keyword search across concepts
wid_decode("sptinc992j")                                # parse into components
wid_encode("s", "ptinc", age = "992", pop = "j")       # build from components
wid_is_valid(series_type = "s", concept = "ptinc")      # non-throwing validation

The six reference tables (wid_series_types, wid_concepts, wid_ages, wid_pop_types, wid_percentiles, wid_countries) are lazy-loaded and compiled from the codes dictionary by an independent script.

Downloading data

download_wid() returns a wid_df, a classed data.frame fully compatible with dplyr, ggplot2, and base R. At minimum supply indicators or areas; all other parameters default to "all" (age to "992", pop to "j").

library(widr)

# Top 1% pre-tax income share, United States, 2000-2022
top1 <- download_wid(
  indicators = "sptinc992j",
  areas      = "US",
  perc       = "p99p100",
  years      = 2000:2022
)

top1
#> <wid_df>  23 rows | 1 countries | 1 variables
#>   country   variable percentile year  value age pop
#> 1      US sptinc992j  p99p100   2000  0.168 992   j
#> ...

Data is retrieved from the WID webservice at https://rfap9nitz6.execute-api.eu-west-1.amazonaws.com/prod.

Multiple countries and percentiles

shares <- download_wid(
  indicators = "sptinc992j",
  areas      = c("US", "FR", "DE", "CN"),
  perc       = c("p90p100", "p99p100"),
  years      = 1980:2022
)

Excluding interpolated points

Many series are linearly interpolated between survey years. Pass include_extrapolations = FALSE to retain only directly observed observations:

download_wid("sptinc992j", areas = "MZ", include_extrapolations = FALSE)

Source metadata

metadata = TRUE attaches source and methodological documentation as an attribute — the shape of the data frame is unchanged:

result <- download_wid("sptinc992j", areas = "US", metadata = TRUE)
attr(result, "wid_meta")
#>     variable country      source method quality    imputation
#> 1 sptinc992j      US Tax records    DFL    high adjusted surveys

Key parameters

Parameter	Default	Description
`indicators`	`"all"`	Variable codes
`areas`	`"all"`	ISO-2 country / region codes
`years`	`"all"`	Integer vector or `"all"`
`perc`	`"all"`	Percentile codes, e.g. `"p99p100"`
`ages`	`"992"`	Three-digit age code
`pop`	`"j"`	Population unit
`metadata`	`FALSE`	Attach source info as `attr(., "wid_meta")`
`include_extrapolations`	`TRUE`	Include interpolated points
`cache`	`TRUE`	Cache responses to disc
`verbose`	`FALSE`	Print progress messages

Tidyverse integration

wid_df is a plain data.frame subclass; dplyr verbs and ggplot2 work without any unwrapping:

library(dplyr)
library(ggplot2)

top1 |>
  wid_tidy(country_names = FALSE) |>
  filter(year >= 1990) |>
  ggplot(aes(year, value)) +
  geom_line(colour = "#58a6ff", linewidth = 0.9) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(title = "Top 1% pre-tax income share - United States",
       x = NULL, y = NULL) +
  theme_minimal()

wid_tidy() coerces year to integer and value to double, and optionally appends indicator, series_type, type_label, and country_name columns.

Reusable query objects

wid_query() builds a query; wid_filter() updates it; wid_fetch() executes it. Useful when iterating over parameter combinations or embedding in analysis pipelines:

q <- wid_query(indicators = "sptinc992j", areas = c("US", "FR"), cache = FALSE)
q <- wid_filter(q, years = 2010:2022)
wid_fetch(q)

Caching

All responses are cached to disc by default, keyed to the exact query parameters and persisting across sessions:

wid_cache_list()    # list cached queries
wid_cache_clear()   # remove all

Currency conversion

Monetary series (types a, m, t) are in local currency at the prior year’s prices. wid_convert() fetches the appropriate WID exchange-rate series and divides in one step. Dimensionless series (types s, g, etc.) pass through unchanged with a message.

# Bottom 50% average income, four countries - convert to 2022 USD PPP
download_wid("aptinc992j", areas = c("US", "FR", "CN", "IN"), perc = "p0p50") |>
  wid_convert(target = "ppp", base_year = "2022")

Supported targets: "lcu" (no conversion), "usd", "eur", "gbp", "ppp", "yppp".

Inequality measures

These operate on data already in memory; no additional API calls are needed.

Gini coefficient

Requires a share (s) series with contiguous pXpY codes covering the full distribution:

dist <- download_wid("sptinc992j", areas = c("US", "FR"), perc = "all",
                     years = 1990:2022)
wid_gini(dist)
#>   country year  gini
#> 1      FR 1990 0.411
#> 2      US 1990 0.453

Percentile ratio

Requires a threshold (t) series:

thresh <- download_wid("tptinc992j", areas = "US", perc = "all")
wid_percentile_ratio(thresh)                                          # P90/P10
wid_percentile_ratio(thresh, numerator = "p90", denominator = "p50") # P90/P50

Plotting

All plot functions return ggplot objects and accept additional layers:

# Time series - one line per country; facet = TRUE for separate panels
wid_plot_timeseries(shares,
  country_labels = c(US = "United States", FR = "France",
                     DE = "Germany",       CN = "China"))

# Cross-country bar chart for a single year
wid_plot_compare(shares, year = 2020)

# Lorenz curve
wid_plot_lorenz(dist, country = "US")

Example

library(widr); library(dplyr); library(ggplot2)

download_wid(
  indicators = "aptinc992j",
  areas      = c("US", "FR", "CN", "IN"),
  perc       = "p0p50",
  years      = 1990:2022
) |>
  wid_convert(target = "ppp", base_year = "2022") |>
  wid_tidy(country_names = TRUE) |>
  ggplot(aes(year, value, colour = country_name)) +
  geom_line(linewidth = 0.8) +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title    = "Bottom 50% average pre-tax income",
       subtitle = "2022 USD PPP · equal-split adults 20+",
       x = NULL, y = NULL, colour = NULL)

Quick reference

Function	Purpose
`download_wid()`	Download data; returns a `wid_df`
`wid_decode()` / `wid_encode()`	Parse or build variable codes
`wid_validate()` / `wid_is_valid()`	Validate code components
`wid_search()`	Keyword search across reference tables
`wid_tidy()`	Decode columns, coerce types
`wid_convert()`	Currency conversion
`wid_metadata()`	Retrieve source information
`wid_gini()`	Gini coefficient
`wid_top_share()`	Top fractile income / wealth share
`wid_percentile_ratio()`	Percentile ratio (e.g. P90/P10)
`wid_plot_timeseries()`	Time-series line chart
`wid_plot_compare()`	Cross-country bar / point chart
`wid_plot_lorenz()`	Lorenz curve
`wid_query()` / `wid_filter()` / `wid_fetch()`	Reusable query objects
`wid_set_key()`	Set API key
`wid_cache_list()` / `wid_cache_clear()`	Cache management

Full code dictionary: vignette("code-dictionary") · wid.world/codes-dictionary