widr

widr provides direct API access to the World Inequality Database (WID) from R. It offers validated variable codes, structured downloads as standard data frames, and helpers for currency conversion, inequality measurement, and plotting. Independent implementation, unaffiliated with the World Inequality Lab (WIL) or the Paris School of Economics. Data are sourced from WID and maintained by WIL.

Installation

install.packages("widr")

# Development version
remotes::install_github("cherylisabella/widr")

Variable codes

WID variables follow a four-part grammar:

<type:1> <concept:5-6> [<age:3>] [<pop:1>]
Component Width Example Meaning
type 1 letter s share
concept 5-6 letters ptinc pre-tax national income
age 3 digits 992 adults 20+
pop 1 letter j equal-split between spouses

sptinc992j denotes the share of pre-tax national income for equal-split adults aged 20+.

The full catalogue is available at World Inequality Database; widr bundles it as six searchable reference tables.

wid_search("national income")                           # keyword search across concepts
wid_decode("sptinc992j")                                # parse into components
wid_encode("s", "ptinc", age = "992", pop = "j")       # build from components
wid_is_valid(series_type = "s", concept = "ptinc")      # non-throwing validation

The six reference tables (wid_series_types, wid_concepts, wid_ages, wid_pop_types, wid_percentiles, wid_countries) are lazy-loaded and compiled from the codes dictionary by an independent script.

Downloading data

download_wid() returns a wid_df, a classed data.frame fully compatible with dplyr, ggplot2, and base R. At minimum supply indicators or areas; all other parameters default to "all" (age to "992", pop to "j").

library(widr)

# Top 1% pre-tax income share, United States, 2000-2022
top1 <- download_wid(
  indicators = "sptinc992j",
  areas      = "US",
  perc       = "p99p100",
  years      = 2000:2022
)

top1
#> <wid_df>  23 rows | 1 countries | 1 variables
#>   country   variable percentile year  value age pop
#> 1      US sptinc992j  p99p100   2000  0.168 992   j
#> ...

Data is retrieved from the WID webservice at https://rfap9nitz6.execute-api.eu-west-1.amazonaws.com/prod.

Multiple countries and percentiles

shares <- download_wid(
  indicators = "sptinc992j",
  areas      = c("US", "FR", "DE", "CN"),
  perc       = c("p90p100", "p99p100"),
  years      = 1980:2022
)

Excluding interpolated points

Many series are linearly interpolated between survey years. Pass include_extrapolations = FALSE to retain only directly observed observations:

download_wid("sptinc992j", areas = "MZ", include_extrapolations = FALSE)

Source metadata

metadata = TRUE attaches source and methodological documentation as an attribute — the shape of the data frame is unchanged:

result <- download_wid("sptinc992j", areas = "US", metadata = TRUE)
attr(result, "wid_meta")
#>     variable country      source method quality    imputation
#> 1 sptinc992j      US Tax records    DFL    high adjusted surveys

Key parameters

Parameter Default Description
indicators "all" Variable codes
areas "all" ISO-2 country / region codes
years "all" Integer vector or "all"
perc "all" Percentile codes, e.g. "p99p100"
ages "992" Three-digit age code
pop "j" Population unit
metadata FALSE Attach source info as attr(., "wid_meta")
include_extrapolations TRUE Include interpolated points
cache TRUE Cache responses to disc
verbose FALSE Print progress messages

Tidyverse integration

wid_df is a plain data.frame subclass; dplyr verbs and ggplot2 work without any unwrapping:

library(dplyr)
library(ggplot2)

top1 |>
  wid_tidy(country_names = FALSE) |>
  filter(year >= 1990) |>
  ggplot(aes(year, value)) +
  geom_line(colour = "#58a6ff", linewidth = 0.9) +
  scale_y_continuous(labels = scales::percent_format()) +
  labs(title = "Top 1% pre-tax income share - United States",
       x = NULL, y = NULL) +
  theme_minimal()

wid_tidy() coerces year to integer and value to double, and optionally appends indicator, series_type, type_label, and country_name columns.

Reusable query objects

wid_query() builds a query; wid_filter() updates it; wid_fetch() executes it. Useful when iterating over parameter combinations or embedding in analysis pipelines:

q <- wid_query(indicators = "sptinc992j", areas = c("US", "FR"), cache = FALSE)
q <- wid_filter(q, years = 2010:2022)
wid_fetch(q)

Caching

All responses are cached to disc by default, keyed to the exact query parameters and persisting across sessions:

wid_cache_list()    # list cached queries
wid_cache_clear()   # remove all

Currency conversion

Monetary series (types a, m, t) are in local currency at the prior year’s prices. wid_convert() fetches the appropriate WID exchange-rate series and divides in one step. Dimensionless series (types s, g, etc.) pass through unchanged with a message.

# Bottom 50% average income, four countries - convert to 2022 USD PPP
download_wid("aptinc992j", areas = c("US", "FR", "CN", "IN"), perc = "p0p50") |>
  wid_convert(target = "ppp", base_year = "2022")

Supported targets: "lcu" (no conversion), "usd", "eur", "gbp", "ppp", "yppp".

Inequality measures

These operate on data already in memory; no additional API calls are needed.

Gini coefficient

Requires a share (s) series with contiguous pXpY codes covering the full distribution:

dist <- download_wid("sptinc992j", areas = c("US", "FR"), perc = "all",
                     years = 1990:2022)
wid_gini(dist)
#>   country year  gini
#> 1      FR 1990 0.411
#> 2      US 1990 0.453

Top fractile share

wid_top_share(dist, top = 0.01)   # top 1%
wid_top_share(dist, top = 0.10)   # top 10%

Percentile ratio

Requires a threshold (t) series:

thresh <- download_wid("tptinc992j", areas = "US", perc = "all")
wid_percentile_ratio(thresh)                                          # P90/P10
wid_percentile_ratio(thresh, numerator = "p90", denominator = "p50") # P90/P50

Plotting

All plot functions return ggplot objects and accept additional layers:

# Time series - one line per country; facet = TRUE for separate panels
wid_plot_timeseries(shares,
  country_labels = c(US = "United States", FR = "France",
                     DE = "Germany",       CN = "China"))

# Cross-country bar chart for a single year
wid_plot_compare(shares, year = 2020)

# Lorenz curve
wid_plot_lorenz(dist, country = "US")

Example

library(widr); library(dplyr); library(ggplot2)

download_wid(
  indicators = "aptinc992j",
  areas      = c("US", "FR", "CN", "IN"),
  perc       = "p0p50",
  years      = 1990:2022
) |>
  wid_convert(target = "ppp", base_year = "2022") |>
  wid_tidy(country_names = TRUE) |>
  ggplot(aes(year, value, colour = country_name)) +
  geom_line(linewidth = 0.8) +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title    = "Bottom 50% average pre-tax income",
       subtitle = "2022 USD PPP · equal-split adults 20+",
       x = NULL, y = NULL, colour = NULL)

Quick reference

Function Purpose
download_wid() Download data; returns a wid_df
wid_decode() / wid_encode() Parse or build variable codes
wid_validate() / wid_is_valid() Validate code components
wid_search() Keyword search across reference tables
wid_tidy() Decode columns, coerce types
wid_convert() Currency conversion
wid_metadata() Retrieve source information
wid_gini() Gini coefficient
wid_top_share() Top fractile income / wealth share
wid_percentile_ratio() Percentile ratio (e.g. P90/P10)
wid_plot_timeseries() Time-series line chart
wid_plot_compare() Cross-country bar / point chart
wid_plot_lorenz() Lorenz curve
wid_query() / wid_filter() / wid_fetch() Reusable query objects
wid_set_key() Set API key
wid_cache_list() / wid_cache_clear() Cache management

Full code dictionary: vignette("code-dictionary") · wid.world/codes-dictionary