This package is motivated by a fundamental principle: data acquisition should be treated as code, not as an external preparatory step. This is increasingly important in the age of AI-assisted research.
As AI tools accelerate analytical workflows while enabling plausible fabrication of statistics and citations, anchoring research to authoritative, version-controlled data sources becomes essential infrastructure for scientific credibility. The unicefData package operationalizes this principle by making data provenance an integral component of your analytical script.
When you specify:
df <- unicefData(
indicator = "CME_MRY0T4",
countries = c("ALB", "USA", "BRA"),
year = "2015:2023"
)You are not downloading a spreadsheet from a portal and applying undocumented filters. Instead, you are executing an explicit, reproducible specification of provenance: a command line that documents what data were requested, from which source, and under what constraints. This ensures:
The unicefData package adopts three design principles from wbopendata (Azevedo, 2026), a similar package for World Bank data:
Together, these principles treat data acquisition as infrastructure for reproducibility, not mere convenience.
The United Nations Children’s Fund (UNICEF) maintains one of the world’s most comprehensive databases on child welfare, covering health, nutrition, education, protection, HIV/AIDS, and water, sanitation and hygiene (WASH). The UNICEF Data Warehouse uses the Statistical Data and Metadata eXchange (SDMX) standard, an ISO-certified framework for exchanging statistical information.
The warehouse currently maintains 733+ indicators organized across thematic dataflows:
| Dataflow | Domain | Indicators |
|---|---|---|
| CME | Child Mortality Estimates | 39 |
| NUTRITION | Stunting, wasting, underweight | 112 |
| IMMUNISATION | Immunization coverage | 18 |
| WASH_HOUSEHOLDS | Water and sanitation | 57 |
| EDUCATION | Education access and quality | 38 |
| HIV_AIDS | HIV-related indicators | 38 |
| MNCH | Maternal, newborn, child health | varies |
| PT | Child protection | varies |
| CHLD_PVTY | Child poverty | varies |
| GENDER | Gender equality | varies |
SDMX (Statistical Data and Metadata eXchange) is an international standard for structuring and exchanging statistical data. Within SDMX:
CME_MRY0T4) and belongs to a dataflow,
which groups related indicators.While powerful, direct SDMX API interaction requires knowledge of
dataflow structures, dimension codes, and RESTful query syntax. The
unicefData package removes these barriers.
The unicefData package provides a simple R interface to
the UNICEF Data Warehouse. It is part of a trilingual ecosystem with
identical implementations in R, Python (unicef_api), and
Stata (unicefdata), sharing the same function names and
parameter structures for cross-team collaboration.
Key features:
Install from GitHub:
Before downloading data, explore what is available:
# Browse indicator categories (thematic dataflows)
list_categories()
# Search for indicators by keyword
search_indicators("mortality")
# List all indicators in the Child Mortality Estimates dataflow
list_indicators("CME")
# Get detailed information about a specific indicator
get_indicator_info("CME_MRY0T4")These discovery commands mirror the paper’s Examples 1–4 and the Stata equivalents:
. unicefdata, categories
. unicefdata, search(mortality)
. unicefdata, indicators(CME)
. unicefdata, info(CME_MRY0T4)
Fetch under-5 mortality rate for three countries over a year range:
# Example 5 (paper): Basic data retrieval
df <- unicefData(
indicator = "CME_MRY0T4",
countries = c("BRA", "IND", "CHN"),
year = "2015:2023"
)
head(df)The equivalent Stata command is:
. unicefdata, indicator(CME_MRY0T4) countries(BRA IND CHN) year(2015:2023) clear
Fetch data for East African countries for a single year:
# Example 7 (paper): Get the latest available value per country
df_latest <- unicefData(
indicator = "CME_MRY0T4",
countries = c("BGD", "IND", "PAK"),
latest = TRUE
)
# Get the 3 most recent values per country
df_mrv <- unicefData(
indicator = "CME_MRY0T4",
countries = c("BGD", "IND", "PAK"),
mrv = 3
)The year parameter supports multiple formats:
# Single year
df <- unicefData(indicator = "CME_MRY0T4", year = 2020)
# Year range
df <- unicefData(indicator = "CME_MRY0T4", year = "2015:2023")
# Non-contiguous years
df <- unicefData(indicator = "CME_MRY0T4", year = "2015,2018,2020")
# Circa mode: find closest available year
df <- unicefData(indicator = "CME_MRY0T4", year = 2015, circa = TRUE)UNICEF data supports rich disaggregation along multiple dimensions. Not all dimensions are available for all indicators—availability depends on the dataflow (see the disaggregation matrix in the package paper).
Useful for time-series analysis:
Fetch and merge multiple indicators automatically:
# Example 10 (paper): Multiple indicators
df <- unicefData(
indicator = c("CME_MRM0", "CME_MRY0T4"),
countries = c("KEN", "TZA", "UGA"),
year = 2020
)
# Wide indicators format: one column per indicator
df_wide <- unicefData(
indicator = c("CME_MRY0T4", "CME_MRY0", "IM_DTP3", "IM_MCV1"),
countries = c("AFG", "ETH", "PAK", "NGA"),
latest = TRUE,
format = "wide_indicators"
)Add regional and income group classifications:
Post-processing utilities for downloaded data:
The package caches metadata and API responses for performance. To clear and refresh all caches:
Inspect the structure of any dataflow:
The unicefData ecosystem provides identical
functionality across R, Python, and Stata. The same analytical workflow
translates directly:
| Operation | R | Python | Stata |
|---|---|---|---|
| Search | search_indicators("mortality") |
search_indicators("mortality") |
unicefdata, search(mortality) |
| Fetch | unicefData(indicator="CME_MRY0T4") |
unicefData(indicator="CME_MRY0T4") |
unicefdata, indicator(CME_MRY0T4) clear |
| Latest | unicefData(..., latest=TRUE) |
unicefData(..., latest=True) |
unicefdata, ... latest clear |
| Wide | unicefData(..., format="wide") |
unicefData(..., format="wide") |
unicefdata, ... wide clear |
| Cache | clear_unicef_cache() |
clear_cache() |
unicefdata, clearcache |
| Sync | sync_metadata() |
sync_metadata() |
unicefdata_sync, all |
This parity enables cross-team collaboration: an analyst can prototype in R and a colleague can reproduce the workflow in Stata or Python with minimal translation.
The unicefData package embodies three design principles that make reproducibility the default rather than the exception:
When you write:
You are not performing manual steps that will be forgotten or become undocumented. Every data selection decision—indicator, countries, years, disaggregations—is explicitly specified in your script. This ensures that:
The package prioritizes stable syntax and predictable behavior. This matters because:
Rather than exposing HTTP requests and JSON parsing, the interface uses concepts familiar to development researchers: indicators, countries, years. This constrains input to meaningful values and reduces opportunities for error.
In an era where AI tools accelerate analytical workflows, these principles become more important, not less. As generative tools lower the cost of producing plausible analyses and narratives, anchoring empirical work to authoritative and verifiable data sources is essential infrastructure for scientific credibility. The unicefData package provides this foundation by making data provenance explicit and executable.
This package was developed at the UNICEF Data and Analytics Section. The author gratefully acknowledges the collaboration of Lucas Rodrigues, Yang Liu, and Karen Avanesian, whose technical contributions and feedback were instrumental in the development of this R package.
Special thanks to Yves Jaques, Alberto Sibileau, and Daniele Olivotti for designing and maintaining the UNICEF SDMX data warehouse infrastructure that makes this package possible.
The author also acknowledges the UNICEF database managers and technical teams who ensure data quality, as well as the country office staff and National Statistical Offices whose data collection efforts make this work possible.
Development of this package was supported by UNICEF institutional funding for data infrastructure and statistical capacity building. The author also acknowledges UNICEF colleagues who provided testing and feedback during development, as well as the broader open-source R community.
Development was assisted by AI coding tools (GitHub Copilot, Claude). All code has been reviewed, tested, and validated by the package maintainers.
This package is provided for research and analytical purposes.
The unicefData package provides programmatic access to
UNICEF’s public data warehouse. While the author is affiliated with
UNICEF, this package is not an official UNICEF product and the
statements in this documentation are the views of the author and do not
necessarily reflect the policies or views of UNICEF.
Data accessed through this package comes from the UNICEF Data Warehouse. Users should verify critical data points against official UNICEF publications at data.unicef.org.
This software is provided “as is”, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or UNICEF be liable for any claim, damages or other liability arising from the use of this software.
The designations employed and the presentation of material in this package do not imply the expression of any opinion whatsoever on the part of UNICEF concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.
Important Note on Data Vintages
Official statistics are subject to revisions as new information becomes available and estimation methodologies improve. UNICEF indicators are regularly updated based on new surveys, censuses, and improved modeling techniques. Historical values may be revised retroactively to reflect better information or methodological improvements.
For reproducible research and proper data attribution, users should:
CME_MRY0T4)Example citation for data used in research:
Under-5 mortality data (indicator: CME_MRY0T4) accessed from UNICEF Data Warehouse via unicefData R package (v2.1.0) on 2026-02-09. Data available at: https://sdmx.data.unicef.org/
This practice ensures that others can verify your results and understand any differences that may arise from data updates. For official UNICEF statistics in publications, always cross-reference with the current version at data.unicef.org.
If you use this package in your research, please cite:
Azevedo, J.P. (2026). unicefData: Trilingual R, Python, and Stata Interface
to UNICEF SDMX Data Warehouse. R package version 2.1.0.
https://github.com/unicef-drp/unicefData
For data citations, please refer to the specific UNICEF datasets accessed through the warehouse and cite them according to UNICEF’s data citation guidelines.
This package is released under the MIT License. See the LICENSE file for full details.