wbids Data Model

Overview

The diagram below shows the data model of the wbids package. In our design, we primarily have R users in mind, particularly those who rely heavily on the popular data manipulation packages dplyr and data.table. For these users, having consistent and descriptive primary key column names across different tables (e.g., geography_id, series_id) simplifies writing joins across tables, reduces the risk of column name conflicts, and avoids ambiguity. We hence deliberately deviate from a common practice in data modeling to omit the entity prefix from a entity table (e.g. geography_ in the geographies table).

erDiagram
    GEOGRAPHIES ||--o{ DEBT_STATISTICS : has
    GEOGRAPHIES {
        string geography_id PK
        string geography_name
        string geography_iso2code
        string geography_type
        string capital_city
        string region_id
        string region_iso2code
        string region_name
        string admin_region_id
        string admin_region_iso2code
        string admin_region_name
        string lending_type_id
        string lending_type_iso2code
        string lending_type_name
    }
    SERIES ||--o{ DEBT_STATISTICS : has
    SERIES ||--o{ SERIES_TOPICS : has
    SERIES {
        string series_id PK
        string series_name
        int source_id
        string source_note
        string source_organization
    }
    COUNTERPARTS ||--o{ DEBT_STATISTICS : has
    COUNTERPARTS {
        string counterpart_id PK
        string counterpart_name
        string counterpart_iso2code
        string counterpart_iso3code
        string counterpart_type
    }
    SERIES_TOPICS {
        string series_id FK
        int topic_id 
        string topic_name 
    }
    DEBT_STATISTICS {
        string series_id FK
        string geography_id FK
        string counterpart_id FK
        int year
        float value
    }

Table Details

Geographies

Column name Description Example value
geography_id ISO 3166-1 alpha-3 code of the geography ZMB
geography_name Standardized name of the geography Zambia
geography_iso2code ISO 3166-1 alpha-2 code of the geography ZM
geography_type Type of geography (e.g., country, region) Country
capital_city Capital city of the geography Lusaka
region_id Unique identifier for the region SSF
region_iso2code ISO 3166-1 alpha-2 code of the region ZG
region_name Name of the region Sub-Saharan Africa
admin_region_id Unique identifier for the administrative region SSA
admin_region_iso2code Unique identifier for the administrative region ZF
admin_region_name Name of the administrative region Sub-Saharan Africa (excluding high income)
lending_type_id Unique identifier for the lending type IDX
lending_type_iso2code ISO code of the lending type XI
lending_type_name Name of the lending type IDA

Counterparts

Column name Description Example value
counterpart_id Unique identifier for the counterpart 730
counterpart_name Standardized name of the counterpart China
counterpart_iso2code ISO 3166-1 alpha-2 code of the counterpart CN
counterpart_iso3code ISO 3166-1 alpha-3 code of the counterpart CHN
counterpart_type Type of counterpart (e.g., institution, country, region) Country

Series

Column name Description Example value
series_id Unique identifier for the data series DT.DOD.DPPG.CD
series_name Name of the series External debt stocks, public and publicly guaranteed (PPG) (DOD, current US$)
source_id Unique identifier for the data source 2
source_note Note about the data source Public and publicly guaranteed debt comprises long-term external obligations of public debtors, including the national government, Public Corporations, State Owned Enterprises, Development Banks and Other Mixed Enterprises, political subdivisions (or an agency of either), autonomous public bodies, and external obligations of private debtors that are guaranteed for repayment by a public entity. Data are in current U.S. dollars.
source_organization Organization responsible for the data series World Bank, International Debt Statistics.

Debt Statistics

Column name Description Example value
series_id Identifier for the series DT.DOD.DPPG.CD
geography_id Identifier for the geography ZMB
counterpart_id Identifier for the counterpart 061.
year Year of the data point 2020
value Value of the data point 4298957000

Assignment of Geography and Counterpart Types

The original World Bank IDS data includes a ‘country’ field, containing both countries and regions, and a ‘counterpart-area’ field, which may include countries, regions, and institutions. In our data model, these fields are renamed to ‘geography’ and ‘counterpart’ to clarify the types of entities in each column.

We also introduce corresponding type columns that specify whether a geography is a country (e.g., “Aruba”) or a region (e.g., “Africa Eastern and Southern”), and whether a counterpart is a country, region, or a special category (e.g., “Global IFIs”, “Global MDBs”). Each counterpart is represented in the geography table if it is a country or region, ensuring consistency across both tables.

Harmonization of Geography and Counterpart Names

In some cases, the IDS data provides different names for geographies that appear both in the ‘counterpart-area’ and the ‘country’ data. We use the geography names whenever they are available and drop counterpart names with different wording. For instance, if the original data features “Cote D`Ivoire, Republic Of” in the counterpart table, but the country name is “Cote d’Ivoire”, then we overwrite the former with the latter.