| Title: | Normalize and Match City Names to NUTS Regions |
| Version: | 0.2.3 |
| Date: | 2026-02-08 |
| Description: | Normalizes city names for Germany (DE) and Switzerland (CH) and matches them to NUTS 3 regions using provided crosswalks. Features include comprehensive normalization rules, cascading matching logic (Exact NUTS -> Exact LAU -> Fuzzy), and single-source data synthesis. The package implements the NUTS classification as described in the NUTS methodology (Eurostat (2021) https://ec.europa.eu/eurostat/web/nuts). |
| License: | MIT + file LICENSE |
| Encoding: | UTF-8 |
| LazyData: | true |
| Imports: | dplyr, stringr, stringdist, data.table, tidyr, rlang |
| Suggests: | testthat, readxl |
| RoxygenNote: | 7.3.2 |
| NeedsCompilation: | no |
| Packaged: | 2026-02-08 23:47:07 UTC; giulianetinginfrati |
| Author: | Giulian Etingin-Frati [aut, cre] |
| Maintainer: | Giulian Etingin-Frati <etingin-frati@kof.ethz.ch> |
| Depends: | R (≥ 3.5.0) |
| Repository: | CRAN |
| Date/Publication: | 2026-02-11 19:50:02 UTC |
Generate Fake City Data
Description
Generates a vector of fake city names for testing, including common variations and noise.
Usage
generate_fake_cities(n = 10, country = "DE")
Arguments
n |
Integer, matching number of cities to generate. |
country |
"DE" or "CH". |
Value
Character vector of city names.
Examples
# Generate 5 fake German cities
generate_fake_cities(5, country = "DE")
# Generate 3 fake Swiss cities
generate_fake_cities(3, country = "CH")
Local Administrative Units (LAU) Crosswalks
Description
Datasets containing mappings from city names to LAU codes and NUTS 3 regions for various countries. The data handles string normalization and matches cities to their respective statistical regions.
Usage
lau_at
lau_be
lau_bg
lau_ch
lau_cy
lau_cz
lau_de
lau_dk
lau_ee
lau_el
lau_es
lau_fi
lau_fr
lau_hr
lau_hu
lau_ie
lau_it
lau_li
lau_lt
lau_lu
lau_lv
lau_mk
lau_mt
lau_nl
lau_no
lau_pl
lau_pt
lau_ro
lau_se
lau_si
lau_sk
lau_tr
Format
Data frames with varying columns depending on the country, typically including:
- lau_id
Local Administrative Unit code
- lau_name
Name of the Local Administrative Unit
- nuts_3_id
NUTS 3 region code
- population
Population (if available)
An object of class data.frame with 2093 rows and 5 columns.
An object of class data.frame with 571 rows and 5 columns.
An object of class data.frame with 265 rows and 5 columns.
An object of class data.frame with 2135 rows and 5 columns.
An object of class data.frame with 617 rows and 5 columns.
An object of class data.frame with 6258 rows and 5 columns.
An object of class data.frame with 10972 rows and 5 columns.
An object of class data.frame with 99 rows and 5 columns.
An object of class data.frame with 79 rows and 5 columns.
An object of class data.frame with 6142 rows and 5 columns.
An object of class data.frame with 8132 rows and 5 columns.
An object of class data.frame with 309 rows and 5 columns.
An object of class data.frame with 32774 rows and 5 columns.
An object of class data.frame with 556 rows and 5 columns.
An object of class data.frame with 3155 rows and 5 columns.
An object of class data.frame with 166 rows and 5 columns.
An object of class data.frame with 7900 rows and 5 columns.
An object of class data.frame with 11 rows and 5 columns.
An object of class data.frame with 60 rows and 5 columns.
An object of class data.frame with 100 rows and 5 columns.
An object of class data.frame with 43 rows and 5 columns.
An object of class data.frame with 80 rows and 5 columns.
An object of class data.frame with 68 rows and 5 columns.
An object of class data.frame with 342 rows and 5 columns.
An object of class data.frame with 378 rows and 5 columns.
An object of class data.frame with 2477 rows and 5 columns.
An object of class data.frame with 3092 rows and 5 columns.
An object of class data.frame with 3181 rows and 5 columns.
An object of class data.frame with 290 rows and 5 columns.
An object of class data.frame with 211 rows and 5 columns.
An object of class data.frame with 2927 rows and 5 columns.
An object of class data.frame with 972 rows and 5 columns.
Source
Eurostat and national statistical institutes.
Match City Names to NUTS Regions
Description
Matches a vector of city names to NUTS 3 regions using a cascading logic for any supported country.
Usage
match_city(x, country = "DE", fuzzy = TRUE, threshold = 0.95)
Arguments
x |
Character vector of city names. |
country |
Character string of two-letter country code (e.g. "DE", "FR"). |
fuzzy |
Logical, whether to perform fuzzy matching. |
threshold |
Numeric, similarity threshold for fuzzy matching (0-1). |
Value
A data frame with columns: original, city_clean, nuts_3_id, lau_name, match_type, similarity.
Examples
# Match German cities
cities <- c("Berlin", "Munich", "Hamburg")
match_city(cities, country = "DE")
# Match with exact matching only (no fuzzy)
match_city(c("Frankfurt am Main"), country = "DE", fuzzy = FALSE)
Normalize City Names
Description
Normalizes city names for EEA countries using comprehensive rules tailored to each language/region.
Usage
normalize_city(x, country = "DE")
Arguments
x |
Character vector of city names. |
country |
Character string of the ISO 2-character country code (e.g. "DE", "FR", "PL"). |
Value
Character vector of normalized names.
Examples
# Normalize German city names
# Normalize German city names
normalize_city(c("M\u00FCnchen", "K\u00F6ln", "Frankfurt a.M."), country = "DE")
# Normalize Swiss city names
normalize_city(c("Z\u00FCrich", "Gen\u00E8ve", "Basel-Stadt"), country = "CH")
NUTS 3 Region Metadata
Description
Metadata for NUTS 3 regions for various countries, used for hierarchical matching.
Usage
nuts_at
nuts_be
nuts_bg
nuts_ch
nuts_cy
nuts_cz
nuts_de
nuts_dk
nuts_ee
nuts_el
nuts_es
nuts_fi
nuts_fr
nuts_hr
nuts_hu
nuts_ie
nuts_it
nuts_li
nuts_lt
nuts_lu
nuts_lv
nuts_mk
nuts_mt
nuts_nl
nuts_no
nuts_pl
nuts_pt
nuts_ro
nuts_se
nuts_si
nuts_sk
nuts_tr
Format
Data frames with columns:
- nuts_3_id
NUTS 3 region code
- nuts_3_name
Name of the NUTS 3 region
An object of class data.frame with 35 rows and 4 columns.
An object of class data.frame with 43 rows and 4 columns.
An object of class data.frame with 28 rows and 4 columns.
An object of class data.frame with 26 rows and 4 columns.
An object of class data.frame with 1 rows and 4 columns.
An object of class data.frame with 14 rows and 4 columns.
An object of class data.frame with 401 rows and 4 columns.
An object of class data.frame with 11 rows and 4 columns.
An object of class data.frame with 5 rows and 4 columns.
An object of class data.frame with 53 rows and 4 columns.
An object of class data.frame with 59 rows and 4 columns.
An object of class data.frame with 19 rows and 4 columns.
An object of class data.frame with 96 rows and 4 columns.
An object of class data.frame with 21 rows and 4 columns.
An object of class data.frame with 20 rows and 4 columns.
An object of class data.frame with 8 rows and 4 columns.
An object of class data.frame with 107 rows and 4 columns.
An object of class data.frame with 1 rows and 4 columns.
An object of class data.frame with 10 rows and 4 columns.
An object of class data.frame with 1 rows and 4 columns.
An object of class data.frame with 5 rows and 4 columns.
An object of class data.frame with 8 rows and 4 columns.
An object of class data.frame with 2 rows and 4 columns.
An object of class data.frame with 40 rows and 4 columns.
An object of class data.frame with 17 rows and 4 columns.
An object of class data.frame with 73 rows and 4 columns.
An object of class data.frame with 26 rows and 4 columns.
An object of class data.frame with 42 rows and 4 columns.
An object of class data.frame with 21 rows and 4 columns.
An object of class data.frame with 12 rows and 4 columns.
An object of class data.frame with 8 rows and 4 columns.
An object of class data.frame with 81 rows and 4 columns.
Source
Eurostat