Raw UKB phenotype data contains encoded column names and values that need to be converted before analysis.
| Source | Column names | Column values |
|---|---|---|
extract_pheno() |
participant.p31 |
Raw integer codes — needs decode_values() |
extract_batch() |
p31, p53_i0 |
Usually already decoded — decode_values() typically not
needed |
Both outputs need decode_names() to convert field ID
column names to human-readable snake_case.
Call order matters: when using
extract_pheno()output, always rundecode_values()beforedecode_names(), because value decoding relies on the numeric field ID still being present in the column name.
library(ukbflow)
df <- extract_pheno(c(31, 54, 20116, 21022))
df <- decode_values(df) # 0/1 → "Female"/"Male", etc.
df <- decode_names(df) # participant.p31 → sexdecode_values() converts raw integer codes to
human-readable labels for categorical fields that have UKB encoding
mappings. Continuous, date, text, and already-decoded fields are left
unchanged.
It requires two metadata files from the UKB Showcase. Download them once with:
Then point decode_values() to the same directory
(default matches fetch_metadata()):
| Column | Raw value | Decoded value |
|---|---|---|
p31 |
0 / 1 |
"Female" / "Male" |
p54 |
11012 |
"Leeds" |
p20116_i0 |
0 / 1 / 2 |
"Never" / "Previous" /
"Current" |
Codes absent from the encoding table (including UKB missing codes
-1, -3, -7) are returned as
NA.
decode_names() renames columns from field ID format to
snake_case labels using the approved UKB field dictionary available to
your project.
| Raw name | Decoded name |
|---|---|
participant.eid |
eid |
participant.p31 |
sex |
participant.p21022 |
age_at_recruitment |
participant.p53_i0 |
date_of_attending_assessment_centre_i0 |
p31 |
sex |
p53_i0 |
date_of_attending_assessment_centre_i0 |
Both extract_pheno() format
(participant.p31) and extract_batch() format
(p31) are handled automatically.
Some UKB field titles are verbose. Names exceeding
max_nchar characters are flagged with a warning (default:
60). Lower the threshold to catch more aggressively:
df <- decode_names(df, max_nchar = 30)
#> ! 1 column name longer than 30 characters - consider renaming manually:
#> • date_of_attending_assessment_centre_i0Rename manually to something concise:
?decode_values, ?decode_namesvignette("extract") — extracting phenotype data