Reads supplied fields from UKB project data and writes a serialized dataframe to a .rds file.
bio_phen(
project_dir,
field_subset_file,
pheno_dir = "phenotypes",
out = "ukb_phenotype_subset",
exact = FALSE
)
Path to the enclosing directory of a UKB project.
A path to a one-per-line text file of fields (no header). Fields can be specified as f.field.index.array, or field-index.array.
Path to the enclosing directory of the phenotype data.
Name of phenotype subset file. Default "ukb_phenotype_subset", writes ukb_phenotype_subset.rds to the current directory.
Setting exact = TRUE
will return all -index.array
entries for only exact matches of fields in field_subset_file
,e.g.,
31
, would return all 31_-index.array_, but not for fields 3159
,
3160
etc. Default FALSE
. Note: Do not set exact = TRUE
if
you have supplied full field names (i.e., including index and
array) in your field subset file, e.g., 31-0.0 or 31.0.0
Read the serialized dataframe with readRDS("<name_of_phenotype_subset_file>.rds").
Periodically, the UKB will update some subset of the data, e.g.,
hospital episode statistics. When this happens, the datafame created
will include all duplicates with the basket ID as suffix
("<basket_id>"). Decide which to keep, larger basket numbers
correspond to more recent data. To use bio_rename
,
to update the numeric field names to more descriptive names, first
drop duplicates you do not want and rename the remaining fields by
deleting the "<basket_id>" suffix.