Reads supplied fields from UKB project data and writes a serialized dataframe to a .rds file.

bio_phen(
  project_dir,
  field_subset_file,
  pheno_dir = "phenotypes",
  out = "ukb_phenotype_subset",
  exact = FALSE
)

Arguments

project_dir

Path to the enclosing directory of a UKB project.

field_subset_file

A path to a one-per-line text file of fields (no header). Fields can be specified as f.field.index.array, or field-index.array.

pheno_dir

Path to the enclosing directory of the phenotype data.

out

Name of phenotype subset file. Default "ukb_phenotype_subset", writes ukb_phenotype_subset.rds to the current directory.

exact

Setting exact = TRUE will return all -index.array entries for only exact matches of fields in field_subset_file,e.g., 31, would return all 31_-index.array_, but not for fields 3159, 3160 etc. Default FALSE. Note: Do not set exact = TRUE if you have supplied full field names (i.e., including index and array) in your field subset file, e.g., 31-0.0 or 31.0.0

Details

Read the serialized dataframe with readRDS("<name_of_phenotype_subset_file>.rds").

Periodically, the UKB will update some subset of the data, e.g., hospital episode statistics. When this happens, the datafame created will include all duplicates with the basket ID as suffix ("<basket_id>"). Decide which to keep, larger basket numbers correspond to more recent data. To use bio_rename, to update the numeric field names to more descriptive names, first drop duplicates you do not want and rename the remaining fields by deleting the "<basket_id>" suffix.