A UK Biobank fileset includes a .tab file containing the raw data with field codes instead of variable names, an .r (sic) file containing code to read raw data (inserts categorical variable levels and labels), and an .html file containing tables mapping field code to variable name, and labels and levels for categorical variables.

ukb_df(fileset, path = ".", n_threads = "dt", data.pos = 2)

Arguments

fileset

The prefix for a UKB fileset, e.g., ukbxxxx (for ukbxxxx.tab, ukbxxxx.r, ukbxxxx.html)

path

The path to the directory containing your UKB fileset. The default value is the current directory.

n_threads

Either "max" (uses the number of cores, `parallel::detectCores()`), "dt" (default - uses the data.table default, `data.table::getDTthreads()`), or a numerical value (in which case n_threads is set to the supplied value, or `parallel::detectCores()` if it is smaller).

data.pos

Locates the data in your .html file. The .html file is read into a list; the default value data.pos = 2 indicates the second item in the list. (The first item in the list is the title of the table). You will probably not need to change this value, but if the need arises you can open the .html file in a browser and identify where in the file the data is.

Value

A dataframe with variable names in snake_case (lowercase and separated by an underscore).

Details

The index and array from the UKB field code are preserved in the variable name, as two numbers separated by underscores at the end of the name e.g. variable_index_array. index refers the assessment instance (or visit). array captures multiple answers to the same "question". See UKB documentation for detailed descriptions of index and array.

Examples