Changelog

Note. All paths mentioned in the Changelog are relative to the project-specific data folder.

29.04.2022 Ken B. Hanscombe

bio_rename bug fix
document handling of duplicate fields indicated with "_" suffix, in both bio_rename and bio_phen

16.02.2022 Ken B. Hanscombe

New/updated functionality

bio_return reads UKB returns. With argument return = 3388 reads PGxPOP returned allele and metabolizing phenotype calls and assigns application specific pseudo IDs.
bio_code_primary_care reads UKB primary care prescription and diagnosis codings maps and lookups (From UKB download primarycare_codings.zip)

New datasets

03.02.2020 Ken B. Hanscombe

Added bio_gen_related_remove which uses GreedyRelated to returns a minimum sample list to remove in order to remove all relationships at a given relatedness threshold, retaining the maximum amount of samples.
Added bio_gen_write_plink_input which take either a vector of sample IDs, or a dataframe with sample IDs in the first column, and writes these to the first two columns of a white-space separated file, with no header.

22.01.2020 Ken B. Hanscombe

Added convenience read functions: bio_gen_fam returns project-specific fam (with header), bio_gen_sqc returns generic sample QC with header and addtional column containing project-specific pseudo-IDs (eid), bio_gen_related returns project-specific relatedness
Added bio_gen_ancestry which returns a dataframe with project-specific pseudo-ID (eid), and 1000 genomes super population (pop). For QC and super population assignment details see Ollie’s Ancestry Specific Quality Control documentation.

20.01.2020 Ken B. Hanscombe

Added exact argument to bio_phen, default value is exact = FALSE which gives previous behaviour, i.e., matches all fields beginning 31. Setting exact = TRUE will return only exact matches for fields supplied, e.g., 31 in the field subset file will return all -index.array entries for field 31, and not 3159, 3160 etc.
bio_record returns either a character vector of available record-level data, a disk.frame, or, if a subset of samples for whom record-level data are required is supplied, a dataframe of all data. As the disk.frame data are “on-disk”, to query the data a relatively low-memory (1G) slurm session is sufficient.
bio_record_map applies a summary function (e.g. names, str, glimpse) to a vector of record level data (default is to apply the function to all available record-level data)

12.10.2020 Ken B. Hanscombe

bio_phen accepts fields specified as either field-index.array (as used in the ukbconv conversion to csv) or f.field.index.array (as used in the ukbconv conversion to r/tab)

12.10.2020 Ken B. Hanscombe

New/updated functionality

bio_covid now also reads “Primary Care Data for COVID-19 Research”: TPP and EMIS prescriptions and GP (clinical) data
bio_gen_ls lists project genetic directory contents
bio_code updated path to resources/, which includes Codings_Showcase.csv

24.09.2020 Ken B. Hanscombe

Updated documentation for project setup and data access can be found under the Articles tab on the ukbkings page. These can also be read as vignettes within the installed R package

New/updated functionality

bio_covid now returns additional codings (corresponding to new columns in the results data), and the new blood group dataset.
bio_hesin reads HES in-patient record-level data if available for the project.
bio_death reads death record data if available for the project.

Genetic data

resources/ now includes a the UKB program ukbgene used to retrieve UKB “link” files (.fam, .sample) and relatedness data.

ukbkings (development version)