• Documented dplicate column name edit before use of bio_rename

Note. All paths mentioned in the Changelog are relative to the project-specific data folder.


29.04.2022 Ken B. Hanscombe

  • bio_rename bug fix

  • document handling of duplicate fields indicated with "_" suffix, in both bio_rename and bio_phen

16.02.2022 Ken B. Hanscombe

New/updated functionality

  • bio_return reads UKB returns. With argument return = 3388 reads PGxPOP returned allele and metabolizing phenotype calls and assigns application specific pseudo IDs.

  • bio_code_primary_care reads UKB primary care prescription and diagnosis codings maps and lookups (From UKB download primarycare_codings.zip)

New datasets

  • Added datasets drug_pharmgkb, drug_gwas, drug_dmd_antidep

03.02.2020 Ken B. Hanscombe

  • Added bio_gen_related_remove which uses GreedyRelated to returns a minimum sample list to remove in order to remove all relationships at a given relatedness threshold, retaining the maximum amount of samples.

  • Added bio_gen_write_plink_input which take either a vector of sample IDs, or a dataframe with sample IDs in the first column, and writes these to the first two columns of a white-space separated file, with no header.

22.01.2020 Ken B. Hanscombe

  • Added convenience read functions: bio_gen_fam returns project-specific fam (with header), bio_gen_sqc returns generic sample QC with header and addtional column containing project-specific pseudo-IDs (eid), bio_gen_related returns project-specific relatedness

  • Added bio_gen_ancestry which returns a dataframe with project-specific pseudo-ID (eid), and 1000 genomes super population (pop). For QC and super population assignment details see Ollie’s Ancestry Specific Quality Control documentation.

20.01.2020 Ken B. Hanscombe

  • Added exact argument to bio_phen, default value is exact = FALSE which gives previous behaviour, i.e., matches all fields beginning 31. Setting exact = TRUE will return only exact matches for fields supplied, e.g., 31 in the field subset file will return all -index.array entries for field 31, and not 3159, 3160 etc.

  • bio_record returns either a character vector of available record-level data, a disk.frame, or, if a subset of samples for whom record-level data are required is supplied, a dataframe of all data. As the disk.frame data are “on-disk”, to query the data a relatively low-memory (1G) slurm session is sufficient.

  • bio_record_map applies a summary function (e.g. names, str, glimpse) to a vector of record level data (default is to apply the function to all available record-level data)

12.10.2020 Ken B. Hanscombe

  • bio_phen accepts fields specified as either field-index.array (as used in the ukbconv conversion to csv) or f.field.index.array (as used in the ukbconv conversion to r/tab)

12.10.2020 Ken B. Hanscombe

New/updated functionality

  • bio_covid now also reads “Primary Care Data for COVID-19 Research”: TPP and EMIS prescriptions and GP (clinical) data

  • bio_gen_ls lists project genetic directory contents

  • bio_code updated path to resources/, which includes Codings_Showcase.csv

24.09.2020 Ken B. Hanscombe

  • Updated documentation for project setup and data access can be found under the Articles tab on the ukbkings page. These can also be read as vignettes within the installed R package

New/updated functionality

  • bio_covid now returns additional codings (corresponding to new columns in the results data), and the new blood group dataset.

  • bio_hesin reads HES in-patient record-level data if available for the project.

  • bio_death reads death record data if available for the project.

Genetic data

  • resources/ now includes a the UKB program ukbgene used to retrieve UKB “link” files (.fam, .sample) and relatedness data.