NEWS.md
bio_rename
Note. All paths mentioned in the Changelog are relative to the project-specific data folder.
29.04.2022 Ken B. Hanscombe
bio_rename
bug fix
document handling of duplicate fields indicated with "_bio_rename
and bio_phen
16.02.2022 Ken B. Hanscombe
New/updated functionality
bio_return
reads UKB returns. With argument return = 3388
reads PGxPOP returned allele and metabolizing phenotype calls and assigns application specific pseudo IDs.
bio_code_primary_care
reads UKB primary care prescription and diagnosis codings maps and lookups (From UKB download primarycare_codings.zip)
New datasets
drug_pharmgkb
, drug_gwas
, drug_dmd_antidep
03.02.2020 Ken B. Hanscombe
Added bio_gen_related_remove
which uses GreedyRelated to returns a minimum sample list to remove in order to remove all relationships at a given relatedness threshold, retaining the maximum amount of samples.
Added bio_gen_write_plink_input
which take either a vector of sample IDs, or a dataframe with sample IDs in the first column, and writes these to the first two columns of a white-space separated file, with no header.
22.01.2020 Ken B. Hanscombe
Added convenience read functions: bio_gen_fam
returns project-specific fam (with header), bio_gen_sqc
returns generic sample QC with header and addtional column containing project-specific pseudo-IDs (eid
), bio_gen_related
returns project-specific relatedness
Added bio_gen_ancestry
which returns a dataframe with project-specific pseudo-ID (eid
), and 1000 genomes super population (pop
). For QC and super population assignment details see Ollie’s Ancestry Specific Quality Control documentation.
20.01.2020 Ken B. Hanscombe
Added exact
argument to bio_phen
, default value is exact = FALSE
which gives previous behaviour, i.e., matches all fields beginning 31
. Setting exact = TRUE
will return only exact matches for fields supplied, e.g., 31
in the field subset file will return all -index.array entries for field 31
, and not 3159
, 3160
etc.
bio_record
returns either a character vector of available record-level data, a disk.frame, or, if a subset of samples for whom record-level data are required is supplied, a dataframe of all data. As the disk.frame data are “on-disk”, to query the data a relatively low-memory (1G) slurm session is sufficient.
bio_record_map
applies a summary function (e.g. names, str, glimpse) to a vector of record level data (default is to apply the function to all available record-level data)
12.10.2020 Ken B. Hanscombe
bio_phen
accepts fields specified as either field-index.array (as used in the ukbconv
conversion to csv) or f.field.index.array (as used in the ukbconv
conversion to r/tab)12.10.2020 Ken B. Hanscombe
New/updated functionality
bio_covid
now also reads “Primary Care Data for COVID-19 Research”: TPP and EMIS prescriptions and GP (clinical) data
bio_gen_ls
lists project genetic directory contents
bio_code
updated path to resources/, which includes Codings_Showcase.csv
24.09.2020 Ken B. Hanscombe
New/updated functionality
bio_covid
now returns additional codings (corresponding to new columns in the results data), and the new blood group dataset.
bio_hesin
reads HES in-patient record-level data if available for the project.
bio_death
reads death record data if available for the project.
Genetic data
ukbgene
used to retrieve UKB “link” files (.fam, .sample) and relatedness data.