Uses GreedyRelated to maximize sample size by removing one member of each pair with relatedness above a specified threshold.
bio_gen_related_remove(
project_dir,
greedy_related,
thresh = 0.044,
keep = NULL,
seed = 1234
)
Path to the enclosing directory of a UKB project.
Path to the GreedyRelated binary.
KING kinship coefficient threshold. One member of each
pair exceeding this threshold is returned in a dataframe to be
removed from further analyses. (default = 0.044
)
An optional vector of samples on which to perform relative
removal, e.g., samples with data on a phenotype of interest.
GreedyRelated ignores samples not included in keep
. (default =
NULL
, i.e., all samples are considered).
Seed used for the random number generator (default = 1234
).
A data frame of samples to remove.
Re. the KING robust kinship estimator, from KING documentation: A negative kinship coefficient estimation indicates an unrelated relationship. The reason that a negative kinship coefficient is not set to zero is a very negative value may indicate the population structure between the two individuals. Close relatives can be inferred fairly reliably based on the estimated kinship coefficients as shown in the following simple algorithm: an estimated kinship coefficient range >0.354, [0.177, 0.354], [0.0884, 0.177] and [0.0442, 0.0884] corresponds to duplicate/MZ twin, 1st-degree, 2nd-degree, and 3rd-degree relationships respectively.
From PLINK 2.0 documentation: Note that KING kinship coefficients are scaled such that duplicate samples have kinship 0.5, not 1. First-degree relations (parent-child, full siblings) correspond to ~0.25, second-degree relations correspond to ~0.125, etc. It is conventional to use a cutoff of ~0.354 (the geometric mean of 0.5 and 0.25) to screen for monozygotic twins and duplicate samples, ~0.177 to add first-degree relations, etc.