Uses GreedyRelated to maximize sample size by removing one member of each pair with relatedness above a specified threshold.

bio_gen_related_remove(
  project_dir,
  greedy_related,
  thresh = 0.044,
  keep = NULL,
  seed = 1234
)

Arguments

project_dir

Path to the enclosing directory of a UKB project.

greedy_related

Path to the GreedyRelated binary.

thresh

KING kinship coefficient threshold. One member of each pair exceeding this threshold is returned in a dataframe to be removed from further analyses. (default = 0.044)

keep

An optional vector of samples on which to perform relative removal, e.g., samples with data on a phenotype of interest. GreedyRelated ignores samples not included in keep. (default = NULL, i.e., all samples are considered).

seed

Seed used for the random number generator (default = 1234).

Value

A data frame of samples to remove.

Details

Re. the KING robust kinship estimator, from KING documentation: A negative kinship coefficient estimation indicates an unrelated relationship. The reason that a negative kinship coefficient is not set to zero is a very negative value may indicate the population structure between the two individuals. Close relatives can be inferred fairly reliably based on the estimated kinship coefficients as shown in the following simple algorithm: an estimated kinship coefficient range >0.354, [0.177, 0.354], [0.0884, 0.177] and [0.0442, 0.0884] corresponds to duplicate/MZ twin, 1st-degree, 2nd-degree, and 3rd-degree relationships respectively.

From PLINK 2.0 documentation: Note that KING kinship coefficients are scaled such that duplicate samples have kinship 0.5, not 1. First-degree relations (parent-child, full siblings) correspond to ~0.25, second-degree relations correspond to ~0.125, etc. It is conventional to use a cutoff of ~0.354 (the geometric mean of 0.5 and 0.25) to screen for monozygotic twins and duplicate samples, ~0.177 to add first-degree relations, etc.