A thin wrapper around purrr::reduce and dplyr::full_join to merge multiple UKB datasets.

ukb_df_full_join(..., by = "eid")

Arguments

...

Supply comma separated unquoted names of to-be-merged UKB datasets (created with ukb_df). Arguments are passed to list.

by

Variable used to merge multiple dataframes (default = "eid").

Details

The function takes a comma separated list of unquoted datasets. By explicitly setting the join key to "eid" only (Default value of the by parameter), any additional variables common to any two tables will have ".x" and ".y" appended to their names. If you are satisfied the additional variables are identical to the original, the copies can be safely deleted. For example, if setequal(my_ukb_data$var, my_ukb_data$var.x) is TRUE, then my_ukb_data$var.x can be dropped. A dlyr::full_join is like the set operation union in that all observations from all tables are included, i.e., all samples are included even if they are not included in all datasets.

NB. ukb_df_full_join will fail if any variable names are repeated **within** a single UKB dataset. This is unlikely to occur, however, ukb_df creates variable names by combining a snake_case descriptor with the variable's **index** and **array**. If an index_array combination is incorrectly repeated, this will result in a duplicated variable. If the join fails, you can use ukb_df_duplicated_name to find duplicated names. See vignette(topic = "explore-ukb-data", package = "ukbtools") for further details.

Examples

if (FALSE) {
# If you have multiple UKB filesets, tidy then merge them.

ukb1234_data <- ukb_df("ukb1234")
ukb2345_data <- ukb_df("ukb2345")
ukb3456_data <- ukb_df("ukb3456")

my_ukb_data <- ukb_df_full_join(ukb1234_data, ukb2345_data, ukb3456_data)
}