Announcement Icon Online training class for Clinical R programming batch starts on Monday, 02Feb2026. Click here for details.

Merging datasets - cartesian product


Lesson Description
-
  • Sometimes, we want to work with the concept of "Merging datasets - cartesian product" in a clear, repeatable way.
  • This lesson walks through a simple example and shows the key steps.
  • We will see one approach on how we can do it in SAS and R.
data DUMMY01;
infile datalines dlm='|' dsd missover;
input grouplabel : $100. statistic : $100. group : best32. intord : best32.;
label ;
format ;
datalines4;
Sex, n(%)|Male|2|1
Sex, n(%)|Female|2|2
Sex, n(%)|Missing|2|99
;;;;
run;

data DUMMY_TREATMENTS;
infile datalines dlm='|' dsd missover;
input treatment : best32.;
label ;
format ;
datalines4;
1
2
3
4
;;;;
run;


proc sql;
    
create table dummy02 as
        
select *
        
from dummy01,dummy_treatments;
quit;
  • This SAS code snippet demonstrates how to perform a Cartesian product using the proc sql procedure. The resulting table, "dummy02," will have a number of rows equal to the product of the number of rows in "dummy01" and "dummy_treatments" tables.
  • The SQL query inside the proc sql block selects all columns (indicated by *) from the "dummy01" table and the "dummy_treatments" table. The Cartesian product of these two tables is created, resulting in a new table named "dummy02" that contains all possible combinations of rows from the two tables.
  • The quit statement is used to end the proc sql block and execute the query.
library(tidyverse)

dummy01<-tribble(
~grouplabel,~statistic,~group,~intord,
"Sex, n(%)","Male",2,1,
"Sex, n(%)","Female",2,2,
"Sex, n(%)","Missing",2,99,
)

dummy_treatments<-tribble(
~treatment,
1,
2,
3,
4,
)

dummy02<-cross_join(dummy01,dummy_treatments)
  • This R Tidyverse code snippet demonstrates how to create a new data frame named "dummy02" by performing a cross join between two existing data frames, "dummy01" and "dummy_treatments," using the cross_join function from the dplyr package.
  • The cross_join function takes two data frames as arguments and returns a new data frame that combines every row from the first data frame with every row from the second data frame. The resulting "dummy02" data frame will contain all possible combinations of rows between "dummy01" and "dummy_treatments".
dummy01 <- data.frame(
  grouplabel = c("Sex, n(%)", "Sex, n(%)", "Sex, n(%)"),
  statistic = c("Male", "Female", "Missing"),
  group = c(2, 2, 2),
  intord = c(1, 2, 99)
  , stringsAsFactors = FALSE
)

dummy_treatments <- data.frame(
  treatment = c(1, 2, 3, 4)
  , stringsAsFactors = FALSE
)

dummy02 <- merge(dummy01, dummy_treatments, by = NULL)
  • merge(..., by = NULL) creates a cartesian product (cross join).