Announcement Icon Online training class for Clinical R programming batch starts on Monday, 02Feb2026. Click here for details.

Retain - Concatenate character results into a single value in a group


Lesson Description
-
  • Sometimes, we want to work with the concept of "Retain - Concatenate character results into a single value in a group" in a clear, repeatable way.
  • This lesson walks through a simple example and shows the key steps.
  • We will see one approach on how we can do it in SAS and R.

*==============================================================================;
* Create the input dataset;
*==============================================================================;
data concat;
    input usubjid visitnum result$;
    datalines;
101 1 Y
101 2 N
101 3 Y
102 1 Y
102 2 N
102 3 .
102 4 N
102 5 Y
;
run;


*==============================================================================;
*Concatenate non-missing results into a single string;
*==============================================================================;

proc sort data=concat;
   by usubjid visitnum;
run;

data concat01;
   set concat;
   by usubjid visitnum;

   length all_results $32767;
   
   
retain all_results;

   
if first.usubjid then all_results=result;
   else if not missing(result) then all_results=catx(";",all_results,result);

   
if last.usubjid;

   
keep usubjid all_results;
run;
library(tidyverse)
library(stringr)

#==============================================================================
# Create the input dataset using tribble
#==============================================================================

concat <- tribble(
  ~usubjid, ~visitnum, ~result,
  101, 1, "Y",
  101, 2, "N",
  101, 3, "Y",
  102, 1, "Y",
  102, 2, "N",
  102, 3, NA,
  102, 4, "N",
  102, 5, "Y"
)

#==============================================================================
# Concatenate non-missing results into a single string
#==============================================================================

concat01 <- concat %>%
  arrange(usubjid, visitnum) %>%
  group_by(usubjid) %>%
  summarize(
    all_results = str_flatten(result, collapse=";",na.rm = TRUE)
  ) %>% ungroup()
concat <- data.frame(
  usubjid = c(101, 101, 101, 102, 102, 102, 102, 102),
  visitnum = c(1, 2, 3, 1, 2, 3, 4, 5),
  result = c("Y", "N", "Y", "Y", "N", NA, "N", "Y")
  , stringsAsFactors = FALSE
)

concat <- concat[order(concat$usubjid, concat$visitnum), ]

concat01 <- aggregate(result ~ usubjid, data = concat, FUN = function(x) {
  paste(x[!is.na(x)], collapse = ";")
})

names(concat01)[2] <- "all_results"
  • aggregate() groups by subject and applies a custom combine function.
  • paste(..., collapse = ";") builds one string of non-missing values.