Announcement Icon Online training class for Clinical R programming batch starts on Monday, 02Feb2026. Click here for details.

Transpose data - long to wide


Lesson Description
-
  • Sometimes, we want to work with the concept of "Transpose data - long to wide" in a clear, repeatable way.
  • This lesson walks through a simple example and shows the key steps.
  • We will see one approach on how we can do it in SAS and R.

data LONG;
infile datalines dlm='|' dsd missover;
input usubjid : $8. lbtestcd : $8. lbstresn : BEST6.;
label ;
format ;
datalines4;
1001|HGB|13
1001|ALT|30
1001|AST|23
1002|HGB|12
1002|ALT|28
1002|AST|15
;;;;
run;

proc sort data=long ;
   by usubjid;
run;
 
proc transpose data=long   out=wide  ;
   by usubjid;
   var lbstresn;
   id lbtestcd;
run;

 
  • The PROC TRANSPOSE procedure is used to transpose or reshape a dataset from long format to wide format. The dataset "long" is transposed into a new dataset named "wide" using the OUT= option. The transposition is performed by grouping the data by the variable "usubjid" with the BY statement.
  • BY usubjid;: This statement specifies the variable "usubjid" as the grouping variable. The transposition will be done separately for each unique value of "usubjid".
  • VAR lbstresn;: This statement specifies the variable "lbstresn" to be transposed. The values of this variable will become the values of the new variables in the transposed dataset.
  • ID lbtestcd;: This statement specifies the variable "lbtestcd" as the identifier variable. The unique values of "lbtestcd" will become the variable names in the transposed dataset.
  • After executing the PROC TRANSPOSE procedure, the dataset "long" is transformed into the "wide" dataset, where each unique value of "usubjid" becomes a separate observation, and the values of "lbstresn" are transposed into new variables named according to the unique values of "lbtestcd".
library(tidyverse)

long <- tribble(
  ~usubjid, ~lbtestcd, ~lbstresn,
  "1001",   "HGB",     13,
  "1001",   "ALT",     30,
  "1001",   "AST",     23,
  "1002",   "HGB",     12,
  "1002",   "ALT",     28,
  "1002",   "AST",     15
)

wide <- long %>%
  pivot_wider(
    id_cols     = usubjid,
    names_from  = lbtestcd,
    values_from = lbstresn
  )
  • The pivot_wider() function is a part of the Tidyverse package in R, specifically the tidyr package. It is used to reshape a dataset from long format to wide format. In this code snippet, the dataset "long" is transformed into a new dataset named "wide" using the pivot_wider() function.
  • id_cols = usubjid: This argument specifies the variable "usubjid" as the identifier column. Each unique value of "usubjid" will become a separate observation in the wide dataset.
  • values_from = lbstresn: This argument specifies the variable "lbstresn" from which the values will be extracted and spread across the wide dataset.
  • names_from = lbtestcd: This argument specifies the variable "lbtestcd" whose unique values will become the column names in the wide dataset.
  • By calling pivot_wider() with the specified arguments, the "long" dataset is reshaped into the "wide" dataset. Each unique value of "usubjid" becomes a separate observation, and the values from "lbstresn" are spread across the wide dataset with new columns created based on the unique values of "lbtestcd".
long <- data.frame(
  usubjid = c(1001, 1001, 1001, 1002, 1002, 1002),
  lbtestcd = c("HGB", "ALT", "AST", "HGB", "ALT", "AST"),
  lbstresn = c(13, 30, 23, 12, 28, 15)
  , stringsAsFactors = FALSE
)

 
wide <- reshape(
  long,
  idvar = "usubjid",
  timevar = "lbtestcd",
  direction = "wide",
  v.names = "lbstresn"
)

 
names(wide) <- sub("^lbstresn\\.", "", names(wide))
  • reshape(..., direction = "wide") pivots repeated values into columns.
  • sub() cleans the generated column name prefixes.