mycsg SASnR2 2400_Transpose-data---long-to-wide

Lesson Description

Sometimes, we want to work with the concept of "Transpose data - long to wide" in a clear, repeatable way.
This lesson walks through a simple example and shows the key steps.
We will see one approach on how we can do it in SAS and R.

SAS (Base SAS)

data LONG;
 infile datalines dlm='|' dsd missover; 
input usubjid : $8. lbtestcd : $8. lbstresn : BEST6.; 
label ;
 format ;
 datalines4; 
1001|HGB|13
1001|ALT|30
1001|AST|23
1002|HGB|12
1002|ALT|28
1002|AST|15
;;;;
 run; 

proc sort data=long ;
    by usubjid;
 run; 
 
proc transpose data=long   out=wide  ;
    by usubjid;
    var lbstresn;
    id lbtestcd;
 run; 

 

The PROC TRANSPOSE procedure is used to transpose or reshape a dataset from long format to wide format. The dataset "long" is transposed into a new dataset named "wide" using the OUT= option. The transposition is performed by grouping the data by the variable "usubjid" with the BY statement.
BY usubjid;: This statement specifies the variable "usubjid" as the grouping variable. The transposition will be done separately for each unique value of "usubjid".
VAR lbstresn;: This statement specifies the variable "lbstresn" to be transposed. The values of this variable will become the values of the new variables in the transposed dataset.
ID lbtestcd;: This statement specifies the variable "lbtestcd" as the identifier variable. The unique values of "lbtestcd" will become the variable names in the transposed dataset.
After executing the PROC TRANSPOSE procedure, the dataset "long" is transformed into the "wide" dataset, where each unique value of "usubjid" becomes a separate observation, and the values of "lbstresn" are transposed into new variables named according to the unique values of "lbtestcd".

R (tidyverse)

library(tidyverse)

long <- tribble(
  ~usubjid, ~lbtestcd, ~lbstresn,
  "1001",   "HGB",     13,
  "1001",   "ALT",     30,
  "1001",   "AST",     23,
  "1002",   "HGB",     12,
  "1002",   "ALT",     28,
  "1002",   "AST",     15
)

wide <- long %>%
  pivot_wider(
    id_cols     = usubjid,
    names_from  = lbtestcd,
    values_from = lbstresn
  )

The pivot_wider() function is a part of the Tidyverse package in R, specifically the tidyr package. It is used to reshape a dataset from long format to wide format. In this code snippet, the dataset "long" is transformed into a new dataset named "wide" using the pivot_wider() function.
id_cols = usubjid: This argument specifies the variable "usubjid" as the identifier column. Each unique value of "usubjid" will become a separate observation in the wide dataset.
values_from = lbstresn: This argument specifies the variable "lbstresn" from which the values will be extracted and spread across the wide dataset.
names_from = lbtestcd: This argument specifies the variable "lbtestcd" whose unique values will become the column names in the wide dataset.
By calling pivot_wider() with the specified arguments, the "long" dataset is reshaped into the "wide" dataset. Each unique value of "usubjid" becomes a separate observation, and the values from "lbstresn" are spread across the wide dataset with new columns created based on the unique values of "lbtestcd".

R (base)

long <- data.frame(
  usubjid = c(1001, 1001, 1001, 1002, 1002, 1002),
  lbtestcd = c("HGB", "ALT", "AST", "HGB", "ALT", "AST"),
  lbstresn = c(13, 30, 23, 12, 28, 15)
  , stringsAsFactors = FALSE
)

 
wide <- reshape(
  long,
  idvar = "usubjid",
  timevar = "lbtestcd",
  direction = "wide",
  v.names = "lbstresn"
)

 
names(wide) <- sub("^lbstresn\\.", "", names(wide))

reshape(..., direction = "wide") pivots repeated values into columns.
sub() cleans the generated column name prefixes.

Transpose data - long to wide

Lesson Description

SAS (Base SAS)

R (tidyverse)

R (base)