mycsg SASnR2 2500_Transpose-data---wide-to-long

Lesson Description

Sometimes, we want to work with the concept of "Transpose data - wide to long" in a clear, repeatable way.
This lesson walks through a simple example and shows the key steps.
We will see one approach on how we can do it in SAS and R.

SAS (Base SAS)

data WIDE;
 infile datalines dlm='|' dsd missover; 
input usubjid : $4. HGB : best32. ALT : best32. AST : best32.; 
label ;
 format ;
 datalines4; 
1001|13|30|23
1002|12|28|15
;;;;
 run; 

proc sort data=wide;
     by usubjid;
 run; 
 
proc transpose data=wide   
out=long(rename=(col1=lbstresn)) 
name=lbtestcd ;
    by usubjid;
    var HGB ALT AST;
 run; 

 

The PROC TRANSPOSE procedure in SAS is used to transform a dataset from wide format to long format. In this code snippet, the dataset "wide" is transposed into a new dataset named "long" using the PROC TRANSPOSE procedure.
data=wide: This specifies the input dataset "wide" that contains the data in wide format.
out=long(rename=(col1=lbstresn)): This specifies the output dataset "long" where the transposed data will be stored. The RENAME statement is used to rename the variable generated by PROC TRANSPOSE as "col1" to "lbstresn" to match the desired variable name.
name=lbtestcd: This specifies the variable "lbtestcd" as the name variable. The values of this variable in the wide dataset will become the values of a new variable in the long dataset, which will contain the original column names.
by usubjid: This specifies the BY statement, indicating that the data should be transposed for each unique value of the variable "usubjid".
var HGB ALT AST;: This specifies the variables to be transposed from wide to long format. In this case, the variables "HGB", "ALT", and "AST" are selected.
By running the PROC TRANSPOSE procedure with the specified options, the "wide" dataset is transposed into the "long" dataset. The transposed dataset will have variables "usubjid", "lbtestcd" (containing the original column names), and "lbstresn" (containing the values from the wide dataset).

R (tidyverse)

wide<-tribble( 
~usubjid,~HGB,~ALT,~AST, 
"1001",13,30,23, 
"1002",12,28,15, 
) 

long<-pivot_longer(wide, 
cols=c(HGB,ALT,AST), 
names_to = "lbtestcd", 
values_to = "lbstresn"
)

The pivot_longer function in R, part of the Tidyverse package, is used to transform a dataset from wide format to long format. In this code snippet, the dataset "wide" is transformed into a new dataset named "long" using the pivot_longer function.
cols = c(HGB, ALT, AST): This specifies the columns to be transformed from wide to long format. In this case, the variables "HGB", "ALT", and "AST" are selected.
names_to = "lbtestcd": This specifies the name of the new variable that will contain the column names from the wide dataset. The values in this variable will correspond to the original column names.
values_to = "lbstresn": This specifies the name of the new variable that will contain the values from the wide dataset. The values in this variable will correspond to the values previously present in the selected columns.
By using the pivot_longer function with the specified arguments, the "wide" dataset is transformed into the "long" dataset. The "long" dataset will have variables "lbtestcd" (containing the original column names) and "lbstresn" (containing the values from the wide dataset), along with any other existing variables.

R (base)

wide <- data.frame(
  usubjid = c(1001, 1002),
  HGB = c(13, 12),
  ALT = c(30, 28),
  AST = c(23, 15)
  , stringsAsFactors = FALSE
)

long <- reshape(
  wide,
  varying = c("HGB", "ALT", "AST"),
  v.names = "lbstresn",
  timevar = "lbtestcd",
  times = c("HGB", "ALT", "AST"),
  idvar = "usubjid",
  direction = "long"
)

row.names(long) <- NULL

This code converts a dataset from wide format back to long format, which is the reverse of what pivot_wider() does in tidyverse.

The reshape() function is a base-R tool that handles both wide→long and long→wide transformations, depending on the direction argument.

The varying argument lists the columns that currently hold repeated measurements.
Here, HGB, ALT, and AST are the wide-format columns that will be stacked into rows.

The v.names argument defines the name of the value column in the long dataset.
All numeric results from the wide columns will be stored under lbstresn.

The timevar argument specifies the column that will identify which measurement each value belongs to.
In this case, lbtestcd will contain values like HGB, ALT, and AST.

The times argument supplies the actual labels that populate the timevar.
These correspond directly to the column names listed in varying.

The idvar argument defines the subject-level identifier that remains constant across reshaped rows.
Here, each usubjid will appear once per laboratory test.

Setting direction = "long" explicitly tells R we are reshaping from wide to long.

After reshaping, row names are automatically generated based on the original structure.
row.names(long) <- NULL is used to reset them, giving a clean, sequential row order.

Transpose data - wide to long

Lesson Description

SAS (Base SAS)

R (tidyverse)

R (base)