Announcement Icon Online training class for Clinical R programming batch starts on Monday, 02Feb2026. Click here for details.

Adding new variables


Lesson Description
-
  • Sometimes, we want to work with the concept of "Adding new variables" in a clear, repeatable way.
  • This lesson walks through a simple example and shows the key steps.
  • We will see one approach on how we can do it in SAS and R.
data DM;
infile datalines dlm='|' dsd missover;
input subjid : $10. age : best32.;
label ;
format ;
datalines4;
1001|35
1002|64
1003|19
;;;;
run;

data dm02;
    set dm;
    agemon=age*12;
    group=
1;
run;

data dm02;
    set dm;
    length agegr1 $30;
    
if age lt 20 then agegr1="<20 Years";
    
else if 20 le age lt 60 then agegr1="20 - <60 Years";
    
else if age ge 60 then  agegr1=">= 60 Years";
run;
  • The provided SAS code snippets showcase different data manipulation techniques for creating new variables based on the values of existing variables in the dataset "dm."
  • In the first code snippet:
  • A new dataset named "dm02" is created using the data statement.
  • The set statement is used to read data from the dataset "dm" into the "dm02" dataset.
  • A new variable named "agemon" is created by multiplying the "age" variable by 12.
  • A new variable named "group" is created and assigned a constant value of 1 for all observations.
  • The run; statement marks the end of the data step and executes the creation of the "dm02" dataset.
  • In the second code snippet:
  • The existing dataset "dm" is overwritten with the same name "dm" using the data statement.
  • The set statement is used to read data from the original "dm" dataset into the modified "dm" dataset.
  • The length statement is used to specify the length of the new variable "agegr1" as 30 characters.
  • Conditional statements (if-else) are used to assign a value to the "agegr1" variable based on the value of the "age" variable:
  • If "age" is less than 20, "agegr1" is assigned the value "= 60 Years."
  • The run; statement marks the end of the data step and executes the creation of the modified "dm" dataset.
  • These SAS code snippets demonstrate how to manipulate and transform data in the "dm" dataset by creating new variables based on the values of existing variables. The first snippet creates new variables "agemon" and "group," while the second snippet creates a new variable "agegr1" based on different age groupings.
dm=60 ~ ">= 60 Years",
   between(age,20,59) ~ "20 - < 60 Years",
   age < 20 ~ "< 20 Years" )
)
  • The provided R Tidyverse code snippets demonstrate data manipulation techniques for creating new variables based on the values of existing variables in the "dm" data frame.
  • In the first code snippet:
  • The mutate function from the dplyr package is used to create a modified data frame named "dm02" based on the "dm" data frame.
  • The mutate function allows for the creation of new variables while preserving the existing variables.
  • The new variable "agemon" is created by multiplying the "age" variable by 12.
  • The new variable "group" is created and assigned a constant value of 1 for all observations.
  • In the second code snippet:
  • The mutate function is used to create a modified data frame named "dm02" based on the "dm" data frame.
  • The case_when function is used within the mutate function to perform conditional assignments for the new variable "agegr1."
  • The case_when function allows for multiple conditional statements to be evaluated sequentially.
  • The first conditional statement checks if the "age" variable is greater than or equal to 60. If true, the value ">= 60 Years" is assigned to "agegr1."
  • The second conditional statement checks if the "age" variable is between 20 and 59 (inclusive). If true, the value "20 -
dm <- data.frame(
  subjid = c(1001, 1002, 1003),
  age = c(35, 64, 19)
  , stringsAsFactors = FALSE
)

dm02 <- dm

dm02$agemon <- dm02$age * 12

dm02$group <- 1

dm02$agegr1 <- ifelse(
  dm02$age >= 60,
  ">= 60 Years",
  ifelse(dm02$age >= 20 & dm02$age <= 59, "20 - < 60 Years", "< 20 Years")
)
  • Use $ to add new columns to a data frame.
  • ifelse() creates grouped labels based on age ranges.
  • Assigning a constant fills the same value in every row.