Announcement Icon Online training class for Clinical R programming batch starts on Monday, 02Feb2026. Click here for details.

Subset observations and variables


Lesson Description
-
  • Sometimes, we want to select specific observations and variables from an existing dataset.
  • For example, we might keep only male subjects and drop the Sex column afterward.
  • This lesson shows one approach in SAS using WHERE and DROP.
  • We then mirror the same idea in R using filter() and select().
data CLASS;
infile datalines dlm='|' dsd missover;
input Name : $8. Sex : $1. Age : best32. Height : best32. Weight : best32.;
label ;
format ;
datalines4;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
;;;;
run;

data males;
   set class;
   where sex="M";
   
drop sex;
run;
  • The provided SAS code creates a new dataset named "males" by extracting observations from the existing dataset "class" based on a specific condition and dropping a variable from the resulting dataset.
  • The set statement is used to read data from the "class" dataset into the "males" dataset.
  • The where statement is used to filter the observations and includes only those where the value of the "sex" variable is "M," indicating males.
  • The drop statement is used to exclude the "sex" variable from the resulting "males" dataset, effectively removing it.
  • The run; statement marks the end of the data step and executes the creation of the "males" dataset.
  • This SAS code snippet demonstrates how to create a new dataset named "males" that includes only the observations where the sex is "M," while dropping the "sex" variable. This allows for analysis and further processing of a subset of data containing only male individuals.
library(tidyverse) 

class<-tribble( 
~Name,~Sex,~Age,~Height,~Weight, 
"Alfred","M",14,69,112.5, 
"Alice","F",13,56.5,84, 
"Barbara","F",13,65.3,98, 
"Carol","F",14,62.8,102.5, 
"Henry","M",14,63.5,102.5, 
"James","M",12,57.3,83, ) 

males<-class %>% filter(Sex=="M") %>% select(-Sex)
  • The provided R Tidyverse code snippet creates a new data frame named "males" by filtering observations from the existing data frame "class" based on a specific condition and excluding a variable from the resulting data frame.
  • The %>% operator, also known as the pipe operator, is used to chain multiple operations together in a concise manner.
  • The filter function is used to select only those rows from the "class" data frame where the value of the "Sex" variable is "M," indicating males.
  • The select function, combined with the - sign, is used to exclude the "Sex" variable from the resulting "males" data frame.
  • By using the pipe operator, the filtered and selected data frame is directly assigned to the "males" object.
  • This R Tidyverse code snippet showcases how to create a new data frame named "males" that includes only the observations where the sex is "M," while excluding the "Sex" variable. This enables further analysis and manipulation of a subset of data containing only male individuals.
class <- data.frame(
  Name = c("Alfred", "Alice", "Barbara", "Carol", "Henry", "James"),
  Sex = c("M", "F", "F", "F", "M", "M"),
  Age = c(14, 13, 13, 14, 14, 12),
  Height = c(69, 56.5, 65.3, 62.8, 63.5, 57.3),
  Weight = c(112.5, 84, 98, 102.5, 102.5, 83)
  , stringsAsFactors = FALSE
)

males_tmp <- class

males_tmp <- males_tmp[males_tmp$Sex=="M", ]

males_tmp <- males_tmp[, names(males_tmp) != "Sex"]

males <- males_tmp
  • Use a logical row filter to keep only males.
  • Drop a column by name from the subset to get the final result.