Announcement Icon Online training class for Clinical R programming batch starts on Monday, 02Feb2026. Click here for details.

Subset Observations


Lesson Description
-
  • When working with data we frequently want to work only on a subset of the observations
  • In this lesson, we will see how we can subset required observations in SAS and R


data CLASS;
infile datalines dlm='|' dsd missover;
input Name : $8. Sex : $1. Age : best32. Height : best32. Weight : best32.;
label ;
format ;
datalines4;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
;;;;
run;
 
data males;
set class;
where sex="M";
run;
 
data preteen;
    set class;
    where age in (11,12);
run;
 
  • The provided SAS code snippets demonstrate how to create new datasets by filtering the data from an existing dataset based on specific conditions.
  • In the first snippet, a dataset named "males" is created by selecting only the observations from the original "class" dataset where the value of the variable "sex" is equal to "M", representing males.
  • In the second snippet, a dataset named "preteen" is created by selecting only the observations from the original "class" dataset where the value of the variable "age" is either 11 or 12, indicating preteen ages.
  • Both code snippets use the set statement to read the data from the original dataset, and the where statement to apply the filtering conditions.
  • These SAS code snippets demonstrate how to extract specific subsets of data from an existing dataset, allowing for targeted analysis or further processing.
library(tidyverse)

class<-tribble(
~Name,~Sex,~Age,~Height,~Weight,
"Alfred","M",14,69,112.5,
"Alice","F",13,56.5,84,
"Carol","F",14,62.8,102.5,
"Henry","M",14,63.5,102.5,
"James","M",12,57.3,83,
)


males <- filter(class, Sex=="M")


preteen<-filter(class,Age %in% c(11,12))
  • The provided R Tidyverse code snippets demonstrate how to create new data frames by filtering an existing data frame based on specific conditions.
  • In the first snippet, a data frame named "males" is created by filtering the "class" data frame to include only rows where the value of the "Sex" variable is "M", indicating males. This is achieved using the filter function from the dplyr package.
  • In the second snippet, a data frame named "preteen" is created by filtering the "class" data frame to include only rows where the value of the "Age" variable is either 11 or 12. This is done using the filter function and the %in% operator.
  • Both code snippets showcase the power of the filter function in extracting specific subsets of data from a data frame based on given conditions. This allows for targeted analysis and further processing of the filtered data.
class <- data.frame(
  Name = c("Alfred", "Alice", "Carol", "Henry", "James"),
  Sex = c("M", "F", "F", "M", "M"),
  Age = c(14, 13, 14, 14, 12),
  Height = c(69, 56.5, 62.8, 63.5, 57.3),
  Weight = c(112.5, 84, 102.5, 102.5, 83)
  , stringsAsFactors = FALSE
)

males <- class[class$Sex == "M", ]

preteen <- class[class$Age %in% c(11, 12), ]
  • class[class$Sex == "M", ] keeps rows where Sex is M.
  • %in% tests membership; it selects ages in the given set.