Announcement Icon Online training class for Clinical R programming batch starts on Monday, 02Feb2026. Click here for details.

Count Missing Values in Selected Variables


Lesson Description
-
  • Sometimes, we want to work with the concept of "Count Missing Values in Selected Variables" in a clear, repeatable way.
  • This lesson walks through a simple example and shows the key steps.
  • We will see one approach on how we can do it in SAS and R.

 


data example;
    input id name $ age score;
    datalines;
1 Alice 25 85 
2 . 30 90 
3 Charlie . 88 
4 . . . 
;
run;

data example2;
    set example;
    cmiss_subset = cmiss(name, score);
run;
  • The dataset `example` includes missing values across character and numeric variables.
  • The function `cmiss(name, score)` counts the number of missing values among the selected variables in each row.
  • The result is stored in a new variable `cmiss_subset`, which reflects how many of `name` and `score` are missing.
df <- tibble(
id = 1:4,
name = c("Alice", NA, "Charlie", NA),
age = c(25, 30, NA, NA),
score = c(85, 90, 88, NA) 
)
 

df2 <- df %>% mutate(cmiss_subset = rowSums(is.na(across(c(name, score)))))
  • The dataset `df` contains both character and numeric variables with some missing values.
  • `across(c(name, score))` selects the specified variables for row-wise inspection.
  • `is.na()` identifies missing values in each selected column.
  • `rowSums()` counts the number of `NA` values for each row across those columns.
  • The result is stored in `cmiss_subset`, which reflects how many of `name` and `score` are missing per row.
df <- data.frame(
  id = 1:4,
  name = c("Alice", NA, "Charlie", NA),
  age = c(25, 30, NA, NA),
  score = c(85, 90, 88, NA)
  , stringsAsFactors = FALSE
)

df2 <- df

df2$cmiss_subset <- rowSums(is.na(df2[, c("name", "score")]))
  • is.na() creates a missingness matrix for selected columns.
  • rowSums() counts missing values per row.