mycsg SASnR2 3190_count_missing_selected_vars

Online training class for Clinical R programming batch starts on Monday, 02Feb2026. Click here for details.

Count Missing Values in Selected Variables

Lesson Description

-

Sometimes, we want to work with the concept of "Count Missing Values in Selected Variables" in a clear, repeatable way.
This lesson walks through a simple example and shows the key steps.
We will see one approach on how we can do it in SAS and R.

SAS (Base SAS)

+

data example;
    input id name $ age score;
    datalines;
1 Alice 25 85 
2 . 30 90 
3 Charlie . 88 
4 . . . 
;
run;

data example2;
    set example;
    cmiss_subset = cmiss(name, score);
run;

The dataset `example` includes missing values across character and numeric variables.
The function `cmiss(name, score)` counts the number of missing values among the selected variables in each row.
The result is stored in a new variable `cmiss_subset`, which reflects how many of `name` and `score` are missing.

R (tidyverse)

+

df <- tibble(
id = 1:4,
name = c("Alice", NA, "Charlie", NA),
age = c(25, 30, NA, NA),
score = c(85, 90, 88, NA) 
)
 

df2 <- df %>% mutate(cmiss_subset = rowSums(is.na(across(c(name, score)))))

The dataset `df` contains both character and numeric variables with some missing values.
`across(c(name, score))` selects the specified variables for row-wise inspection.
`is.na()` identifies missing values in each selected column.
`rowSums()` counts the number of `NA` values for each row across those columns.
The result is stored in `cmiss_subset`, which reflects how many of `name` and `score` are missing per row.

R (base)

+

df <- data.frame(
  id = 1:4,
  name = c("Alice", NA, "Charlie", NA),
  age = c(25, 30, NA, NA),
  score = c(85, 90, 88, NA)
  , stringsAsFactors = FALSE
)

df2 <- df

df2$cmiss_subset <- rowSums(is.na(df2[, c("name", "score")]))

is.na() creates a missingness matrix for selected columns.
rowSums() counts missing values per row.