Announcement Icon Online training class for Clinical R programming batch starts on Monday, 02Feb2026. Click here for details.

Keep Only Variables with Non-missing Values Across Dataset


Lesson Description
-
  • Sometimes, we want to work with the concept of "Keep Only Variables with Non-missing Values Across Dataset" in a clear, repeatable way.
  • This lesson walks through a simple example and shows the key steps.
  • We will see one approach on how we can do it in SAS and R.
data lab;
    input usubjid $ hgb glu chol$ alb alt$ ast ldh$ crp;
    datalines;
101 12.5 110 POS 4.2 NEG 21 . . 
102 13.0 105 NEG 4.1 NEG 25 . . 
103 . . . . . . . . 
104 14.1 108 POS 4.5 NEG 20 . . 
;
run;

data _null_;
    
set lab end=last;

    array charvars {*} _character_;
    
array numvars {*} _numeric_;

   
length keep_chars keep_nums $32767;

    
retain keep_chars keep_nums;

    
do i = 1 to dim(charvars);
        if charvars{i} ne '' then
            keep_chars = catx(
' ', keep_chars, vname(charvars{i}));
    end;

    
do j = 1 to dim(numvars);
        if numvars{j} ne . then
            keep_nums = catx(
' ', keep_nums, vname(numvars{j}));
    end;

    
if last then
        
call symputx('keep_vars', catx(' ', keep_chars, keep_nums));
run;


data lab_clean;
    set lab(keep=&keep_vars.);
run;
  • The dataset `lab` includes numeric and character variables for lab test results.
  • Some tests (like `ldh`, `crp`) are never performed and are entirely missing.
  • Two arrays are used: one for character and one for numeric variables.
  • We loop through each and collect variable names with at least one non-missing value.
  • A macro variable `keep_vars` is created and used to retain only those useful columns in the output dataset.
library(tidyverse)

lab <- tribble(
~usubjid, ~hgb, ~glu, ~chol, ~alb, ~alt, ~ast, ~ldh, ~crp,
"101", 12.5, 110, "POS", 4.2, "NEG", 21, NA, NA,
"102", 13.0, 105, "NEG", 4.1, "NEG", 25, NA, NA,
"103", NA, NA, NA, NA, NA, NA, NA, NA,
"104", 14.1, 108, "POS", 4.5, "NEG", 20, NA, NA )
 

lab_clean <- lab %>%
select(where(~ any(!is.na(.) & . != "")))
  • The dataset `lab` includes a mix of numeric and character results.
  • Some columns (`ldh`, `crp`) are completely missing for all subjects.
  • `where(~ any(!is.na(.) & . != ""))` keeps only columns with at least one non-missing or non-blank value.
  • The resulting `lab_clean` retains only relevant lab tests.
lab <- data.frame(
  usubjid = c(101, 102, 103, 104),
  hgb  = c("12.5", "13.0", "NA", "14.1"),
  glu  = c("110", "105", "NA", "108"),
  chol = c("POS", "NEG", "NA", "POS"),
  alb  = c("4.2", "4.1", "NA", "4.5"),
  alt  = c("NEG", "NEG", "NA", "NEG"),
  ast  = c("21", "25", "NA", "20"),
  ldh  = c("NA", "NA", "NA", "NA"),
  crp  = c("NA", "NA", "NA", "NA"),
  stringsAsFactors = FALSE
)

lab_clean_tmp <- lab

# identify columns that have at least one non-"NA" and non-blank value
keep_cols <- sapply(lab_clean_tmp, function(x) {
  any(!is.na(x) & x != "" & x != "NA")
})

# subset dataset
lab_clean_tmp <- lab_clean_tmp[, keep_cols]

lab_clean_tmp
  • Build a logical mask of columns with any non-missing values.
  • Subset the data frame using that mask.