Announcement Icon Online training class for Clinical R programming batch starts on Monday, 02Feb2026. Click here for details.

Sort Order with Missing Values


Lesson Description
-
  • Sometimes, we want to work with the concept of "Sort Order with Missing Values" in a clear, repeatable way.
  • This lesson walks through a simple example and shows the key steps.
  • We will see one approach on how we can do it in SAS and R.

 


data sort_missing; 
    
input id $ score; 
datalines
A 85 
B . 
C 90 
D 70 
E . 
F 95 
G 80 
;

run

proc sort data=sort_missing out=sorted; 
    
by score; 
run;

 

  • This SAS code demonstrates how missing values are handled when sorting a dataset using PROC SORT. Missing values (.) are always placed at the beginning in ascending order and at the end in descending order. The 'score' variable is sorted in ascending order, so missing values will appear first, followed by numeric values in increasing order.
library(tidyverse)


data <- tibble( 
id = c("A", "B", "C", "D", "E", "F", "G"), 
score = c(85, NA, 90, 70, NA, 95, 80) 
)


data_sorted <- data %>%
   arrange(!is.na(score), score)
  • This R code mimics PROC SORT behavior for missing values using dplyr's arrange function. By using `arrange(!is.na(score), score)`, missing values (`NA`) are sorted to the beginning, matching SAS PROC SORT's default behavior. If sorting in descending order, `arrange(is.na(score), desc(score))` would place `NA` values at the end, similar to SAS.
# Example dataset

data <- data.frame(
  id = c("A", "B", "C", "D", "E", "F", "G"),
  score = c(85, NA, 90, 70, NA, 95, 80)
  , stringsAsFactors = FALSE
)

data_sorted <- data[order(!is.na(data$score), data$score), ]
  • order(!is.na(score), score) sorts by missingness and then by value.
  • Change the logical term to move missing values first or last.