Announcement Icon Online training class for Clinical R programming batch starts on Monday, 02Feb2026. Click here for details.

Last dot concept


Lesson Description
-
  • Sometimes, we want to work with the concept of "Last dot concept" in a clear, repeatable way.
  • This lesson walks through a simple example and shows the key steps.
  • We will see one approach on how we can do it in SAS and R.

data CLASS;
infile datalines dlm='|' dsd missover;
input Name : $8. Sex : $1. Age : best32. Height : best32. Weight : best32.;
label ;
format ;
datalines4;
Alfred|M|14|69|112.5
Alice|F|13|56.5|84
Barbara|F|13|65.3|98
Carol|F|14|62.8|102.5
Henry|M|14|63.5|102.5
James|M|12|57.3|83
Jane|F|12|59.8|84.5
Janet|F|15|62.5|112.5
Jeffrey|M|13|62.5|84
John|M|12|59|99.5
Joyce|F|11|51.3|50.5
Judy|F|14|64.3|90
Louise|F|12|56.3|77
Mary|F|15|66.5|112
Philip|M|16|72|150
Robert|M|12|64.8|128
Ronald|M|15|67|133
Thomas|M|11|57.5|85
William|M|15|66.5|112
;;;;
run;

*------------------------------------------------------------------------------;
*subset of highest height;
*------------------------------------------------------------------------------;

proc sort data=class;
    by sex height;
run;

data highestheight;
    set class;
    by sex height;
    if last.sex;
    keep sex height name;
run;

 
  • This SAS code snippet demonstrates how to create a subset of data containing observations with the highest height for each unique value of the "Sex" variable.
  • First, the PROC SORT step is used to sort the "class" dataset in ascending order by the variables "Sex" and "Height".
  • Then, in the DATA step:
  • The SET statement is used to read the sorted "class" dataset.
  • The BY statement specifies the variables "Sex" and "Height" for processing the data in a sorted manner.
  • The IF LAST.SEX condition is used to identify the last observation within each unique value of "Sex". This condition becomes true only for the last observation of each group.
  • The KEEP statement is used to select the variables "Sex", "Height", and "Name" to include in the resulting dataset.
  • After executing this code snippet, the "highestheight" dataset will contain the subset of observations with the highest height for each unique value of "Sex".
library(tidyverse) 
class<-tribble(
~Name,~Sex,~Age,~Height,~Weight, 
"Alfred","M",14,69,112.5, 
"Alice","F",13,56.5,84, 
"Barbara","F",13,65.3,98, 
"Carol","F",14,62.8,102.5, 
"Henry","M",14,63.5,102.5, 
"James","M",12,57.3,83,
 "Jane","F",12,59.8,84.5, 
"Janet","F",15,62.5,112.5, 
"Jeffrey","M",13,62.5,84, 
"John","M",12,59,99.5, 
"Joyce","F",11,51.3,50.5, 
"Judy","F",14,64.3,90,
 "Louise","F",12,56.3,77, 
"Mary","F",15,66.5,112, 
"Philip","M",16,72,150, 
"Robert","M",12,64.8,128, 
"Ronald","M",15,67,133, 
"Thomas","M",11,57.5,85, 
"William","M",15,66.5,112, 
) 

highestheight<-class %>% 
arrange(Sex,Height) %>% 
group_by(Sex) %>% 
slice(n()) %>% 
select(Name,Sex,Height)
  • This R Tidyverse code snippet demonstrates how to create a subset of data containing observations with the highest height for each unique value of the "Sex" variable.
  • Using the pipe operator %>%, the following operations are performed:
  • The arrange function is used to sort the "class" data frame in ascending order by the variables "Sex" and "Height".
  • The group_by function is applied to group the data frame by the variable "Sex".
  • The slice function is used to extract the last observation within each group, which corresponds to the observation with the highest height for each unique value of "Sex".
  • The select function is used to choose the variables "Name", "Sex", and "Height" to include in the resulting data frame.
  • After executing this code snippet, the "highestheight" data frame will contain the subset of observations with the highest height for each unique value of "Sex".
class <- data.frame(
  Name = c("Alfred", "Alice", "Barbara", "Carol", "Henry", "James", "Jane", "Janet", "Jeffrey", "John", "Joyce", "Judy", "Louise", "Mary", "Philip", "Robert", "Ronald", "Thomas", "William"),
  Sex = c("M", "F", "F", "F", "M", "M", "F", "F", "M", "M", "F", "F", "F", "F", "M", "M", "M", "M", "M"),
  Age = c(14, 13, 13, 14, 14, 12, 12, 15, 13, 12, 11, 14, 12, 15, 16, 12, 15, 11, 15),
  Height = c(69, 56.5, 65.3, 62.8, 63.5, 57.3, 59.8, 62.5, 62.5, 59, 51.3, 64.3, 56.3, 66.5, 72, 64.8, 67, 57.5, 66.5),
  Weight = c(112.5, 84, 98, 102.5, 102.5, 83, 84.5, 112.5, 84, 99.5, 50.5, 90, 77, 112, 150, 128, 133, 85, 112)
  , stringsAsFactors = FALSE
) 

highestheight_tmp <- class
highestheight_tmp <- highestheight_tmp[
  order(highestheight_tmp$Sex, highestheight_tmp$Height),
]
highestheight_tmp <- highestheight_tmp[
  !duplicated(highestheight_tmp$Sex, fromLast = TRUE),
]

highestheight_tmp <- class
We begin by creating a temporary copy of the dataset so the original data remains unchanged during processing.

order(highestheight_tmp$Sex, highestheight_tmp$Height)
The order() function sorts the data first by Sex and then by Height in ascending order.
Within each Sex group, this places the smallest Height first and the largest Height last.

highestheight_tmp <- highestheight_tmp[order(...), ]
Row subsetting applies this ordering to the dataset, physically rearranging the rows based on Sex and Height.

duplicated(highestheight_tmp$Sex, fromLast = TRUE)
The duplicated() function identifies repeated values of Sex.
Setting fromLast = TRUE tells R to mark duplicates starting from the bottom of each Sex group instead of the top.

!duplicated(..., fromLast = TRUE)
Using logical negation (!) keeps only the last occurrence of each Sex after sorting.

Because the data is sorted by Height within Sex, keeping the last record per Sex selects the subject with the highest Height for that group.