*Copyright @ www.mycsg.in;

What is the word meaning of unique

Being the only one of its kind or different from all others.

What do we call a unique record in a dataset

A record is unique when at least one value makes it different from every other observation in the dataset
When specific variables are treated as key variables, we judge uniqueness only on the combination of those key variables
So a record can be unique under one key definition and non-unique under another key definition

Create a sample dataset

This dataset contains exact duplicates for Alfred and multiple records for Carol with different ages
That mix lets us study uniqueness using different BY variable combinations

data class;
   infile cards truncover;
   input name $ sex $ age height weight;
   cards;
Alfred    M     14     69        112.5
Alfred    M     14     69        112.5
Carol     F     14     62.8      102.5
Carol     F     12     55        84
Carol     F     11     51        79
Jane      F     12     59.8      84.5
;
run;

If all variables are considered together, only the two Alfred rows are non-unique exact duplicates
If `name` alone is considered as the key, then Alfred and Carol are non-unique because their names repeat
If `name` and `age` are considered together, then only Alfred remains non-unique in this example

Where do we need to identify non-unique records

We may need to review duplicate subject records, duplicate event records, or repeated lab records within a defined key
Separating unique and non-unique observations helps with data cleaning and issue investigation
`proc sort` provides a convenient way to split these records into separate datasets

Separate unique and non-unique records based on key variables

The `nouniquekeys` option sends non-unique observations to the dataset specified by `out=`
The `uniqueout=` option stores observations whose BY value combination occurs only once
The BY statement defines the key used to judge uniqueness

Create a dataset named class_nouniq_names to hold records with non-unique names

Using `by name;` means uniqueness is checked only on the value of `name`
Jane appears only once and therefore goes to `class_uniq_names`
Alfred and Carol appear more than once and therefore go to `class_nouniq_names`

proc sort data=class out=class_nouniq_names uniqueout=class_uniq_names nouniquekeys;
   by name;
run;

Create a dataset named class_nouniq_name_age to hold records with non-unique name and age combinations

Using `by name age;` changes the uniqueness rule
Carol now has unique rows because each age value differs within the name Carol
Only the Alfred rows remain non-unique because both `name` and `age` match exactly

proc sort data=class out=class_nouniq_name_age uniqueout=class_uniq_name_age nouniquekeys;
   by name age;
run;

Key points to remember

Uniqueness depends on the BY variables used as the key
`nouniquekeys` helps separate non-unique observations from unique observations
`uniqueout=` stores records that occur only once for the specified BY key
This technique is very useful in duplicate record review and data cleaning workflows

*Copyright @ www.mycsg.in;

What is the word meaning of unique

What do we call a unique record in a dataset

Create a sample dataset

SAS Log

Dataset View

Where do we need to identify non-unique records

Separate unique and non-unique records based on key variables

Create a dataset named class_nouniq_names to hold records with non-unique names

SAS Log

Dataset View

Create a dataset named class_nouniq_name_age to hold records with non-unique name and age combinations

SAS Log

Dataset View

Key points to remember