stated clearly and in detail, leaving no room for confusion or doubt.
When does a SAS data step write an observation to the output dataset?
SAS data step writes the contents of the Program Data Vector (PDV) as an observation to the output dataset when it encounters a
step boundary
In SAS, a 'run' statement or a 'data' statement or a 'proc' statement is considered as a step boundary
What does an (explicit) output statement do?
An output statement is used for requesting SAS to write the contents of PDV to the output dataset as an observation
As we are explicitly requesting SAS to write an observation, we call it an 'explicit' output statement
We can provide multiple output statements within a data step
When at least one explicit output statement is used in a data step SAS no longer writes the observation to the output dataset at the
step boundary
Different step boundary scenarios
Run statement as step boundary
Proc statement as step boundary for the data step
Data statement as step boundary for the first data step
data statement of the step creating class04 dataset works as a step boundary for the data step creating class03 dataset
What happens when a step boundary is not present for a data step
When a data step code which does not have a step boundary is submitted for execution, SAS does not execute the code
and shows a message on the top as shown in screenshot
Same is applicable for any proc step, SAS does not execute a step unless a step is encountered
A data step without a step boundary
A proc step without a step boundary
Replacing implicit output instruction with explicit output statement
Basics of explicit output statement
Implicit output, writing an observation to the output dataset, happens at the step boundary
Implicit output instruction can be overwritten by placing an explicit output statement as the last statement in a data step
The below data steps produce the same result: the first one works with implicit output, and the second one with explicit output
Multiple explicit output statements
SAS data step processes one observation at a time
SAS data step uses PDV to build an observation in memory before writing it to output dataset
SAS writes the contents of PDV as an observation to the output dataset whenever it encounters an output statement
In the below example, for each observation of input dataset, SAS writes the contents of the PDV to the output dataset
twice, once because of the first output statement and second time because of the second output statement
The output dataset, class01 will contain twice the number of observations of sashelp.class as each observation of the input dataset is
written twice to the output dataset because of the two output statements
Similary, the output dataset class02 will contain thrice the number observations of sashelp.class as each observation is written thrice
because of the three output statements
Mentioning the name of the output dataset on the output statement
We can optionally mention the name of the output dataset in the output statement
Below three data steps produce same output
The first data step is using the explicit output statement. (The instruction is interpreted by SAS as
'Write the observation to the dataset specified on the data statement')
The second data step is using the explicit output statement along with the name of output dataset.
(The instruction is interpreted by SAS as 'Write the observation to the class02 dataset specified on the
data statement'). Remember that more than one dataset name can be specified on the data statement, which means
more than one dataset can be created within a single data step
The third data step is using implicit output instruction present at the step boundary
Creating multiple datasets in a single data step
We can create multiple datasets in a single data step by specifying the names of the output datasets on the data statement
In the below example, all 4 data steps create 2 datasets, each output dataset as a copy of sashelp.class
In the first data step, implicit output writes each observation of input dataset to all the datasets specified on data statement
In the second data step, the explicit output statement writes each observation of input dataset to both copy11 and copy12 datasets as no
specific output dataset name is mentioned on output statement
In the third data step, we are asking the explicit output statement to write each observation to both copy21 and copy22 datasets by explcitly
mentioning the names of the output datasets
In the fourth data step, we are instructing the first output statement to write the observation to only copy31 dataset(so, observation will
not be written to copy32 because of this output statement) and we are instructing the second output statement to write observation only to copy32
dataset (so, observation will not be written to copy31 because of this output statement)
Examine the log to check the number of observations being written to each output dataset
Test your understanding: How many number of observations will be present in each output dataset?
Answer can be found by examining the log section
FYI, sashelp.class has 19 observations
Using Explicit Output Statement to Create Data Subsets
We can use conditional processing to write only specific observations to the output dataset of interest
Creating additional rows for counting 'Total' observations
Requirement is to get the number of students in each age group and also report total number of students in a new dataset
We can duplicate each student row with a value of 999 in age value and then run a proc freq on age variable to get age wise
and total student counts in a single step
Lengthy approach for the above task
First get the number of subjects in each age group using proc freq and then create and then add them to find total students