mycsg SASnR2 8100_list_files_in

Lesson Description

Sometimes, we want to work with the concept of "Get the list the files in a Directory/Folder to a dataset" in a clear, repeatable way.
This lesson walks through a simple example and shows the key steps.
We will see one approach on how we can do it in SAS and R.

SAS (Base SAS)

filename mydir "C:\Users\curio\Desktop\Rough";

data file_list;
  length full_filename filename folder basename extension $256;

  folder = pathname("mydir");
  did = dopen("mydir");

  if did > 0 then do;
    nfiles = dnum(did);
    do i = 1 to nfiles;
      filename = dread(did, i);
      full_filename = cats(folder, "\", filename);

      *Extract extension and basename;

      if index(filename, ".") then do;
        extension = scan(filename, -1, ".");
        basename = substr(filename, 1, length(filename) - length(extension) - 1);
      end;
      else do;
        extension = "";
        basename = filename;
      end;

      output;
    end;
    rc = dclose(did);
  end;

  drop did nfiles i rc;
run;

We are creating a FILENAME reference called mydir that points to a specific folder path on our system
Inside a DATA step named file_list, we are defining variables to hold the file name, full path, folder name, base name, and extension
To get the actual path from the filename reference, we are using pathname("mydir") and storing it in the variable folder
We are opening the folder using dopen("mydir"), which returns a directory ID that we can use for further processing
If the folder is successfully opened, we are retrieving the number of files using dnum(did)
We are then looping through each file using a DO loop from 1 to the number of files
To get the name of each file, we are using dread(did, i) and saving it in the variable filename
We are constructing the full file path by combining the folder and filename using cats(folder, "\", filename)
To extract the extension, we are checking if the filename contains a period using index(filename, ".")
If an extension exists, we are using scan(filename, -1, ".") to extract the last portion as the extension
To get the base name (filename without extension), we are using substr() and length() to remove the extension part
If there is no period in the filename, we are setting the entire filename as the base name and leaving the extension blank
We are using output; to write each processed file to the final dataset
Once all files are processed, we are closing the folder using dclose(did)
Finally, we are dropping internal helper variables using the drop statement to keep the dataset clean

R (tidyverse)

library(tidyverse)

folder_path <- "C:/Users/curio/Desktop/Rough"

file_list <- tibble(
  full_filename = list.files(path = folder_path,full.names = TRUE),
  filename = list.files(path = folder_path),
  folder = folder_path,
  basename = tools::file_path_sans_ext(filename),
  extension = tools::file_ext(filename)
)

We are loading the tidyverse package to make use of functions like tibble() and data manipulation tools
We are assigning the target folder path "C:/Users/curio/Desktop/Rough" to a variable named folder_path
We are using list.files() with full.names = TRUE to get the complete file paths and saving them in full_filename
We are also getting just the filenames (without paths) using list.files() again and saving them in filename
We are storing the folder path as-is into a new column called folder for reference
To extract the base name of the file (without the extension), we are using tools::file_path_sans_ext(filename)
To extract just the file extension (e.g., txt, csv), we are using tools::file_ext(filename)
We are combining all of this into a tidy table using tibble(), resulting in a dataset with one row per file

R (base)

folder_path <- "C:/Users/curio/Desktop/Rough"

file_list <- data.frame(
  full_filename = list.files(path = folder_path, full.names = TRUE),
  filename = list.files(path = folder_path),
  folder = folder_path,
  stringsAsFactors = FALSE
)

file_list$basename <- tools::file_path_sans_ext(file_list$filename)
file_list$extension <- tools::file_ext(file_list$filename)

list.files() returns file names; full.names adds full paths.
tools::file_path_sans_ext and file_ext split names.

Get the list the files in a Directory/Folder to a dataset

Lesson Description

SAS (Base SAS)

R (tidyverse)

R (base)