Announcement Icon Online training class for Clinical R programming batch starts on Monday, 02Feb2026. Click here for details.

Get the list the files in a Directory/Folder to a dataset


Lesson Description
-
  • Sometimes, we want to work with the concept of "Get the list the files in a Directory/Folder to a dataset" in a clear, repeatable way.
  • This lesson walks through a simple example and shows the key steps.
  • We will see one approach on how we can do it in SAS and R.

 

filename mydir "C:\Users\curio\Desktop\Rough";

data file_list;
  length full_filename filename folder basename extension $256;

  folder = pathname(
"mydir");
  did = dopen(
"mydir");

  if did > 0 then do;
    nfiles = dnum(did);
    
do i = 1 to nfiles;
      filename = dread(did, i);
      full_filename = cats(folder, "\", filename);

      
*Extract extension and basename;

      
if index(filename, "."then do;
        extension = scan(filename, -1".");
        basename = substr(filename, 1, length(filename) - length(extension) - 1);
      end;
      else do;
        extension = "";
        basename = filename;
      
end;

      
output;
    end;
    rc = dclose(did);
  
end;

  
drop did nfiles i rc;
run;

 

  • We are creating a FILENAME reference called mydir that points to a specific folder path on our system
  • Inside a DATA step named file_list, we are defining variables to hold the file name, full path, folder name, base name, and extension
  • To get the actual path from the filename reference, we are using pathname("mydir") and storing it in the variable folder
  • We are opening the folder using dopen("mydir"), which returns a directory ID that we can use for further processing
  • If the folder is successfully opened, we are retrieving the number of files using dnum(did)
  • We are then looping through each file using a DO loop from 1 to the number of files
  • To get the name of each file, we are using dread(did, i) and saving it in the variable filename
  • We are constructing the full file path by combining the folder and filename using cats(folder, "\", filename)
  • To extract the extension, we are checking if the filename contains a period using index(filename, ".")
  • If an extension exists, we are using scan(filename, -1, ".") to extract the last portion as the extension
  • To get the base name (filename without extension), we are using substr() and length() to remove the extension part
  • If there is no period in the filename, we are setting the entire filename as the base name and leaving the extension blank
  • We are using output; to write each processed file to the final dataset
  • Once all files are processed, we are closing the folder using dclose(did)
  • Finally, we are dropping internal helper variables using the drop statement to keep the dataset clean
library(tidyverse)

folder_path <- "C:/Users/curio/Desktop/Rough"

file_list <- tibble(
  full_filename = list.files(path = folder_path,full.names = TRUE),
  filename = list.files(path = folder_path),
  folder = folder_path,
  basename = tools::file_path_sans_ext(filename),
  extension = tools::file_ext(filename)
)
  • We are loading the tidyverse package to make use of functions like tibble() and data manipulation tools
  • We are assigning the target folder path "C:/Users/curio/Desktop/Rough" to a variable named folder_path
  • We are using list.files() with full.names = TRUE to get the complete file paths and saving them in full_filename
  • We are also getting just the filenames (without paths) using list.files() again and saving them in filename
  • We are storing the folder path as-is into a new column called folder for reference
  • To extract the base name of the file (without the extension), we are using tools::file_path_sans_ext(filename)
  • To extract just the file extension (e.g., txt, csv), we are using tools::file_ext(filename)
  • We are combining all of this into a tidy table using tibble(), resulting in a dataset with one row per file
folder_path <- "C:/Users/curio/Desktop/Rough"

file_list <- data.frame(
  full_filename = list.files(path = folder_path, full.names = TRUE),
  filename = list.files(path = folder_path),
  folder = folder_path,
  stringsAsFactors = FALSE
)

file_list$basename <- tools::file_path_sans_ext(file_list$filename)
file_list$extension <- tools::file_ext(file_list$filename)
  • list.files() returns file names; full.names adds full paths.
  • tools::file_path_sans_ext and file_ext split names.