Tips & Best Practices for the Study File Format API
This page lists some tips and best practices for using the Study File Format (SFF) API.
Use Programmatic Consumption
The SFF API is designed to be accessed programmatically, not by manually reading the files. The manifest file contained in the SFF package organizes the CSV files and metadata for each column, with each Study Data section having its own block. The Study Design section is organized separately within the manifest file. Item Definition data is included in the clinical data section because each Item is a column in the CSV file.
SFF column headers are also designed to be machine-readable. The attributes in JSON format are not guaranteed to be in a specific order, so it’s important to parse the columns programmatically. This is with the exception of the itemgroups
array attribute in the clinical data section, which maintains the order in which each item definition appears within the item group.
Efficiently Track Changes
Each file in the SFF includes a ROWID column, which serves as a unique identifier for each row. Use this column to track any row-level changes.
- Treat each change as an UPSERT: if a row appears in an incremental SFF file, it indicates an addition or modification to that row.
- The DELETES CSV file lists any rows that were deleted in the latest increment of changes.
Leverage Created Dates
Filenames include a published time for each defined increment: every 15 minutes for incremental extractions and every 24 hours for full extractions. It’s best practice to consume SFF ZIP packages in the order of the created_date
returned by the API. Using the created_date
, especially for incremental SFF consumption, helps you understand the order in which to apply changes to downstream systems to keep them in sync. For example, the created_date
may be around 00:30 UTC, but the published time may show 00:45. This is because the system has captured the time interval between 00:30 and 00:45. Learn more about filenames.
Retrieve the Full SFF Package for Setup and Refresh
Retrieve the full SFF package the first time you enable it. If incremental SFF is also enabled, you should only need to retrieve the full package initially. After that, if your data becomes out of sync and you need a full refresh, you can retrieve the full package again. Learn more about SFF and study design changes.
.