Importing Reference Data

CDB allows you to import non-clinical data via reference objects. Reference objects are a specific type of dataset that can be ingested into CDB alongside clinical data for cleaning and export purposes. They are like Forms, in that they can have a limited set of Items contained within them, but they are different in how they relate to other header and form data. Forms have a predefined relationship to a Study, Site, and Event, but you must define the relationship between a reference object and the header and form data during ingestion. After data in a reference object is imported into CDB, you can use it like any other Form in CQL, as long as its related data is included in the selection.

Availability: Clinical DataBase (CDB) is only available to CDB license holders. Contact your Veeva Services representative for details.

Prerequisites

Before you can import data into CDB, your organization must perform the following tasks:


Users with the CDMS Lead Data Manager standard study role can perform the actions described below by default. If your organization uses custom roles, your role must grant the following permissions:

Type Permission Label Controls
Functional Permission Workbench Tab

Ability to access and use the Data Workbench application, via the Workbench tab

Functional Permission View Import

Ability to access the Import page

Functional Permission API Access

Ability to access and use the Vault CDMS API. (This permission is also required to use CDB.)

Functional Permission Approve Import

Ability to approve or reject an import package that contains configuration changes

Functional Permission Download Import Package

Ability to download import packages

If your Study contains restricted data, you must have the Restricted Data Access permission to view it.

Learn more about Study Roles.


Relating to Clinical Data

You define keys for your reference object to relate it to clinical data. The keys are the Items in reference objects that are related to Items in another Form or Header. You can only relate reference data to a single Form or Header. The data type for a key must match the data type of the related Item. For example, if the data type for the Key in the reference object is INT (integer), then the related Item on the Form must also use INT.

A reference object can have up to four (4) keys.

The Key or combinations of Keys must be unique across the data set. For this reason, reference objects can’t be repeating.

Reference objects don’t have attributes, such as Name or CreatedDate, like forms and headers do. So, you can’t use @Form or @ItemGroup.

Reference objects can be joined with header data (@HDR). You don’t need to specify this join in your CQL statement, as CQL is already aware of it. However, if you use a reference object with forms in a selection clause, then the reference object must be related to at least one of the forms in the selection.

Reference objects are always joined to their source data as a left outer join.

You can’t join reference data to other data using advanced joins.

Preparing Source Data for Import

Unlike clinical data, there aren’t any column requirements for reference data CSV files. Include one column per data item. CDB uses the column header as the Name of the Item.

Like clinical data import, CDB uses the filename of the CSV to name the reference object.

Creating the Reference Manifest File

Your reference manifest file provides the Study and Source for CDB, lists the files in your ZIP, and for each file, lists which columns map to the required data points.

Source is a custom defined value that allows you to identify the contents of the package. This value is stored in the Source field in Workbench, so you can use it to identify data from this package in Workbench via CQL. The Source value must be unique within your Study.

For your Study, provide the Name of your Study (study__v) record. If your study’s name contains a whitespace character, you must use the value with the whitespace in your manifest file, but then you must replace the whitespace with an underscore (_) in your data files.

For Source, specify the source of the data. Vault applies this source to all imported data. Then, as an array, list each CSV file, including its filename (with the extension) and its mappings, as the value for “reference_data”.

You can include multiple entries in reference_data. Each entry represents a different reference data table.

To relate the reference object to the Header (@HDR), use @HDR for the related_form, and the header property for the related_item.

Reference Manifest Filename: Save this file as “reference_manifest.json”. Note that Workbench will only accept reference manifest files with this exact filename (case-sensitive) and extension.

"reference_data": [
        {
        "filename": "MedDRA.csv",
        "reference_object_name": "MedDRA_Query_List",
        "related": [
                    {   "related_form": "AE",
                        "related_itemgroup": "ae_itemgroup",
                        "related_item": "AETERM",
                        "related_item_function":
                                {
                                    "name": "CODEDCODE",
                                    "argument" : ["LLTCD"]
                                }
                        "related_key": "AELLTCD"
                    },
                   ],
        "items": {
            "LIST_CODE": "text",
            "LIST_ID": "integer",
            "LIST_NAME": "text",
            "LLT_CODE": "integer",
            "LLT_NAME": "text",          
            "LANG": "text",
            "AELLTCD": "integer"
                }
        }
] 

In this example, MedDRA.csv (filename) represents the CSV file that contains the reference data. The property reference_object_name is the name of the reference data table in CDB. The property reference_object_name is optional. When omitted, the name for the reference data table in CDB defaults to the value from filename.

The reference data has these fields: LIST_CODE, LIST_ID, LIST_NAME, LLT_CODE, LLT_NAME, LANG, and AELLTCD.

The item AELLTCD (related_key) in reference data is the key. The related_item in the form AE is the coded value for the item AETERM. In order to use the CODEDCODE function with AETERM, a new property called related_item_function is defined. related_item_function defines the name of the function and the argument to be applied to the related_item.

related_item_function requires two (2) attributes:

  • name: Name of the CQL function.
  • argument: An array of arguments passed to the function (as defined by the name attribute).

The data type for AELLTCD must match the data type of the output from applying CODEDCODE to the item AETERM from the AE form.

Use the following keys in your reference_data entries to define your reference object, its items, and its relationship to the clinical data.

Parent Key Key Name Required Dependencies Description
reference_data filename Yes Name of the CSV file that contains reference data
reference_data reference_object_name No Name of the reference data table in CDB
reference_data related Yes Use this property to define a key or a combination of keys
related related_form_source No Name for the Source for the related_form that reference data is related to
related related_form Yes (when Reference Data is related to an Item on the Form) Name of the Form that reference data is related to
related related_itemgroup Yes (when Reference Data is related to an Item on a Form) Name of the Item Group that the reference data is related to
related related_item Yes Name of the Item in in the Form or Header that the reference data is related to
related related_key Yes Name of the defined key in reference data
reference_data items Yes Use this property to define data types and properties of each data type
items {Name of data column} Yes References the column name where the datatype is going to be defined
{Name of data column} type Yes Defines data type: text, integer, date, and datetime
{Name of data column} length No Type must be text Defines length of a text type
{Name of data column} min No Type must be integer Defines the min value of an integer
{Name of data column} max No Type must be integer Defines the max value of an integer
{Name of data column} format No Type must be date or datetime Defines the acceptable formats of date and datetime datatypes. See the table below for a list of valid formats.

Date & DateTime Formats

Type Valid formats
Date
  • dd MM yy
  • dd MM yyyy
  • dd MMM yyyy
  • dd-MM-yy
  • dd-MM-yyyy
  • dd-MMM-yyyy
  • dd.MM.yy
  • dd.MM.yyyy
  • dd/MM/yy
  • dd/MM/yyyy
  • ddMMMyyyy
  • ddMMyy
  • ddMMyyyy
  • MM/dd/yy
  • MM/dd/yyyy
  • MMddyy
  • MMddyyyy
  • MMM dd yyyy
  • MMM/dd/yyyy
  • MMMddyyyy
  • yy-MM-dd
  • yy/MM/dd
  • yyyy MM dd
  • yyyy-MM-dd
  • yyyy.dd.MM
  • yyyy.MM.dd
  • yyyy/MM/dd
  • yyyyMMdd
  • dd/MMM/yy
  • dd-MMM-yy
  • ddMMMyy
  • dd MMM yy
DateTime
  • dd/MM/yyyy HH:mm
  • MM/dd/yyyy HH:mm
  • yyyy-MM-ddTHH:mm:ss+HH:mm
  • yyyy-MM-ddTHH:mm:ssZ
  • yyyyMMddTHH:mm:ssZ

Import Package Requirements

Your import package must be a ZIP file containing your reference_manifest.json and any reference data CSV files for the chosen source. Don’t include any folders in the ZIP folder. All CSV files and the manifest file must be at the same level.

Column Headers in CSV Files: In the current release, your column headers can’t contain whitespace characters.

Importing the Reference Data Package

For CDB to ingest reference data and validate it, clinical data from the related forms must either already exist in CDB or it must be imported in the same upload package as the reference data. To import reference data alongside clinical data, include the reference_data array key in the same manifest file as the clinical data.

To import your data, upload your ZIP file to the “workbench” directory of your FTP staging server, using an FTP client of your choice. Once you upload your ZIP to the FTP staging server, CDB imports it and transforms your data.

When import is finished, Workbench sends you, as well as any other users subscribed to the Source, an email notification. If the reprocessing of a package results in a change from the previous load, Workbench will also send a notification to you and users subscribed to that Source.

Successful Import

Upon successful import:

  • CDB creates all records from the reference objects.
  • CDB automatically sets the Source field with the value provided in your reference manifest on all imported records to uniquely identify the data source.
  • CDB moves your import ZIP file from “workbench” into “workbench/_processed/{study}/{source}” (for failed imports, your ZIP file stays in the root of the “workbench” directory. CDB also appends the date and time of import to the filename.

Viewing Import Status

You can check the status of your import package from Import > Packages. This page lists the status of every import package, from both Vault EDC and 3rd party tools. You can also download import packages and issue logs (errors and warnings) from this page.

Complete Status: For an import package to move into the Complete import status, a Workbench user in your Study must open a listing. Otherwise, the import stays in the In Progress status. If your Study has the auto swap feature enabled, this isn’t required.

Users without the Restricted Data Access permission can download the import package log, but they can’t download the data file. Users with restricted data access can download a package with blinded data.

Any time you import a package into Workbench, Workbench automatically reprocesses the most recent packages for all other sources. For example, when your nightly Workbench Export EDC job runs, and then imports into Workbench, Workbench reprocesses your most recent Lab Data and Imaging packages as well. For earlier packages from the same Source, Workbench marks them as Replaced by the newer package.

You can easily filter the list to show only complete or failed imports using the Import Status Filters. Click Error to show only failed imports, or click Complete to show completed imports.

Workbench Import Statuses

When an import package is able to import with only warnings, Workbench highlights the status in orange to indicate that there are warnings. When import finishes, you can download the issue log to review the warnings.

Status Description
In Progress The import process has begun for this package, and Workbench has not identified any errors or warnings.
In Progress (with warnings) The import process is ongoing, but Workbench has identified a warning.
Error Import failed because there are one or more errors in the import package. Download the issue log and review the errors.
Complete Workbench imported the package successfully, with no errors or warnings.
Complete (with warnings) Workbench imported the package successfully, but there are one or more warnings. Download the issue log and review the warnings.
Reprocess in Progress Workbench has begun reprocessing this package because a new package from another source was imported.
Reprocessed Complete Workbench finished reprocessing this package with no errors or warnings.
Reprocessed Complete (with warnings) Workbench finished reprocessing this package, but there are one or more warnings. Download the issue log and review the warnings.
Reprocessed Error Reprocessing failed because there are one or more errors in the import package. Download the issue log and review the errors.

Download the Import Package

To download the import package:

  1. Navigate to Import > Packages.
  2. Locate your import package in the list.
  3. Click the Package link. Click to download the import package

  4. Extract the files from the ZIP folder and view them in a tool of your choosing.

Download the Logs

You can download the Import Log (CSV) for any import and the Issue Log (CSV) for an import that fails. The Import Log lists details about the import job and the data ingestion into Workbench.

The import log lists the following:

  • Transformation Start Time
  • Transformation Completion Time
  • Transformation Duration
  • Import Start Time
  • Import Completion Time
  • Import Duration

To download the import log:

  1. Navigate to Import > Packages.
  2. Locate your import package in the list.
  3. From the Package () menu, select View Package Details.
  4. In the Package Details panel, click Issues.
  5. Optioanl: In the Issues tab, click Download () to download a CSV of the issue log.

Issue Log

The Issue Log lists all errors and warnings that Workbench encountered while importing the package.See a list of possible errors and warnings here.

To view the issue log:

  1. Navigate to Import > Packages.
  2. Locate your import package in the list.
  3. From the Package () menu, select View Package Details.
  4. In the Package Details panel, click Issues.
  5. Optional: In the Issue Log panel, click Download () to download a CSV of the log.

To download the issue log without first viewing it in the application:

  1. Navigate to Import > Packages.
  2. Locate your import package in the list.
  3. From the Package () menu, select Download Issue Log.

Viewing Reference Data in Workbench

You can query for reference data using CQL.

Select from a reference object in the same way you would as the header or a form. If you select *, CQL returns all Items on the reference object.

select * from Reference_Object_Name

To retrieve a list of all reference objects for your Study, use the SHOW keyword:

SHOW Reference

Then, you can use the DESCRIBE keyword to retrieve a list of Items on a given reference object:

DESCRIBE Reference <Reference_Object_Name>

Use the ReferenceRelations keyword to retrieve a table of key value pairs for the following properties: reference_object_name, related_form, related_item, and related_key.

ReferenceRelations <Reference_Object_Name>

You can use the KEYMATCH function to check if there is a match between the related_item and the related_key. If there’s a match, CQL returns true. Otherwise, CQL returns false.

KEYMATCH(form_name, reference_object)

For text-type reference object Items, CQL supports the following functions:

  • CONCAT
  • LENGTH
  • TRIM
  • LTRIM
  • RTRIM
  • UPPER
  • LOWER
  • LEFT
  • RIGHT

For date-type reference object Items, CQL supports the following functions:

  • ADDDATE
  • SUBDATE
  • DATEDIFF
  • DATE_FORMAT
  • SITENORMALIZEDDATE
  • RAWDATE
  • SDTMDATEFORMAT
  • STR_TO_DATE
  • CURDATE
  • NOW
  • YEAR
  • MONTH
  • WEEK
  • DAY
  • HOUR
  • MINUTE
  • SECOND
  • LAST_DAY

CQL Limitations

CQL limitations apply when working with reference objects.

The following functions are not available for use with reference data:

  • FROZEN
  • ILB
  • LABEL
  • LASTCODED
  • LASTMODIFIEDDATE
  • LOCKED
  • SIGNED
  • STATUS

You also can’t use advanced joins to join reference data to other data.