Importing Reference Data

CDB allows you to import non-clinical data via reference objects. Reference objects are a specific type of dataset that can be ingested into CDB alongside clinical data for cleaning and export purposes. They are like Forms, in that they can have a limited set of Items contained within them, but they are different in how they relate to other header and form data. Forms have a predefined relationship to a Study, Site, and Event, but you must define the relationship between a reference object and the header and form data during ingestion. After data in a reference object is imported into CDB, you can use it like any other Form in CQL, as long as its related data is included in the selection.

Availability: Clinical DataBase (CDB) is only available to CDB license holders. Contact your Veeva Services representative for details.

Prerequisites

Configuration

Before you can import data into CDB, your organization must perform the following tasks:

Create a Study, Study Country, and Study Sites.
Create and publish Casebook Definition, a complete set of design definition records, and a study schedule in Veeva EDC Studio. See details here
Create Casebook records for each of your Subjects

Required Permissions

Users with the CDMS Lead Data Manager standard study role can perform the actions described below by default. If your organization uses custom roles, your role must grant the following permissions:

Type	Permission Label	Controls
Standard Tab	Workbench Tab	Ability to access and use the Data Workbench application, via the Workbench tab
Functional Permission	View Import	Ability to access the Import page
Functional Permission	API Access	Ability to access and use the Vault EDC API. (This permission is also required to use CDB.)
Functional Permission	Approve Import	Ability to approve or reject an import package that contains configuration changes
Functional Permission	Download Import Package	Ability to download import packages

If your Study contains restricted data, you must have the Restricted Data Access permission to view it.

Learn more about Study Roles.

Relating to Clinical Data

You define keys for your reference object to relate it to clinical data. The keys are the Items in reference objects that are related to Items in another Form or Header. You can only relate reference data to a single Form or Header. The data type for a key must match the data type of the related Item. For example, if the data type for the Key in the reference object is INT (integer), then the related Item on the Form must also use INT.

A reference object can have up to four (4) keys.

The Key or combinations of Keys must be unique across the data set. For this reason, reference objects can’t be repeating.

Reference objects don’t have attributes, such as Name or CreatedDate, like forms and headers do. So, you can’t use @Form or @ItemGroup.

Reference objects can be joined with header data (@HDR). You don’t need to specify this join in your CQL statement, as CQL is already aware of it. However, if you use a reference object with forms in a selection clause, then the reference object must be related to at least one of the forms in the selection.

Reference objects are always joined to their source data as a left outer join.

You can’t join reference data to other data using advanced joins.

Item Column Limit: You can have up to 35 item columns in your reference data import file.

Preparing Source Data for Import

Unlike clinical data, there aren’t any column requirements for reference data CSV files. Include one column per data item. CDB uses the column header as the Name of the Item.

Like clinical data import, CDB uses the filename of the CSV to name the reference object.

Creating the Reference Manifest File

Your reference manifest file provides the Study and Source for CDB, lists the files in your ZIP, and for each file, lists which columns map to the required data points.

Source is a custom defined value that allows you to identify the contents of the package. This value is stored in the Source field in Workbench, so you can use it to identify data from this package in Workbench via CQL. The Source value must be unique within your Study.

For your Study, provide the Name of your Study (study__v) record. If your study’s name contains a whitespace character, you must use the value with the whitespace in your manifest file, but then you must replace the whitespace with an underscore (_) in your data files.

For Source, specify the source of the data. The system applies this source to all imported data. Then, as an array, list each CSV file, including its filename (with the extension) and its mappings, as the value for “reference_data”.

You can include multiple entries in reference_data. Each entry represents a different reference data table.

To relate the reference object to the Header (@HDR), use @HDR for the related_form, and the header property for the related_item.

Reference Manifest Filename: Save this file as “reference_manifest.json”. Note that Workbench will only accept reference manifest files with this exact filename (case-sensitive) and extension.

"reference_data": [
        {
        "filename": "MedDRA.csv",
        "reference_object_name": "MedDRA_Query_List",
        "related": [
                    {   "related_form": "AE",
                        "related_itemgroup": "ae_itemgroup",
                        "related_item": "AETERM",
                        "related_item_function":
                                {
                                    "name": "CODEDCODE",
                                    "argument" : ["LLTCD"]
                                }
                        "related_key": "AELLTCD"
                    },
                   ],
        "items": {
            "LIST_CODE": "text",
            "LIST_ID": "integer",
            "LIST_NAME": "text",
            "LLT_CODE": "integer",
            "LLT_NAME": "text",          
            "LANG": "text",
            "AELLTCD": "integer"
                }
        }
] 

In this example, MedDRA.csv (filename) represents the CSV file that contains the reference data. The property reference_object_name is the name of the reference data table in CDB. The property reference_object_name is optional. When omitted, the name for the reference data table in CDB defaults to the value from filename.

The reference data has these fields: LIST_CODE, LIST_ID, LIST_NAME, LLT_CODE, LLT_NAME, LANG, and AELLTCD.

The item AELLTCD (related_key) in reference data is the key. The related_item in the form AE is the coded value for the item AETERM. In order to use the CODEDCODE function with AETERM, a new property called related_item_function is defined. related_item_function defines the name of the function and the argument to be applied to the related_item.

related_item_function requires two (2) attributes:

name: Name of the CQL function.
argument: An array of arguments passed to the function (as defined by the name attribute).

The data type for AELLTCD must match the data type of the output from applying CODEDCODE to the item AETERM from the AE form.

Use the following keys in your reference_data entries to define your reference object, its items, and its relationship to the clinical data.

Parent Key	Key Name	Required	Dependencies	Description
reference_data	filename	Yes		Name of the CSV file that contains reference data
reference_data	reference_object_name	No		Name of the reference data table in CDB
reference_data	related	Yes		Use this property to define a key or a combination of keys
related	related_form_source	No		Name for the Source for the related_form that reference data is related to
related	related_form	Yes (when Reference Data is related to an Item on the Form)		Name of the Form that reference data is related to
related	related_itemgroup	Yes (when Reference Data is related to an Item on a Form)		Name of the Item Group that the reference data is related to
related	related_item	Yes		Name of the Item in in the Form or Header that the reference data is related to
related	related_key	Yes		Name of the defined key in reference data
reference_data	items	Yes		Use this property to define data types and properties of each data type
items	{Name of data column}	Yes		References the column name where the datatype is going to be defined
{Name of data column}	type	Yes		Defines data type: text, integer, date, and datetime
{Name of data column}	length	No	Type must be text	Defines length of a text type
{Name of data column}	min	No	Type must be integer	Defines the min value of an integer
{Name of data column}	max	No	Type must be integer	Defines the max value of an integer
{Name of data column}	format	No	Type must be date or datetime	Defines the acceptable formats of date and datetime data types. See the table below for a list of valid formats.

Date & DateTime Formats

Type	Valid formats
Date	dd MM yy dd MM yyyy dd MMM yyyy dd-MM-yy dd-MM-yyyy dd-MMM-yyyy dd.MM.yy dd.MM.yyyy dd/MM/yy dd/MM/yyyy ddMMMyyyy ddMMyy ddMMyyyy MM/dd/yy MM/dd/yyyy MMddyy MMddyyyy MMM dd yyyy MMM/dd/yyyy MMMddyyyy yy-MM-dd yy/MM/dd yyyy MM dd yyyy-MM-dd yyyy.dd.MM yyyy.MM.dd yyyy/MM/dd yyyyMMdd dd/MMM/yy dd-MMM-yy ddMMMyy dd MMM yy
DateTime	dd/MM/yyyy HH:mm MM/dd/yyyy HH:mm yyyy-MM-ddTHH:mm:ss+HH:mm yyyy-MM-ddTHH:mm:ssZ yyyyMMddTHH:mm:ssZ

Import Package Requirements

Your import package must be a ZIP file containing your reference_manifest.json and any reference data CSV files for the chosen source. Don’t include any folders in the ZIP folder. All CSV files and the manifest file must be at the same level.

Column Headers in CSV Files: In the current release, your column headers can’t contain whitespace characters.

Importing the Reference Data Package

For CDB to ingest reference data and validate it, clinical data from the related forms must either already exist in CDB or it must be imported in the same upload package as the reference data. To import reference data alongside clinical data, include the reference_data array key in the same manifest file as the clinical data.

You can import data into Workbench using your vault’s FTP server or the CDB Package Loader.

Importing with an FTP Client

To import your data using an FTP client, upload your ZIP file to the “workbench” directory of your FTP staging server, using an FTP client of your choice. Once you upload your ZIP to the FTP staging server, CDB imports it and transforms your data.

Importing with the CDB Package Loader

To import data using the Workbench user interface, upload your ZIP files via the CDB Package Loader. You can access the CDB Package Loader from the top navigation bar in Workbench or from the Navigation Drawer. For instructions on how to import via the CDB Package Loader, read Importing with the CDB Package Loader.

When import is finished, Workbench sends you, as well as any other users subscribed to the Source, an email notification. If the reprocessing of a package results in a change from the previous load, Workbench will also send a notification to you and users subscribed to that Source.

Successful Import

Upon successful import:

CDB creates all records from the reference objects.
CDB automatically sets the Source field with the value provided in your reference manifest on all imported records to uniquely identify the data source.
CDB moves your import ZIP file from:
- “workbench” and into “workbench/_processed/{study}/{source}” if you used an FTP client for import. CDB also appends the date and time of import to the filename.
- Vault File Staging and into “workbench/_processed/{study}/{source}” if you used the CDB Package Loader for import

Viewing Import Status

You can check the status of your import package from Import > Packages. This page lists the status of every import package, from both Vault EDC and third party tools. You can also download import packages and issue logs (errors and warnings) from this page.

Complete Status: For an import package to move into the Complete import status, a Workbench user in your Study must open a listing. Otherwise, the import stays in the In Progress status. If your Study has the auto swap feature enabled, this isn’t required. This is the case for all incremental and OpenEDC studies, as they have the auto swap feature enabled by default.

Users without the Restricted Data Access permission can download the import package log, but they can’t download the data file. Users with Restricted Data Access can download a package with blinded data.

Filtering

You can easily filter the list to show only complete or failed imports using the Import Status Filters. Click Error to show only failed imports, or click Complete to show completed imports.

Reprocessing

Workbench automatically reprocesses production third party data packages that contain the recoverable warning codes listed below 24 hours after the import or last reprocess time.

D-002: Site not found
D-003: Subject not found
D-004: Event not found
D-032: Subject not found

The presence of these codes means that individual rows in the data file could not be imported because a match to the Primary Source was not found.

Workbench does not reprocess packages with other warning or error codes because the reprocess looks for newly entered data from the Primary Source, and only the above warnings can resolve via newly entered data. Note that automatic reprocessing is only for production environments, not TST, TRN, or VAL (even if they have the above warnings).

Workbench Import Statuses

When an import package is able to import with only warnings, Workbench highlights the status in orange to indicate that there are warnings. When import finishes, you can download the issue log to review the warnings.

Status	Description
In Progress	The import process has begun for this package, and Workbench has not identified any errors or warnings.
In Progress (with warnings)	The import process is ongoing, but Workbench has identified a warning.
Error	Import failed because there are one or more errors in the import package. Download the issue log and review the errors.
Complete	Workbench imported the package successfully, with no errors or warnings.
Complete (with warnings)	Workbench imported the package successfully, but there are one or more warnings. Download the issue log and review the warnings.
Reprocess in Progress	Workbench has begun reprocessing this package because a new package from another source was imported.
Reprocessed Complete	Workbench finished reprocessing this package with no errors or warnings.
Reprocessed Complete (with warnings)	Workbench finished reprocessing this package, but there are one or more warnings. Download the issue log and review the warnings.
Reprocessed Error	Reprocessing failed because there are one or more errors in the import package. Download the issue log and review the errors.

Download the Import Package

To download the import package:

Navigate to Import for your Study.
Locate your Source in the source list.
Click Packages to open the Packages page for your Source.
Locate your import package in the list.
Click the Package link.
Extract the files from the ZIP folder and view them in a tool of your choosing.

Download the Logs

You can download the Import Log (CSV) for any import and the Issue Log (CSV) for an import that fails. The Import Log lists details about the import job and the data ingestion into Workbench.

The import log lists the following:

Transformation Start Time
Transformation Completion Time
Transformation Duration
Import Start Time
Import Completion Time
Import Duration

To download the import log:

Navigate to Import for your Study.
Locate your Source in the source list.
Click Packages to open the Packages page for your Source.
Locate your import package in the list.
From the Package () menu, select Download Logs.

Issue Log

The Issue Log lists all errors and warnings that Workbench encountered while importing the package.See a list of possible errors and warnings here.

To view the issue log:

Navigate to the Packages page for your Source.
Locate your import package in the list.
From the Package () menu, select View Package Details.
In the Package Details panel, click Issues.
Optional: In the Issue Log panel, click Download () to download a CSV of the log.

To download the issue log without first viewing it in the application:

Navigate to the Packages page for your Source.
Locate your import package in the list.
From the Package () menu, select Download Logs.

Viewing Reference Data in Workbench

You can query for reference data using CQL.

Select from a reference object in the same way you would as the header or a form. If you select *, CQL returns all Items on the reference object.

select * from Reference_Object_Name

To retrieve a list of all reference objects for your Study, use the SHOW keyword:

SHOW Reference

Then, you can use the DESCRIBE keyword to retrieve a list of Items on a given reference object:

DESCRIBE Reference <Reference_Object_Name>

Use the ReferenceRelations keyword to retrieve a table of key value pairs for the following properties: reference_object_name, related_form, related_item, and related_key.

ReferenceRelations <Reference_Object_Name>

You can use the KEYMATCH function to check if there is a match between the related_item and the related_key. If there’s a match, CQL returns true. Otherwise, CQL returns false.

KEYMATCH(form_name, reference_object)

For text-type reference object Items, CQL supports the following functions:

CONCAT
LENGTH
TRIM
LTRIM
RTRIM
UPPER
LOWER
LEFT
RIGHT

For date-type reference object Items, CQL supports the following functions:

ADDDATE
SUBDATE
DATEDIFF
DATE_FORMAT
SITENORMALIZEDDATE
RAWDATE
SDTMDATEFORMAT
STR_TO_DATE
CURDATE
NOW
YEAR
MONTH
WEEK
DAY
HOUR
MINUTE
SECOND
LAST_DAY

CQL Limitations

CQL limitations apply when working with reference objects.

The following functions are not available for use with reference data:

FROZEN
ILB
LABEL
LASTCODED
LASTMODIFIEDDATE
LOCKED
SIGNED
STATUS

You also can’t use advanced joins to join reference data to other data.