Importing Reference Data
CDB allows you to import non-clinical data via reference objects. Reference objects are a specific type of dataset that can be ingested into CDB alongside clinical data for cleaning and export purposes. They are like Forms, in that they can have a limited set of Items contained within them, but they are different in how they relate to other header and form data. Forms have a predefined relationship to a Study, Site, and Event, but you must define the relationship between a reference object and the header and form data during ingestion. After data in a reference object is imported into CDB, you can use it like any other Form in CQL, as long as its related data is included in the selection.
Availability: Clinical DataBase (CDB) is only available to CDB license holders. Contact your Veeva Services representative for details.
Prerequisites
Before you can import data into CDB, your organization must perform the following tasks:
- Create a Study, Study Country, and Study Sites.
- Create and publish Casebook Definition, a complete set of design definition records, and a study schedule in Vault EDC Studio. See details here
- Create Casebook records for each of your Subjects
Users with the CDMS Lead Data Manager standard study role can perform the actions described below by default. If your organization uses custom roles, your role must grant the following permissions:
Type | Permission Label | Controls |
---|---|---|
Standard Tab | Workbench Tab | Ability to access and use the Data Workbench application, via the Workbench tab |
Functional Permission | View Import | Ability to access the Import page |
Functional Permission | API Access | Ability to access and use the Vault EDC API. (This permission is also required to use CDB.) |
Functional Permission | Approve Import | Ability to approve or reject an import package that contains configuration changes |
Functional Permission | Download Import Package | Ability to download import packages |
If your Study contains restricted data, you must have the Restricted Data Access permission to view it.
Learn more about Study Roles.
Relating to Clinical Data
You define keys for your reference object to relate it to clinical data. The keys are the Items in reference objects that are related to Items in another Form or Header. You can only relate reference data to a single Form or Header. The data type for a key must match the data type of the related Item. For example, if the data type for the Key in the reference object is INT (integer), then the related Item on the Form must also use INT.
A reference object can have up to four (4) keys.
The Key or combinations of Keys must be unique across the data set. For this reason, reference objects can’t be repeating.
Reference objects don’t have attributes, such as Name or CreatedDate, like forms and headers do. So, you can’t use @Form or @ItemGroup.
Reference objects can be joined with header data (@HDR). You don’t need to specify this join in your CQL statement, as CQL is already aware of it. However, if you use a reference object with forms in a selection clause, then the reference object must be related to at least one of the forms in the selection.
Reference objects are always joined to their source data as a left outer join.
You can’t join reference data to other data using advanced joins.
Preparing Source Data for Import
Unlike clinical data, there aren’t any column requirements for reference data CSV files. Include one column per data item. CDB uses the column header as the Name of the Item.
Like clinical data import, CDB uses the filename of the CSV to name the reference object.
Creating the Reference Manifest File
Your reference manifest file provides the Study and Source for CDB, lists the files in your ZIP, and for each file, lists which columns map to the required data points.
Source is a custom defined value that allows you to identify the contents of the package. This value is stored in the Source field in Workbench, so you can use it to identify data from this package in Workbench via CQL. The Source value must be unique within your Study.
For your Study, provide the Name of your Study (study__v
) record. If your study’s name contains a whitespace character, you must use the value with the whitespace in your manifest file, but then you must replace the whitespace with an underscore (_) in your data files.
For Source, specify the source of the data. Vault applies this source to all imported data. Then, as an array, list each CSV file, including its filename (with the extension) and its mappings, as the value for “reference_data”.
You can include multiple entries in reference_data
. Each entry represents a different reference data table.
To relate the reference object to the Header (@HDR
), use @HDR
for the related_form
, and the header property for the related_item
.
Reference Manifest Filename: Save this file as “reference_manifest.json”. Note that Workbench will only accept reference manifest files with this exact filename (case-sensitive) and extension.
"reference_data": [
{
"filename": "MedDRA.csv",
"reference_object_name": "MedDRA_Query_List",
"related": [
{ "related_form": "AE",
"related_itemgroup": "ae_itemgroup",
"related_item": "AETERM",
"related_item_function":
{
"name": "CODEDCODE",
"argument" : ["LLTCD"]
}
"related_key": "AELLTCD"
},
],
"items": {
"LIST_CODE": "text",
"LIST_ID": "integer",
"LIST_NAME": "text",
"LLT_CODE": "integer",
"LLT_NAME": "text",
"LANG": "text",
"AELLTCD": "integer"
}
}
]
In this example, MedDRA.csv (filename) represents the CSV file that contains the reference data. The property reference_object_name
is the name of the reference data table in CDB. The property reference_object_name
is optional. When omitted, the name for the reference data table in CDB defaults to the value from filename
.
The reference data has these fields: LIST_CODE
, LIST_ID
, LIST_NAME
, LLT_CODE
, LLT_NAME
, LANG
, and AELLTCD
.
The item AELLTCD
(related_key
) in reference data is the key. The related_item
in the form AE is the coded value for the item AETERM. In order to use the CODEDCODE
function with AETERM, a new property called related_item_function
is defined. related_item_function
defines the name of the function and the argument to be applied to the related_item
.
related_item_function
requires two (2) attributes:
name
: Name of the CQL function.argument
: An array of arguments passed to the function (as defined by thename
attribute).
The data type for AELLTCD
must match the data type of the output from applying CODEDCODE
to the item AETERM from the AE form.
Use the following keys in your reference_data entries to define your reference object, its items, and its relationship to the clinical data.
Parent Key | Key Name | Required | Dependencies | Description |
---|---|---|---|---|
reference_data | filename | Yes | Name of the CSV file that contains reference data | |
reference_data | reference_object_name | No | Name of the reference data table in CDB | |
reference_data | related | Yes | Use this property to define a key or a combination of keys | |
related | related_form_source | No | Name for the Source for the related_form that reference data is related to | |
related | related_form | Yes (when Reference Data is related to an Item on the Form) | Name of the Form that reference data is related to | |
related | related_itemgroup | Yes (when Reference Data is related to an Item on a Form) | Name of the Item Group that the reference data is related to | |
related | related_item | Yes | Name of the Item in in the Form or Header that the reference data is related to | |
related | related_key | Yes | Name of the defined key in reference data | |
reference_data | items | Yes | Use this property to define data types and properties of each data type | |
items | {Name of data column} | Yes | References the column name where the datatype is going to be defined | |
{Name of data column} | type | Yes | Defines data type: text, integer, date, and datetime | |
{Name of data column} | length | No | Type must be text | Defines length of a text type |
{Name of data column} | min | No | Type must be integer | Defines the min value of an integer |
{Name of data column} | max | No | Type must be integer | Defines the max value of an integer |
{Name of data column} | format | No | Type must be date or datetime | Defines the acceptable formats of date and datetime datatypes. See the table below for a list of valid formats. |
Date & DateTime Formats
Type | Valid formats |
---|---|
Date |
|
DateTime |
|
Import Package Requirements
Your import package must be a ZIP file containing your reference_manifest.json and any reference data CSV files for the chosen source. Don’t include any folders in the ZIP folder. All CSV files and the manifest file must be at the same level.
Column Headers in CSV Files: In the current release, your column headers can’t contain whitespace characters.
Importing the Reference Data Package
For CDB to ingest reference data and validate it, clinical data from the related forms must either already exist in CDB or it must be imported in the same upload package as the reference data. To import reference data alongside clinical data, include the reference_data array key in the same manifest file as the clinical data.
To import your data, upload your ZIP file to the “workbench” directory of your FTP staging server, using an FTP client of your choice. Once you upload your ZIP to the FTP staging server, CDB imports it and transforms your data.
When import is finished, Workbench sends you, as well as any other users subscribed to the Source, an email notification. If the reprocessing of a package results in a change from the previous load, Workbench will also send a notification to you and users subscribed to that Source.
Successful Import
Upon successful import:
- CDB creates all records from the reference objects.
- CDB automatically sets the Source field with the value provided in your reference manifest on all imported records to uniquely identify the data source.
- CDB moves your import ZIP file from “workbench” into “workbench/_processed/{study}/{source}” (for failed imports, your ZIP file is moved into the “workbench/_error”). CDB also appends the date and time of import to the filename.
Viewing Import Status
You can check the status of your import package from Import > Packages. This page lists the status of every import package, from both Vault EDC and 3rd party tools. You can also download import packages and issue logs (errors and warnings) from this page.
Complete Status: For an import package to move into the Complete import status, a Workbench user in your Study must open a listing. Otherwise, the import stays in the In Progress status. If your Study has the auto swap feature enabled, this isn’t required.
Users without the Restricted Data Access permission can download the import package log, but they can’t download the data file. Users with restricted data access can download a package with blinded data.
Any time you import a package into Workbench, Workbench automatically reprocesses the most recent packages for all other sources. For example, when your nightly Workbench Export EDC job runs, and then imports into Workbench, Workbench reprocesses your most recent Lab Data and Imaging packages as well. For earlier packages from the same Source, Workbench marks them as Replaced by the newer package.
You can easily filter the list to show only complete or failed imports using the Import Status Filters. Click Error to show only failed imports, or click Complete to show completed imports.
Workbench Import Statuses
When an import package is able to import with only warnings, Workbench highlights the status in orange to indicate that there are warnings. When import finishes, you can download the issue log to review the warnings.
Status | Description |
---|---|
In Progress | The import process has begun for this package, and Workbench has not identified any errors or warnings. |
In Progress (with warnings) | The import process is ongoing, but Workbench has identified a warning. |
Error | Import failed because there are one or more errors in the import package. Download the issue log and review the errors. |
Complete | Workbench imported the package successfully, with no errors or warnings. |
Complete (with warnings) | Workbench imported the package successfully, but there are one or more warnings. Download the issue log and review the warnings. |
Reprocess in Progress | Workbench has begun reprocessing this package because a new package from another source was imported. |
Reprocessed Complete | Workbench finished reprocessing this package with no errors or warnings. |
Reprocessed Complete (with warnings) | Workbench finished reprocessing this package, but there are one or more warnings. Download the issue log and review the warnings. |
Reprocessed Error | Reprocessing failed because there are one or more errors in the import package. Download the issue log and review the errors. |
Download the Import Package
To download the import package:
- Navigate to Import > Packages.
- Locate your import package in the list.
- Extract the files from the ZIP folder and view them in a tool of your choosing.
Download the Logs
You can download the Import Log (CSV) for any import and the Issue Log (CSV) for an import that fails. The Import Log lists details about the import job and the data ingestion into Workbench.
The import log lists the following:
- Transformation Start Time
- Transformation Completion Time
- Transformation Duration
- Import Start Time
- Import Completion Time
- Import Duration
To download the import log:
- Navigate to Import > Packages.
- Locate your import package in the list.
- From the Package () menu, select View Package Details.
- In the Package Details panel, click Issues.
- Optioanl: In the Issues tab, click Download () to download a CSV of the issue log.
Issue Log
The Issue Log lists all errors and warnings that Workbench encountered while importing the package.See a list of possible errors and warnings here.
To view the issue log:
- Navigate to Import > Packages.
- Locate your import package in the list.
- From the Package () menu, select View Package Details.
- In the Package Details panel, click Issues.
- Optional: In the Issue Log panel, click Download () to download a CSV of the log.
To download the issue log without first viewing it in the application:
- Navigate to Import > Packages.
- Locate your import package in the list.
- From the Package () menu, select Download Issue Log.
Viewing Reference Data in Workbench
You can query for reference data using CQL.
Select from a reference object in the same way you would as the header or a form. If you select *, CQL returns all Items on the reference object.
select * from Reference_Object_Name
To retrieve a list of all reference objects for your Study, use the SHOW
keyword:
SHOW Reference
Then, you can use the DESCRIBE
keyword to retrieve a list of Items on a given reference object:
DESCRIBE Reference <Reference_Object_Name>
Use the ReferenceRelations
keyword to retrieve a table of key value pairs for the following properties: reference_object_name
, related_form
, related_item
, and related_key
.
ReferenceRelations <Reference_Object_Name>
You can use the KEYMATCH function to check if there is a match between the related_item and the related_key. If there’s a match, CQL returns true
. Otherwise, CQL returns false
.
KEYMATCH(form_name, reference_object)
For text-type reference object Items, CQL supports the following functions:
CONCAT
LENGTH
TRIM
LTRIM
RTRIM
UPPER
LOWER
LEFT
RIGHT
For date-type reference object Items, CQL supports the following functions:
ADDDATE
SUBDATE
DATEDIFF
DATE_FORMAT
SITENORMALIZEDDATE
RAWDATE
SDTMDATEFORMAT
STR_TO_DATE
CURDATE
NOW
YEAR
MONTH
WEEK
DAY
HOUR
MINUTE
SECOND
LAST_DAY
CQL Limitations
CQL limitations apply when working with reference objects.
The following functions are not available for use with reference data:
FROZEN
ILB
LABEL
LASTCODED
LASTMODIFIEDDATE
LOCKED
SIGNED
STATUS
You also can’t use advanced joins to join reference data to other data.