Importing 3rd Party Data

CDB allows you to import data from 3rd party systems for cleaning and reporting in the CDB Workbench application. This is intended for importing subject data that exists outside of the subject’s Casebook in Vault EDC, for example, data about a Subject from an IRT system.

Availability: Clinical DataBase (CDB) is only available to CDB license holders. Contact your Veeva Services representative for details.

Prerequisites

Before you can import data into CDB, your organization must perform the following tasks:


Users with the CDMS Lead Data Manager standard study role can perform the actions described below by default. If your organization uses custom roles, your role must grant the following permissions:

Type Permission Label Controls
Standard Tab Workbench Tab

Ability to access and use the Data Workbench application, via the Workbench tab

Functional Permission View Import

Ability to access the Import page

Functional Permission API Access

Ability to access and use the Vault EDC API. (This permission is also required to use CDB.)

Functional Permission Approve Import

Ability to approve or reject an import package that contains configuration changes

Functional Permission Download Import Package

Ability to download import packages


Preparing Data for Import

You can import your data as a series of CSV files to your CDB vault’s FTP server. To import your data, you’ll first need to create the CSV files, which list your clinical data, and a manifest file (.json), describing your data and how CDB should process it. Then, you’ll create a ZIP package (.zip) containing all of those files for import into CDB.

Manifest Builder: CDB includes the CDB Manifest Builder for ease of file creation. The CDB Manifest Builder is a step-by-step wizard that guides you through all manifest file configuration options in a user-friendly interface. In the CDB Manifest Builder, CDB presents you with every import option, which you can set without needing to understand JSON.

Data CSVs

Create a CSV for each data collection form in your study that is independent of EDC. For each CSV, you must provide four (4) required columns, and then a column for each execution data item. For the four required columns, you can name them whatever you choose, as long as you define those names in your manifest file for each CSV. Note that these column names are case-sensitive and must be an exact match to the values provided in your manifest file.

Aside from the study, site, subject, event, and sequence columns, CDB considers every column to be a data item and will import it as such. There is a limit of 410 item columns per listing.

Column Description Manifest File Key
Study

In this column, provide the Name of the Study. For your Study, provide the Name of your Study (study__v) record. If your study’s name contains a whitespace character, you must use the value with the whitespace in your manifest file, but then you must replace the whitespace with an underscore (_) in your data files.

study
Site

In this column, provide the Name (site number) of the Site (from the Site record in Vault EDC).

If you don’t define a Site column in the manifest file and exclude the column from the CSV file, then Workbench can match based on the Subject column. Workbench expects Subject IDs to be unique at the Study level when doing so.

site
Subject

In this column, provide the Subject ID for the subject (from the Subject record’s Name field in Vault EDC).

subject
Event In this column, provide the Event (visit) associated with the data row. Depending on the edc_matching configuration in your manifest file, you can use the Name or External ID of the EDC Event Definition, the Event Date of the EDC Event, or use Events in this column to have Workbench create events separate from the EDC schedule. You can also set a default Event for all rows in the manifest file and exclude this column entirely, which assigns all rows to that Event. Learn more about event matching below. event
Form
*Optional
In this column, provide the Form associated with the data row (from the Name of the Form Definition record in Vault EDC). This column is only needed if you’re importing data for multiple Forms in a single CSV. If you don’t include this column, Workbench assumes that each data row is an occurrence of a single Form.

Learn more about form mapping below.
form
Form Sequence
*Required for repeating Forms, optional for non-repeating Forms

The Sequence Number of the Form, if the form is repeating. A repeating Form represents the collection of the same data more than once during a single Event for a single Subject. Then, the Sequence Number uniquely identifies the data row. If there is more than one row (form) for a Subject and Event, Workbench assumes that the form is repeating and uses this column to set the Sequence Number. This column is required if your Form is repeating, but it is optional if your Form is not repeating. In that case, Workbench sets the Sequence Number to “1” for each row by default.

If you don’t include this column for a repeating Form, it can result in a cartesian product, because Workbench defaults Sequence Number to “1”, resulting in rows with the same Subject/Site/Form/Form Sequence Number identifier.

With the 20R1 release, we renamed “sequence” as “formsequence”. You can continue to import data using “sequence” until the 20R2 release (August 2020).

formsequence
Item Group
*Optional
In this column, provide the Item Group associated with the Item columns after it (from the Name of the Item Group Definition record in Vault EDC). This column is only needed if you’re importing data for forms with multiple Item Groups. If you don’t include this column, Workbench assumes that all Items in the data row are in a single Item Group.

Learn more about item group mapping below
itemgroup
Item Group Sequence
*Required for repeating Item Groups, optional for non-repeating Item Groups

The Sequence Number of the Item Group, if the item group is repeating. A repeating Item Group represents the collection of the same data more than once during a single Form for a single Subject. Then, the Sequence Number uniquely identifies the data row. If there is more than one set of Items in a form row for a Subject and Event, Workbench assumes that the form is repeating and uses this column to set the Sequence Number.

If you don’t include this column for a repeating Item Group, it can result in a cartesian product, because Workbench defaults Sequence Number to “1”, resulting in rows with the same Subject/Site/Form/Form Sequence/Item Group/Item Group Sequence Number identifier.

For non-repeating Item Groups, this column is optional, so you may leave it blank or not provide it if none of your item groups are repeating. In that case, Workbench sets the Sequence Number to “1” for each row by default.

itemgroupsequence
Row ID
*Optional
21R2 & Earlier
Provide a list of columns that, when their values are combined, Workbench can use to uniquely identify a row. This is useful when you don’t have a numerical key to identify a sequence. For example, when importing lab data, you could use Study, Subject, and Lab Test to uniquely identify rows. If you include a rowid column mapping, Workbench ignores any values for sequence and automatically sets the sequence to “1”. If omitted, Workbench uses the default array of columns to identify rows: study, subject, event, form, and formsequence.

While providing a single list of columns for rowid is still supported, it is planned for removal in a future release. Instead, define groupid and distinctid as lists of columns for the rowid. See the row below.
rowid
Row ID
*Optional
21R3 & Later

Group ID: Provide a list of columns that, when their values are combined, Workbench can use to group records together beyond Study, Site, Subject, and Event. Workbench uses the groupid, combined with the distinctid, to automatically increment the Sequence Number of records.

Distinct ID: Provide a list of columns that, when their values are combined, Workbench can use to uniquely identify a record within the context of the group (as defined by the groupid and the Study, Site, Subject, and Event). Workbench assigns an incremental Sequence Number to each uniquely identified repeating Form or Item Group (by distinctid) within the group (by groupid).

Row External ID: You can assign an External ID to a data row, like the external ID from EDC. This can help you identify the source of a query on third party data. Provide the column that contains the value for External ID for the row in rowexternalid. CQL returns this value in core query listings (@QRY.RowExternalID).

rowid with groupid, distinctid, and rowexternalid

Here is an example CSV file:

STUDY_ID SITE SUBJECT_ID VISIT INITIALS DOB GENDER RACE
S.Deetoza 101 101-1001 Screening CDA 03-27-1991 F Hispanic

In this example, we have the following column mappings:

  • Study: STUDY_ID
  • Site: SITE
  • Subject: SUBJECT_ID
  • Event: VISIT

The INITIALS, DOB, GENDER, and RACE columns are all data items.

Manifest File

Your manifest file provides the Study and Source for CDB, lists the files in your ZIP, and for each file, lists which columns map to the required data points.

Source is a custom defined value that allows you to identify the contents of the package. This value is stored in the Source field in Workbench, so you can use it to identify data from this package in Workbench via CQL. The Source value must be unique within your Study.

For your Study, provide the Name of your Study (study__v) record. For your Study, provide the Name of your Study (study__v) record. If your study’s name contains a whitespace character, you must use the value with the whitespace in your manifest file, but then you must replace the whitespace with an underscore (_) in your data files.

For Source, specify the source of the data. Vault applies this source to all imported data. Then, as an array, list each CSV file, including its filename (with the extension) and its column mappings, as the value for “data”.

You can choose to include configuration metadata about each data item in your import file. See details below.

Save this file as “manifest.json”.

Note that Workbench will only accept manifest files with this exact filename (case-sensitive) and extension.

Example Manifest: Single Form

Here is an example manifest file for the Deetoza study that imports a package from an eCOA containing the Survey form:

{
  "study": "Deetoza",
  "source": "eCOA",
  "data": [
    {
      "filename": "Survey.csv",
      "study": "protocol_id",
      "site": "site_id",
      "subject": "patient",
      "event": "visit_name"
    }  
  ]   
}

Example Manifest: Multiple Forms

Here is an example manifest file for the Deetoza study that imports a package from a laboratory vendor, containing the Chemistry and Hematology forms:

{
  "study": "Deetoza",
  "source": "lab",
  "data": [
    {
      "filename": "Chemistry.csv",
      "study": "STUDY_ID",
      "site": "SITE_ID",
      "subject": "SUBJECT_ID",
      "event": "VISIT",
      "formsequence": "LAB_SEQ"
    },
    {
      "filename": "Hematology.csv",
      "study": "STUDY_ID",
      "site": "SITE_ID",
      "subject": "SUBJECT_ID",
      "event": "VISIT",
      "formsequence": "LAB_SEQ"
    }
  ]
}

You could also use a Row ID mapping and import Chemistry and Hematology lab data using a single CSV file. In that CSV file, you could use a Lab Test Set column to indicate Chemistry or Hematology for the row.

{
  "study": "Deetoza",
  "source": "lab",
  "data": [
    {
      "filename": "Labs.csv",
      "study": "STUDY_ID",
      "site": "SITE_ID",
      "subject": "SUBJECT_ID",
      "event": "VISIT",
      "rowid": ["LAB_TEST_SET", "LAB_TEST_SEQ"]
    }
  ]
} 

Optional: Event Matching

By default, Workbench matches the Event from the form’s CSV file to the Name of the matching Event Definition record in Vault EDC. You can also choose to have Workbench match the Events based on the event definition’s External ID. You can also define a default Event in the manifest file and automatically assign all rows to that Event. If your data was collected outside the event scheduled in EDC, you can choose to not map to any EDC event and have Workbench create events as needed to match those in your import package.

If your Event is within a repeating Event Group, Workbench uses the event definition’s External ID and appends a sequence number to it for matching. The sequence number increments for each unique Event in the file. If an Event is reused in more than one Event Group, it isn’t considered repeating. Instead, Workbench matches it to the first matching Event in the EDC schedule.

Default Behavior

If you don’t include any event matching configuration in your manifest file, the following defaults apply:

  • Workbench matches using the Name of the Event Definition.
  • Workbench attempts to match with existing EDC Events. If there is no matching Event in EDC, Workbench creates a new Event.

This default behavior is equivalent to the manifest configuration below:

{
  "study": "Deetoza",
  "source": "Labs", 
  "edc_matching": {
    "event": {
      "target": ["name"],
      "generate": true
    } 
  }
}

Match on Name (Default)

If you don’t include edc_matching in your manifest file, Workbench automatically matches using the Name of the Event Definition. You can also specify Name-based batching in the manifest file.

{
  "study": "Deetoza",
  "source": "Labs", 
  "edc_matching": {
    "event": {
      "target": ["name"]
    } 
  }
}

Match on External ID

To match on the External ID (formerly known as OID) of the Event Definition, you must include edc_matching and set the event target to external_id. See the example excerpt below:

{
  "study": "Deetoza",
  "source": "Labs", 
  "edc_matching": {
    "event": {
      "target": ["external_id"]
    } 
  }
}

Set a Default Event

You can choose an Event as the default Event from the manifest file, instead of listing an Event in the CSV. Then, Workbench automatically assigns all rows in that CSV file to that Event. This is controlled by the default key. Use the Name of the Event Definition for default. You can set a default for a specific data file or set a default at the package level.

Example: Package-level Default Event

{
  "study": "Deetoza",
  "source": "lab",
  "edc_matching": {
     "event": {
       "default": "treatment_visit"
     }
   },
  "data": [
    {
      "filename": "Labs.csv",
      "study": "STUDY_ID",
      "site": "SITE_ID",
      "subject": "SUBJECT_ID",
      "event": "VISIT",
      "rowid": ["LAB_TEST_SET", "LAB_TEST_SEQ"]
    }
  ]
} 

Example: File-level Default Event

{
  "study": "Deetoza",
  "source": "Labs",
  "edc_matching": {
    "event": {
      "default": "treatment_visit",
    }
  }
}

Handling of Unmatched Events

By default, if Workbench can’t match an Event to one in EDC, either by Name or External ID, Workbench creates new Events for any non-EDC events that are specific to the Source. These new Events become part of the overall Workbench Header Event record (accessible via CQL from @HDR.Event). This is controlled by the generate key. The manifest configuration of the default behavior is generate: true. If your data was collected outside the EDC event schedule, you can choose to not match with any EDC events by setting generate to false. If you set generate to false, Workbench won’t create new Event Definitions. Instead, import will fail for any rows without matching EDC Events.

Example: Match on External ID and Ignore Unmatched Events

{
  "study": "Deetoza",
  "source": "Labs",
  "edc_matching": {
    "event": {
      "target": ["external_id"],
      "generate": false
    }
  }
}

If your data was collected entirely outside the EDC schedule, you can choose to have Workbench create new Events for each Event, without trying to match to an EDC Event. To do so, set event to false, as shown in the example below:

{
  "study": "Deetoza",
  "source": "Labs", 
  "edc_matching": {
    "event": false
  }
}

Optional: Form & Item Group Mapping

You can choose to map rows in your CSV to different Forms and Item Groups, including defining how Workbench interprets repeating forms nad item groups within your data set. This allows Workbench to transform a single CSV into multiple Forms and Item Groups, instead of a single Form and single Item Group.

For example, a Labs form could contain multiple Item Groups - one Item Group per each lab test category. You would use an itemgroup column to indicate which Item Group the data items in the row belong to. Workbench treats all unmapped columns in the row as items. Workbench imports those items within the row’s specified item group. You can specify additional metadata for those columns in the item configuration. See details below.

Use the table below to understand how Workbench will handle your data with different Form and Item Group configuration scenarios:

Form Defined Item Group Defined Result
No No Workbench treats each row as one occurrence of a single Form.
Yes No Workbench treats each row as a different Form, identifying the Form from the form column. Each unique value in the form column is handled as a different Form.
No Yes Workbench treats each row as an instance of an Item Group (as defined in the itemgroup column) and groups these rows together by Subject and Form. To treat each Item Group as a repeating Item Group, use an itemgroupsequence column.
Yes Yes Workbench groups rows together by Subject, form, and itemgroup. Each unique value in the form column is handled as a different Form, and each unique value in the itemgroup column is an Item Group.

Example Manifest: Multiple Forms

In the example below, the “Labs.csv” file contains multiple Forms. These are identified in the form column of the “Labs.csv” file.

{
  "study": "Deetoza",
  "source": "lab",
  "data": [
    {
      "filename": "Labs.csv",
      "study": "protocol_id",
      "site": "site_id",
      "subject": "subject_id",
      "event": "visit",
      "form": "form"
    }
  ]
}

Labs.csv

Example Manifest: Multiple Item Groups

In the example manifest file below, the Labs form has multiple Item Groups. These are identified in the lab_category column of the “Labs.csv” file.

{
  "study": "Deetoza",
  "source": "lab",
  "data": [
    {
      "filename": "Labs.csv",
      "study": "protocol_id",
      "site": "site_id",
      "subject": "subject_id",
      "event": "visit",
      "item_group": "lab_category"
    }
  ]
}

Labs.csv

Optional: Item Configuration

Within your manifest file, you can include Item metadata to inform how CDB Workbench handles data items. You can specify a data type for each Item, as well as additional properties depending on your data type selection.

When item configuration is omitted, CDB Workbench treats all items as text.

Simple vs. Advanced Configuration

The item “config” object has two configuration formats, simple and advanced. For simple configuration, you specify only the item’s data type. The item’s properties use the default values. For advanced configuration, you specify the data type and properties.

Simple:
"items" {
    "Weight": "float"
}
Advanced:
"items" {
    "Weight": {
        "type": "float",
        "precision": 2
    }
}

Available Properties by Data Type

There are different configuration properties available for each data type.

Data Type Example Configuration Property Property Description
text
"items": {
    "Subject_Initials": {
        "type": "text",
        "length": 3
    }
}
Length The number of characters allowed. The default length is 1500.
integer
"items": {
   "Actual_Dosage": {
      "type": "integer",
      "min": 1,
      "max": 1000
   }
}
Min The minimum (lowest) allowable value. The default is -4,294,967,295.
Max The maximum (highest) allowable value. The default is 4,294,967,295.
float
"items": {
    "Weight": {
        "type": "float",
        "length": 4,
        "precision": 1,
        "min": 80,
        "max": 400
    }
}
Length

The maximum number of digits allowed, both to the left and to the right of the decimal point.

If the maximum value has more digits than this property, Workbench uses that number instead.

Precision

The number of decimal places allowed. The default is 5.

If the maximum value has more decimal places than this property, Workbench uses that number instead.

Min The minimum (lowest) allowable value. The default is -4,294,967,295.
Max The maximum (highest) allowable value. The default is 4,294,967,295.
date
"items": {
    "Start_Date": {
        "type": "date",
        "format": "yyyy-MM-dd"
    }
}
Format

The format pattern with which to parse date values. See a list of supported date formats below.

The default is "yyyy-MM-dd".

datetime
"items": {
    "End_DateTime": {
        "type": "datetime",
        "format": "MM-dd-yyyy HH:mm"
    }
}
Format

The format pattern with which to parse datetime values. See a list of supported date and time formats below.

The default is "yyyy-MM-dd HH:mm".

time
"items": {
    "Time_Collected": {
        "type": "time",
        "format": "HH:mm:ss"
    }
}
Format

The format pattern with which to parse time values. See a list of supported time formats below.

The default is "HH:mm".

boolean
"items": {
    "Exam_Performed": "boolean"
}
N/A

The boolean data type doesn't have any configuration properties.

Workbench accepts "true"/"false", "yes"/"no", and "1"/"0" for boolean values.

Supported Date & Time Format Patterns

The format patterns listed below are available for date, datetime, and time items. For datetime items, combine a date and time pattern, for example, “yy-MM-dd HH:mm”. If you use a time format pattern for a date, or vice versa, your import will fail with an error (D-012).

Format Pattern Example Description
dd MM yy 02 18 20 2-digit day, 2-digit month, and 2-digit year, using a whitespace ( ) delimiter
dd MM yyyy 02 18 2020 2-digit day, 2-digit month, and full year, using a whitespace ( ) delimiter
dd MMM yyyy 02 Feb 2020 2-digit day, Abbreviated month (text), and full year, using a whitespace ( ) delimiter
dd-MM-yy 18-02-20 2-digit day, 2-digit month, and 2-digit year, using a dash (-) delimiter
dd-MM-yyyy 18-02-2020 2-digit day, 2-digit month, and full year, using a dash (-) delimiter
dd-MMM-yyyy 18-Feb-2020 2-digit day, Abbreviated month (text), and full year, using a dash (-) delimiter
dd-MMM-yyyy HH:mm:ss 18-Feb-2020 12:10:50 2-digit day, Abbreviated month (text), and full year, using a dash (-) delimiter, with the time including seconds
dd/MMM/yy 18/Feb/20 2-digit day, Abbreviated month (text), and 2-digit year, using a forward slash (/) delimiter
dd-MMM-yy 18-Feb-20 2-digit day, Abbreviated month (text), and 2-digit year, using a dash (-) delimiter
ddMMMyy 18Feb20 2-digit day, Abbreviated month (text), and 2-digit year (no delimiter)
dd MMM yy 18 Feb 20 2-digit day, Abbreviated month (text), and 2-digit year, using a whitespace ( ) delimiter
dd.MM.yy 18.02.20 2-digit day, 2-digit month, and 2-digit year, using a period (.) delimiter
dd.MM.yyyy 18.02.2020 2-digit day, 2-digit month, and full year, using a period (.) delimiter
dd/MM/yy 18/02/20 2-digit day, 2-digit month, and 2-digit year, using a forward slash (/) delimiter
dd/MM/yyyy 18/02/2020 2-digit day, 2-digit month, and full year, using a forward slash (/) delimiter
ddMMMyyyy 18Feb2020 2-digit day, Abbreviated month (text), and full year (no delimiter)
ddMMyy 180220 2-digit day, 2-digit month, and 2-digit year (no delimiter)
ddMMyyyy 18022020 2-digit day, 2-digit month, and full year (no delimiter)
MM/dd/yy 02/18/20 2-digit month, 2-digit day, and 2-digit year, using a forward slash (/) delimiter
MM/dd/yyyy 02/18/2020 2-digit month, 2-digit day, and full year, using a forward slash (/) delimiter
MMddyy 021820 2-digit month, 2-digit day, and 2-digit year (no delimiter)
MMddyyyy 02182020 2-digit month, 2-digit day, and full year (no delimiter)
MMM dd yyyy Feb 18 2020 Abbreviated month (text), 2-digit day, and full year, using a whitespace ( ) delimiter
MMM/dd/yyyy Feb/18/2020 Full month (text), 2-digit day, and 4-digit year, using a forward slash (/) delimiter
MMMddyyyy Feb182020 Full month (text), 2-digit day, and 4-digit year (no delimiter)
yy-MM-dd 20-02-18 2-digit year, 2-digit month, and 2-digit day, using a dash (-) delimiter
yy/MM/dd 20/02/18 2-digit year, 2-digit month, and 2-digit day, using a forward slash (/) delimiter
yyyy MM dd 2020 02 18 Full year, 2-digit month, and 2-digit day, using a whitespace ( ) as a delimiter
yyyy-MM-dd 2020-02-18 Full year, 2-digit month, and 2-digit day, using a dash (-) as a delimiter
yyyy-MM-dd'T'HH:mm 2020-02-18T12:10 Full year, 2-digit month, and 2-digit day, using a dash (-) as a delimiter with the time
yyyy.dd.MM 2020.18.02 Full year, 2-digit day, and 2-digit month, using a period (.) as a delimiter
yyyy.MM.dd 2020.02.18 Full year, 2-digit month, and 2-digit day, using a period (.) as a delimiter
yyyy/MM/dd 2020/02/18 Full year, 2-digit month, and 2-digit day, using a forward slash (/) delimeter
yyyyMMdd 20200218 Full year, 2-digit month, 2-digit day (no delimiter)
yyyyMMdd'T'HH:mm 2020218T12:10 Full year, 2-digit month, 2-digit day (no delimiter) with the time
dd/MM/yyyy HH:mm 18/02/2020 18:30 2-digit day, 2-digit month, 4-digit year, using a slash (/) as a delimiter, and the 24-hour time
MM/dd/yyyy HH:mm 02/18/2020 18:30 2-digit month, 2-digit day, 4-digit year, using a slash (/) as a delimiter, and the 24-hour time
yyyy-MM-dd'T'HH:mm:ss+HH:mm 2020-02-18T18:30:22+00:00

4-digit year, 2-digit month, 2-digit day, using a dash (-) as a delimeter, and the 24-hour time with seconds, with the time offset from UTC (+HH:mm)

Note: The manifest file should surround the T with single quotes ('), but do not include those single quotes in the CSV file.

yyyy-MM-dd'T'HH:mm:ssZ 2020-02-18T18:30:22Z

4-digit year, 2-digit month, 2-digit day, using a dash (-) as a delimeter, and the 24-hour time with seconds in UTC time

Note: The manifest file should surround the T with single quotes ('), but do not include those single quotes in the CSV file.

yyyyMMdd'T'HH:mm:ssZ 20200218T18:30:22Z

4-digit year, 2-digit month, 2-digit day, and the 24-hour time with seconds in UTC time

Note: The manifest file should surround the T with single quotes ('), but do not include those single quotes in the CSV file.

yyy-MM-dd'T'HH:mm:ss 2020-2-18T12:10:41 Full year, 2-digit month, 2-digit day using a dash (-) as a delimiter with the time, including seconds

Note: The manifest file should surround the T with single quotes ('), but do not include those single quotes in the CSV file.

ddMMyyyy'T'HH:mm:ss 10122024T16:15:30 2-digit day, 2-digit month, 4-digit year, with no delimiter, and the 24-hour time

Note: The manifest file should surround the T with single quotes ('), but do not include those single quotes in the CSV file.

yyyyMMdd HH:mm:ss 20241210 16:15:30 4-digit year, 2-digit month, 2-digit day, with no delimiter, and the 24-hour time including seconds
MM/dd/yyyy HH:mm:ss 12/10/2024 16:15:30 2-digit month, 2-digit day, 4-digit year, with a slash (/) delimiter, and the 24-hour time including seconds
yyyy-MM-dd'T'HH:mm:ss 2024-12-10T16:15:30 4-digit year, 2-digit month, 2-digit day, with a dash (-) delimiter, and the 24-hour time

Note: The manifest file should surround the T with single quotes ('), but do not include those single quotes in the CSV file.

HH:mm 18:30 24-hour time
HH:mm:ss 18:30:15 24-hour time with seconds

Example Package: Single Form, Item Metadata

Here is an example import package from Verteo Pharma’s randomization vendor, containing the Randomization form with item metadata:

manifest.json:

{
  "study": "Cholecap",
  "source": "IRT",
  "data": [{
    "filename": "Randomization.csv",
    "study": "protocol_id",
    "site": "site_id",
    "subject": "patient",
    "event": "visit_name",
    "items": {
      "randomization_number": {
        "type": "integer",
        "length": "14"
      },
      "date_of_randomization": {
        "type": "date",
        "format": "yyyy-MM-dd"
      }
    }
  }]
}

Randomization.csv:

Randomization.csv for the manifest.json file above

Strict Import

The Strict Import parameter in the manifest file allows you to restrict data ingestion to only those items defined in the manifest, instead of all item columns in your data CSV files. You can apply this option to the entire package or to individual files. File-level settings override the package-level setting.

The "strict_import" parameter accepts true or false. Set this to true to enable strict import.

Optional: Restricting Data (Blinding)

You can restrict (blind) data at the Item, row, listing data file, and Source levels to hide it from users who don’t have permission to access restricted data. For example, your study may only order certain labs for subjects who are using the study drug. Knowing which labs a subject has would unblind the subject. You can restrict this information to prevent blinded users from identifying those subjects.

Restriction Level Manifest File Configuration
Item
"items": {
  "itemname": {
    "type": "text",
    "blinded": true
  }
}
Form Record *To restrict data for a specific form record that is being imported. Use the column that describes if that row is blinded or not.
{
  "study": "Deetoza",
  "source": "Labs"
  {
    ...
    "event": "visit",
    "row_blinded": "column_id"
  }
}
Listing Data File
{
  "study": "Deetoza",
  "source": "Labs"
  {
    "filename": "chempanel.csv",
    "blinded": true,
    ...
  }
}
Source Package
{
  "study": "Deetoza",
  "source": "Labs",
  "blinded": true
  ...
}

For users who have access to restricted data (typically lead data managers), restricted data behaves the same way as unrestricted data. For blinded users (users without the Restricted Data Access permission), the following behavioral rules apply to any imported restricted data:

  • If an Item (column) is restricted:
    • The CQL projection doesn’t return the restricted item’s column.
    • The CQL projection doesn’t return any derived columns that reference the restricted item.
    • If a blinded user references the restricted Item in their CQL statement, CQL still doesn’t return the column.
    • SHOW and DESCRIBE do not return the restricted Item.
  • If a row is restricted:
    • The result set doesn’t return any rows from the Form or Item Group.
  • If a listing file (csv) is restricted:
    • The default @HDR columns are included in the listing, but no Item columns are included.
  • If a Source (package) is restricted:
    • CQL doesn’t return any Items or column results from the restricted Source in any listing.
    • CDB marks all Item Definitions, Item Group Definitions, and Form Definitions within the Source as restricted.
    • All data rows are marked as restricted.
    • The default @HDR columns will still appear in the core listings.

ZIP Package

Once you’ve finished creating your manifest file and your CSVs, zip the files together. Don’t place these files in a folder before zipping them. You can name this ZIP folder whatever you’d like. However, we recommend that you name it with a unique identifier, for example, “Study-Name_Source_datetime.zip”. Don’t include any folders in the ZIP folder. All CSV files and the manifest file must be at the same level.

A ZIP file for the Deetoza study's laboratory import

Accessing your Vault’s FTP Server

Each vault in your domain has its own FTP staging server. The FTP server is a temporary storage area for files you’re uploading to or extracting from Vault.

Server URL

The URL for each staging server is the same as the corresponding vault, for example, veepharm.veevavault.com.

How to Access FTP Servers

You can access your staging server using your favorite FTP client or through the command line.

Use the following settings with an FTP client:

  • Protocol: FTP (File Transfer Protocol)
  • Encryption: Require explicit FTP over TLS (FTPS). This is a security requirement. Your network infrastructure must support FTPS traffic.
  • Port: This does not typically need to be added and will default to Port 21.
  • Host: {DNS}.veevavault.com. For example: “veepharm” is the DNS in veepharm.veevavault.com.
  • User: {DNS}.veevavault.com+{USERNAME}. This uses the same user name that you log in with. For example: veepharm.veevavault.com+tchung@veepharm.com.
  • Password: Your login password for this vault. This is the same password used for your standard login.
  • Login Type: Normal
  • Transfer File Type: Transfer files as binary

If you are experiencing issues uploading large files, increase the FTP client timeout setting to 180 seconds.

If you have remote verification enabled on a proxy or a firewall, FTP traffic from computers on your network to Veeva FTP servers might be refused. If possible, work with your IT department to disable remote verification. If it cannot be disabled, contact Veeva Support.

FTP Directory Structure

Inside your user directory, there is a “workbench” directory. This is where you will upload your 3rd party data. CDB is automatically aware of any files that you put here.

Directory for workbench

CDB moves any successfully imported files into “workbench/_processed”. See details below.

Importing the Data

To import your data, upload your ZIP file to the “workbench” directory of your FTP staging server, using an FTP client of your choice. Once you upload your ZIP to the FTP staging server, CDB imports it and transforms your data.

When import is finished, Workbench sends you, as well as any other users subscribed to the Source, an email notification. If the reprocessing of a package results in a change from the previous load, Workbench will also send a notification to you and users subscribed to that Source.

Successful Import

Upon successful import:

  • CDB creates all definition records and the relationships between them.
  • CDB automatically sets the Source field with the value provided in your manifest on all imported records to uniquely identify the data source. (Any Forms imported from Vault EDC automatically have this value set to “EDC”.)
  • CDB creates a Core Listing for each unique Form in your import package. See details below.
  • CDB moves your import ZIP file from “workbench” into “workbench/_processed/{study}/{source}” (for failed imports, your ZIP file is moved into the “workbench/_error”). CDB also appends the date and time of import to the filename.

You can now view your listing in the Workbench. In the current release, Workbench doesn’t show your new listings right away. First, navigate to your study’s Listings page, click to open another Core Listing, and then return to the Listings page. Then, your new data listings will display in the list. You can now click on the Name of one of your listings to open it.

Failed Import

Upon failed import:

  • CDB appends the date and time of the attempted import to the ZIP file’s filename.
  • CDB moves your import ZIP file from “workbench” into “workbench/_error”.
  • CDB creates an error log (“<import datetime>_<package name>_errors.csv”).

See this list of possible errors and how to resolve them.

Error Limit: Import logs only capture up to 10,000 errors and warnings and will stop recording after meeting this threshold.

Review & Approve Imports with Changes

To ensure that data being imported into CDB doesn’t change from the approved format and structure, CDB detects any changes to package configurations from one load to the next for each third party Source.

If the system detects changes in configuration at the package, file, or blinding level, or changes in the CSV file structure, CDB pauses the import process. The package isn’t processed and enters the Paused status until a user with approval rights either approves or rejects the package. If the user approves the package, CDB records the approval reason and imports the data package. Once a package is approved, CDB notifies subscribers to the approved change via email. CDB displays a change indicator on all associated Listings and Views. Users are able to dismiss this indicator and download the form change log for any listing or view. If the user rejects the package, CDB records the rejection reason, marks the package as Rejected, and assigns it the Not Imported status. CDB then reverts to the last successfully imported data package for the Source.

When a package is in the Paused status, all other uploaded packages for that same Source enter a queue and will wait for the paused package to be approved or rejected. Once the paused package is approved or rejected, CDB skips to the last queued package for processing. Any packages between the last paused package and the last package in the queue aren’t processed. For example, if “Package 1” is uploaded and paused, then packages 2-5 are uploaded while “Package 1” is still in the Paused state, once “Package 1” is approved or rejected, CDB will skip to “Package 5”, leaving packages 2-4 unprocessed.

For the first uploaded package in a new Source, CDB automatically applies the Paused status, and a user must approve or reject that first package before any data from that Source appears in the system.

For packages requiring approval, Workbench now includes a Package Detail panel with tabs for Differences and Associated Objects. The Differences tab displays the changes in the manifest between the current and previous packages. Users can review the previous and current value of the changes that they’re approving. The Associated Objects tab displays the Export Definitions, Listings, and Views that the changes in the package may potentially impact.

To approve or reject an import package requires the Approve Import permission, which is by default assigned to the standard CDMS Super User and CDMS Lead Data Manager study roles.

Approve an Import

To approve an import:

  1. Navigate to Import > Packages for your Study.
  2. Filter the list of import packages by Pending Approvals.
  3. From the Package menu (), select View Package Details.

  4. In the Package Details panel, click Differences to show the changes.
  5. Review the changes.
  6. Click Approve Package.
  7. Optional: Enter a Reason.
  8. Click Approve.

The package is approved and will now enter the queue for processing.

Reject an Import

To reject an import:

  1. Navigate to Import > Packages for your Study.
  2. Filter the list of import packages by Pending Approvals.
  3. From the Package menu (), select View Package Details.

  4. In the Package Details panel, click Differences to show the changes.
  5. Review the changes.
  6. Click Reject Package.
  7. Optional: Enter a Reason.
  8. Click Reject.
  9. In the Reject Package confirmation dialog, click Confirm.

Review Associated Objects

You can review the associated objects for an import package from the Associated Objects tab of the Package Details panel. This tab lists the Export Definitions, Listings, and Views that the changes in the package may potentially impact.

To access this tab, open the Package Details panel and then click to open the Associated Objects tab.

Workbench shows a change icon (orange circle) badge on the form pill for each changed Form.

For any files that are mapped in the “form” attribute of the manifest file, the Associated Objects tab won’t display any objects potentially impacted in that imported file.

Viewing Import Status

You can check the status of your import package from Import > Packages. This page lists the status of every import package, from both Vault EDC and 3rd party tools. You can also download import packages and issue logs (errors and warnings) from this page.

Complete Status: For an import package to move into the Complete import status, a Workbench user in your Study must open a listing. Otherwise, the import stays in the In Progress status. If your Study has the auto swap feature enabled, this isn’t required.

Users without the Restricted Data Access permission can download the import package log, but they can’t download the data file. Users with restricted data access can download a package with blinded data.

Any time you import a package into Workbench, Workbench automatically reprocesses the most recent packages for all other sources. For example, when your nightly Workbench Export EDC job runs, and then imports into Workbench, Workbench reprocesses your most recent Lab Data and Imaging packages as well. For earlier packages from the same Source, Workbench marks them as Replaced by the newer package.

You can easily filter the list to show only complete or failed imports using the Import Status Filters. Click Error to show only failed imports, or click Complete to show completed imports.

Workbench Import Statuses

When an import package is able to import with only warnings, Workbench highlights the status in orange to indicate that there are warnings. When import finishes, you can download the issue log to review the warnings.

Status Description
Queued The package is in the processing queue. There is a package ahead of this one in line that also contains changes, and this package is waiting for a paused package to be approved or rejected.
Paused CDB detected a change in the manifest, and so import is paused until you approve or reject the package.
Approved The change in the manifest was approved. CDB will now import the package.
Rejected The change in the manifest was rejected.
Skipped The package was skipped and not imported. Another package was imported for the Source before this package was processed. This status can only apply to third party packages.
In Progress The import process has begun for this package, and Workbench has not identified any errors or warnings.
In Progress (with warnings) The import process is ongoing, but Workbench has identified a warning.
Error Import failed because there are one or more errors in the import package. Download the issue log and review the errors.
Complete Workbench imported the package successfully, with no errors or warnings.
Complete (with warnings) Workbench imported the package successfully, but there are one or more warnings. Download the issue log and review the warnings.
Not Imported Workbench skipped this package because a newer package for the same source was uploaded before processing began. When a package enters the Not Imported status, Workbench also replaces the processing date with “Replaced”.
Reprocess in Progress Workbench has begun reprocessing this package because a new package from another source was imported.
Reprocessed Complete Workbench finished reprocessing this package with no errors or warnings.
Reprocessed Complete (with warnings) Workbench finished reprocessing this package, but there are one or more warnings. Download the issue log and review the warnings.
Reprocessed Error Reprocessing failed because there are one or more errors in the import package. Download the issue log and review the errors.

Download the Import Package

To download the import package:

  1. Navigate to Import > Packages.
  2. Locate your import package in the list.
  3. Click the Package link. Click to download the import package

  4. Extract the files from the ZIP folder and view them in a tool of your choosing.

Download the Logs

You can download the Import Log (CSV) for any import and the Issue Log (CSV) for an import that fails. The Import Log lists details about the import job and the data ingestion into Workbench.

The import log lists the following:

  • Transformation Start Time
  • Transformation Completion Time
  • Transformation Duration
  • Import Start Time
  • Import Completion Time
  • Import Duration

To download the import log:

  1. Navigate to Import > Packages.
  2. Locate your import package in the list.
  3. From the Package () menu, select View Package Details.
  4. In the Package Details panel, click Issues.
  5. Optioanl: In the Issues tab, click Download () to download a CSV of the issue log.

Issue Log

The Issue Log lists all errors and warnings that Workbench encountered while importing the package.See a list of possible errors and warnings here.

To view the issue log:

  1. Navigate to Import > Packages.
  2. Locate your import package in the list.
  3. From the Package () menu, select View Package Details.
  4. In the Package Details panel, click Issues.
  5. Optional: In the Issue Log panel, click Download () to download a CSV of the log.

To download the issue log without first viewing it in the application:

  1. Navigate to Import > Packages.
  2. Locate your import package in the list.
  3. From the Package () menu, select Download Issue Log.

Viewing Imported Data

Upon upload, Workbench creates a Form for each unique file in the import package. Workbench automatically generates a Core Listing for each unique Form in a Study, whether that form comes from Vault EDC or is imported from a 3rd party system.

The default CQL query for these Core Listings is:

SELECT @HDR, * from source.filename

In the laboratory import example above, CDB creates two Core Listings: Chemistry and Hematology (one for each CSV file), using these queries:

Chemistry
SELECT @HDR, * from labs.Chemistry
Hematology
SELECT @HDR, * from labs.Hematology

Definitions

CDB creates a Form Definition for each CSV file to define your imported data as “forms” within the CDB Workbench application. These records use the CSV filename (without the extension) as the Name, for example, “hematology”. CDB also creates an Item Group Definition to group your data items together within the form. CDB names these by prepending “ig_” to the CSV filename (without the extension), for example, “ig_hematology”.

Both of these definitions display as a column within your listing (form_name and ig_name). You can edit the listing’s CQL query to hide these columns.