Importing 3rd Party Data into Data Workbench


Vault allows you to import data from 3rd party systems for cleaning and reporting in the Data Workbench application. This is intended for importing subject data that exists outside of the subject’s Casebook in Vault EDC, for example, data about a Subject from an IRT system.

Prerequisites

Before you can import data into Vault, your organization must perform the following tasks:


The following permissions are required to perform the actions described below.

Users with the Vault Owner security profile are able to perform the actions described below by default. If your vault uses custom Security Profiles, your profile must grant the following permissions:

Type Permissions Controls
Security Profile Tabs: Workbench Ability to access the Data Workbench application (via the Workbench tab)
Security Profile Objects: Study: Read Ability to view records in the Study object, which is required for creating records (via import) in Data Workbench
Security Profile Application: FTP Staging: Access Ability to access your vault’s FTP staging server
Security Profile Application: API: Access API Ability to make an API call, which is required to create and update records via FTP

Preparing Data for Import

You can import your data as a series of CSV files to your Data Workbench vault’s FTP server. To import your data, you’ll first need to create the CSV files, which list your clinical data, and a manifest file (.json), describing your data and how Vault should process it. Then, you’ll create a ZIP package (.zip) containing all of those files for import into Vault.

Data CSVs

Create a CSV for each data collection Form in your study that is independent of EDC. For each CSV, you must provide four (4) required columns, and then a column for each execution data item. For the four required columns, you can name them whatever you choose, as long as you define those names in your manifest file for each CSV. Note that these column names are case-sensitive and must be an exact match to the values provided in your manifest file. Aside from the study, site, subject, event, and sequence columns, Vault considers every column to be a data item and will import it as such.

Column Description Manifest File Key
Study In this column, provide the Name of the Study (from the Study record in Vault EDC). study
Site In this column, provide the Name (site number) of the Site (from the Site record in Vault EDC). site
Subject In this column, provide the Subject ID for the subject (from the Subject record’s Name field in Vault EDC). subject
Event In this column, provide the Event (visit) associated with the data row (from the Name of the Event Definition record in Vault EDC). event
Sequence
*Optional
The Sequence Number of the Form, if the form is repeating. A repeating Form represents the collection of the same data more than once during a single Event for a single Subject. Then, the Sequence Number uniquely identifies the data row. If there is more than one row (form) for a Subject and Event, Workbench assumes that the form is repeating and uses this column to set the Sequence Number. This column is optional, so you may leave it blank or not provide it if none of your forms are repeating. In that case, Workbench sets the Sequence Number to “1” for each row by default. sequence

Here is an example CSV file:

STUDY_ID SITE SUBJECT_ID VISIT INITIALS DOB GENDER RACE
S.Deetoza 101 101-1001 Screening CDA 03-27-1991 F Hispanic

In this example, we have the following column mappings:

  • Study: STUDY_ID
  • Site: SITE
  • Subject: SUBJECT_ID
  • Event: VISIT

The INITIALS, DOB, GENDER, and RACE columns are all data items.

In the current release, there is no way to specify data type. Workbench imports all 3rd party as the “text” data type.

Manifest File

Your manifest file provides the Study and Source for Vault, lists the files in your ZIP, and for each file, lists which columns map to the required data points.

Source is a custom defined value that allows you to identify the contents of the package. This value is stored in the Source field in Workbench, so you can use it to identify data from this package in Workbench via CQL. The Source value must be unique within your Study.

For your Study, provide the Name of your Study (study__v) record in Vault EDC. For Source, specify the source of the data. Vault applies this source to all imported data. Then, as an array, list each CSV file, including its filename (with the extension) and its column mappings, as the value for “data”.

You can choose to include configuration metadata about each data item in your import file. See details below.

Save this file as “manifest.json”.

Note that Workbench will only accept manifest files with this exact filename (case-sensitive) and extension.

Example Manifest: Single Form

Here is an example manifest file for the Deetoza study that imports a package from an eCOA containing the Survey form:

{
  "study": "Deetoza",
  "source": "eCOA",
  "data": [
    {
      "filename": "Survey.csv",
      "study": "protocol_id",
      "site": "site_id",
      "subject": "patient",
      "event": "visit_name"
    }  
  ]   
}

Example Manifest: Multiple Forms

Here is an example manifest file for the Deetoza study that imports a package from a laboratory vendor, containing the Chemistry and Hematology forms:

{
  "study": "Deetoza",
  "source": "lab",
  "data": [
    {
      "filename": "Chemistry.csv",
      "study": "STUDY_ID",
      "site": "SITE_ID",
      "subject": "SUBJECT_ID",
      "event": "VISIT",
      "sequence": "LAB_SEQ"
    },
    {
      "filename": "Hematology.csv",
      "study": "STUDY_ID",
      "site": "SITE_ID",
      "subject": "SUBJECT_ID",
      "event": "VISIT",
      "sequence": "LAB_SEQ"
    }
  ]
}

Optional: Item Configuration

Within your manifest file, you can include Item metadata to inform how Data Workbench handles data items. You can specify a data type for each Item, as well as additional properties depending on your data type selection.

When item configuration is omitted, Data Workbench treats all items as text.

Simple vs. Advanced Configuration

The item “config” object has two configuration formats, simple and advanced. For simple configuration, you specify only the item’s data type. The item’s properties use the default values. For advanced configuration, you specify the data type and properties.

Simple:
"items" {
    "Weight": "float"
}
Advanced:
"items" {
    "Weight": {
        "type": "float",
        "precision": 2
    }
}

Available Properties by Data Type

There are different configuration properties available for each data type.

Data Type Example Configuration Property Property Description
Text
"items": {
    "Subject_Initials": {
        "type": "text",
        "length": 3
    }
}
Length The number of characters allowed. The default length is 1500.
Integer
"items": {
   "Actual_Dosage": {
      "type": "integer",
      "min": 1,
      "max": 1000
   }
}
Min The minimum (lowest) allowable value. The default is -4,294,967,295.
Max The maximum (highest) allowable value. The default is 4,294,967,295.
Float
"items": {
    "Weight": {
        "type": "float",
        "length": 4,
        "precision": 1,
        "min": 80,
        "max": 400
    }
}
Length

The maximum number of digits allowed, both to the left and to the right of the decimal point.

If the maximum value has more digits than this property, Workbench uses that number instead.

Precision

The number of decimal places allowed. The default is 5.

If the maximum value has more decimal places than this property, Workbench uses that number instead.

Min The minimum (lowest) allowable value. The default is -4,294,967,295.
Max The maximum (highest) allowable value. The default is 4,294,967,295.
Date
"items": {
    "Start_Date": {
        "type": "date",
        "format": "yyyy-MM-dd"
    }
}
Format

The format pattern with which to parse date values. See a list of supported date formats below.

The default is "yyyy-MM-dd".

DateTime
"items": {
    "End_DateTime": {
        "type": "date",
        "format": "MM-dd-yyyy HH:mm"
    }
}
Format

The format pattern with which to parse datetime values. See a list of supported date and time formats below.

The default is "yyyy-MM-dd HH:mm".

Time
"items": {
    "Time_Collected": {
        "type": "time",
        "format": "HH:mm:ss"
    }
}
Format

The format pattern with which to parse time values. See a list of supported time formats below.

The default is "HH:mm".

Boolean
"items": {
    "Exam_Performed": "boolean"
}
N/A The boolean data type doesn't have any configuration properties.

Supported Date & Time Format Patterns

The format patterns listed below are available for date, datetime, and time items. For datetime items, combine a date and time pattern, for example, “yy-MM-dd HH:mm”.

Format Pattern Example Description
dd MM yy 02 18 20 2-digit day, 2-digit month, and 2-digit year, using a whitespace ( ) delimiter
dd MM yyyy 02 18 2020 2-digit day, 2-digit month, and full year, using a whitespace ( ) delimiter
dd MMM yyyy 02 Feb 2020 2-digit day, Abbreviated month (text), and full year, using a whitespace ( ) delimiter
dd-MM-yy 18-02-20 2-digit day, 2-digit month, and 2-digit year, using a dash (-) delimiter
dd-MM-yyyy 18-02-2020 2-digit day, 2-digit month, and full year, using a dash (-) delimiter
dd-MMM-yyyy 18-Feb-2020 2-digit day, Abbreviated month (text), and full year, using a dash (-) delimiter
dd.MM.yy 18.02.20 2-digit day, 2-digit month, and 2-digit year, using a period (.) delimiter
dd.MM.yyyy 18.02.2020 2-digit day, 2-digit month, and full year, using a period (.) delimiter
dd/MM/yy 18/02/20 2-digit day, 2-digit month, and 2-digit year, using a forward slash (/) delimiter
dd/MM/yyyy 18/02/2020 2-digit day, 2-digit month, and full year, using a forward slash (/) delimiter
ddMMMyyyy 18Feb2020 2-digit day, Abbreviated month (text), and full year (no delimiter)
ddMMyy 180220 2-digit day, 2-digit month, and 2-digit year (no delimiter)
ddMMyyyy 18022020 2-digit day, 2-digit month, and full year (no delimiter)
MM/dd/yy 02/18/20 2-digit month, 2-digit day, and 2-digit year, using a forward slash (/) delimiter
MM/dd/yyyy 02/18/2020 2-digit month, 2-digit day, and full year, using a forward slash (/) delimiter
MMddyy 021820 2-digit month, 2-digit day, and 2-digit year (no delimiter)
MMddyyyy 02182020 2-digit month, 2-digit day, and full year (no delimiter)
MMM dd yyyy Feb 18 2020 Abbreviated month (text), 2-digit day, and full year, using a whitespace ( ) delimiter
MMM/dd/yyyy Feb/18/2020 Full month (text), 2-digit day, and 4-digit year, using a forward slash (/) delimiter
MMMddyyyy Feb182020 Full month (text), 2-digit day, and 4-digit year (no delimiter)
yy-MM-dd 20-02-18 2-digit year, 2-digit month, and 2-digit day, using a dash (-) delimiter
yy/MM/dd 20/02/18 2-digit year, 2-digit month, and 2-digit day, using a forward slash (/) delimiter
yyyy MM dd 2020 02 18 Full year, 2-digit month, and 2-digit day, using a whitespace ( ) as a delimiter
yyyy-MM-dd 2020-02-18 Full year, 2-digit month, and 2-digit day, using a forward slash (/) as a delimiter
yyyy.dd.MM 2020.18.02 Full year, 2-digit day, and 2-digit month, using a period (.) as a delimiter
yyyy.MM.dd 2020.02.18 Full year, 2-digit month, and 2-digit day, using a period (.) as a delimiter
yyyy/MM/dd 2020/02/18 Full year, 2-digit month, and 2-digit day, using a forward slash (/) delimeter
yyyyMMdd 20200218 Full year, 2-digit month, 2-digit day (no delimiter)
HH:mm 18:30 24-hour time
HH:mm:ss 18:30:15 24-hour time with seconds

Example Package: Single Form, Item Metadata

Here is an example import package from Verteo Pharma’s randomization vendor, containing the Randomization form with item metadata:

manifest.json:

{
  "study": "Cholecap",
  "source": "IRT",
  "data": [{
    "filename": "Randomization.csv",
    "study": "protocol_id",
    "site": "site_id",
    "subject": "patient",
    "event": "visit_name",
    "items": {
      "randomization_number": {
        "type": "integer",
        "length": "14"
      },
      "date_of_randomization": {
        "type": "date",
        "format": "yyyy-MM-dd"
      }
    }
  }]
}

Randomization.csv:

Randomization.csv for the manifest.json file above
Randomization.csv for the manifest.json file above

ZIP Package

Once you’ve finished creating your manifest file and your CSVs, create a ZIP folder of all of those files. You can name this folder whatever you’d like. However, we recommend that you name it with a unique identifier, for example, “Study-Name_Source_datetime.zip”.

A ZIP file for the Deetoza study's laboratory import
A ZIP file for the Deetoza study's laboratory import

Accessing your Vault’s FTP Server

Each vault in your domain has its own FTP staging server. The FTP server is a temporary storage area for files you’re uploading to or extracting from Vault.

Server URL

The URL for each staging server is the same as the corresponding vault, for example, veepharm.veevavault.com.

How to Access FTP Servers

You can access your staging server using your favorite FTP client or through the command line.

Use the following settings with an FTP client:

  • Protocol: FTP (File Transfer Protocol)
  • Encryption: Require explicit FTP over TLS (FTPS). This is a security requirement. Your network infrastructure must support FTPS traffic.
  • Port: This does not typically need to be added and will default to Port 21.
  • Host: {DNS}.veevavault.com. For example: “veepharm” is the DNS in veepharm.veevavault.com.
  • User: {DNS}.veevavault.com+{USERNAME}. This uses the same user name that you log in with. For example: veepharm.veevavault.com+tchung@veepharm.com.
  • Password: Your login password for this vault. This is the same password used for your standard login.
  • Login Type: Normal
  • Transfer File Type: Transfer files as binary
If you are experiencing issues uploading large files, increase the FTP client timeout setting to 180 seconds.

If you have remote verification enabled on a proxy or a firewall, FTP traffic from computers on your network to Veeva FTP servers might be refused. If possible, work with your IT department to disable remote verification. If it cannot be disabled, contact Veeva Support.

FTP Directory Structure

Inside of your user directory, there is a “workbench” directory. This is where you will upload your 3rd party data. Vault is automatically aware of any files that you put here.

Directory for workbench

Vault moves any successfully imported files into “workbench/_processed”. See details below.

Importing the Data

To import your data, upload your ZIP file to the “workbench” directory of your FTP staging server, using an FTP client of your choice. Once you upload your ZIP to the FTP staging server, Vault imports it and transforms your data.

Successful Import

Upon successful import:

  • Vault creates all definition records and the relationships between them.
  • Vault automatically sets the Source field with the value provided in your manifest on all imported records to uniquely identify the data source. (Any Forms imported from Vault EDC automatically have this value set to “EDC”.)
  • Vault creates a Core Listing for each unique Form in your import package. See details below.
  • Vault moves your import ZIP file from “workbench” into “workbench/_processed/{study}/{source}” (for failed imports, your ZIP file stays in the root of the “workbench” directory. Vault also appends the date and time of import to the filename.

You can now view your listing in the Data Workbench application. In the current release, Workbench doesn’t show your new listings right away. First, navigate to your study’s Listings page, click to open another Core Listing, and then return to the Listings page. Then, your new data listings will display in the list. You can now click on the Name of one of your listings to open it.

Failed Import

Upon failed import:

  • Vault appends the date and time of the attempted import to the ZIP file’s filename, but it keeps your ZIP file in the root of the “workbench” directory.
  • Vault creates an error log (“<import datetime>_<package name>_errors.csv”).

See this list of possible errors and how to resolve them.

Viewing Imported Data

Upon upload, Workbench creates a Form for each unique file in the import package. Workbench automatically generates a Core Listing for each unique Form in a Study, whether that form comes from Vault EDC or is imported from a 3rd party system.

The default CQL query for these Core Listings is:

SELECT @HDR, * from source.filename

In the laboratory import example above, Vault creates two Core Listings: Chemistry and Hematology (one for each CSV file), using these queries:

Chemistry
SELECT @HDR, * from labs.Chemistry
Hematology
SELECT @HDR, * from labs.Hematology

Definitions

Vault creates a Form Definition for each CSV file to define your imported data as “forms” within the Data Workbench application. These records use the CSV filename (without the extension) as the Name, for example, “hematology”. Vault also creates an Item Group Definition to group your data items together within the form. Vault names these by prepending “ig_” to the CSV filename (without the extension), for example, “ig_hematology”.

Both of these definitions display as a column within your listing (form_name and ig_name). You can edit the listing’s CQL query to hide these columns.