Skip to content

Variable-level Metadata (Data Dictionaries)

Motivation

Variable level metadata (VLMD), in the form of standardized data dictionaries, provides an exciting opportunity:

  • a way to search, understand, and compare datasets before (potentially sensitive) data is shared.

For an example of this searchability in the context of study level metadata, see the platform's discovery page

  • When data is available, VLMD provides a way to validate the data as well.

  • Supports HEAL projects and goals such as the common data elements program

Functions

extract: Extract the variable level metadata from an existing file with a specific type/format

start: Start a data dictionary from an empty template

validate: Check (validate) an existing HEAL data dictionary file to see if it follows the HEAL specifications after filling out a template or further annotation after extracting from a different format.

Typical workflows for creating a HEAL-compliant data dictionary include:

  1. Create your data dictionary

    (a) Run the vlmd extract command (or convert_to_vlmd if in python) to generate a HEAL-compliant data dictionary via your desired input format

    (b) Run the vlmd template command to start from an empty template.

  2. Add/annotate with additional information in your preferred HEAL data dictionary format (either json or csv).

  3. Run the vlmd validate command with your HEAL data dictioanry as the input to validate.

  4. Repeat (2) and (3) until you are ready to submit. Please note, currently only name and description are required.

Definitions

Important

The main difference* between the CSV and JSON definitions lies in the way the data dictionaries are structured and the additional metadata included in the JSON data dictionary.

The CSV data dictionary is a plain tabular representation with no additional metadata, while the JSON dataset includes fields along with additional metadata in the form of a root description and title.

  • for field-specific differences, see the schemas in the documentation.

For more information on variable-level metadata properties (fields), see the csv field specification and json data dictionary specification.