Variable-level Metadata (Data Dictionaries)¶
Motivation¶
Variable level metadata (VLMD), in the form of standardized data dictionaries, provides an exciting opportunity:
- a way to search, understand, and compare datasets before (potentially sensitive) data is shared.
For an example of this searchability in the context of study level metadata, see the platform's discovery page
-
When data is available, VLMD provides a way to validate the data as well.
-
Supports HEAL projects and goals such as the common data elements program
Functions¶
extract
: Extract the variable level metadata from an existing file with a specific
type/format
start
: Start a data dictionary from an empty template
validate
: Check (validate) an existing HEAL data dictionary file to see if it follows the HEAL specifications after filling out a template or further annotation after extracting from a different format.
Typical workflows for creating a HEAL-compliant data dictionary include:
-
Create your data dictionary
(a) Run the
vlmd extract
command (orconvert_to_vlmd
if in python) to generate a HEAL-compliant data dictionary via your desired input format(b) Run the
vlmd template
command to start from an empty template. -
Add/annotate with additional information in your preferred HEAL data dictionary format (either
json
orcsv
).- To further annotate and use the data dictionary, see the variable-level metadata field property information below:
-
Run the
vlmd validate
command with your HEAL data dictioanry as the input to validate. -
Repeat (2) and (3) until you are ready to submit. Please note, currently only
name
anddescription
are required.
Definitions¶
Important
The main difference* between the CSV and JSON definitions lies in the way the data dictionaries are structured and the additional metadata included in the JSON data dictionary.
The CSV data dictionary is a plain tabular representation with no additional metadata, while the JSON dataset includes fields along with additional metadata in the form of a root description and title.
- for field-specific differences, see the schemas in the documentation.
For more information on variable-level metadata properties (fields), see the csv
field specification and json
data dictionary specification.