Skip to content

Excel (xlsx) dataset

Excel workbooks contain tabular data tables across named worksheets.

Extract and infer a partially completed data dictionary from excel worksheets. The command line tool will automatically generate one data dictionary per worksheet. Working in python provides a little more flexibility as you can specify specific worksheets and if you want to combine sheets into one data dictionary (rather than one dd per sheet).

Run the vlmd command

To output multiple sheets as separate data dictionaries

from healdata_utils import convert_to_vlmd

convert_to_vlmd(input_filepath="myexcelfile.xlsx",inputtype="excel-data")

To extract multiple sheets as one data dictionary

Note

The parameter multiple_data_dicts is to specify whether multiple data dictionaries should be inferred (one per sheet). The default value is True. Be careful about using the multiple_data_dicts=False. In most instances, one sheet should correspond to one separate data table and thus have one corresponding data dictionary.

Note, this combines (ie concatenates all data tables) and then infers fields. This use case is when sheets are viewed as "chunks" of one resource/dataset.

from healdata_utils import convert_to_vlmd

convert_to_vlmd(
    input_filepath="myexcelfile.xlsx",
    inputtype="excel-data",
    multiple_data_dicts=False
    )

To extract a subset of sheets as one data dictionary

from healdata_utils import convert_to_vlmd

convert_to_vlmd(
    input_filepath="myexcelfile.xlsx",
    inputtype="excel-data",
    multiple_data_dicts=False,
    sheet_name=["mysheet1","mysheet2"]
    )
vlmd extract --inputtype excel-data myexcelfile.xlsx