HEAL Platform workspaces are secure data analysis environments in the cloud that can access data from one or more data resources. Workspaces include Jupyter notebooks, Python, and RStudio by default but can be configured to host virtually any application, including analysis workflows, data processing pipelines, or data visualization apps. In order to launch a workspace, users will need to request a Gen3 workspace account at "https://healportal.org/". There are two methods to support workspaces: grant-funded accounts paid for by your institution (STRIDES grant) or a STRIDES credit account supported by the NIH initiative. Note, it may take several days for your account to be approved.
New to Jupyter? Learn more about the popular tool for data scientists on Jupyter.org (disclaimer: CTDS is not responsible for the content).
Guideline to get started¶
Workspace access requires authorization. Please contact HEAL Support for more information.
- After navigating to https://healdata.org/portal/workspace, users will discover a list of pre-configured virtual machine (VM) images, as shown below.
Available workspaces on the HEAL Platform (top). Users may need to link their accounts from other repositories (bottom); click here to see how.
- (Generic) Jupyter Notebook with R kernel: Choose this VM if you are familiar with setting up Python- or R-based Notebooks, or if you just exported one or multiple studies from the Discovery Page and want to start your custom analysis.
- (Generic, User-licensed) Stata Notebook: Choose this VM if you are familiar with Stata-based data analysis. This notebook requires a Stata license.
- Tutorial Notebooks: Explore our Jupyter Notebook tutorials written in Python or RStudio, which pull data from various sources of the HEAL Data Ecosystem to leverage statistical programs and data analysis tools.
Feel free to edit and experiment with this collection of notebooks. They are your personal copies!
- Notebooks in Python: (1) BACPAC Synthetic Data Analysis, (2) JCOIN Tracking Opioid Stigma, (3) Opioid Overdose Trajectories, (4) Opioid Prevalence And Overdoses
Notebooks in RStudio: (1) Opioid Environment Toolkit and OEPS
Click “Launch” on any of the above workspace flavors to spin up a copy of that VM. Note: Launching the VM may take several minutes.
The status of launching the workspace is displayed after clicking on “Launch”.
- After launching, the home folders are displayed, one of which is the user's persistent drive ("pd").
- Select the /pd folder. Only files saved in the /pd directory will remain available after termination of a workspace session.
- Attention: Any personal files in the folder “data” will be lost. Personal files in the directory /pd will persist.
- Do not save files in the "data" and “data/healdata.org” folders.
The folder “healdata.org” in the “data” folder will host the data files you have exported from the Discovery Page.
Start a new notebook by clicking “New” in the top right corner and choose between Python 3 or R Studio as the base programmatic language.
- Experiment away! Code blocks are entered in cells, which can be executed individually or all at once. Code documentation and comments can also be entered in cells, and the cell type can be set to support Markdown.
Results, including plots, tables, and graphics, can be generated in the workspace and downloaded as files.
Users can import data files directly into the Notebook code after selecting files from the "Discovery Page". An example is shown below.
- Do not forget to terminate your workspace once your work is finished to be mindful of the cost-intensive computational effort. Note, that Workspaces automatically shut down after 90 minutes of idle time.
Further reading: read more about how to download data files into the Workspaces here.
Upload, save, and download Files/Notebooks¶
Users can upload data files or Notebooks from the local machine to the home directory by clicking on “Upload” and access them in the Notebook (see below).
Then run in the cells, for example:
import pandas as pd
demo_df = pd.read_csv('/this_is_a_demo.txt', sep='\t')
Users can save the notebook by clicking "File" - "Save as", as shown below.
Users can download notebooks by clicking "File" - "Download as", as shown below.
Environments, Languages, and Tools¶
The following environments are available in the workspaces:
The following programmatic languages are available in Jupyter Notebooks:
The following tools are available in Jupyter Notebooks:
- GitHub (read GitHub documentation)
Python 3 and RStudio in Jupyter¶
Both Python 3 and RStudio are available in Jupyter Notebooks.
Users can expect to be able to use typical Python or RStudio packages, such as PyPI or CRAN. For Python and RStudio, users can start a new notebook under "New", as shown below.
Stata in Jupyter¶
Stata is available as language in Jupyter notebooks (either in Python or R kernels), but requires a license and a specific workspace.
Users need to first choose the following workspace "(Generic, User-licensed) Stata Notebook" in order to be able to use Stata:
Users need to upload a license
stata.lic to the /pd folder by selecting "Upload" (top right).
Note, that uploading the license can also be achieved programmatically by opening a new terminal window under "New" - "Terminal", finding the directory /pd by typing:
cd pd . Then, create a file using vim:
vim stata.lic . This will open the file in the terminal. Users can copy the license, then hit
Then, users need to start a new notebook under "New" (choose either R or Python). Run the following code in the first cell:
import stata_setup stata_setup.config("/usr/local/stata17", "mp")
This will return the following:
Users can then begin using the notebook by typing in known Stata commands, for example
%% stata . describe .
- If the kernel died, make sure to be logged in on 1) the Login Page 2) have enabled access to the FAIR enabled repository.
Automatic Workspace Shutdown¶
Workspaces automatically shut down after 90 minutes of idle time and a pop-up window will remind users before the workspace shuts down.
After the workspace has been shut down, users will be notified with the following pop-up window.