Customizing parameters and adding widgets.
An analytical workflow in the Biodepot-workflow-builder (Bwb) consists of a sequence of computational steps. Each step (or module) is containerized and represented by a graphical widget. These widgets include input parameters for each analytical step, and can be connected to create a workflow. Users can interact with widgets in the following ways:
- Clicking on a widget brings up the widget UI window with tabs for parameter entry, an output console and a tool bar with options to control the execution of the widget.
- Right-clicking on the widget and choosing the edit widget item brings up the widget definition window.
In this episode, you will learn how to interact with, customize parameters, and add a new widget in the Bwb. We will revisit the Salmon RNA-seq workflow from the basic training, and demonstrate how to:
- Understand the structure of a widget (inputs, outputs, parameters, widget’s execution command, and Docker image setup)
- Change the widget’s run mode, and use the test mode in which the Docker commands are not executed but are generated and recorded in console windows.
- Add a Gnumeric widget to visualize and inspect the counts file.
- Disconnect widgets in the Salmon workflow, and run each module individually.
- Customize a widget by adding and editing parameters (as practice, adding an extra flag in the alignment step using the command parameter).
- Run Jupyter Notebook widgets
Please refer to the full manual for more detailed instructions on how to interact with widgets in the Bwb https://github.com/BioDepot/BioDepot-workflow-builder#interacting-with-widgets
The structure of a Bwb widget using the Salmon RNA-seq workflow from the basic training for illustration of concepts.
ENTRIES OVERVIEW:
As seen previously, widgets have entries for users to select settings, files, folders, and other types of variables as inputs when executing the widget's command. If we open up the “Salmon Quant” widget, we see there are required entries in the first tab, with all the inputs that need to be filled in before running the widget. Then there are the optional entries, which are settings and variables that are selected and filled in if needed but not required, hence being optional.
There are different types of entries shown in this widget. The top two entries in the required entries ask for directories, the first one wants one directory while the second one can have multiple directories listed, and the last entry here is a string input. In the optional entries, we see the top 4 entries ask for files, with the first entry wanting one file and the other three can list multiple files. The last entry asks for an integer value, which you can type or change the value with the arrows on the right side of the entry.
These entries have been set up for this widget so that when we run “Salmon Quant”, these inputs are read into the command that executes the tool. Most of these entries are currently grayed out because the values for these entries will come from the previous widgets after they are done executing.
Parameters overview
To see what some of the inputs are without executing the workflow, we can go to the Start widget and see what is listed at the start.
Currently we have all the entries filled out to specify the directories and samples used, no optional entries set up. When executing the Start Widget, these entries will go to the downstream widgets as inputs for their respective entries. This is possible by clicking on the connections, or "signals" between the Start widget and the downstream widget, for example the top download widget wants the Start widget's work directory and download links.
On the left are outputs from the Start widget, and the right is the inputs, with links to where the outputs from the Start widget will end up going as inputs for the download widget entries. The names of the entries that are set up to also be outputs in the Start widget are seen here, which I will explain more how these names are set up next.
Let's right click on the Start widget and select "Edit widget". Click on the "Outputs" tab, and you will see the output names seen when clicking on the signals between the start and download widget.
You can do a quick check to see the "Inputs" tab when editing the download widget, which the names match what we previously saw in the inputs section when connecting links.
The most important section about these inputs, outputs, and entries are seen in the "Parameters" tab. In the Start widget, all the entries seen are set up in this section. Let's click on the Start widget again to see a side-by-side of the entries and their corresponding parameters, clicking at the bottom of the screen to open back up the widget definition window.
Parameters are listed here by their name first, then their settings in the curly brackets. The settings available are specifying the type of parameter, any flags for that parameter, whether the parameter is an argument or not (meaning if the parameter is passed at the very end of the command when executing the widget), any environment variable for that parameter, how the parameter is labeled as an entry in the widget, any default values for the parameter, any group associated with the parameter amongst other parameters, and whether the parameter should be an optional entry or required entry.
If we click on the top parameter, we see the values filled out for the settings for the work_dir parameter.
This is a directory parameter labeled "Work directory", which we see is the entry we see at the top of the Start widget. It has a flag and environment variable which are used in the script when executing the Start widget. Since it is a required entry, we leave the "Optional" section unchecked.
Going down the list of parameters, most of these parameters have corresponding entries in the Start widget. There are some parameters that do not appear in the entries (mate_1, mate_2, unpaired, and output). They are set to “Optional”, and they have no “Label” set. These parameters intentionally hidden in the entries and are internally used in the Start widget’s Python script that is executed when the widget runs. These parameters are used as outputs to be sent to the downstream widgets, which you can find these parameters in the Outputs tab.
Inputs, outputs, and triggers
To register these parameters as Inputs and/or Outputs, you have to copy the assigned name of the parameter and add it to the Inputs/Outputs tabs. If the names do not match, then the inputs and outputs will not register when either receiving or sending the value for that variable correctly.
We can look at the Edit mode of the Download widget to see the parameters here briefly, then see the Inputs tab for the input options with their assigned parameter names listed.
For this widget, “directory” and “URL” parameters are seen in the inputs. The name match in both tabs in order for the parameters to be treated as inputs to take in values from other widgets and be placed directly into their respective entries. If the names do not match, then no information is taken into the download widget entries for these input options.
You might have noticed the "trigger" input, which is an additional input you can add that isn't assigned to any parameter, but it is an option if you want an extra option when connecting widgets to signal that the upstream widget has stopped execution. This could be useful if you want to have a widget executed once multiple widgets are all done executing, despite not all of the widgets sending values to the receiving widget.
There are limitations to what gets linked as inputs to a widget. Only one link can be made to a widget's input, while you can link an output to multiple inputs if necessary. An example would be in the download widget is the widget can only intake URLs from a single widget, making that URL link inaccessible to other parameter and widgets. Then on the flip side, the Start widget can output the work_dir parameter value to multiple widgets.
Widget Command
In the Commands tab, you can see the commands that are executed in a widget. Staying on the current download widget, its current command reads /root/download
.sh. Based on how the download widget is currently constructed via Docker, the widget runs this shell script found in the Docker container for this download widget once the widget runs.
Parameters associated with the widget are also included in the command once the widget executes. How the parameters are added into the command depends on whether the widget is set as a flag, is set as an argument, or is added directly to the widget command.
Going back to the Parameters tab, there are the options to set the parameter as a flag. An example of a parameter with flag enabled is “directory”. In the flag option, it has “--directory”. Flags are placed after the widget command and includes the flag itself and the value for that parameter.
An example of a parameter set as an Argument is “URL”. Arguments are placed at the very end of the command, after the widget command and flags.
In general, the order of the command is the command seen in the Command tab first, then the flag parameters next, then the argument parameters last.
If we want to add a parameter directly to the command, you can use the notation _bwb{}
inside the widget command, with the parameter name inside the curly brackets. For example, the parameter “decompress” is enabled by default for this widget in the workflow, and it is set with the flag --decompress
. This would be typically be placed after the base widget command (after /root/download.sh
. However, what if we want to put it in the command directly? It can be done by going into the widget’s command and place “_bwb{decompress}” after the original command:
/root/download.sh _bwb{decompress}
The command that is executed for this widget would typically be this:
/root/download.sh --decompress --directory /data/salmon_demo_work
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/DRR016/DRR016125/DRR016125_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/DRR016/DRR016125/DRR016125_2.fastq.gz
In the download widget console during execution, it will actually show two commands executed to download the files individually, rather than having one command handle both files. This is because the widget has iteration enabled to allow the files to run concurrently. More about this feature will be explained later.
Widget Docker
Iterating for concurrent execution
As mentioned earlier, it is possible to set some entries to run concurrently. The top download widget takes in two files to download, and the execution of downloading these files are handled in their own commands. This is possible due to the settings in the Scheduler tab.
RunMode and test mode
RUNMODE:
Another thing is how these widgets are run based on the widget's selected RunMode option. Let's open up the top download widget, and look at the bottom-right corner of the widget window.
You see "RunMode", and it is currently set to Triggered. Manual means that the widget executes only if you click on the blue "Start" button on the bottom-left side of the window, while Automatic means the widget will execute automatically if all required entries listed have values in them. The Triggered option shows up if you have input options listed in the Inputs tab in the widget's edit mode. In this option, you can select the inputs you want to be received by other widgets (or if you need to have a value filled out for them initially), to run the widget.
In this case, the download widget waits until it receives the work_dir and URLs from the Start widget before running, which they are set to the download widget’s “directory” and “URL” parameters as inputs, respectively. Ideally, you should select the inputs that have links connected to them. Hypothetically, if we select Trigger without having an upstream widget connected to it, the download widget will never run, so let's leave that unchecked.
An interesting observation is that the Start widget has no “Triggered” option available as a RunMode.
This is because there are no inputs listed in the Inputs tab for this widget, so it does not have any input options for other widgets to trigger its execution to.
TEST MODE:
So we just covered how Inputs, Outputs, parameters and their respective entries are set up, let's test how these parameters send their values to other widgets, without executing any commands in the widgets using what we call "Test Mode". Let's go to the Start widget again and click on "Test Mode" at the bottom here, then click Start.
Like the message states, no results are generated from doing this, and we click "Yes", and see that the widgets will run very quickly. If we open up the Salmon Index widget's console tab, we see the Docker command that would be used if we were to execute the widget.
We see some of these connections, or "signals" that were once dotted are now solid, indicating the entries from the Start Widget to go directly to the other widgets, like the download widgets and Salmon Index widget.
The widgets downstream that have all triggered parameters filled in will automatically be set to "Test Mode" as well. For the Salmon Quant widget, there are three entries that are still grayed out with nothing filled out, and this is because the Test Mode does not execute the Start Widget's script that would output values for the missing sections. In this case, it would be the output quantification directories entry, and the two fastq input entries. Since it's missing the inputs, the widget is not "triggered" to run without them, so the outputs will not be sent to the last Gnumeric widget at the end just yet.
Add a Gnumeric widget to visualize and inspect the counts
The Gnumeric widget displays the output counts from the Salmon quantification, and it is located at the end of the Salmon RNA-seq demo workflow. The following video demonstrates how the widget was added to the workflow and how to set up the widget to display the counts.
The video contains:
- An overview of the Gnumeric widget settings in the original workflow.
- How to remove the original Gnumeric widget on the workflow.
- How to add a new Gnumeric widget from the tool dock to the workflow.
- Changing the widget settings to match the original widget.
- Making connections between the Gnumeric widget to the Salmon Quant widget.
Disconnecting widgets and run widget on its own
The Salmon RNA-seq demo workflow is constructed such that the widgets are executed in a cascade-like manner, starting from the Start widget and finish with the Gnumeric widget. The signals connecting between the widgets allow an order of execution that involve widgets post-execution triggering downstream widgets to start executing, as well as sending outputs to other widgets. However, there can be cases these widgets can be unlinked from the other widgets and run on their own. The following video demonstrates how to disconnect widgets from other widgets.
The video contains:
- How to inspect signals connecting between widgets.
- Ways to disable signals and disconnect the connections.
- Modifying a widget's RunMode option, and how to disable triggers after removing signals.
Add & edit parameters, and set up inputs and outputs
Parameters in widgets allow information and settings to be set by the user when executing the widget. They can store values and paths to files and directories, as well as be formatted as a flag, an environmental variable, or a command argument placed at the end of the widget's execution command (a mixture of these formats is also possible). Parameters can take in values from other widgets in the form of an input after those widgets are done executing, and parameter values can be sent to other widgets in the form of an output after the widget is done executing.
Parameters can be seen in the widgets from the Salmon RNA-seq demo workflow. To understand how a parameter works in a widget, the following video demonstrates how to add and edit parameters, as well as making inputs and outputs from parameters.
The video contains:
- How to add a parameter, going through an example scenario of making a new parameter in an existing widget.
- How to edit an existing parameter.
- How to delete a parameter.
- How to set a parameter as an output or input.
- After setting inputs and outputs in a widget, how to connect inputs and outputs to other widgets.
Jupyter Notebook widgets
Jupyter Notebooks is a useful tool that mixes live code and documentation in a document. These notebooks are widely used for data analysis, scientific research, machine learning, bioinformatics, and other fields of work. In workflows, Notebooks can be included at the end of the workflow, after data processing is completed, to conduct data analysis on the processed data. In Bwb, Jupyter Notebooks are available by default as widgets that have Python and R kernels. The following video demonstrates how to use the Jupyter Notebook widgets to make a new notebook, and explains some of the caveats in installing libraries, packages, and dependencies to run code in the notebook.
The video contains:
- Where to find the Jupyter Notebook widgets in the tool dock.
- An overview of the components and parameters in a Jupyter Notebook widget.
- How to run the widget to open Jupyter Notebooks.
- How code can be run in the notebook using Python and R kernels.
- How to install libraries and packages for Python and R inside the notebook, as well as in a new Jupyter Notebooks Docker image for the widget.
There is coding involved in the video that needs to be typed if you want to follow along. Here are the code snippets seen in the video.
To install numpy
in a Python kernel notebook:
import sys
!{sys.executable} -m pip install numpy
To install ggplot2
in an R kernel notebook:
install.packages(‘ggplot2’, quiet = TRUE)
The lines of code to add to the Jupyter Notebooks (bioconductor version) Dockerfile to add a list of Python and R packages:
RUN R -e "install.packages(c(‘ggplot2’, 'jsonlite', 'stringr'), repos='http://cran.us.r-project.org')"
RUN pip install numpy pandas matplotlib