Extending Phenoflow-ML

A key advantage of Phenoflow-ML is that it is easily extensible, meaning that users can implement and include new extensions and functionalities regarding ML-based phenotyping. For that, three elements need to be considered: (1) the templates, (2) the endpoints of the generator component, and (3) the endpoints of the web component.

The easiest way to incorporate a new type of ML-based phenotype is to use another existing one as a base, modifying and adapting it. Therefore, we use the Logistic Regression type (already implemented in Phenoflow-ML) in this example in order to include the Decision Tree Classifier type. The following sections show how to modify and adapt the three aforementioned elements in order to include the new type of ML-based phenotype.

The templates

Each specific ML-based phenotype definition is compounded by a set of files (python files, cwl files, datasets, etc.), which contain the specific initial parameters with which the phenotype definition was created. This creation process is carried out using templates already stored in the system.

The first step to creating the new templates is to copy the directory /src/web/templates/LogisticRegression to /src/web/templates/DecisionTreeClassifier. The templates have been designed to be as generic as possible, meaning that we only have to modify two files: (a) README.md, and (b) step2.py.

The former file contains general information about the ML technique that is implemented (and the user can read). After modifying the text properly, this file would look like this. The latter file contains the Python code that executes the ML technique itself and it is, therefore, the most important file. Here, there are two relevant blocks of code: the _params dictionary (which contains the values of the initial parameters), and the call to the ML technique (that is different for each ML technique and uses the elements from the _params dictionary). After modifying the code properly, this file would look like this.

The endpoints of the generator component

The generator component aims to create specific files in the proper format for each phenotype definition (for which an API with different endpoints is provided). For that, it uses the corresponding templates and makes the necessary substitutions according to the initial parameters given by the author of the phenotype definition.

This functionality is implemented in the src/generator/api/routes.py file and each type of ML-based phenotype has the following endpoints: getStepCwl, which generates the content of the CWL file of a certain step, getMainCwl, which generates the content of the main.cwl file, and generateMainYml, which generates the content of the main.yml file.

These endpoints have been designed to be as generic as possible, meaning that we only have to make a few modifications: (1) replace "LogisticRegression" with "DecisionTreeClassifier", and (2) replace "Logistic Regression" with "Decision Tree Classifier". After these changes, the new block of code, that must be added at the bottom of the file, would look like this.

The endpoints of the web component

The web component aims to offer different mechanisms (in the form of API endpoints) to create, delete and modify phenotype definitions. For that, it uses the corresponding templates (making the necessary substitutions according to the initial parameters given by the author of the phenotype definition) and the endpoints of the generator API.

This functionality is implemented in the src/web/routes folder, in which each type of ML-based phenotype has a single file with the following endpoints: addPhenotype, which creates a new ML-based phenotype definition based on the parameters specified by the creator, uploadCsvDataset, which allows to upload a dataset in CSV format, and generate, which returns a zip file containing all the needed files to execute the phenotype definition.

Taking this into account, we have first to copy the file src/web/routes/LogisticRegression.js to src/web/routes/DecisionTreeClassifier.js. The aforementioned endpoints have been designed to be as generic as possible, meaning that we only have to make a few modifications: (1) replace "LogisticRegression" with "DecisionTreeClassifier", and (2) replace "Logistic Regression" with "Decision Tree Classifier". After these changes, the new file would look like this.

Finally, we also have to reference this new file in the src/web/app.js file. For that, the following two lines of code need to be added:

const DecisionTreeClassifier = require("./routes/DecisionTreeClassifier")

router.use("/DecisionTreeClassifier", DecisionTreeClassifier)