Fernando Aguilar Gómez, IFCA – CSIC, Spain

This team is working on two different case studies for INDIGO: Algae Bloom and TRUFA. In the first case, INDIGO technology will help address the problem of the eutrophication of the fresh water, that is a bloom of algae that produce a deceasing of the water quality and is potentially bad for human health. In the second case, the main goal is providing resources and tools to experts for performing RNA-seq analysis. Both cases are well suited for test and validation of INDIGO solutions.

Champion's team: Algae Bloom

Daniel García (IFCA), physicist, expert on the remote instrumentation, will participate in validation of the results, data curation and configuration of  the predictive model.
Tamara Santiago (Ecohydros), biologist, expert on cyanos and output validator.
Jose Augusto Monteoliva (Ecohydros), expert on data quality, validator of results.
Experts “on demand”: Agustín Monteoliva (Ecohydros), Jesús Marco (IFCA). They will help to configuring models and validate results.

Champion's team: TRUFA

Luis Cabellos, senior engineer in informatics at IFCA, Altamira administrator, HPC expert and one of the main TRUFA developers.
Etiene Kornobis, PhD in Biology-Bioinformatics, expert in RNA-seq analysis, expert in the different TRUFA tools and one of the main developers.

The Case Studies: Algae Bloom and TRUFA (TRanscriptome User-Friendly Analysis)

In the Algae Bloom Case Study, researchers want to use the data collected in an instrumented platform in the water reservoir and also in some tributaries (rivers that flow into the reservoir). The problem to address is the eutrophication of the fresh water, that is a bloom of algae that produce a deceasing of the water quality and is potentially bad for human health. To achieve this goal, experts needs to monitor the evolution of the water status (using data gathering with specific instrumentation), use data as input for a hydrodynamic and biological model, execute the model using appropriate computing resources and implement a predictive framework to enable warnings to alert the water management authorities when the water quality decrease.

For TRUFA Case Study, the main goal is providing resources and tools to experts for performing RNA-seq analysis. This service must provide not only computing resources (HPC or Cloud) but also storage space to upload input files and store outputs from analysis. TRUFA aims to be very User-friendly, so that is based on a web interface that can be easily filled by the users and some of outputs can be checked directly online.

Case Study & User story

Algae Bloom

User Story A): an SME team wants to model the hydrodynamic behaviour of the water reservoir, to reproduce the thermocline and predict the onset and completion times of the water column stratification, with special interest in its final phase (september/october).

  • •    A.1) As a ICT expert, I can set all the input parameters and maps to model in 3D the water reservoir using Delft3D. The input comes from ENV experts in the SME. 
  • •    A.2) As a ICT expert, I can launch the simulation in HPC resources in the cloud. Each simulation is run from April to October for a given year, in 1h step, 3D. 
  • •    A.3) As an ICT expert, I can check with the ENV experts in the SME that the reference distributions (Temperature profiles, water level, etc) make sense and compare them to previous monitoring data (when existing) in a distributed way. 

The SME experts use the programs output using EXCEL (currently, MATLAB or iPYTHON in the short future) on cloud resources via remote interactive portal (currently TeamViewer, next VNC/Thinlinc).

If the comparison makes sense, the output is stored. If a prediction is required, different runs are executed using previous year meteo scenarios to provide an estimation of the probability of an algae bloom. An statistical model is then applied to estimate the expectations.

If the comparison doesn’t qualify or the output is not ok, the model is re-run varying the input parameters with larger incertitude, and a set of new measurements are transmitted to the BIO team.

User Story B): an SME team wants to predict algae bloom based on model, and validate against previous year detailed analytical measurements

  • •    B.1) As a ICT or BIO expert, I can confirm the required input, including those from A). The experts also define the model metrics (difference in the evolution of relevant functions: oxygen profile, algae profile, etc.). 
  • •    B.2) As a ICT expert, I can prepare the simulation of the DELWAQ module to estimate how the algae will grow. 

All input parameters to be tuned (around 60) are listed, including their uncertainty range. Also different input maps have to be used when there is an uncertainty (like for sediments). An initial random selection using uniform sampling (after variable normalization) for 1000 different simulation points is prepared, each one requiring around 5h. These 1000 MC points are used as a first estimation of the dependence with the different variables.

  • •    B.3) An optimization of the parameters, using 2010 data, is made using a multivariable gradient technique: different simulation results guide the convergence. An optimized set of parameters (including for example mortality rates) is obtained.
  • •    B.4)  The DELWAQ model is run on different clima scenarios to estimate a prediction.
  • •    B.5) The model is run weekly and contrasted with real evolution. Prediction of scenarios are updated. Estimation of the impact of different pressures and corrective measures (for example, water level management, by pass of waste water treatment plants, artificial wetlands, cattle management) is done to propose the best path to the management

TRUFA

When a new user  applies  for  a  new TRUFA  account, the team checks  the new user  identity  and  create a  new account, allowing the user  to access  the  portal. The user can upload new  input  files  or  link them from  external  databases (like NCBI  SRA), selects the input  files  to be analyse and the different  steps  to be  performed. Finally it launches  the analysis.   
The TRUFA  pipeline  script  sets  up  the  workflow  to  be  performed  in  Altamira  supercomputer, and a list of jobs  and dependencies  between them  are  established. Jobs are then processed  in  Altamira. 
User  gets  information  about  the  status of  the  jobs  from  the web  server. Once  jobs  have  finished, the user  can  access  the  file  manager  to  handle  the  output  files:  download, check images and text files directly in the web server, delete, move, tag as input files, etc. 

Status today (January 2016)

Algae Bloom

The Algae Bloom is currently in the validation status. The data gathered from the platform is stored in a virtualized server located at IFCA and the input for modelling can be format and send to different types of resources to be computed: cloud or HPC (Altamira). Cloud systems also are being used to validate the result using a remote desktop.

TRUFA

Trufa is composed by different parts, like web server and computing layer. Web server is located in a virtualized server at IFCA and it is connected to HPC resources (Altamira, IFCA supercomputer). Currently is in a process of restructuration of the modules that will allow TRUFA to work with different computing paradigm behind, like cloud computing. Also different types or orchestrations are being tested to make TRUFA more scalable.

The community: Lifewatch

Lifewatch community is currently distributed in different European countries which are involved in different project in many fields biodiversity-related: marine researching, birds, genetics, etc. IN particular for this two Case Studies, the communities arx more focused. For algae bloom, the clear stakeholders are researchers on water quality (in particular Ecohydros, company who works on that field) and water management authorities, but can be extrapolated to other similar cases, in water quality or other kind of modelling. TRUFA has a growing community (~140 users at the time of writing this document) from all over the world (India, Brazil, United States, Europe, Japan, etc.). TRUFA is open to researchers with an institutional identity, so that the community is potentially very wide.

The final user perspective  

For Algae Bloom use case study, the user will be able to get different types of resources to run his/her model. For interacting with the resources, the final user has to ways: on one hand, for managing input/output files and check results, user needs a web portal or other graphical system (e.g. for managing files, a graphical file system). On the other hand, for computing part, user needs to access remotely to the server and run the model there.

For TRUFA, the integration within INDIGO solutions should practically do not change the way to interact for the users. That is why TRUFA interface is based on web and that layer can be easily exportable to different environments. However, some improvements can be adopted by TRUFA, like integration of a more standardized file system.


INDIGO innovation

Algae Bloom

INDIGO will offer an integrated solution where all the components are integrated around the key of the case study that is data, not only input but also output from models. So that, users will get access to the distributed file system that can be accessed also by the computing resources. That way, users will not need manage the upload and download files to check, instead of that they will be able to check online.

TRUFA

Integrating TRUFA within INDIGO solutions will make TRUFA more scalable due to the number of available resources and the flexibility that INDIGO solutions will give to the whole system, giving the possibility of plug and unplug resources when needed or not. Also other INDIGO solutions can be integrated in the system, like AAI. Using INDIGO AAI will help TRUFA administrators to manage the users easier than the current system, assuring that users belong to a researching institution. Finally, INDIGO storage solution can help TRUFA to use more standardized ways to work with the results of the analysis. 


Want to know more about this use case? Write us at info@indigo-datacloud.eu to contact the Champions