Virtualization of the HADDOCK portal for biomolecular modelling

Proteins are the workhorses of life. They make and break other molecules and interact with a myriad of other biomolecules to carry on their function and control most processes in our cell. In doing so they form an intricate network of interactions, which one could term the “molecular Facebook of life”. Miscommunication is those networks is often at the origin of disease. Understanding how protein functions and repairing miscommunication by designing new drugs requires adding the three dimensional dimension to those networks. This can only be achieved with a proper combination of both experimental and computational approaches.

The size of the problem and the amount of computations needed are however main challenges that can only be addressed with proper e-Science solutions.

The Case Study: virtualization of the HADDOCK portal for biomolecular modelling

For more than ten years the Bonvin lab has been developing the HADDOCK software [1] for the modelling of biomolecular complex workflows. HADDOCK encodes a complex workflow during which thousands of 3D atomic models are being generated. The computations, which are based on classical mechanics (Newtonian dynamics), are driven by external experimental or bioinformatics information typically provided by the end user. Several hundreds of parameters control the way the computations are performed. In order to hide this complexity and provide an easy and user-friendly access to both the software and computational resources, the team has developed and been operating since 2010 a web-based portal [2,3]. The portal offers various access levels, exposing an increasing number of parameters and options to endusers. Results are returned via web pages that present interpreted results with both numerical and graphical analysis, providing added value to the computations. The HADDOCK portal makes both use of local computational resources and worldwide distributed resources made available via the European Grid Initiative. This easy access to rather complex software has attracted a large community of users worldwide, HADDOCK being currently the most cited software in its research field.

Status today

The HADDOCK portal has processed over 110.000 user runs since its official launch, and this for a growing community of more than 6500 registered users worldwide. The portal is hosted on a local cluster, but makes efficient use of the distributed grid resources offering through the European Grid Initiative. This was made possible through two former EU FP7 e-Infrastructure projects (eNMR and WeNMR [4]). The user submission are translated into more than 8 millions individual grid jobs per year, running all over Europe, but also in grid sites in Asia and the US (via the Open Science Grid), and this all in a transparent manner for the end user. The job management is based on middleware developed in the context of former European projects (EMI middleware) and current services operated by EGI-Engage (e.g. the DIRAC4EGI service). 

Operating such a portal is a complex process with many risk factors. The current implementation on bare metal makes it vulnerable to hardware and/or power failures, which would have a direct and large impact on the user community.  

The HADDOCK portal has been in continuous production since 2010. Currently a majority of jobs are being sent to international grid resources. Gaining the ability to offer a fully virtualized portal, while still making efficient use of either local virtual resources (e.g. a virtual cluster) or grid resources, will in the long term improve the quality and reliability of our services.

The end user might not realize this since all the complexity of the workflows and computing remain hidden from him/her, but the portal operators will greatly benefit for the developments in INDIGO since they will be able to clone and launch on demand new instances of the portal with minimal overheads. This will allow to finetune the operation in order to respond to increased demands and/or specific purposes, like for education, workshops or applications within an industrial setting. 

Within INDIGO, new tools and solutions are being developed that should make it easy to deploy, configure and customize a virtual instance of the HADDOCK portal and cluster, at the click of a mouse, without having to learn complex Cloud management operation.

In related research areas, the groups has also been developing applications to study biomolecular interaction (e.g. PowerFit and Disvis, see bonvinlab.org/software) that can efficiently exploit GPGPU resources. Web portals are currently under development for those, and we hope to be able, as for the HADDOCK portal, to virtualize those, but in this particular case on GPGPU-enabled instances. Finally, the data generated by such computations, might also need to be stored and shared, either in a private manner during the duration of a project, or made public at the end of a project in open data policy. Tools underdevelopment in INDIGO should facilitate this process.

Next to benefitting from the INDIGO work, the community also greatly benefits from other H2020 European projects, which all together will synergistically boost our research and its impact on a worldwide community. Within EGI-Engage (www.egi.eu/about/egi-engage) GPGPU-enabled grid resources are currently being tested, which, next to the cloud solutions of INDIGO, might provide another efficient manner of running GPGPU jobs. Developments to incorporate among others cryo-electron microscopy data into the HADDOCK portal are taking place under the WestLife Virtual Research Environment project (www.west-life.eu). Further, under the BioExcel center of excellence project (www.bioexcle.eu) , we intend, among others to further improve the user experience by building workflows that connect HADDOCK with other flagship software like Gromacs for both pre- and post-processing.

 


Dr. Alexandre M.J.J. Bonvin, is Professor of Computational Structural Biology at Utrecht University, Faculty of Science, Bijvoet Center for Biomolecular Research. 

He leads the computational structural biology group in Utrecht (bonvinlab.org/people), composed of master and PhD students and post-doctoral researchers working on the development of reliable bioinformatics and computational approaches to predict, model and dissect biomolecular interactions at atomic level.