Yin Chen, Senior technique outreach expert, EGI.eu, Netherlands

EGI offers a set of independent cloud services presented coherently as a single system using common standards. One of important EGI solutions is Federated Cloud, which is targeted at researchers and research communities that need to access digital resources on a flexible environment, using common standards to support their data- and computing intensive experiments. INDIGO solutions will be integrated with EGI Federated Cloud and help improve use experiences in EGI communities.

Champion’s team

EGI (www.egi.eu) is an international collaboration that federates the digital capabilities, resources and expertise of national and international research communities in Europe and worldwide. Over the last decade, EGI has built a federation of long-term distributed compute and storage infrastructures that support research and innovation. This international e-infrastructure has delivered unprecedented data analysis capabilities to more than 38,000 researchers from many disciplines. The federation brings together more than 350 data and compute centres worldwide. 

Several people from EGI.eu are working in INDIGO, and they are:

Photo

Name

Roles

Peter Solagna

Senior Operations Manager of EGI.eu leading EGI activities in INDIGO

 

Matthew Viljoen

Operations officer of EGI.eu, working in INDIGO WP3 and WP2

 

Themis Athanassiadou

Technical outreach expert of EGI.eu, working in INDIGO WP2

Iulia Popescu

Communication Officer in EGI.eu supporting INDIGO dissemination activities in WP2

 

The Case Study: the EGI Federated Cloud

EGI offers a set of independent cloud services presented coherently as a single system using common standards. One of important EGI solutions is Federated Cloud, which is targeted at researchers and research communities that need to access digital resources on a flexible environment, using common standards to support their data- and computing intensive experiments.

There are many challenges. Most researchers have two major computing needs. The first is access to a large-scale computing and/or data analysis services. The second is to be able to deploy existing, trusted, applications and interfaces on these resources. Many e-Infrastructures in Europe already provide the access to computational resources. However migrating applications from an existing provider to these can require a considerable amount of resources and effort. In a worse case scenario the deployment of some applications may not be possible due to technology choices made by an infrastructure. There is also limited interoperability between some infrastructures. This prevents maximising the public investment in e-Infrastructures and a loss in effectiveness.

Case Study & User story

We bring three important requirements from EGI FedCloud community to INDIGO. There are commonly requested by the community:

Requirement 1: Users request that the distributed nature of a cloud infrastructure that they are using should be transparent to the user. The user should not need to care about whether the infrastructure is spread across multiple sites (or indeed across physical or political boundaries) or what cloud implementation(s) the infrastructure includes (e.g. OpenStack or OpenNebula). The data used by the user may similarly be shared across multiple sites and this should be transparent to the user, as should the underlying hardware in use by the infrastructure and any other site-specific details of the cloud infrastructure deployment. This requirement is the most important for EGI and is needed by multiple communities including: BILS, HBP, LOFAR, EPOS, DARIAH and ELIXIR.

Requirement 2Users request that software running on the INDIGO-DataCloud platform is able to access data via POSIX, regardless of the type of underlying storage (e.g. file, block or object storage).   Both read and read-write access is required, as is distributed access.  Moreover, access control must be implemented which is able to prevent anybody apart from the owner accessing the data, if the user requires this.

Requirement 3: This requirement relates to interoperability of the Authentication and Authorization Infrastructure (AAI) of EGI. Users having an EGI SSO account want to use services that form a part of the INDIGO-DataCloud platform to do their work.  The user is granted access to these services by virtue of membership of their Virtual Organization (VO). The user’s EGI SSO account should be sufficient to be able to do this without needing to setup and maintain multiple logon credentials.

Status today (January 2016)

The EGI FedCloud has a large use base. But currently there are limitations that prevent users from easily adopting the technology. For example, user needs to manually select site supporting their VO and which has their VM image; on the other hand, reliance and knowledge are requested for command-line interface and manipulating X509 certificates/proxy certificates.

"We expect INDIGO solutions would be integrated with EGI FedCloud and help improve use experiences in our communities"

The community

EGI Federated cloud is a federation of institutional private cloud infrastructure. The federation supports a number of use cases and communities with diverse requirements and research fields.  It includes but not limited to:

Use cases or communities

Description

Chipster

A user-friendly analysis software for high-throughput data. It contains over 300 analysis tools for next generation sequencing (NGS), microarray, proteomics and sequence data.

READemption

 

A pipeline for the computational evaluation of RNA-Seq data. It was originally developed to process dRNA-Seq reads (as introduced by Sharma et al., Nature, 2010 (Pubmed)) originating from bacterial samples. Meanwhile is has been extended to process data generated in different experimental setups and from all domains of life.

JAMS

A java-based, open-source software platform that has been especially designed to address the demands of a process-based hydrological model development and various aspects of model application. It is a framework to build up complex models out of simple components. Several hydrological models were implemented within JAMS (e.g. J2000, J2000g). Usually those models are applied to simulate hydrological dynamics in catchments with a size of 1km to 100,000 km in a temporal time step of hours to months.

HAPPI

 

SCIence Data Infrastructure for Preservation with focus on Earth Science (SCIDIP-ES) brings together the state of the art in preservation technologies, represented by Earth Science repositories, and researchers for digital data preservation techniques. SCIDIP-ES HAPPI supports the archive manager and curator to capture and manage part of the Preservation Descriptive Information (PDI).

INERTIA

A project addresses the "structural inertia" of existing Distribution Grids by introducing more active elements combined with the necessary control and distributed coordination mechanisms. To this end INERTIA will adopt the Internet of Things/Services principles to the Distribution Grid Control Operations.

DRIHM

 

An European initiative running from 1st September 2011 to 28th February 2015 aiming at providing an open, fully integrated workflow platform for predicting, managing and mitigating the risks related to extreme weather phenomena.

BILS (Bioinformatics Infrastructure for Life Sciences)

A distributed national research infrastructure supported by the Swedish Research Council (Vetenskapsrådet) providing bioinformatics support to life science researchers in Sweden. BILS is also the Swedish node in the European infrastructure for biological information ELIXIR. BILS is willing to rewrite its current services to scale up compute in cloud and so exploiting the Cloud Elasticity. The BILS portals is front-end to biological tools for not IT skilled users. Currently, all user compute jobs run in worker nodes in small clusters. BILS is interested to run the compute jobs on the EGI Federated Cloud and so scale-up.

HBP

(Human Brain Project)

HBP aims to accelerate our understanding of the human brain by integrating global neuroscience knowledge and data into supercomputer-based models and simulations. This will be achieved, in part, by engaging the European and global research communities using six collaborative ICT platforms: Neuroinformatics, Brain Simulation, High Performance Computing, Medical Informatics, High Performance Computing, Neuromorphic Computing and Neurorobotics. For HBP a key capability is to deliver multi-level brain atlases that enable the analysis and integration of many different types of data into common semantic and spatial coordinate frameworks. HBP is looking for different type of repositories that allow interactive access to selected sub-set of data wanted and ultimately to be able to do analysis where that data sets are. The purpose is to leave the data in place, without moving it outside of the repositories.

BBMRI-ERIC CC

 

Thousands of biobanks in Europe have been collecting data, samples and images of millions of individuals in different stages of their lives, during disease and after recovery. Biobanking is currently evolving from local repositories to a pan-European RI the BBMRI-ERIC. The BBMRI CC facilitates the implementation of big data storage in combination with data analysis and data federation by integrating technologies from community projects, EGI and other e-Infrastructures. The CC will capture requirements and provide technology demonstrators to:


  • Increase biobank interoperability and data discovery in BBMRI-ERIC community by providing a secure and standard way to share biobank high-throughput data,
  • Provide biobanking community with a federated infrastructure for big data storage and intensive data analysis,
  • Facilitate the efficient use of bio-resources by supporting visibility and sharing, while also respecting the protection level required by owners of the data and samples,
  • Facilitate the efficient use of economic resources in BBMRI-ERIC by providing a common informatics infrastructure for storage and processing of big data.

DARIAH CC2

 

It aims to widen the usage of the e-Infrastructures for Arts and Humanities research. The CC will develop and provide a workflow-based science gateway based on the generic-purpose WS-PGRADE and gLibrary technologies, adapted and tailored to the needs of users coming from the field of Arts and Humanities. The gateway will provide access and compute services for data residing in distributed grid and cloud storages. The gateway will be validated and enriched with the ‘Multi-Source Distributed Real-Time Search and Information Retrieval’ application (SIR). The CC will engage with Arts and Humanities communities to attract more applications and users to the gateway.

EPOS CC2

 

It aims to drive the future design of the use of grid and cloud for the integrated solid Earth Sciences research as part of the European Plate Observing System (EPOS). The CC will (1) identify and validate authentication and authorisation services, (2) will test cloud resources and usage models, (3) provide knowledge transfer services between e-Infrastructure and EPOS communities.

Disaster Mitigation  CC2

 

The objective of the this CC is to make available customised IT services to support the climate and disaster mitigation researchers to gain a deeper understanding of the most serious natural disasters that affect Asia (e.g. earthquakes, tsunamis, typhoons) and to mitigate multi-hazards via data-intensive, e-Science techniques and collaborations. The task strongly builds on experts from the Asia-Pacific region who will create virtual research environments with embedded services and simulations that enable the sharing of disaster-related data, tools, applications and knowledge among field- workers, scientists, and e-Infrastructure experts, shortening the time they can respond to natural disasters.

 

The final user perspective 

The users of EGI FedCloud are individual researchers and larger research communities or groups. They are offered an Infrastructure as a Service (IaaS) cloud service in which they can freely choose from a wide range of service providers using the same standards. They can use their existing applications or have the confidence that any new applications can be used with any other service provider within the EGI cloud federation.

From final user perspective, they would expect the following results:

For Requirement 1: As a user, I can create VMs sharing common data on multiple sites regardless of their cloud implementation or physical location.

For Requirement 2: As a user I can run a visualization tool on an HBP application running in Docker that access underlying storage via POSIX.

For Requirement 3: As I user I logon to the EGI using my EGI Single Sign On (SSO) credential.  I am able to use any services of the INDIGO-DataCloud platform to which I have access rights seamlessly without needing separate logon credentials or multiple user accounts.

INDIGO innovation

The identified requirements are related to a number of functionalities to be delivered by the INDIGO DataCloud, including:

  • Orchestration support for multiple VMs
  • Brokering functionality
  • Federated access to distributed data via tools to share data between VMs in a safe and efficient way.
  • Interoperable AAI solutions

"We are very interested in these solutions and hope they can be well integrated into EGI FedCloud."