Lucio Angelo Antonelli and Stefano Gallozzi, INAF, Italy

The collaboration with the Cherenkov Telescope Array (CTA) team aims at testing new collaborative platforms to increase the technological capacity in CTA scenarios. The pincipal activities of the team are related to the development of the end-to-end ASTRI ground cherenkov gamma-ray telescope, which is a prototype of the Cherenkov Telescope Array (CTA). It will work as on observatory producing a very huge amount of data (~300PB), these data should be accessed, stored, managed and an easy access should be provided to such a huge amount of data at different user level.
The Use case archive for INDIGO will guarantee several tests for the technical solution more suitable to handle such an amount of data.

Champion’s team

The team involved in ASTRI prototype and CTA miniarray project belongs mainly to INAF institutes like the Astronomic Observatory in Roma, ASI Science Data Center, IAPS Rome, IASF Milan and the Astrophysics Observatory in Turin, but a special collaboration is also provided by the University of Padova, where a GPU-Cuda center is located.
Principal activities of the team are related to the development of the end-to-end ASTRI ground cherenkov gamma-ray telescope, which is a prototype of the Cherenkov Telescope Array (CTA). The evolution of the ASTRI project is the SST mini-array project which consists of 9 ASTRI-like telescopes and will be the first seed of the CTA array to be installed in the CTA south hemisphere site in Chile. 
The software implementation of the Archive and analysis pipelines are developed by our team. Other relevant efforts are dedicated to the simulations and high level software products to be used in the CTA contest.


 

The Case Study: CTA Big Data Processing

The principal community challenge involves the huge amount of data to be stored: prototypes will take ~1TB/night → 0.3PB per year, the preproduction pahse will produce at least 1PB per year and the whole CTA array in production will produce ~40PB per year, but the projection on data rate are highly variable since the most pessimistic scenario can reach the huge total amount of data of ~100PB per year. These data should be stored and available for a very long time (~30-50 years).

Case Study & User story

The only way to efficiently address such amount of data is configuring a distributed data model archive. The group plans to federate together some geographically distributed storage resources in order to guarantee an horizontal scalability to reach big data horizon. 

The way a single user can access and search for data identify the related use case:

1. Archive Scientists performs simulations and pipelines using standard GRID infrastructure and store sensitive output in the CTA archive. They will access through standard grid X509 certificates and they will redirect Input&Output to the CTA Archive federation.
2. Archive & DB Admin: control the whole distributed resources in order to balance workload and optimize the system. He schedules backups and replication of data in the storage federation. He manages the computation for observation campaigns in order to aggregate coherent data in the same place and speed up I/O.
3. Principal Investigator (P.I.) external user: directly access to the CTA archive through a dedicated A&A interface/portal and analyze data and metadata using provided software tools. He also download final products.
4. Scheduler external user: is responsible to grade and rank all observation to be performed. He will access Through dedicated portal A&A interfaces and services.
5. Guest observer: can queue for observation P.I. programs through dedicated interface with on-site array infrastructures.

Status today

The CTA community has more than >1200 scientists and it is foresee to increase in dimension. Each user can be a potential stakeholder of innovative services to be developed and deployed for the community. We currently concentrate on the more challenging task which is the intelligent storage and data handling of the produced amount of data. We foresee the INDIGO use case as a prolific contributor to the CTA technological choices. The archive development is currently related to the ASTRI prototype phase: all technologies tested will be also tested in pre-production phase.

The CTA community

The CTA community is a worldwide community which gather together about 31 countries, ~1200 participants, ~180 institutes and ~400 FTE. The CTA is array of about 120 telescopes in two sites (20 in the northern hemisphere and 100 in the southern hemisphere) it is foreseen an increase by a factor 10 on the VHE gamma sensitivity on a wider band (from 0.02 to 100TeV), thus permitting more than 1000 new gamma ray sources to be detected. It will work as on observatory producing a very huge amount of data (~300PB), these data should be accessed, stored, managed and an easy access should be provided to such a huge amount of data at different user level.

The Use case archive for INDIGO will guarantee several tests for the technical solution more suitable to handle such an amount of data. A feasible demo solution can be envisaged taking into account a distributed federation of storage resources to be accessed and managed by INDIGO solutions.

The final user perspective

Fragmented data are coherently aggregated and stored in dedicated repositories (Storage Elements) of GRID providers. Archive PIPELINE users will create a batch queue to process these data creating higher level products; this task is performed by standard computation GRID infrastructure. Long term sensitive output is stored, elsewhere is erased after the computation. Any operation on data will update a metacatalog of descriptors easily browsable by a database query engine.

INDIGO innovation

The main goal of the INDIGO collaboration ad the CTA use case is the testing for new collaborative platforms in order to use technological development in CTA scenario.

The federation of storage resources can come into help and can define a simple unified way to access data, so that the whole open data of CTA community could be available to the INDIGO users by grant in authorization. A common platform for A&A can also help to homogenize scientific communities in order to use the same tools to deploy production infrastructures and scientific services on-demand.


Want to know more about this use case? Write us at info@indigo-datacloud.eu to contact the Champions