IA2 is an ambitious Italian Astrophysical research infrastructure project of the Italian National Institute for Astrophysics, is looking for new solutions in terms of efficient data tranfer and storage, data preservation and user space for data reduction and computing.
IA2 is an ambitious Italian Astrophysical research infrastructure project of INAF, the Italian National Institute for Astrophysics, that aims at co-ordinating different national initiatives to improve the quality of astrophysical data services like data storage and preservation, data retrieval trough Virtual Observatory compliant services and allowing data computing using appropriate workflows.
The main topic of this use case is delivery new solutions in terms of efficient data tranfer and storage, data preservation and user space for data reduction and computing. Users space and web access to data are the key points of an efficient service to the Astronomical community. Robust and highly available techniques are required to fulfill the requirements of data distribution and data process using efficient computational systems. Storage, computation and ability to handle with user permission in a flexible way in a collaborative environment are major requests coming from the community.
Case Study & User story
The scale of the accumulated data for the Large Binocular Telescope (LBT) raw data archive is relatively small if compared with other big projects of nowaday. LBT is expected to produce an archive of standard FITS data products with a growth rate on the order of 1 TB per year. Although the management and processing challenges associated with populating and maintaining the LBT scientific archive are quite of the same entity, these standard data products actually represent only the first part of the full science extraction chain. Storage and computing resources associated with both the raw and reducted data archive, however, are expected to be significantly fast and performant in order to allow the Astronomical community at all to extract scientific information from the available data both in proprietary and public way.
A user would be able to
- access a dedicated environment in which he/she have sufficient permission to create a proposal (web form with a persistence system under it),
- verify if the proposal is accepted and the observing time allocated (user notification system or ticketing system),
- receive information about the status of the observing project,
- interact with the telescope team in order to adjust or optimaze the technical details of the observation,
- retrieve the raw or pre-reduced data,
- run a specific sequence of tasks of a pipeline (workflows systems),
- save the data products, share result with other users,
- discuss via collaborative tools about the observing results and possibly perform all these operations using one set of credentials in a Single Sign On way.
User would probably would like also to manage permission of his workspace allowing or withrawing access to other users.
Status today (January 2016)
At present time (January 2016) the team is working on the identification of actions from the user point of view and on the identification of specific technological and modelling requirements. The technological challenges of this use case are not so huge, due to the small amount of data and the small data ratesin place for such a telescope. Parallel computing was tested and evaluated in the recent past, but no specific requirements on this are emerged. Data distribution and data delivery to remote site are still based on public net but some requirements on net topology could allow to use multicast techniques in case of parallel data delivery over different sites.
In principle LBT use case is quite identical for several ground based telescopes. The major telescopes in the world are financed by public government and are available for public institutes research. Since some years some public institutes like INAF adopted a policy on data that allow publication of data after one year by the end of the observing program. This means those data are public to the entire Astronomical community.
Using VO compliant services, the LBT INAF data are available to the wole Astronomical community. LBT data archive could handle both private and public data at the same time and the mutation of data status is automatic.
The Astronomical community is world wide and thanks to the work done in the scope of the VO, all the VO clients available to retrieve, manage, handle Astronomical data handle fully interoperable data.
The final users will access the user dedicated space composed by storage, high performance computing and collaborative tools in a transparent way. He/She will submit or ask for observing time, handle data set, reduct these information to extract more detailed scientific information and save the products in a reserved zone. The credential to access he user space should be unique, the possibility to share some parts or data subsets with other users should be a requirement of such a system.
In this scenario, INDIGO could help in the definition of the opportune technological stack if already available, suggest technological solution to handle with workflow and to manage workspace shared between users, suggest best practices and solution for data distribution and in particular to perform computing in a distributed environment.
User workspace is one of the addressed problem for a secure but collaborative way to organize the job. To correlate distributed data over two continents with the same processing software is a challanging issue also for relatively small quantities of data like in our case.