The Centre for Informatics and Computing (CIC) at the Ruder Boskovic Institute (RBI) is composed of PhD students, post-doctoral and senior researchers working on various projects, mainly focused on e-Science technologies and services, distributed computing systems, grid/cloud infrastructures and applications, data mining and advanced scientific visualizations.
RBI/CIC is strongly oriented towards international cooperation and has significant experience with participating in the EU Framework Programmes. Since the very beginning, RBI is involved in the construction of the Digital Research Infrastructure for the Arts and Humanities (DARIAH), starting as a partner in the “Preparing DARIAH” project. After the DARIAH preparatory phase, RBI was actively involved on the national level in establishing Croatia as a member of DARIAH-ERIC, as well as on the international level as partner and DARIAH community representative in the EGI-Engage and INDIGO-DataCloud projects.
The Centre for Informatics and Computing (CIC) at the Ruder Boskovic Institute (RBI) is composed of PhD students, post-doctoral and senior researchers working on various projects, mainly focused on e-Science technologies and services, distributed computing systems, grid/cloud infrastructures and applications, data mining and advanced scientific visualizations. RBI/CIC is strongly oriented towards international cooperation and has significant experience with participating in the EU Framework Programmes. Since the very beginning, RBI is involved in the construction of the Digital Research Infrastructure for the Arts and Humanities (DARIAH), starting as a partner in the “Preparing DARIAH” project. After the DARIAH preparatory phase, RBI was actively involved on the national level in establishing Croatia as a member of DARIAH-ERIC, as well as on the international level as partner and DARIAH community representative in the EGI-Engage and INDIGO-DataCloud projects.
The Arts and Humanities (A&H) have seen an exponential growth in digital research material, especially in the last decade, as a result of new digital material and large digitisation efforts. Because humanities disciplines nowadays generate and analyse an increasing amount of data, parts of their research process become more and more data-intensive and have to be supported by emerging research infrastructures. With the growth of digital art and humanities related content, emerges the need for data storage, as well as the needs to easily access, search and analyse the data.
The domain of art and humanities includes various scientific disciplines (History, Linguistics, Musicology, Archaeology, Art history, Philosophy, Literature, Political science, Psychology, Religions, etc. ) and all those disciplines have specific research data with their corresponding standards and metadata.
Considering the heterogeneity of Arts and Humanities data, as well as the insufficient experience with use of e-Infrastructure resources among researchers in this area, a challenging solution addressed within this Case Study is the development of an user-friendly platform which simplifies the process of creating, managing and sharing data repositories.
Case Study & User story
Data used in A&H research differs in size (a few KB for a text file to several GB for a film record), quantity (a few image files of a rare and valuable manuscript up to several millions of image files of a whole library) and type as there is a variety of different formats for text, image, audio, and movies. The proposed data repository platform should serve as a solution for storing and managing data, serving various A&H research projects, for example:
- A musicological project which provides a complete overview of the work of one composer including scores, letters or recordings of an orchestra; need for viewing and editing MEI-encoded music documents in CMN (Common Music Notation)
- A digitization project establishes a virtual library comprising of manuscripts that have been spread all over the world
- An archaeologist virtually reconstructs buildings from their remains. The data from results of the excavations will be used to create 3D models of the landscapes and buildings
- A scholar analyses the historic development of narrative techniques based on a large collections of literary texts comprising about 2000 novels
The main goal of this platform is to simplify the process of creating and managing repositories of digital assets from various A&H disciplines. To be able to meet this requirement, one of the key features of the platform should be user-friendliness. This platform should enable the user, depending on the user role, the possibilities to create collections, store and retrieve data responsively and reliably, as well as to include a data discovery mechanism and metadata management.
The main idea is that each individual DARIAH member (i.e. a person authorized by the DARIAH IdP) should be able to access this platform and create his/her own customized repository.
The repository manager (a person who created the repository) is then able to define different access rights to different end-users. Depending on the access rights, several levels of end-users can be identified. Based on the level of their access rights, end-users should be able to perform certain actions such as upload, download, annotate and/or browse content.
Currently, the DARIAH community has more than 3000 active users, with a tendency towards intensive growth. Those are all potential users of the data repository platform. Right now, the platform is in a preparatory phase. User stories and technical requirements are defined and are currently being analysed by the INDIGO-DataCloud development team.
DARIAH (https://www.dariah.eu/) is a pan-european social and technical infrastructure for arts and humanities scholars working with computational methods, focused on supporting digital research, as well as the teaching of digital research methods. It is composed of people, expertise, methods, tools and technologies for investigating, exploring and supporting work across the wide spectrum of the digital arts and humanities.
DARIAH is a network which connects several hundreds of scholars and dozens of research facilities, currently in 17 European countries. As a big and broad community, DARIAH joins up a great number of collaborators, mainly researchers from various arts and humanities disciplines, as well as computer scientist and technical infrastructure experts.
In the usual scenario, a DARIAH member (project/institute manager, research group leader, or individual researcher) will use the solution provided by INDIGO-DataCloud through a user-friendly and suitably designed web portal to make an initial request to create a new repository.
Upon the successful authorization of the user a new, empty repository instance will be launched and the repository URL address returned to the repository manager. Once the requested repository is up and running the repository manager can define the repository, collections, and items and fill it in with data. After the repository is setup up and filled, the end-user will be able to log in with his/her DARIAH credentials through a central web portal and perform certain actions on all the repositories to which he/she has access, such as download items’ replicas, annotate data, browse and search, as well as upload new content, depending on his/her user rights.
All repositories should have some general description and those should be publically available. Non-registered users should be able to browse through the public repositories based on those descriptions and registered user should be able to request access rights for certain repositories or collections.
There is a considerable number of projects which include collaboration between various A&H institutions and e-infrastructure providers, at the moment mostly on national levels. Those kind of projects often include the involvement of service implementers who are working in collaboration with a domain specific group of researchers in setting up a particular repository of digitised assets.
However, what is lacking is an e-Infrastructure service oriented towards individual researchers and smaller/medium sized research groups who want to store, manage and share their data, but are not necessarily linked to any institution or do not have adequate technical knowledge. The INDIGO solution addressed in within this Case study, namely the user-friendly data repository platform, would thereby be mainly oriented towards meeting the needs of data storage and management, as well as collaboration and data sharing among individual A&H researchers, research projects and research groups within DARIAH.
Currently, the DARIAH community does not offer a data repository platform for its members on a global level, thus INDIGO will provide a new technical solution, offered to the DARIAH members as a service for creating and managing digital data repositories.