Program

The program has the following main characteristics:

  • at each of the 5 days present a module relevant for DS and DM

  • each module will be introduced by a 2-hour slot of disseminating knowledge and principles followed by practical sessions where participants get hands-on experience

The modules can be summarised as:

  1. Proper Repository: how to organise and assess a proper repository with the help of a professional repository application (DSpace: http://www.dspace.org/, https://github.com/ufal/clarin-dspace

  2. Registering Environmental Data: upload environmental data into the repository, create metadata, assign augmented Persistent Identifiers (PID: https://www.handle.net/, http://www.pidconsortium.eu/)

  3. Collection Building and Using: create collections as subject of analysis, expose metadata, cite collections, etc.

  4. Data Typing: use data typing as an essential element to carry out transformation, visualisation and analysis via a Data Type Registry (DTR: https://www.rd-alliance.org/group/data-type-registries-wg/post/data-type-registry-first-prototype.html)

  5. Analysing Data: addressing the stored data and metadata via PIDs for analysis using BEAKER notebooks (http://beakernotebook.com/), Python and R (https://www.r-project.org/)

Programming and scripting will mainly done in Python and the tools to be used (D-Space, Handles, DTR, Beaker, R, etc.) are open source and can be used in the course.