Forschungsdatenmanagement

Podcast Blechhammer

Research data management at Schmalkalden University of Applied Sciences introduces itself.

Svetlana Knaub enquires.

Information to read along

M

This is the start of our new podcast on the topic of Research Data Management, abbreviated “RDM”,
at Schmalkalden University. This aims to encourage researchers to engage in the subject of RDM
to organize their research more efficiently and use the results more sustainably.
Mr. Fehling, you are a contact person for research data management at our university. Perhaps you could start
by briefly introducing yourself and then tell us something about the HAWK project on research data management.

A

My name is Peer Fehling. I am an educated chemist. The research data I have dealt with is primarily measurement data. The community in research data management in Thuringia or nationwide is made up of all scientific areas, so that a diverse input from individual scientific disciplines is achieved. When I took up the position at Schmalkalden University in the HAWK-FDM project, I found my way into research data management.
HAWK project has been running at Schmalkalden University of Applied Sciences since December 2022. The German abbreviation FDM-HAWK stands for: Competence Cluster Research Data Management at Universities of Applied Sciences in Thuringia. This already shows that several institutions are involved.

M

Which facilities are these?

A

In addition to the Schmalkalden University of Applied Sciences, the Erfurt University of Applied Sciences, the Ernst Abbe University of Applied Sciences Jena and the Nordhausen University of Applied Sciences are also involved.

M

What is the project about?

A

During the research process, starting with the planning of projects, the application for funding and the realization of the projects, numerous data is collected. The aim of research data management is to preserve this data throughout the data life cycle in the best possible way, based on standardized rules and to make it available to other interested parties beyond the end of the project.

M

Why is this data of interest to third parties?

A

More than ever, modern research is carried out by highly specialized teams and the results of individual research groups are closely connected to other groups. This often involves considerable human and financial resources. Some projects are financed by funding organizations with taxpayers’ money. In the data-driven age, data and information are the true treasure of research that must be preserved. And data that is collected with taxpayers' money should also be accessible to any interested parties, for instance to avoid duplicate surveys and thus double funding.

M

Why is the project focusing specifically on the universities of applied sciences and
universities of applied sciences in Thuringia?

A

It must be said that the topic of research data management has gained increasing attention in recent years. One of the main reasons for this has been the efforts of funding organizations to create binding standards for handling research data. In this context, the "Guidelines of the German Research Foundation on Good Scientific Practice for handling research data” should be mentioned. Other funding organizations, such as the Volkswagen Foundation or the European Union require applicants to make statements on the handling of data generated in the research projects.

The establishment of research data management at universities in Germany began several years ago, not least because basic research there is financed to a large extent with taxpayers’ money.

M

How should we visualize this "establishment of research data management"?

A

Local state initiatives for research data management have been founded, which actively support universities in the development of research data management. In Thuringia, this is the Thuringian Competence Network for Research Data Management, German abbreviation TKFDM, which emerged from the University of Jena. But there are also RDM-initiatives in other federal states of Germany. Examples include HeFDI for Hesse, BW-FDM in Baden-Wurttemberg and FDM-Bavaria. At universities, temporary positions were initially created for contact persons for researchers on the topic of "research data management’, which in many cases have been made permanent in the meantime. This is an important requirement for the planned stabilization of research data management structures at universities of applied sciences.

M

And what is special about Universities of Applied Sciences when it comes to research data management compared to universities?

A

One major difference is the focus of research. While universities focus on basic research, research at universities of applied sciences is very industry- and application-oriented. Accordingly, the stakeholders have a strong interest in protecting sensitive data and securing competitive advantages from research activities. However, this is also possible with an appropriately tailored research data management system, for example through targeted licenses and the protection of exploitation rights. Nevertheless, it should not be forgotten that the re-utilization of third-party data is also associated with advantages for one's own research

M

Where can I find out more about research data management and get started quickly?

A

At Schmalkalden University, there is a section on research data management on the website of the "Research and Transfer" department, where further information is compiled. But of course, we invite everyone to follow this podcast, which is intended as an introduction to the topic of "research data management”.

M

Mr. Fehling, thank you very much for this informative overview of the project
"Research data management competence cluster at Universities of Applied Sciences in Thuringia".
In the next episode, we want to clarify what is meant by research data and research data management.
And with that, we say goodbye.

Podcast Blechhammer

Research data and research data management - getting started.

Svetlana Knaub finds out how.

Information to read along

M

Welcome everybody. Today's topic of our podcast on "Research data management at Universities of Applied Sciences"
deals with the following questions: What is research data and what is research data management?
Mr. Fehling, could you please explain what is meant by research data?

A

Everyone can probably imagine what research data means, at least from the perspective of their own academic background. However, this subject-specific perspective makes it challenging to establish a standardized definition. Just think of the multitude of measurement data generated in the natural and engineering sciences or the data surveys often used in the social sciences.

M

That seems to be almost all the data that has anything to do with research?

A

In the "Guidelines on the Handling of Research Data", the German Research Foundation has used an enumeration to address the topic of "What is research data" and summarised measurement data, laboratory values, texts, objects from collections or samples that are created, developed or analyzed as part of scientific work under the heading of "research data". Methodological test procedures, such as questionnaires or software, simulations and survey data, that means data related to individual “observation units", such as people, households or companies, are also mentioned.

M

That sounds pretty bulky!

A

We can simplify this and say that research data is all data that is generated or used in the planning, realisation and documentation of scientific projects. Scientific projects include project work as well as bachelor's, master's or doctoral theses.

M

And why is research data so important today?
After all, such data already existed in the past, even though in a different form.

A

That’s correct. But the current situation is as follows: Modern research is no longer carried out exclusively by outstanding individuals and no longer takes place in a "quiet room". Instead, we are dealing with teamwork between highly specialised scientific units that work together across universities at a national and international level. At this point, we need to make the connection to digitalisation. Modern research generates a constantly growing flood of digital data, which, in addition to practical application, represents the real treasure of research. It is not difficult to recognize that research data forms the basis for successful scientific work and reflects its success. Modern research is "data-driven", which means that strategic decisions about the alignment of research are made on the basis of analyzing and interpreting data.

M

The importance of research data management is now also becoming clear...

A

That's right. Basically, every researcher already takes care of their data. However, this process is becoming increasingly demanding and time-consuming, not to mention the information technology "know-how". It involves resource planning, available storage structures, data protection and data security, backup strategies, data archiving and much more.

M

How does research data management support researchers?

A

Research data management provides suitable tools for these tasks and advises researchers on the preparation, implementation and organisation of their work, effectively leaving more time for the actual research. The security and usability of the research data are always guaranteed during the project and beyond.

M

Which university institutions are involved in this process?

A

In addition to a local contact person for research data management at the university, the university data center and the library are also involved. These form the "research data infrastructure". In a broader sense, it is also about information management. The basis for this is the electronic archiving and subsequent utilization of research data. In a nutshell, research data management combines all methodological, conceptual, organizational and technical measures and procedures for handling research data during its “life cycle".

M

We hope that we have been able to contribute to the understanding of the terms
"research data" and "research data management".
The next episode will deal with the topic of "Research Data Management and Data Life Cycle”.
Until then, we say goodbye.

Podcast Blechhammer

Research data management - preferably FAIR!

Svetlana Knaub finds out how this works.

Information to read along

M

Welcome to the third episode on Research Data Management at Schmalkalden University.
In the second episode, we talked about research data and research data management.
At the end, when it came to how exactly the support for researchers in organizing their research data looks like,
the term "data lifecycle” came up. What is the data life cycle?

A

The data life cycle is an illustrative model for handling research data over the timeline of its existence. In simplified terms, it is divided into individual sections that do not run in strict succession, but overlap in time to some extent.
These sections are:

1. Planning the research project

2. The generation of data

3. Analyzing and processing the data

4. Sharing and publishing the data

5. Archiving the data and -last but not least-

6. The subsequent use of the data.

It should be noted that this structure exists in different labelling variants, although the content is the same.

M

Can you explain the individual phases in more detail, please?

A

Of course. The research project begins with the planning phase. It is important to clarify which data is required, generated, processed and stored. This includes logistical and infrastructural considerations regarding available storage options and storage capacities, the regulation of responsibilities, the definition of file and directory structures, but also the acquisition of funds for structured research data management.

M

That's quite a lot of information on this first point. How do you keep track of it all?

A

All this information flows into a so-called data management plan. This is a useful tool and a number of structured templates are available. In the Business Information Center of Schmalkalden University processes are designed. There are handouts in the form of a flowchart available that are helpful in creating data management plans. We will return to this in a later episode, as data management plans are now required by many funding providers in the application process for research projects.

M

And this is followed in second place by the "Data generation” section.

A

Right. Research data is generated through experiments, measurements, observations, simulations, surveys or other processes. This varies greatly from subject to subject. In this context, it is important to obtain the consent of the data owner when using third-party data or to check the license restrictions.

M

As a small note: We will also return to the topic of the legal aspects and licensing of research data in a later episode.

A

That's right. Which brings us to the third section, "Analyzing and processing the research data”. The responsibility for this lies with the researchers themselves. The associated processes include, for example, digitizing, transcribing, checking, validating, interpreting and, in the case of personal data, anonymizing or pseudonymizing the data.

M

And how do I know in the end how I got from my raw data to the processed data?

A

This is a very important aspect. So, we need a kind of description for our data: This is the Metadata. Put simply, Metadata is data about data. It’s collection is a core component of processing research data. Metadata plays an important role in the retrieval and re-utilization of research data.

M

Can we perhaps give an example of the use of metadata?

A

The metadata of digital photographs is well known. They show when the photo was taken and with which camera settings. The GPS coordinates provide information about the location.

M

And the topic of metadata will also be discussed in more detail in a later podcast episode.
And that brings us to section four of the research data lifecycle, the "sharing and publishing" of research data.
What needs to be considered here?

A

Before data is shared or published, copyright and access rights, patent rights or licenses should be defined. If data is to be published on a research data repository, that means a special public or institutional server, it is possible to exert a targeted influence in this respect when selecting the repository. Persistent identifiers, or PIDs for short, can be created to uniquely identify and reference the data. PIDs make it easier to find the data in the network. A special form of PIDs are, for example, Digital Object Identifiers, DOls, which some of you may have already come across in literature.

M

What do I ultimately gain from sharing or publishing my research data?

A

The published research data are a direct expression of the success of the research and enhance not only the researcher's own reputation, but also that of the researcher’s scientific institution. This should be expectation and incentive enough. In addition, dissemination in existing networks and communities opens up collaboration opportunities for future research projects.

M

That is plausible. And this brings us to the fifth section: Archiving research data. Why should research data be archived?

A

Quite clearly: To make scientific results comprehensible in the long term.

M

What does long-term mean?

A

Longer term means at least 10 years. Based on the guidelines of the German Research Foundation on "Good Scientific Practice”, this requirement is already being implemented by most universities and colleges in internal guidelines on the "Handling of Research Data", including at Schmalkalden University of Applied Sciences.

M

What is archiving?

A

For archiving, the data or different versions of data are copied to long-term storage. There they are stored permanently and securely. Then the original data can be deleted. Archiving storage is not necessarily designed for short-term access but is in fact often designed as a kind of data depot with correspondingly delayed access times. We will return to the topic of archiving in a later episode of the podcast.

M

Which data should be archived?

A

In principle, all important project data should be archived after completion of a research project. This allows you to fulfill your obligation to provide evidence later.

M

This leaves us with the final stage of the research data life cycle, the “Re-usability of research data".
What does this mean and who uses the research data later?

A

Re-useability means that published research data, for example research data in data journals or repositories can be used later by the researchers themselves or by third parties, with or without restrictions. Contextual re-evaluation of research data, for instance viewing it from a different perspective, can open up completely new research perspectives and approaches. Time-consuming and cost-intensive preliminary investigations are reduced, and the overall scientific output of the research is improved both qualitatively and quantitatively. The owners of the research data can decide for themselves who may use the data, restricted or not.

M

That was a lot of information about the research data lifecycle. In the end, can we derive a recommendation on how to handle research data correctly and make the best use of knowledge about the research data lifecycle?

A

There are also the international FAIR principles. FAIR spelled F-A-I-R is an acronym. The letters stand for:

F for findable

A for accessible

I for interoperable and

R for reusable.

This describes guidelines for handling research data so that its reuse is suitable for both humans and machines. We have heard some of the means of doing this in today's discussion of the data lifecycle. Data that complies with the FAIR principles is also called FAIR data. A term that is worth remembering.

M

Perhaps we can summarize once again how we make our research data FAIR?

A

Findability is ensured by assigning persistent identifiers, such as a DOI or an ORCID, the Open Researcher and Contributor Identification Number, to the author. The accessibility of the data is ensured by the licensing of the data and long-term storage method. The metadata also makes a decisive contribution to this. Interoperability is achieved by using open and free data formats with long-term usability. The use of standardized terms, that means special vocabulary, is also an important aspect. This applies equally to data and metadata. Many of the points already mentioned are important for the reusability of research data, such as the use of open file formats, structured metadata, standardized vocabulary or machine- readable licenses. We will go into the licenses in more detail in a later episode.

M

Mr. Fehling, thank you for the interesting explanations.
In the next episode of our podcast, we will be looking at
Open Science, Open Access and Open Data.
See you next time.

Podcast Blechhammer

Open science - free access to information.

Svetlana Knaub seeks clarity.

Information to read along

M

Welcome to the fourth episode of our Research Data Management podcast.
In the third episode, we talked about the data lifecycle, the FAIR principles and FAIR research data.
Today, we will take a look at the relationship between Open Science, Open Access, Open Data and research data management.
What does one have to do with the other?

A

There is no precise definition of the term "Open science". Open science is intended to promote the transparency of scientific processes in general and access to scientific information in particular. Individual elements of the research process are freely accessible. This includes, for example, publications, laboratory reports, software and research data. The barrier-free exchange of scientific findings thus enables a higher quality of science and is part of the digitization strategy of both, German federal states and the European Union.

M

What does this mean for us as a University of Applied Sciences
with regard to the practical implementation of research results in industry?

A

The economy naturally benefits directly from an easier transfer of scientific knowledge. Innovative strength and competitiveness are improved and the quality of industry-related research is enhanced, which also benefits future collaboration projects. However, Open Science does not only affect research that can be profitably implemented in industry, but all scientific disciplines.

M

And how can Open Access and Open Data be categorized in relation to Open Science?

A

Open Science is a generic term for a group of measures that all aim to improve access, dissemination and re-usability of scientific knowledge. In addition to Open Access, this also includes Open Source and Open Data. Open Access aims to achieve unrestricted access to scientific publications. Open Source considers the reuse of software and is probably already familiar to many. Finally, Open Data endeavours to make research data freely available.

However, further thought is already being given to Open Hardware for experimental setups, Open Services for support services and Open Educational Resources for teaching materials, opening up new fields of action in the sense of "free availability".

M

And where do we stand as a University of Applied Sciences in this process?

A

Since 2021, Schmalkalden University has had an Open Access Policy, that means a guideline or recommendations on this topic, which can be viewed on the university's website. The university names Open Science and Open Access as part of its canon of values. All members of the university are called upon to participate in the realization of the Open Science concept within the scope of their possibilities. This implies, for example, submitting publications to Open Access journals, permanently securing the exploitation rights of electronic publication versions, that means not assigning them to publishers or publishing freely accessible. A distinction is made here between the "Golden Path", which describes the first publication in an Open Access medium, and the "Green Path" for the second publication of scientific work simultaneously with the first publication or afterwards in an Open Access repository, such as the Digital Library of Thuringia or the Thuringian Research Data Repository REFODAT, which is now available.

M

Where can I get advice on Open Access Publication channels at our university?

A

The first point of contact is the library, which also provides the relevant information on its website. In 2021, Schmalkalden University joined the nationwide DEAL agreement with Springer Nature, which has been extended until 2028. This provides campus-wide access to around 2000 journals published by Springer. Articles by first authors of the university in Springer Closed Access Journals, that means subscription journals, are available worldwide Open Access and the publication fees in pure Open Access journals from Springer Nature are paid by the state of Thuringia. There is also an Open Access publication fund for the state of Thuringia. In any case, you should inform yourself in advance about the publication and exploitation rights. There are also numerous financing models for Open Access Publications.

M

And what about the implementation of the Open Science concept
regarding to Open Data and research data at Schmalkalden University?

A

The Open Access Policy recommends storing research data in a way that is findable, accessible, interoperable and reusable in accordance with the FAIR principles. We talked about how this can be achieved in the last episode.

M

I would like to thank you for your insights into the topics of "Open Science, Open Access and Open Data".
The next episode is about data documentation and the importance of metadata in research data management.
Until then, we say goodbye.

Podcast Blechhammer

Metadata describes the world of data.

Svetlana Knaub questions the details.

Information to read along

M

Welcome to the fifth episode of our Research Data Management podcast. Today, the topics of
"data documentation and metadata" will be discussed in more detail in the context of research data management.
As briefly mentioned in the 3rd episode of our podcast, metadata is data about data.
Using the example of a digital photo file, Mr. Fehling has already explained that, for example,
the date, aperture or GPS coordinates are such data. What is the significance of data documentation and
metadata for research data management?

A

On the one hand, data documentation is important for the reproducibility of research in terms of good scientific practice, but on the other hand, it is also important for the subsequent use of research data. If it is not known under what conditions the data was created or what it says, it is practically worthless. The data used to describe research data is called metadata. Metadata is therefore data about data that is indispensable for interpreting the research data, that means for understanding it. Ideally, they are both human-readable and machine-readable and thus enable the data to be interpreted by technical systems. And a dataset that cannot be found or is difficult to find due to missing metadata cannot be reused. This would eliminate a core element of effective research data management.

M

Let's summarize: Metadata should ideally... ?

A

... be structured, standardized and machine-readable. Only by describing the data with metadata can the research comply with the FAIR principles. Ultimately, each dataset can only be as useful as the metadata that describes it.

M

We remember episode 3 of our podcast, where the FAIR principles were discussed in more detail.
Can you briefly mention them again, as we are sure to come across them more often?

A

Of course. FAIR is an acronym that summarizes the requirements for the preparation of research data. It means:

F for findable

A for accessible

I for interoperable, that means it can be processed across platforms

and

R for reusable.

M

What information about the research data should definitely be included in the metadata?

A

There is the "5 W rule":

Who, What, Where, When and Why?

So, WHO created the data and HOW, WHAT does the data say, WHERE was it created, WHEN and for WHAT purpose, that means WHY.

This makes it clear that metadata is created at all stages of the research data lifecycle, starting with planning, through data collection, data analysis, data archiving or storage and subsequent use. The research data is fully described with information on the research project, the relevant data set and the files it contains. This project-related information is set out in data management plans, which are already mandatory for many funding providers when applications are submitted. We will discuss this in a later episode of our RDM podcast.

M

What options are available to the researcher for creating metadata?
You mentioned that, ideally, metadata should be stored in a standardized way.
Are there tools for this?

A

A simple form is the creation of a README file. Some will be familiar with this format from software, where important information about authorship, version or licenses is stored. Similarly, a README file for research data contains descriptive information about the research data. Keyword: 5 W. The README file is often available in Markdown syntax. There are corresponding templates available on the internet on the "GitHub" platform. Another option is the codebook, which contains information on all variables of a data set. Imagine a table in a non-proprietary file format, that means it can be freely used. For example, the "comma separated value" - csv -. There should not be several tables on one sheet, title lines, comments, blank lines, evaluations and special characters should be omitted, and values should be sorted by number and unit of measurement. This is also called "well-structured data".

Well-structured metadata can be obtained by using metadata schemas. These are available as templates. They can be generic, that means generally valid, or subject specific. Administrative and bibliographic metadata can be standardized across disciplines. The creation of process metadata and descriptive metadata is more demanding.

M

Can you give some examples of frequently used metadata schemas?

A

A well-known generic metadata standard is "Dublin Core". The originator is the Dublin Core Metadata Initiative. It describes the data history using 15 core fields. All fields are optional and can be extended if required, so that you can tailor the standard to your data. Another generic metadata standard is the Data Cite Metadata Generator. It creates data documentation in XML format on a question-and-answer basis and is based on Dublin Core. It is maintained by the Data Cite consortium.

M

And the subject-specific metadata standards?

A

Of the subject-specific metadata standards, CMDI that means Component Metadata Infrastructure for the field of "Artificial Intelligence" and Eng-Meta that means Engineering Metadata for the engineering sciences should be mentioned.

M

Where can I find something about available metadata standards?

A

A good overview of metadata standards is provided by the Metadata Standard Catalog of the Research Data Alliance, an international organization with the aim of promoting the open exchange of data. The website "FAIR Sharing.org", a curated website on data and metadata standards, and the "Digital Curation Centre", a British organization focusing on data management and digital archiving of data, should also be mentioned.

M

The metadata schema therefore specifies how the information on my research data is structured.
Is it up to me which terms or keywords I use for this?

A

That is an important point. The content should also meet certain standards. Special vocabularies and terminologies are available for this purpose. This is intended to bring different or incorrect spellings to a common denominator or to correct them. The terms are organized into categories called taxonomies. These categories can then be related to each other in a model-like manner to form ontologies. The result is a network of knowledge on a topic or across disciplines, which can be used easily, efficiently and without contradiction due to its standardization. The facts are greatly simplified in this context and are really more complex.

M

And where can I find more detailed information on this topic?

A

One example is the NFDI4ING Terminology Service of the National Research Data Infrastructure, a service provided specifically for the engineering sciences. Here, subject-specific terminologies for different areas of engineering are developed and networked. The subject areas are divided into 7 archetypes, which are all abbreviated by first names. The archetype DORIS, for example, stands for High Performance Measurement and Computation.

M

And what exactly does the National Research Data Infrastructure do?

A

The National Research Data Infrastructure, abbreviated NFDI, is a non-profit association founded in 2021 and funded by the federal and state governments of Germany. The aim is to make research data usable in the long term through networking. To this end, research institutions from various fields work together. The NFDI provides services, training courses and standards for handling data. The NFDI is divided into 5 sections. One of them is called "Metadata, Terminologies and Provenance". In each section, several subject-specific consortia work together thematically. As of 2024, there are 27 consortia in total.

M

But back to the topic of metadata. Where is the metadata stored?

A

The metadata is stored directly with the data it describes. This can be directly in the file, as with a photo or linked to the actual data.

M

And how do I find the metadata or the data in my search?

A

Metadata is assigned a persistent identifier or PID upon publication. For instance, the Digital Object Identifier, abbreviated DOI, of publications is well known. This creates the link between metadata and research data. The findability of the metadata itself is realized through registration and indexing in a metadata directory. This can be searched for information. It is important to note that metadata remains available even if the actual reference data no longer exists, perhaps because the server is offline or the archiving period has expired.
This means that important information on data history and usage rights is available even without the actual data. We will return to this topic in a later episode on the subject of "Publishing research data and repositories".

M

Mr. Fehling, thank you for the information on "Data documentation and metadata".
The data management plan and useful tools are the focus of the next episode. Bye.

Podcast Blechhammer

Research data and projects: Good planning is half the battle.

Svetlana Knaub finds out why.

Information to read along

M

Today we are looking at the topic of data management plans, which we briefly touched on in the last episode on
"Data Documentation and Metadata". Let's start with a simple question: What is a data management plan?
I suspect the answer is a little more complex.

A

A data management plan is a project-related document that regulates the handling of research data throughout the duration of the project and beyond. The word "plan" already implies that this is structured information. It begins with general information about the project, such as project management, contact details, duration and funding line, the planning of the expected data volume, the possible subsequent use of third-party data, the regulation of responsibilities for data backup, curation, publication and deletion, the documentation of necessary storage structures through to legal and ethical aspects, for example in the case of the processing of personal data. However, cost factors can also be part of a data management plan.

M

That sounds like a lot of preparatory work.
What additional benefits does a data management plan bring to the actual project work?

A

The data management plan is an efficient tool for project processing. It provides a guideline for handling data and creates more time for actual research by streamlining data management. It also enables improved collaborative work within or between working groups through the standardized handling of research data.

M

So, is it a kind of project-related standardization in the handling of research data?

A

Exactly. Especially in the data-driven age, handling large amounts of data often leads to problems and it is becoming increasingly difficult to maintain an overview. Keyword: Re-useability of research data. For example, if you are looking for data from previous projects where you weren’t included directly, a data management plan is a good guide. As a dynamic document, it is updated over the entire duration of the project and beyond.

M

Nevertheless, it seems to me that persuasion is still necessary for consistent use.

A

First, it should be noted that many funding societies require a data management plan or statements on the handling of research data when applying for a project because of the interests of good scientific practice. These include the Volkswagen Foundation, the German Research Foundation, the European Union and the Federal Ministry of Education and Research. In addition to the rules for good scientific practice addressed by the German Research Foundation, which include traceability of the research process, and the data generated, compliance with the FAIR principles already mentioned in previous podcast episodes is crucial. Research data should be findable, accessible, interoperable across platforms and reusable.

M

German Research Foundation and European Union projects are rather untypical for our university.
So, does Data Management Plan remain the exception rather than the rule at universities of applied sciences?

A

At universities of applied sciences, research is traditionally often conducted within the framework of industry-related cooperation projects. This results in clear differences to basic research at universities. It remains to be seen whether the use of data management plans in research at universities of applied sciences will become established or whether their use will become mandatory at some point. The effort involved will certainly decrease with each successfully implemented Data Management Plan compared to the added value generated in project processing. Tools and local research data management contact points that help with the creation of Data Management Plans will certainly also ensure this.

M

What tools are these and how can they be used?

A

First, we distinguish between generally applicable, that means generic, and subject-specific Data Management Plans. Sample templates are available on the Internet for both types. As universities have a head start of several years over the Universities of Applied Sciences in terms of research data management, such Data Management Plan-samples are freely accessible on the websites of many universities.
The problem is rather that one is spoilt for choice. Sample Data Management Plans from the Federal Ministry of Education and Research, Horizon 2020 of the European Union and the Volkswagen Foundation are available in pdf- or rtf- format on the website of the Humboldt-University Berlin.
The templates from RWTH Aachen University and the University of Greifswald are examples of university Data Management Plan templates.
You can also find many Data Management Plans on the online storage service "ZENODO" hosted by the European Organization for Nuclear Research CERN in Geneva.

The keyword search "Data Management Plan" alone brings up over 1700 hits. Last but not least, the website "DMP online" provides an overview of public Data Management Plans from many areas of research that were created using "DMP online”.

Caution: Please do not confuse this with the website for "Disease Management Programs" for the treatment of chronically ill people, which operate under the same abbreviation!

M

That's already a lot of information about Data Management Plans.

A

And by no means all of them. But that would go beyond the scope of this podcast. Nevertheless, one important representative should still be mentioned.

The Research Data Management Organizer - RDMO. The RDMO is a very comprehensive tool that supports research projects in all research data management tasks, from the planning phase to implementation and administration. It was developed as free software by the Leibniz Institute for Astrophysics Potsdam and the Potsdam University of Applied Sciences as part of a German Research Foundation project. It is Open Source-software. With RDMO, project-specific data is entered using a question-and-answer system and is relatively convenient for the user. Answers can be skipped and added later, and versions can be saved.
The RDMO enables simultaneous collaborative work on data with selective assignment of access and naming rights as well as storage locations. The documentation of copyrights and personal rights at data record level ensure optimal subsequent use of the data. You can choose between generic, founder-specific and institution-specific questionnaires.

It should be noted that a specific RDMO template is available for engineering sciences. Unfortunately, as of 2024 there are only a few institutions in Thuringia that host the RDMO. A test instance is currently running at the University of Jena. Employees of the University of Erfurt can use the RDMO via the website "forschungsdaten.info" via “single sign on” account. Otherwise, you can register with several freely available RDMO instances or use your ORCID account.

M

What is an ORCID account?

A

ORCID: O-R-C-I-D stands for Open Researcher and Contributor Identification and is a persistent identifier that links researchers with their publications and research data. Research results can thus be easily assigned to researchers.

M

Now, of course, it is interesting to see what Data Management Plan support
is offered by the local research data management at Schmalkalden University.

A

First, we offer support in the selection of suitable templates and the processing of Data Management Plans. In addition, as part of the digitization strategy, a description of the "Research Data Management Process" has been added to the process information portal of Schmalkalden University. Employees of Schmalkalden University can find more information there. The content for creating the Data Management Plan, contact points, information channels, responsibilities at Schmalkalden University and links to further information is processed in a flow chart. The path to your own Data Management Plan is practically mapped out. The "Research Data Management Process" is part of the Research and Transfer Department.

M

And that brings us to the end of today's episode.
Next time we will learn about publishing research data, research data repositories and archiving research data.
Until then, we say goodbye.

Podcast Blechhammer

Where to put the research data? Publishing and archiving!

Svetlana Knaub wants to find out more.

Information to read along

M

Research data management at Schmalkalden University continues.
Today with information on the topics publication, repositories and long-term archiving.
Where do we start?

A

Perhaps with the fact that the basic requirement for the re-useability of research data is its publication. Of course, it must then also be possible to find it. In the last episode of our podcast, we talked about how to archive this using suitable metadata. At the beginning we should also address the reservations about publishing research data. Also reservations about publications based on other people's research data, which is often justified by the additional work involved in anonymization, for example.

M

What are the advantages of publishing research data over these reservations?

A

Publishing your own research data creates a better reputation for both researchers and research institutions. Research becomes more comprehensible in terms of good scientific practice. Easily accessible research data is more widely disseminated and perceived. From a cost and funding perspective, research becomes more efficient. Duplication of the same work is avoided. The dynamics of research processes are increased and results become effective more quickly.
It is also important to emphasize that the publication of research data counts as an independent scientific publication. Finally, the requirements of funding organizations or trade journals for the publication of research data should be mentioned, as well as the information requirements of the community. The publication of research data is increasingly expected for serious research.

M

We have already established that research data is generated over the whole duration of a research project.
Which of these data are of interest for publication?

A

Of course, not all research data is published. One criterion is whether the data must also be available after the end of the project, for instance for legal or verification reasons in the sense of good scientific practice. There is also "non-reproducible data" that is worth preserving in any case. This has nothing to do with false results, but with the fact that the conditions under which the data was collected cannot be repeated. Example: survey data. Of course, the cost factor also plays a role, because curating data consumes money.

We also must differentiate between publishing and archiving. The purpose of publishing is to share research data and make it widely accessible. This gives other scientists the opportunity to integrate external research data into their own work and to gain new or further insights.

M

And it opens opportunities for collaborations and new projects.

A

Of course. Where research topics overlap, there is the possibility of joint work. Pooling resources increases the quality of research. In addition, the publication of research data makes research more transparent. Results are checked for reproducibility, which improves confidence in the reliability of research.

M

And in contrast to archiving? What are the main objectives of archiving?

A

Archiving research data, especially unprocessed raw data, ensures the traceability of research. This is known as data integrity. The data is backed up over a long period of time, usually at least 10 years, so that the person responsible for the data can access it at any time. This access can take more or less effort. In some places, data is archived on tape drives, for example. Such media are not suitable for short-term access in day-to-day business. In any case, detailed data documentation and description are also important when archiving.
We have already gone into this in more detail in the previous episode on metadata. It will be the exception rather than the rule that the researchers who collected the original data will still be available after 5 or 10 years. In this respect, archiving and detailed documentation ensure long-term understanding and re-usability of the research data.

M

Both publishing and archiving research data are therefore part of good scientific practice
and bring benefits for the entire research and research community.

A

Yes, that’s the final conclusion.

M

What do you need to keep in mind when publishing and archiving research data?

A

Especially for research at Schmalkalden University with industry-related projects and collaborations with industry, value is placed on exploitation or patent rights. Data can be embargoed so that it cannot be used by third parties before a certain period has expired. Research for and with companies always secures them competitive advantages, which they rightly claim for themselves. At Schmalkalden University in particular, rights and responsibilities for data publication must therefore be regulated at the project planning stage. If published data has to be changed or supplemented, care must be taken to ensure correct and traceable versioning. The data formats should be non-proprietary, that means not restricted to commercial software.

M

And when archiving?

A

Archiving can be done in the university's local infrastructure or in externally hosted storage solutions like cloud storage. Like the backup strategies according to the 3-2-1 rule, a storage solution that is locally separated from the storage structures used on daily basis is recommended. As archiving can delete all copies of the data in other storage locations, particular care and reliability must be taken here. Archived data must be curated, that means its readability must be checked and guaranteed over long periods of time.

M

If I have decided to publish my research data, what options do I have?

A

The common practice of publishing data as an appendix to articles in scientific journals is probably well known. There are also pure data journals for the publication of research data, where the data history is described in a paper. The best methods are repositories, special databases for the publication and management of research data. These can be accessed via interfaces.

A distinction is made between generic, that means cross-disciplinary, subject-specific and institutional repositories. Before publication, the data history is described via metadata in a search mask, which can be individually extended in some cases.

M

How do I choose a suitable repository for my research data?

A

Several criteria can be used for this. Firstly, the use of certified repositories makes sense and builds trust. Persistent identifiers should also be assigned to the data sets so that the data and any linked articles can be found quickly and easily. Another point is the question of possible licensing and usage rights that you want to assign when publishing. We will go into the topic of rights and licenses in more detail in a later episode of our podcast.
Another question: Is access to the data via the repository open or restricted? Some repositories have their own policy that regulates the handling of research data.

M

Can you give some examples of research data repositories?

A

In this context, I would like to refer to the Thuringian Research Data Repository REFODAT, which is now available.
In addition, the website "re3data.org" provides a database of globally available repositories in various subject areas with over 3,000 entries. Enter "engineering" in the search mask, for example, and you will receive over 700 hits for possible repositories.

When selecting repositories, it should be mentioned that the operating institutions or countries are trustworthy and guarantee stable conditions in the long term. When uploading large amounts of data, the required bandwidth should also be available across borders.

M

That was a lot of useful information on the topic of publishing and long-term archiving
of research data and repositories. As our contributions can only serve as a source of inspiration,
the local research data management team at Schmalkalden University
will be happy to answer individual questions or provide advice?

A

We are always happy to answer questions in connection with research data management.

M

Thank you for the useful information.
Our topics for the next episode are licenses and data protection.
We thank you for your attention and hope to see you again in the next episode.

Podcast Blechhammer

Watch out! Protecting and licensing research data.

Svetlana Knaub asks how best to do this.

Information to read along

M

Today we are looking at the FDM topics of licenses, data protection and tools for it.
These topics follow on directly from our last topic on the publication of research data.
What is the purpose of licensing and what options do we have for licensing research data?

A

The first and important thing to note is that the local research data management staff at Schmalkalden University are not permitted to offer legally binding advice. We can only provide information on the facts and possibilities relating to exploitation rights and licenses. The legal and data protection departments at universities and universities of applied sciences are responsible for providing legally binding information.

Perhaps we should start with copyright law, which also protects academic work in Germany. It includes the absolute right to the protection of intellectual property in ideal and material terms. Important here is the concept of intellectual creativity, which means the intellectual originality of the work. Pure measurement data or machine-generated raw data, for example, are not subject to copyright, but processed data, texts and graphics are. This makes it clear that the legal protection status of research data can change during the project work.

In Germany, copyright is inalienable, always remains with the author and only expires 70 years after the author's death. Only rights of use can be contractually assigned. Statements on exploitation rights are important for the subsequent use of research data and create security and transparency, regardless of national legislation. So-called "open licenses" are often used for this purpose. These include Creative Commons licenses, Open Data Commons licenses and software licenses.

M

What exactly are the contents of these licenses?

A

Let's start with the "Creative Commons or CC licenses": These are usually provided free of charge by the non-profit Creative Commons organisation in the USA in the form of standard licensing agreements. They are tailored to a variety of different works and enable the author to protect their rights in a simple way. A distinction is made between the core licenses depending on the restrictions:

CC0 is without claimed rights, that means practically public domain

CC BY is with attribution of the author

CC BY-SA is with attribution, SA stands for "Share Alike" meaning redistribution under the same conditions

CC BY-ND is with attribution, ND stands for "No Derivatives" meaning no derivatives of the work are permitted

CC BY-NC is with attribution, NC stands for "Non Commercial" use of the work is permitted

and the combinations

CC BY-NC-SA: with attribution, no commercial use and distribution under the same conditions and

CC BY-NC-ND means with attribution, commercial use and modification of the work are prohibited.

M

Are these CC licenses applicable and valid worldwide?

A

As legal systems differ globally in terms of copyright law, country-specific adapted or "ported" CC licenses are offered. The ports therefore adapt the CC licenses to local law in such a way that a uniform legal framework ultimately applies on an international scale.

The CC licenses are versioned up to summer 2024 version 4.0 with international validity.

However, there are certainly points of criticism of the CC concept, for instance with regard to comprehensibility for legal laypersons or the combination of incompatible licenses from different source works into a new work under a "share alike" edition. If you use several differently licensed data sets for your project, the licenses of the individual data sets must be compatible with each other. For example, a dataset published under CC0 with a public domain release is not compatible with a dataset published under the CC BY-ND, that means “No Derivatives” license.

M

The "Open Data Commons" licenses were also mentioned.
What is meant by this?

A

The Open Data Commons license provides license agreements for open data or databases. They are managed by the Open Knowledge Foundation. The German branch of this global organization is responsible for providing Open Knowledge for the digital civil society. As of summer 2024, three of these licenses have been published.

The Public Domain Dedication and License, abbreviated PDDL, is the counterpart to the CC0 license of Creative Commons and enables the assignment of all copyrights.

The Open Database License, abbreviated ODbL is comparable to the CC BY-SA which means “Share Alike” license and permits the use, copying, distribution and modification of objects from the database. The creation of a new database by modifying and transforming the licensed database is permitted if

- the author of the database used is named,

- the database is passed on under the same conditions and

- the newly created database is available without restrictions.

And finally, the Open Database Common License ODC BY, which complements the CC BY of Creative Commons, with the obligation to give credit and all the variation options mentioned for the Open Database License.

M

What about the licensing of software research data?

A

Keywords "level of creation" and "originality": Software codes are protected by copyright, that means the programmer holds the rights. Like other works, software licenses define the terms of use for copies, modifications or distribution. In addition to the Creative Commons and the Open Data Commons, there are special licenses, for instance:

The "MIT License" from the Massachusetts Institute of Technology is a permissive and frequently used Open Source-license. It can be used for the further development of software. Source codes with an MIT license may be integrated into proprietary software.

Due to its permissiveness, the MIT license can be combined with many other Open Source licenses. The requirements are the naming of the author and the original license text in the modified or newly created software.

The "GNU General Public License" allows the user to use, modify and re-distribute the software. GNU-licensed software is free software whose unmodified or modified distribution retains all rights and must be passed on. In copyright law, this is referred to the Copyleft License.

The "Apache license" is also a free Open Source-software license, but without copyleft. That means modified versions can be passed on or distributed under a different license. The reference to the original licensor remains important. The original Apache license must be included in the new work.

It permits the free use, modification and distribution of the software licensed in this way without restrictions, as well as distribution or sale. In addition, other software can be combined with the Apache-licensed software and the product can be distributed or sold.

These are just a few examples of software licenses. We are also happy to put interested parties in touch with legal experts to find out more about the subject.

M

What should you do if no licenses have been granted for research data that you want to reuse?

A

Then you have the option of contacting the rights holder or author and requesting the terms of use. In any case, you should clarify the re-use from a legal point of view, as copyright infringements can be expensive due to claims for damages.

M

What are the basic points to consider when choosing a license for your data?

A

Here, too, the principle applies: as open as possible, as restrictive as necessary. The data should be disseminated as widely as possible for the reasons already mentioned. Even in the case of open source for research colleagues, commercial use can be subject to fee-based restrictions. In this context, several licenses can also be set up: Open Source for academic use, fee-based for commercial use.

M

What tools are available for selecting and managing suitable licenses?

A

Also, in this case we only can recommend something: First, the commercially available tool FOSSA for Open Source-license management should be mentioned. It can be integrated into the software development process and used to check license compatibility. It can be used to scan code for license information and its compatibility.

Fossology is an Open Source toolbox that extracts license information from source code. It automatically identifies license-relevant statements from the code and highlights the text passages. The filtered information is documented. This includes licenses in texts, headers, metadata and copyright notices.

The “Choose a License Tool” is a decision-aid when choosing a license for your software.

M

Another chapter of today's podcast is the topic of data protection.
What is covered by data protection?

A

Data protection plays an important role at university. When handling personal data, the anonymity of those involved must be preserved. This requires anonymization or pseudonymization of data so that data cannot be assigned to specific natural persons or can only be assigned with disproportionate effort. This is how it is formulated in the Federal Data Protection Act. This applies to data that is collected in the health sector, for example, or to data in which natural persons are recorded in video or audio formats in public spaces.

Against this background, the National Research Data Infrastructure NFDI is in the process of creating a platform for the economic and social sciences that will provide help in making decisions about personal data. The BERD consortium with BERD for Business, Economic and Related Data is developing the IVA Tool with IVA for "Interactive Virtual Assistant", which makes it easier to decide whether personal data is available or not based on an algorithm.

M

What does the legislator say about safeguarding personal rights in the case of personal research data?

A

The protection of natural people with regard to the processing of personal data is a fundamental right. Personal data is any information relating to an identified or identifiable natural person.
If there is a personal reference, the data subjects must sign a declaration of consent for the further processing of the corresponding data. There are model forms for these declarations of consent.At European Union level, the General Data Protection Regulation (GDPR) has been directly applicable in all European Union member states since March 2018, that means the regulation takes precedence over local national law. However, certain legal freedoms are granted to the nation states within the framework of opening clauses.

In case of doubt, it is advisable to consult the university's data protection officer. Information on this is available from the Research and Transfer Department at Schmalkalden University.

M

In the next episode we will look at the special features of RDM in collaborative working.
Until then we say goodbye.

Podcast Blechhammer

Research data management at Schmalkalden University of Applied Sciences introduces itself. Svetlana Knaub enquires.

Podcast Blechhammer

Research data and research data management - getting started. Svetlana Knaub finds out how.

Podcast Blechhammer

Research data management - preferably FAIR! Svetlana Knaub finds out how this works.

Podcast Blechhammer

Open science - free access to information. Svetlana Knaub seeks clarity.

Podcast Blechhammer

Metadata describes the world of data. Svetlana Knaub questions the details.

Podcast Blechhammer

Research data and projects: Good planning is half the battle. Svetlana Knaub finds out why.

Podcast Blechhammer

Where to put the research data? Publishing and archiving! Svetlana Knaub wants to find out more.

Podcast Blechhammer

Watch out! Protecting and licensing research data. Svetlana Knaub asks how best to do this.

Research data management at Schmalkalden University of Applied Sciences introduces itself.

Svetlana Knaub enquires.

Research data and research data management - getting started.

Svetlana Knaub finds out how.

Research data management - preferably FAIR!

Svetlana Knaub finds out how this works.

Open science - free access to information.

Svetlana Knaub seeks clarity.

Metadata describes the world of data.

Svetlana Knaub questions the details.

Research data and projects: Good planning is half the battle.

Svetlana Knaub finds out why.

Where to put the research data? Publishing and archiving!

Svetlana Knaub wants to find out more.

Watch out! Protecting and licensing research data.

Svetlana Knaub asks how best to do this.