Open science and qualitative research evaluation

”Responsible research evaluation must look past abstract quantitative indicators and examine research in its true context, which requires qualitative research evaluation approaches and methods.” In this blog article, Joona Lehtomäki, a science adviser at the division of strategic research at the Academy of Finland (Research Councils Finland), writes about research evaluation, role of metrics, impact of research, open science and the qualitative turn in research evaluation. Lehtomäki also outlines what would a turn towards more contextualized and qualitative research evaluation mean in practice.

Text: Joona Lehtomäki

Proponents of open science welcome a future where the word ”open” is no longer needed and what now are features of open science will become ”just” science. If you share this quasi-utopian vision, you probably know that this future is precarious and needs active work both within academia and beyond. Doing research in the open not only lacks incentives, but it can even become a career impediment because of how academic merit is currently distributed. The way in which researchers are evaluated – e.g. how grants are won and how tenures are secured – therefore matters immensely in determining when the future described above will arrive.

Not everything that counts can be counted

Evaluation is certainly familiar to all academics. Research funding is granted and papers are published based on peer-review. Research groups, departments and whole universities are evaluated based on increasingly quantitative metrics. What can well be described as tyranny of metrics is not a proprietary feature of academia as it permeates much of modern society, including schools, medical care, businesses and government. All are increasingly managed by quantifying individual performance in a way or another. In academia, this leads to all sorts of maladies such as publish-or-perish culture and increasing reliance on metrics such as journal impact factors (JIF) or the h-index.

The adverse effects of relying too much on quantitative metrics are well-documented. In response, international initiatives such as the San Francisco Declaration of Research Assessment (DORA) and Leiden Manifesto have gained traction. Neither of the initiatives deny the usefulness of quantitative metrics. Rather, they stress the need of placing quantitative evaluation in its rightful place: in support of qualitative expert assessment. What researchers find out and how they arrive at their conclusions should carry more weight than where the results are published. The temptation and (often false) precision of quantitative metrics should be resisted and research should be understood in its context and through a qualitative lens.

What researchers find out and how they arrive at their conclusions should carry more weight than where the results are published. The temptation and (often false) precision of quantitative metrics should be resisted and research should be understood in its context and through a qualitative lens.

Signing the DORA, for example, means commitment to being explicit about the criteria used in research evaluation as well as emphasizing the content rather the venue of scientific outputs. Importantly for incentivising open science, the commitment also entails valuing the full suite of research outputs in addition to publications, including research data and code. Multiple initiatives are already paving the way for making diverse and open research outputs count. European Commission has developed the Open Science Career Assessment Matrix which contains a broad spectrum of open science related criteria for research evaluation. Similarly, Knowledge Exchange, a consortium of six European research infrastructure and digital service providers, is developing an Openness Profile as part of their Open Scholarship work to give credit to currently largely ignored contributions.

These are all necessary and good first steps. But while counting open access publications, open datasets and open source computational products can incentivize open science, it’s still counting (though I’m not suggesting the examples given above are just counting). Counting research outputs and getting credit for them is not bad per se. It can turn sour if counting is the only thing that counts. Without the evaluation of the contextual contribution of research outputs much of the potential and realized impact of research risks to be missed.

Counting research outputs and getting credit for them is not bad per se. It can turn sour if counting is the only thing that counts. Without the evaluation of the contextual contribution of research outputs much of the potential and realized impact of research risks to be missed.

The qualitative turn

Responsible research evaluation must look past abstract quantitative indicators and examine  research in its true context, which requires qualitative research evaluation approaches and methods. This is why the DORA recommendations explicitly mentions qualitative indicators and the Leiden Manifesto contains contextualization and qualitative judgement in many of its principles. In Finland, the Responsible Research Evaluation Guidelines being currently developed echo the same principles.

What would a turn towards more qualitative research evaluation mean in practice? A simple first step, for example, would be to replace full-blown publication lists with JIFs with a relatively short list of more diverse research outputs (including but not restricted to publications). This list would also include descriptions on why exactly these outputs are impactful and relevant for whatever the researcher is being evaluated for. This approach does not imply an immediate disruption of the whole evaluation system, but it does nudge it towards a more contextualized and diverse direction.

What would a turn towards more qualitative research evaluation mean in practice? A simple first step, for example, would be to replace full-blown publication lists with JIFs with a relatively short list of more diverse research outputs.

Dutch universities, research institutions and national funders have been progressive when it comes to more diverse modes of research evaluation. For example, UMC Utrecht provides an example of what ”reading instead of counting”, i.e. more qualitative and contextualized evaluation, might look like. They have developed and piloted a set of research evaluation indicators that focus not only on research outputs, but also on work related to research structures (leadership and culture, collaboration with stakeholders and continuity and infrastructure) and processes (e.g. setting research priorities, posing the right questions and design, conduct and analysis). The  indicators also explicitly account for the value of diverse outputs and outcomes open science can provide. Furthermore, the Association of Universities in the Netherlands has recently published a position paper elaborating a broader spectrum of academic recognition, including open science.

The UMC Utrecht example highlights two additional points, which I think are relevant also for open science. First, the change doesn’t need to happen everywhere at the same time and  individual institutions can take action even on their own. Second, when it does come to systemic change, the transition will not be easy or free. For example, in the case of UMC Utrecht, the organizers observed that ”[s]ince researchers and staff were largely unfamiliar with these more labour-intensive evaluation practices, this was initially not met with only enthusiasm”.

While calls for more qualitative research evaluation are well-justified, the costs of ”re-complicating” evaluation can be substantial. How do we embrace more qualitative and contextualized evaluation practices in a system, in which most evaluators (i.e. peer-reviewers) are voluntary and already overburdened?

While calls for more qualitative research evaluation are well-justified, the costs of ”re-complicating” evaluation can be substantial. How do we embrace more qualitative and contextualized evaluation practices in a system, in which most evaluators (i.e. peer-reviewers) are voluntary and already overburdened? There are no clear answers to this question yet and multiple things will need to happen. Researchers will hopefully find

Joona Lehtomäki hopes to see a shift of focus from research outputs to the whole impact pathway of research. Image: Laura Hiisivuori

doing more in-depth evaluation more motivating than the current way of spending only little time on a given research output. However, given the limited time and resources, this will probably mean cutting down the number of research outputs even radically. This shouldn’t be a problem if the whole point is to concentrate more the whole research process and its potential impact instead of the number of outputs.

Brace for impact

Much of the research evaluation today tries to nail down the impact of research. While ”impact” has been established in the vocabulary of most funders, research organizations and researchers, the definitions of the term remains elusive. Fecher and Friesike have recently launched an ambitious project on building a systemic view of research impact. In their view, the failure to reach research impact can often be attributed to the miss-conceptualization of the term ”impact” itself. However, some commonalities are emerging and impact could become a key concept in the transition towards more qualitative evaluation system that also values open science.

Much of the discussion related to the benefits of open science still revolves around relatively narrow conceptualization of impact, such as the citation advantage of open access publications. Concentrating on the myriad ways how research can have impact either within academia or the broader society is in fact valuing the whole knowledge production process.

Much of the discussion related to the benefits of open science still revolves around relatively narrow conceptualization of impact, such as the citation advantage of open access publications. Concentrating on the myriad ways how research can have impact either within academia or the broader society is in fact valuing the whole knowledge production process, not only particular parts of it like publications. Open science has a large role to play in the conceptualization of research impact. Open science is a great facilitator, if not a prerequisite, for understanding  the contextualized relevance of research and how outputs and outcomes came about.

It is my hope that the move towards more qualitative  research evaluation goes hand in hand with how research is planned, implemented and reported so that emphasis shifts from outputs to the whole impact pathway of research. This is a tall order, I know, but fortunately there are a lot of reasons to be hopeful.

It is my hope that the move towards more qualitative  research evaluation goes hand in hand with how research is planned, implemented and reported so that emphasis shifts from outputs to the whole impact pathway of research.


Joona Lehtomäki (ORCID, @jlehtoma) works as a science adviser at the division of strategic research at the Academy of Finland (Research Councils Finland). He has  a soft spot for open science and a complicated relationship with research impact.

One Reply to “Open science and qualitative research evaluation”

  1. Thanks, this is a very interesting text. However, I wonder if we should make a distinction between different functions of research evaluation (e.g. research grant reviews, article reviws in journals, evaluation of candidates for academic positions, evaluation of impact of research funding, and research assessments of HE institutions).

    Different quantitative indicators such as publication lists and journal level citation indicators don’t automatically dominate all of these. For example, we already have the qualitative turn in regard to research assessments of (Finnish) universities: quantitative data is used to inform qualitative assessment by review panels on different aspects of research in universities.

    It may also be that the dominance of quantitative indicators is field-dependent: it has longer and more profound roots in STEM fields while SSH fields still have other methods for evaluation. Of course the increased use of quantitative indicators based on data sources such as the Web of Science which are insufficient for SSH fields is a problem.

Comments are closed.