Nordic Agile Survey: Pandemic Agility

As posted earlier, we conducted a survey round in October–November 2020 including question items about the impacts of the pandemic and its potential relations to agility. We have now analyzed those results and published a new research paper in the XP 2021 workshop:

Impacts of COVID-19 Pandemic for Software Development in Nordic Companies – Agility Helps to Respond

The key findings showed that although the impacts have mostly been negative, it has not been all so. The pandemic has impacted different companies differently both in negative and positive ways. The majority of the responses indicated that agility has helped to respond to the situation.


We are currently preparing another survey round to be conducted by the end of 2021. In fact, perhaps unevenly, the pandemic is (still) affecting many companies, so a longitudinal study is warranted.

For more information about the Nordic Agile Survey, see here.

PROFESSOR OR ASSISTANT/ASSOCIATE PROFESSOR IN SOFTWARE ENGINEERING

The Department of Computer Science at the Faculty of Science of the University of Helsinki invites applications for a

PROFESSOR OR ASSISTANT/ASSOCIATE PROFESSOR IN SOFTWARE ENGINEERING

aiming to strengthen research areas in computer science at the university.

Position description
We are looking for a new professor or assistant/associate professor to carry out research into the field of software engineering.

The person to be chosen must have a strong scientific track record in the field of software engineering evidenced by publications in top tier forums of the field. The University of Helsinki and the Department of Computer Science offer excellent potential collaborators in the relevant areas, such as data science and other disciplines of different faculties of the university.

The appointee is expected to conduct research on an internationally high level. We expect a strong network of international and interdisciplinary cooperation and collaboration. In addition, strong record in empirical software engineering and particularly in close industrial collaboration is considered a significant advantage. The teaching area of the position is software engineering. The person chosen for the position is expected to teach and develop teaching particularly in the Master’s Programme in Computer Science and potentially also in the Bachelor Programme in Computer Science and the Bachelor’s Programme in Science. In addition, the appointee will participate in doctoral education at the Department of Computer Science, collaborate with other groups in the department, acquire research funding for their research group, and participate in the interaction between science and society at large.

More information: https://www2.helsinki.fi/en/open-positions/professor-or-assistantassociate-professor-in-software-engineering

MLOps research and thesis positions

Machine learning (ML) is becoming an increasingly common technology that is used in software-based systems. While software and software engineering practices are still a quintessential part of the design of such systems, the design space has data and ML and respective data scientist stakeholders as new concerns. Hence, MLOps — as a derivative of DevOps — is currently emerging to better integrate data and ML into the software engineering way of working, as well as integrate ML better in the software-based systems. MLOps is about both practices and tools for ML-based systems that technically enable iterative software engineering practice.

The ESE research group is strengthening its research focus in the area of MLOps. In particular, we are participating in projects that study MLOps, such as AIGA https://ai-governance.eu/ and recently started IMLE4 https://itea4.org/project/iml4e.html.

We have funded thesis positions in the area of MLOps in these research projects that can be tailored to the interest of the applicant. For details, contact Mikko Raatikainen (first.last@helsinki.fi).

Misbehaviour and fault tolerance in machine learning systems

Our interview paper on misbehaviour and fault tolerance in machine learning systems was recently published in Journal of Systems and Software. Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new situations and contexts. At the same time, this adaptability raises uncertainties concerning the run-time product quality or dependability, such as reliability and security, of these systems. Systems can be tested and monitored, but this does not provide protection against faults and failures in adapted ML systems themselves.

We studied software designs that aim at introducing fault tolerance in ML systems so that possible problems in ML components of the systems can be avoided. The research was conducted as a case study, and its data was collected through five semi-structured interviews with experienced software architects.

We present a conceptualisation of the misbehaviour of ML systems, the perceived role of fault tolerance, and the designs used. The problems in the systems rise from problems in inputs, concept drift, bugs and inaccuracies in the models, their faulty deployment, and not really understanding what the utilised model does. Common patterns to incorporating ML components in design in a fault tolerant fashion have started to emerge. ML models are, for example, guarded by monitoring the inputs and their distribution, and enforcing business rules on acceptable outputs. Multiple, specialised ML models are used to adapt to the variations and changes in the surrounding world, and simpler fall-over techniques like default outputs are put in place to have systems up and running in the face of problems.

However, the general role of these patterns is not widely acknowledged. This is mainly due to the relative immaturity of using ML as part of a complete software system: the field still lacks established frameworks and practices beyond training to implement, operate, and maintain the software that utilises ML. ML software engineering needs further analysis and development on all fronts.

The full paper can be read here (open access): https://doi.org/10.1016/j.jss.2021.111096

Validation Methods For AI Systems

Our systematic literature review on validation methods for AI systems was recently published in Journal of Systems and Software. Artificial intelligence (AI) – especially in the form of machine learning (ML) – has steadily, yet firmly, made its way to our lives. Suggestions on what to buy, what too see, what to listen – all works of AI. Easily available tools make these strong techniques implementable even to those with little to no knowledge or experience on the implications the intelligent components can have on the built systems. This begs the question: how to trust these systems?

The paper studies the methods to validate practical AI systems reported in the research literature. In other words, we classified and described the methods used to ensure that AI systems with potential and or actual use work as intended and designed. The review was based on 90 relevant papers, narrowed down from an initial set of more than 1100 papers. The systems presented in the papers were analysed based on their domain, task, complexity, and applied validation methods. The systems performed 18 different kinds of tasks in 14 different application domains.

The validation methods were synthesized into a taxonomy consisting of different forms of trials and simulations, model-centred approaches, and expert opinions. Trials were further divided into trials in real and mock environment, and simulations could be conducted fully virtually, or having hardware or even the entire system in the validation loop. With the provided descriptions and examples, the taxonomy should be easily used as a basis for further attempts to synthesize the validation of AI systems or even to propose general ideas on how to validate systems.

To ensure the dependability of the systems beyond the initial validation – which we gave an umbrella term “continuous validation” -, the papers implemented failure monitors, safety channels, redundancy, voting, and input and output restrictions. These were, however, described to a lesser degree in the papers.

The full paper can be read here (open access): https://doi.org/10.1016/j.jss.2021.111050

Digital world needs architecting more than ever

The slogan of our Computer Science department is “Architects of the digital world” (cs.helsinki.fi). The meaning of this slogan can be understood and approached from different perspectives. In this blog post, we aim to do our part by addressing what architecting the digital world means from the viewpoint of software, and in particular, that of software engineering research, where the word (software) architecture has been partially reserved for technical interpretation.

The digital word – and software in it – is already around us. Many of the devices, such as cars, are almost mobile computers carrying massive amounts of software onboard for controlling their behavior; our surroundings, such as homes, are packed with devices that run software, for example, to enable connectivity to our mobile devices for telling how they are doing; many if not most services are being digitalized – or have a digital component; and even many of our social interactions takes place in the digital world. The increasing digitalization just keeps expanding all that. 

So where’s the architecture and architecting in all of the above? Briefly, in the blueprint that defines the DNA of the system as a whole. In the small scale, software is code created by software developers, with the assistance of software-based tools, frameworks, cloud environments, and so on. However, when things get big, complex, integrated, interoperable, or critical, conscious architecting is needed to give the whole a reasonable shape and to ensure the whole has desired properties. Often referred to as architecturally significant, such properties may include the protection of privacy, availability of services, modifiability of software to different uses, fault tolerance, energy efficiency, and so on, to mention some. Furthermore, as the modern systems and their software evolve throughout their lifespans, only future-proof architectural designs will keep the software systems sustainable. In software engineering research and practice, these are the kinds of concerns architects and architecture design methods aim at addressing. 

To elaborate, architecting means, first of all, understanding what is expected of the digital world. Designing and maintaining complex and large, often interconnected software systems of the digital world demand robust architecting. Architecting is then about making the important design decisions that give the digital world its shape and answers “how to position the load-bearing structures”. Architecting is also about risk management of those parts of the digital world that have not yet been realised. When planning something not yet existing, an architect needs to envision the technical solutions with the acceptably low risk of feasibility for meeting the expectations over the entire life cycle.

It seems inevitable that the future digital world will start shaping itself with little overall control or planning, when more and more systems are constructed that depend on each other in their operations. In this new context, the role of software engineering research is to try to understand where things may start cracking and crumbling – and then see what architectural solutions would be needed to solidify the digital world. This naturally requires understanding the expectation the society sets for the digital world. In addition to the expected needs, the digital world sets constraints on software systems. For example, as more and more regulated software systems become connected, the need for architecture work becomes more stringent. Architecture becomes the essential communication tool that enables risk managers and regulatory affairs professionals to effectively perform their duties in the area of safety, cybersecurity or privacy.

To summarize: The expectations that are architecturally significant, i.e., fundamental in the design and thus hard to change if wrong and that enable or prohibit the expected properties of the digital world, include for example:

  • When and how should the software protect the privacy of personal data, particularly when pieces of data are combined from distinct sources?
  • How to make the public sector software systems interoperable, open and cost efficient – with some open and crowd sourced architecting?
  • How to control the risks related to systems critical to society or individual human life, taking into account their various intertwined dependencies (e.g., banking systems currently used also for general-purpose user identification purposes)?
  • How to take into account the sustainability goals of the future digital world (e.g., electricity consumption)?
  • How to design software systems in which artificial intelligence, including machine learning, can be embedded in an understandable, trustworthy, transparent, and testable manner? 

This is where the architects of the digital world are needed – to help make the architectural decisions regarding the software that runs the world. This is not only about technical issues and possibilities but also about understanding the contexts of use, decisions about policies that constrain technical solutions, incentives for collaboration for common benefit, or ensuring that the voice of minorities is also heard. It is therefore in the interest of the society that the architectural decisions of important services and systems of the digital world and their justifications are transparent and open to scrutiny. 

As a research group in software engineering and architectural topics in particular, these are issues we envision to encounter on our journey when living up to the slogan of the department: “Architects of the digital world”. 

– the ESE Team

Nordic Agile Survey 2020

We have again continued our agile survey research with Nitor started in 2018 (see Agile Now in Finland Continued):

  • In November 2020, we presented further results of the two survey rounds conducted in 2018 and 2019: ICSOB 2020.
  • In December 2020, in collaboration with Karlstad University, we finished another survey round targeted to respondents in Finland and Sweden. We received a substantial set of responses, and we are currently working on the analysis to publish the research results. The questionnaire included items about the impacts of the current situation of the pandemic and its potential relations to agility of companies.

 

Collective Executions: Coordinating Joint Device Behavior in the Fog

This is a re-post from JSS Editor’s Selection on Medium. Read the original post from here.

The contemporary computing environment has shifted from single node computing to increasingly distributed computing as the number of interconnected devices has grown. Despite of this shift in the computing paradigm, the way we think about software has not dramatically changed over the time: Software application is yet considered to be solely executed by a single entity at the time — one entity possibly coordinating the others.

The interconnected entities can together be considered as the runtime environment.

With our novel concept of Collective Executions we have taken the first steps towards new kind of computing paradigm and programming model where multiple devices dynamically, at the same time form the runtime environment where multiple devices are at the same time executing the same piece of software. The devices — as well as other entities, like humans and services — are then assigned to certain roles in the execution. Underneath a novel framework takes care of forming this runtime that enables such collective software execution by taking care of things like state synchronization, device coordination etc.

Coalescence and disintegration — Novel way of thinking the application state

In the very core of this model is the concept of coalescence and disintegration. In physics, these concepts describe the phenomena where two water drops merge into each other, or split. The analogy in our approach is that the software instances running on different entities can be merged as well into one instance. Then the state synchronization between the newly joined devices begins. Similarly, when disintegration takes place, the state synchronization ends.

The coalescence and disintegration take place when the distance between the executing entities meets certain thresholds in n-dimensional space. For example, the distance can be physical distance — like the proximity of other devices executing the same application, or it can be for instance the social distance: the family members, friends or people having some other certain social relationship may cause the application instances to coalescence and jointly start executing the same application.

Figure 1: Coalescence and Disintegration of Collective Executions

The distance can also be measured in the virtual world, where today common interests connect millions of people: People liking the same video game trailers could then jointly start a multiple player game.

More formally, we define the distance for two collective execution instances:

Where p and q are the instances of the Collective Execution E and dist is their mutual distance along the dimension of i. As the value of the distance is the maximum, it is the so-called Chebyshev distance between the instances.

An Example Application

To illustrate how the Collective Executions work in practice, let’s consider the following PhotoSharing example: A group of people has gathered together. The group consists of friends and family members, so the distance in the social world is short. This can act as a trigger that their PhotoSharing executions are enabled to coalescence.

From the user’s perspective, the idea of the PhotoSharing is to initiate discussion among the people that have gathered together. Hence while the PhotoSharing is running, it also notices that the physical distance is under the threshold value that is being used for proactively suggest viewing their most recent and previously un-shared photos related to specific topic, like recent trips to abroad or family life for instance. (Common interests, distance in the virtual world). The Collective Execution then forms the state by synchronizing the unseen photos and the currently shown photo among the participating devices. One device at the time acts as a coordinator and coordinates the other to show certain photo. Other devices act as screens, showing the same selected photo at the same time. This has been depicted in Figure 2.

Figure 2: PhotoSharing Collective Execution coordinating entities.

Future Work

It’s obvious, that there are numerous privacy and security concerns in such proactive approach. Hence plenty of research would be required for instance to make sure that no personal and intimate data will leak into wrong hands. Also, user experience perspective on the security should not be overlooked to make sure that the users a) understand how and by whom their data is being used, and b) not to burden the user with constant interruptions for regarding the data usage or proactive invocation of the activities.

Acknowledgements

This blog post is based on our articles in JSS and IEEE Software that we from the department of Computer Science, University of Helsinki, Finland have co-authored with our colleagues at the departments of Electronics and Communications Engineering and Pervasive Computing, Tampere University, Finland.

References

Mäkitalo, N., Aaltonen, T., Raatikainen, M., Ometov, A., Andreev, S., Koucheryavy, Y., & Mikkonen, T. (2019). Action-Oriented Programming Model: Collective Executions and Interactions in the Fog. Journal of Systems and Software, 157, 110391. DOI: https://doi.org/10.1016/j.jss.2019.110391

Mäkitalo, N., Ometov, A., Kannisto, J., Andreev, S., Koucheryavy, Y., & Mikkonen, T. (2017). Safe, secure executions at the network edge: coordinating cloud, edge, and fog computing. IEEE Software, 35(1), 30–37. DOI: http://doi.org/10.1109/MS.2017.4541037

 

Notes on blockchains

Originally published in the 4APIs project blog on the 24th of August 2020, https://4apis.fi/blog/

The best-known applications of blockchain technology are cryptocurrencies, but there is considerable interest in applying blockchains as a data storage method in various different fields. Blockchains can be used to record transactions in a reliable, secure and immutable manner. Transactions are saved to linked blocks that form a digital, encrypted ledger. Each party or node in a blockchain maintains a copy of this immutable ledger. Consensus among the parties is achieved by using, for example, proof-of-work.

Blockchains can be permissionless or permissioned. Permissionless, public blockchain systems allow anyone to join the blockchain, whereas permissioned, private blockchain systems use membership control to allow only identified parties to join. In public blockchains, anyone can read or write data, but while reading is free, writing to a blockchain requires paying a fee in cryptocurrency. The fee will be paid to a miner who first completes the proof-of-work to secure a new block containing the transaction data. In private blockchains, the owner of the blockchain can decide on the transaction fees.

Blockchains can be used to eliminate the need for trust among the parties sending transactions to each other. All transactions are visible in the distributed ledger and tampering the transaction history would require the malicious party taking control of the majority (51%) of the blockchain network’s mining hash rate.

Ethereum [1] is the most popular permissionless blockchain that allows writing of smart contracts on the blockchain. Smart contracts consist of contracts or business logic that are installed on the blockchain system. Parties in the blockchain can execute smart contracts to create different transactions that are then validated by other parties and saved to the blockchain.

There are several different platforms for building permissioned, private blockchains. Some of the most widely used are Hyperledger [2] and R3’s Corda [3]. Private blockchains are meant to allow saving sensitive data to a blockchain so that only selected parties are able to view it. However, it is possible to save encrypted, private data also to a public blockchain.

Since public blockchains use computationally more expensive consensus protocols and have more nodes, private blockchains can potentially offer better scalability and faster transactions. However, private blockchains are not truly decentralized and, for example, Hyperledger Fabric was found at least in 2019 to have issues with network delays causing desynchronization in the blockchain [4].

There has been a lot of interest in the possibilities of blockchain technology, and hopes of revolutions in many different areas such as finance and Internet of Things (IoT). Blockchains can provide secure ways to manage confidential data and identity information, and thus provide potential use cases also in health care.

However, so far there are only a few fully operational blockchains besides systems related to cryptocurrencies and Bitcoin [5] remains the most successful real-world application of blockchain technology.

Ethereum was the first blockchain to support the implementation of smart contracts, which enable building decentralized applications (dapps) on Ethereum blockchain. There are various potential use cases for dapps and plenty of tutorials on dapp development available online. Despite this, most dapps have practically no users or transactions. On 9th of June 2020 on DappRadar [6], the list of Ethereum dapps showed over 1880 dapps, but only 330 had had at least one user during the previous seven days. All Top Ten dapps (based on user count during the previous seven days), save one, were related to money exchange, high risk investments and decentralized finance. There are some gaming applications on Ethereum (e.g. CryptoKitties, My Crypto Heroes), but most dapps appear to be related to finances and gambling.

In Deloitte’s Global Blockchain survey 2019, 53% of organizations saw blockchains as critical and being in top five of their priorities [7]. In the same survey one of the top five “organizational barriers to greater investment in blockchain technology” was the lack of in-house capabilities. As the need for blockchain professionals is likely to grow in the future, in Finland a project has been launched to provide education on blockchain technology in universities [8].

Supply chains are one area where the use of blockchain technology can potentially streamline the process and reduce paperwork besides creating a transparent, immutable record of the product history. However, Gartner’s report (2019) estimates that “contrary to initial market hype and for the time being, blockchain is not enabling a major digital business revolution, and may not enable one until at least 2028” [9]. This is due to several factors that currently make it challenging for organisations to adopt blockchain technology.

In a blockchain system, there is overhead from replicating the data. For example, in Ethereum, those users that host the full node need approximately 180 GB disk space [10]. However, not all users need to download the full node, as there are also light nodes that store only the transaction headers and are able to request other information from full nodes. Light nodes can be used, for example, in mobile phones or embedded devices.

For instance, Peker et al. (2020) studied the cost of saving IoT sensor data to Ethereum blockchain. In their experiment the cost of storing 6000 data points (256-bit integers) was approximately 335 – 467 US dollars depending on the method used [11]. According to other informal estimates, the cost of storing 1kB of data to Ethereum blockchain would have been approximately 1.6 US dollars in 2018, and the storage of 1GB was over 1.6 million US dollars. According to Kumar et al. (2020) “the cost of storage on a public blockchain platform can be staggering, a few thousand times higher than on a distributed database system or in the cloud. On a permissioned blockchain system, the cost is likely to be less but still one or two orders of magnitude higher.” [12]

Due to mining being computationally expensive, public blockchain systems consume more energy than regular distributed databases. Bitcoin mining is notoriously energy consuming, and sustainability issues are one area where more research is needed. The popularity of Bitcoin and other cryptocurrencies has also led to different scams related to cryptocurrencies, and for example, infected websites harnessing visitors’ computers to mine Bitcoin.

In their article Kumar et al. (2020) suggest that “blockchain technology should be deployed selectively, mainly for interorganizational transactions among untrusted parties, and in applications that need high levels of provenance and visibility.” For example, tracking the origin and shipping of precious gemstones or other expensive or critical commodities is one area where blockchain systems have been trialled. Regarding supply chains, major challenges for using blockchains include creating legislation and standards all parties can agree on, and getting everyone to use blockchain technology despite additional costs. Joining a consortium is usually necessary to properly utilize blockchains.

To conclude, as blockchains are today a high-cost and high-overhead storage method, careful consideration is needed to determine the proper use cases. Also, a decision should be made on what data to store to the blockchain, as it might be feasible to store only the most critical parts of the whole data to save resources. Blockchain technology is being piloted in various different fields, and in the future, blockchains are likely to be utilized in a much wider scale.

References and recommended reading

[1] https://ethereum.org/en/

[2] https://www.hyperledger.org/

[3] https://www.r3.com/corda-platform/

[4] Nguyen, T.S.L., Jourjon, G., Potop-Butucaru, M. & Thai, K.L. 2019, “Impact of network delays on Hyperledger Fabric”, INFOCOM 2019 – IEEE Conference on Computer Communications Workshops, INFOCOM WORKSHOPS 2019, pp. 222-227.

[5] https://bitcoin.org/en/

[6] https://dappradar.com/rankings/protocol/eth

[7] Deloitte’s Global Blockchain survey 2019

[8] https://www.eura2014.fi/rrtiepa/projekti.php?projektikoodi=S22027

[9] Gartner 2019: Blockchain Unraveled: Determining Its Suitability for Your Organization https://www.gartner.com/en/doc/3913807-blockchain-unraveled-determining-its-suitability-for-your-organization

[10] https://medium.com/@marcandrdumas/are-ethereum-full-nodes-really-full-an-experiment-b77acd086ca7

[11] Peker, Y.K., Rodriguez, X., Ericsson, J., Lee, S.J. & Perez, A.J. 2020, “A cost analysis of internet of things sensor data storage on blockchain via smart contracts”, Electronics (Switzerland), vol. 9, no. 2.

[12] Kumar, A., Liu, R., Shan, Z. 2020, “Is Blockchain a Silver Bullet for Supply Chain Management? Technical Challenges and Research Opportunities”, Decision Sciences 51 (1), pp. 8-37.

 

ICWE’20 Successfully Completed

ICWE’20 live from Helsinki, Finland, June 9-12, 2020

June 9-12, 2020 was exciting times for Aalto University, Metropolia University of Applied Sciences, Tampere University, and University of Helsinki. We jointly hosted the 20th International Conference on Web Engineering (ICWE’20, https://icwe2020.webengineering.org/). The conference is one of the key events for the Web Engineering community, and this year it included one day of workshops, tutorials, and PhD symposium, followed by three days of the main event.

When we for the first time promoted the event in Daejeon, South Korea in June 2019, the world was a very different place. We were worried about things such as venue (downtown, easily accessible, with nice places to visit nearby), getting to Finland (Finnair has a great network for entering Europe from various places) and social program (Helsinki has great restaurants and an option to go to a sauna by the seaside, in downtown). However, during the spring things took a different turn, and we basically had to abandon all the plans virtually overnight, leaving us two options, either to postpone the event or to go completely online.

 

We decided to go for the latter — what would have been more natural for a community of web engineering researchers and practitioners? This meant that the website became the portal of just about everything related to the conference, including links to live sessions, video presentations, slides, Slack channels, and so on. Despite our worries while organising things, it turned out that indeed it is feasible to organize an international conference completely online. Here’s a couple of ideas that we adopted that worked well.

  • Staying focused online is harder than staying focused when someone is physically present and presenting. To allow the participants to familiarize with the topics, we collected video presentations of the accepted papers and put them online well before the event.
  • To further focus on the essentials when online, we changed the format of the sessions. Usually, ICWE has had 1.5 hour sessions, with three 30 min presentations, consisting of 20-25 min presentation and 5-10 min questions. We shortened the sessions to 60 min, and the presentations to 20 min. Since full videos of presentations were available for viewing beforehand, the format of the presentation was 5 min for recap by the authors(s), and 15 min of interaction, questions, comments, and so on. This worked surprisingly well, and, given that communication is slower online than in in-premise situations, provided a nice opportunity to interact with the presenters.
  • Keynotes were streamed live, to maintain excitement throughout the presentation. The three keynotes, presented by David Bryant (Fellow in Emerging Technologies, Mozilla), Prof. Dr. Olaf Zimmermann (University of Applied Sciences of Eastern Switzerland) and Jaakko Lempinen (Head of AI, YLE) all delivered number of experiences as well as inspired a lively discussion afterwards.
  • Every session in the conference had a Slack channel, where the topics related to that session could be discussed. Many session chairs used the channel to discuss practicalities with the presenters, and the audience used that to raise questions. This definitely would work on a physical conference, too.

While the event did not require any physical traveling, this is not to say that it was an easy four-day activity. Instead, the event was intense, and it was clearly possible to see the intensity of people, at least when they had their camera on. Having the event completely online also meant taking into account details such as the time zones while composing the program, which added a layer of complexity in the planning of the event.

Goodbye all you ICWE’20 delegates, and thanks to those of you who helped us to organize the event. It was great to have you all in Finland, even if only remotely.

Kari^2, Markku, Niko and Tommi