Tutorial on systems with antifragility to downtime

6Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

An antifragile system of software and stakeholders, including designers, developers, and operators, learn from incidents how to avoid outages and maintain high uptime. This tutorial article reviews how to design and operate such socio-technical systems with antifragility to downtime. It documents the importance of four design principles and two operational principles by exploring the polar opposite anti-principles and the interplay between the principles and the anti-principles. The design principles mandate a software design of separate and isolatable processes with sufficient diversity and redundancy. The processes should communicate asynchronously over an external network. The operational principles imply that the software development teams should repeatedly inject artificial failures into the production system to understand its behavior and detect and mitigate vulnerabilities as the system and its environment change.

Cite

CITATION STYLE

APA

Hole, K. J. (2022). Tutorial on systems with antifragility to downtime. Computing, 104(1), 73–93. https://doi.org/10.1007/s00607-020-00895-6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free