Monitoring for RSE-ops vs. DevOps

Monitoring for both RSE-ops and DevOps can generally be referred to as looking at metrics and logs to assess performance of systems and software. It can even be extended to talking about performance testing, provenance and reproducibility of scientific workflows, and data organization. In both spaces, administrators of resources typically choose strategies to monitor their systems. In cloud development, monitoring services might be more easily integrated into different services, and in HPC some more traditionally DevOps tools like Kubernetes, Grafana, and Prometheus are starting to be used [1]. However, best practices for monitoring have not been established, nor have best practices for running workflows and storing provenance.


  1. “Towards a Framework for Monitoring and Analyzing High Performance Computing Environments Using Kubernetes and Prometheus.” https://ieeexplore.ieee.org/document/9060302. [bibtex]