Healthcheck

The Healthcheck system monitors HPC workloads by running user-defined Python rules against job data at a configurable schedule. When a rule detects an anomaly, OKA records the issue, assigns a severity level, and optionally sends notifications.

Healthchecks are managed from the Healthcheck menu.