Supporting Types

Enumerations

These enumerations are used as parameter or return types in SDK methods.

ComputeLoadResolution

class applications.ait.enums.ComputeLoadResolution(*values)

Bases: StrEnum

Supported time resolutions for compute_load().

Restricted subset of Resolution — only values present in RESOLUTION_DIC (lib/common/oka_constants/constants.py) are accepted by compute_load(). Once compute_load() is refactored to support all Resolution values, remove this enum and replace usages with Resolution directly.

Member

Value

SECOND

1second

MINUTE

1minute

HOUR

1hour

DAY

1day

MONTH

1month

DataType

class applications.ait.enums.DataType(*values)

Bases: StrEnum

Resource dimension used across OKA data queries.

Typed replacement for the "core" / "GPU" string literals passed to services and providers to select the hardware resource being measured. Used in state, congestion, load, and other apps.

When GPU is selected, metric categories that don’t have a dedicated GPU field (cost, energy, …) fall back to GPU-hours.

Example:

>>> provider.get_jobs_status(data_type=DataType.GPU)

Member

Value

CORE

core

GPU

GPU

GpuAccountingField

class applications.resources.dto.gpu.GpuAccountingField(*values)

Bases: StrEnum

Subset of AccountingField relevant to GPU analysis.

Only two fields are valid for GPU distribution queries: the number of GPUs allocated to the job (ALLOC_GPUS) and the number originally requested (REQ_GPUS).

Member

Value

ALLOC_GPUS

Allocated_GPU

REQ_GPUS

Requested_GPU

GroupingField

class applications.state.dto.state.GroupingField(*values)

Bases: StrEnum

Standard ES fields used to group jobs in grouped query variants.

Members cover the most common grouping dimensions. Pass a raw string for custom or cluster-specific fields not listed here (e.g. "Application", "WCKey").

Example:

>>> provider.get_jobs_status_grouped(grouping_field=GroupingField.ACCOUNT)
>>> provider.get_jobs_status_grouped(grouping_field="Application")  # custom

Member

Value

ACCOUNT

Account

UID

UID

GID

GID

PARTITION

Partition

USERNAME

User

JobState

class applications.ait.enums.JobState(*values)

Bases: StrEnum

HPC job state as reported by the scheduler.

Inherits from str so that members compare equal to plain strings:

>>> JobState.CANCELLED == "CANCELLED"  # True
>>> {"CANCELLED": 42}[JobState.CANCELLED]  # 42

Drop-in replacement for the job state string constants in oka_constants.constants (CANCELLED, FAILED, …). Existing code that compares against raw strings keeps working; new code gains autocompletion, exhaustiveness checks, and Enum iteration.

Member

Value

CANCELLED

CANCELLED

COMPLETED

COMPLETED

FAILED

FAILED

NODE_FAIL

NODE_FAIL

PREEMPTED

PREEMPTED

TIMEOUT

TIMEOUT

BOOT_FAIL

BOOT_FAIL

REQUEUED

REQUEUED

RUNNING

RUNNING

RESIZING

RESIZING

SUSPENDED

SUSPENDED

PENDING

PENDING

CONFIGURING

CONFIGURING

COMPLETING

COMPLETING

OUT_OF_MEMORY

OUT_OF_MEMORY

REVOKED

REVOKED

LoadDatetimeField

class applications.load.dto.enums.LoadDatetimeField(*values)

Bases: StrEnum

Subset of DatetimeCol values used across the load application.

Restricts the full DatetimeCol set (SUBMIT, ELIGIBLE, START, END) to the two fields the load app actually defaults to and checks against.

Example:

>>> provider.get_core_load(datetime_col=LoadDatetimeField.SUBMIT)
>>> LoadDatetimeField.ELIGIBLE in mapping_keys

Member

Value

SUBMIT

Submit

ELIGIBLE

Eligible

MemoryAccountingField

class applications.resources.dto.memory.MemoryAccountingField(*values)

Bases: StrEnum

Subset of memory fields valid for memory distribution queries.

Only two fields are relevant: peak memory actually used by the job (MAX_RSS) and memory originally requested (REQ_MEM).

Member

Value

MAX_RSS

MaxRSS

REQ_MEM

Requested_Memory

MetricCategory

class applications.ait.enums.MetricCategory(*values)

Bases: StrEnum

Metric category used across OKA data queries.

Typed replacement for the raw string constants in oka_constants.constants (STATE, COREHOURS, GPUHOURS, COST, ENERGY, CARBON_FOOTPRINT). String values match the Elasticsearch field names, so members are drop-in replacements — no conversion needed:

>>> MetricCategory.JOBS == "State"            # True
>>> MetricCategory.CORE_HOURS == "Core_hours" # True

Used by providers, services, and views across multiple apps (state, load, consumers, kpi, …). Use this instead of raw strings or oka_constants lookups in user scripts and provider calls.

Example:

>>> provider.get_jobs_status(category=MetricCategory.CORE_HOURS)
>>> provider.get_jobs_status(category=MetricCategory.COST)

Member

Value

JOBS

State

CORE_HOURS

Core_hours

GPU_HOURS

GPU_hours

COST

Cost

ENERGY

Energy

CARBON_FOOTPRINT

CO2

Resolution

class applications.ait.enums.Resolution(*values)

Bases: ResolutionMixin, StrEnum

Time bucket size for data aggregation queries.

Each member carries metadata needed by different layers of the stack:

  • es_interval: Elasticsearch calendar_interval value (e.g. "1d").

  • pandas_freq: Pandas frequency alias for resampling (e.g. "D").

  • millis: Duration in milliseconds (used for JS chart intervals).

  • duration_hours: Duration in hours (used for resource-hour normalization).

  • date_format: strftime format for display.

These properties replace the legacy dicts RESOLUTION_DIC (oka_constants), DURATION_AS_HOURS, TIME_STEPS, and TIME_FORMATTING (ait/constants).

Example:

>>> Resolution.DAY.es_interval    # "1d"
>>> Resolution.DAY.duration_hours # 24.0
>>> Resolution("1hour").pandas_freq  # "h"

Member

Value

SECOND

1second

MINUTE

1minute

TEN_MIN

10min

HOUR

1hour

DAY

1day

WEEK

1week

MONTH

1month

YEAR

1year

ResultStatus

class applications.ait.enums.ResultStatus(*values)

Bases: StrEnum

Outcome status for service/provider query results.

Used on DTOs to indicate whether the query returned data, and if not, why. This replaces the legacy pattern of returning {"error": message} dicts or raising exceptions for empty results.

SDK users can check the status before iterating over results:

>>> if result.status == ResultStatus.OK:
...     for entry in result.entries:
...         print(entry.state, entry.count)
>>> else:
...     print(f"No data: {result.status.value}")

Members carry the same string values as the legacy constants DB_ERROR_MESSAGE, FILTER_ERROR_MESSAGE, and RESOLUTION_MESSAGE from oka_constants to ease migration.

Member

Value

OK

ok

NO_DATA

No data

NO_RESULTS

No results found

TOO_MANY_DATA

Too many data - Please tune your date filter to a smaller period or try another resolution

SubmissionDatetimeCol

class applications.throughput.dto.common.SubmissionDatetimeCol(*values)

Bases: StrEnum

Pre-start timestamp columns only (Submit and Eligible).

Restricted subset for metrics where the reference date must precede job execution — i.e. wait time and slowdown. Using Start or End as a “from” date would produce nonsensical results for those metrics.

Enum values are the PascalCase Elasticsearch field names, matching AccountingField directly. Use .session_key when a lowercase form is needed for URL path segments.

Example:

>>> col = SubmissionDatetimeCol.ELIGIBLE
>>> str(col)         # → "Eligible"  (ES field name, ready to use)
>>> col.session_key  # → "eligible"  (for URL/session storage)

Member

Value

SUBMIT

Submit

ELIGIBLE

Eligible

ThroughputDatetimeCol

class applications.throughput.dto.common.ThroughputDatetimeCol(*values)

Bases: StrEnum

Any datetime column available in throughput queries.

Full set of job-lifecycle timestamps: submission, eligibility, start, and end. Use this for metrics that can be bucketed on any event (e.g. job frequency, interarrival time).

Enum values are the PascalCase Elasticsearch field names, matching AccountingField directly. Use .session_key when a lowercase form is needed for URL path segments.

Example:

>>> col = ThroughputDatetimeCol.START
>>> str(col)      # → "Start"  (ES field name, ready to use)
>>> col.session_key  # → "start"  (for URL/session storage)

Member

Value

SUBMIT

Submit

ELIGIBLE

Eligible

START

Start

END

End

Data Models

These Pydantic models appear as nested types within DTOs or as return values from provider methods.

CategoryThreshold

class applications.throughput.dto.exec_time.CategoryThreshold(*, name: str, min_percent: float | None = None, max_percent: float | None = None, color: str = '#cccccc', tooltip: str = '')

Bases: BaseModel

A single ratio threshold category for the exectime/timelimit sunburst.

name

Display name for the category (e.g. "Optimal").

Type:

str

min_percent

Lower bound of the ratio range (inclusive), or None for an open lower bound.

Type:

float | None

max_percent

Upper bound of the ratio range (exclusive), or None for an open upper bound.

Type:

float | None

color

Hex color string for frontend rendering (e.g. "#4caf50").

Type:

str

tooltip

Tooltip text shown on hover in the frontend.

Type:

str

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

CoreBinStats

class applications.resources.dto.cores.CoreBinStats(*, bin_label: str, job_count: Annotated[int, Ge(ge=0)], core_hours_sum: Annotated[float, Ge(ge=0)], core_hours_mean: Annotated[float, Ge(ge=0)])

Bases: BaseModel

Statistics for a single core allocation bin.

bin_label

Human-readable bin range, e.g. "[1, 4[".

Type:

str

job_count

Number of jobs that allocated cores in this bin.

Type:

int

core_hours_sum

Total core-hours consumed by those jobs.

Type:

float

core_hours_mean

Mean core-hours per job in this bin.

Type:

float

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

CoresGroupStats

class applications.resources.dto.cores.CoresGroupStats(*, group_name: str | int, grouping_type: str, bin_labels: list[str], job_counts: list[Annotated[int | float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])]], core_hours_sum: list[Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0.0)])]], core_hours_mean: list[Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0.0)])]])

Bases: BaseModel

Core distribution data for a single group.

group_name

Value of the grouping field for this group (e.g. a username).

Type:

str | int

grouping_type

ES field used for grouping (e.g. "uid", "account").

Type:

str

bin_labels

Ordered bin range labels shared across all metrics.

Type:

list[str]

job_counts

Number of jobs per bin.

Type:

list[int | float]

core_hours_sum

Total core-hours per bin.

Type:

list[float]

core_hours_mean

Mean core-hours per job per bin.

Type:

list[float]

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

CoresMemoryBinStats

class applications.resources.dto.cores_memory.CoresMemoryBinStats(*, memory_bin_label: str, job_counts: list[int | float], core_hours: list[float])

Bases: BaseModel

Per-memory-bin counts and core-hours across all core-allocation bins.

memory_bin_label

Human-readable memory bin range, e.g. "[1GB, 4GB[".

Type:

str

job_counts

Number of jobs per core-allocation bin for this memory bin.

Type:

list[int | float]

core_hours

Total core-hours per core-allocation bin for this memory bin.

Type:

list[float]

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

CoresMemoryGroupStats

class applications.resources.dto.cores_memory.CoresMemoryGroupStats(*, group_name: str | int, grouping_type: str, core_bin_labels: list[str], bins: list[CoresMemoryBinStats])

Bases: BaseModel

Cores-vs-memory matrix data for a single group.

group_name

Value of the grouping field for this group (e.g. a username).

Type:

str | int

grouping_type

ES field used for grouping (e.g. "uid", "account").

Type:

str

core_bin_labels

Ordered core-allocation bin labels for this group.

Type:

list[str]

bins

One entry per memory bin, carrying job counts and core-hours.

Type:

list[applications.resources.dto.cores_memory.CoresMemoryBinStats]

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

ExecTimeFilterConfig

class applications.throughput.dto.exec_time.ExecTimeFilterConfig(*, include_timeout: bool = True, include_other_end_states: bool = False)

Bases: BaseModel

Job-state filter settings for the exectime/timelimit analysis.

Controls which terminal states are included in the ratio calculation. Non-terminal states (RUNNING, PENDING, etc.) are always excluded.

include_timeout

When True, TIMEOUT jobs are included.

Type:

bool

include_other_end_states

When True, CANCELLED, FAILED, NODE_FAIL, PREEMPTED, BOOT_FAIL, OUT_OF_MEMORY, and REVOKED jobs are included.

Type:

bool

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

ExtendedStats

class applications.resources.dto.stats.ExtendedStats(*, min: float | None = None, max: float | None = None, mean: float | None = None, count: Annotated[int, Ge(ge=0)] = 0, std: float | None = None)

Bases: BaseModel

Descriptive statistics returned by an ES extended_stats aggregation.

min

Minimum observed value, or None when count is zero.

Type:

float | None

max

Maximum observed value, or None when count is zero.

Type:

float | None

mean

Mean value, or None when count is zero.

Type:

float | None

count

Number of documents included in the aggregation.

Type:

int

std

Standard deviation, or None when count is zero.

Type:

float | None

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

GpuBinStats

class applications.resources.dto.gpu.GpuBinStats(*, bin_label: str, job_count: Annotated[int, Ge(ge=0)], gpu_hours_sum: Annotated[float | None, Ge(ge=0.0)] = None, gpu_hours_mean: Annotated[float | None, Ge(ge=0.0)] = None)

Bases: BaseModel

Statistics for a single GPU allocation bin.

bin_label

Human-readable bin range, e.g. "[1, 4[".

Type:

str

job_count

Number of jobs that used GPUs in this bin.

Type:

int

gpu_hours_sum

Total GPU-hours consumed by those jobs. None when computing requested (not allocated) GPUs.

Type:

float | None

gpu_hours_mean

Mean GPU-hours per job in this bin. None when computing requested (not allocated) GPUs.

Type:

float | None

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

GpuGroupStats

class applications.resources.dto.gpu.GpuGroupStats(*, group_name: str | int, grouping_type: str, bin_labels: list[str], job_counts: list[Annotated[int | float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0)])]], gpu_hours_sum: list[Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0.0)])]] | None = None, gpu_hours_mean: list[Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Ge(ge=0.0)])]] | None = None)

Bases: BaseModel

GPU distribution data for a single group.

group_name

Value of the grouping field for this group (e.g. a username).

Type:

str | int

grouping_type

ES field used for grouping (e.g. "uid", "account").

Type:

str

bin_labels

Ordered bin range labels shared across all metrics.

Type:

list[str]

job_counts

Number of jobs per bin.

Type:

list[int | float]

gpu_hours_sum

Total GPU-hours per bin. None when computing requested (not allocated) GPUs.

Type:

list[float] | None

gpu_hours_mean

Mean GPU-hours per job per bin. None when computing requested (not allocated) GPUs.

Type:

list[float] | None

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

LoadStats

class applications.load.dto.load.LoadStats(*, mean: float | None = None, std: float | None = None, min: float | None = None, p10: float | None = None, p20: float | None = None, p30: float | None = None, p40: float | None = None, median: float | None = None, p60: float | None = None, p70: float | None = None, p80: float | None = None, p90: float | None = None, max: float | None = None)

Bases: BaseModel

Descriptive statistics for a single load timeseries (one RUNNING or WAITING series in a single bucket resolution).

Matches the output of oka.lib.common.utility.get_stats — pandas Series.describe(percentiles=[.1, .2, .3, .4, .5, .6, .7, .8, .9]) with 50% renamed to median. Values rounded to one decimal; None where pandas returned NaN.

mean

Arithmetic mean.

Type:

float | None

std

Standard deviation.

Type:

float | None

min

Minimum value.

Type:

float | None

p10

10th percentile.

Type:

float | None

p20

20th percentile.

Type:

float | None

p30

30th percentile.

Type:

float | None

p40

40th percentile.

Type:

float | None

median

Median (50th percentile).

Type:

float | None

p60

60th percentile.

Type:

float | None

p70

70th percentile.

Type:

float | None

p80

80th percentile.

Type:

float | None

p90

90th percentile.

Type:

float | None

max

Maximum value.

Type:

float | None

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

MemoryBinStats

class applications.resources.dto.memory.MemoryBinStats(*, bin_label: str, job_count: Annotated[int, Ge(ge=0)])

Bases: BaseModel

Statistics for a single memory bin.

bin_label

Human-readable bin range, e.g. "[1 GB, 4 GB[" or "[64 GB".

Type:

str

job_count

Number of jobs whose memory fell in this bin.

Type:

int

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

MemoryGroupStats

class applications.resources.dto.memory.MemoryGroupStats(*, group_name: str | int, grouping_type: str, bin_labels: list[str], job_counts: list[int | float])

Bases: BaseModel

Memory distribution data for a single group.

group_name

Value of the grouping field for this group (e.g. a username).

Type:

str | int

grouping_type

ES field used for grouping (e.g. "uid", "account").

Type:

str

bin_labels

Ordered GB bin range labels shared across all metrics.

Type:

list[str]

job_counts

Number of jobs per bin.

Type:

list[int | float]

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

MemoryRatioBinStats

class applications.resources.dto.consumed_vs_requested_memory.MemoryRatioBinStats(*, bin_label: str, job_count: Annotated[int, Ge(ge=0)])

Bases: BaseModel

Statistics for a single memory ratio percentage bin.

bin_label

Human-readable bin range, e.g. "[10%, 20%[" or "[100%".

Type:

str

job_count

Number of jobs whose consumed/requested ratio fell in this bin.

Type:

int

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

MemoryRatioGroupStats

class applications.resources.dto.consumed_vs_requested_memory.MemoryRatioGroupStats(*, group_name: str | int, grouping_type: str, bin_labels: list[str], job_counts: list[int | float])

Bases: BaseModel

Memory ratio distribution data for a single group.

group_name

Value of the grouping field for this group (e.g. a username).

Type:

str | int

grouping_type

ES field used for grouping (e.g. "uid", "account").

Type:

str

bin_labels

Ordered percentage bin range labels shared across all metrics.

Type:

list[str]

job_counts

Number of jobs per bin.

Type:

list[int | float]

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

NodesBinStats

class applications.resources.dto.nodes.NodesBinStats(*, bin_label: str, job_count: Annotated[int, Ge(ge=0)])

Bases: BaseModel

Statistics for a single node allocation bin.

bin_label

Human-readable bin range, e.g. "[1, 4[" or "[16".

Type:

str

job_count

Number of jobs whose node count fell in this bin.

Type:

int

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

NodesGroupStats

class applications.resources.dto.nodes.NodesGroupStats(*, group_name: str | int, grouping_type: str, bin_labels: list[str], job_counts: list[int | float])

Bases: BaseModel

Node allocation distribution data for a single group.

group_name

Value of the grouping field for this group (e.g. a username).

Type:

str | int

grouping_type

ES field used for grouping (e.g. "uid", "account").

Type:

str

bin_labels

Ordered bin range labels shared across all metrics.

Type:

list[str]

job_counts

Number of jobs per bin.

Type:

list[int | float]

model_config: ClassVar[ConfigDict] = {'frozen': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Domain Models

These Django models represent core OKA domain objects passed to or returned from SDK classes.

Workload

class core_applications.workload.models.workload.Workload(*args, **kwargs)

Bases: Model

Define a specific scope for healthcheck evaluations.

A workload represents a mutable configuration that defines: - Which clusters to monitor - What filters to apply (using QueryBuilder JSON format) - User ownership

name

Unique name of the workload (e.g., ‘ai_team_production’).

description

Detailed description of what this workload monitors.

clusters

ManyToMany relationship to clusters to monitor.

filters

JSON object with QueryBuilder-style filters for data providers.

created_by

User who created this workload.

created_at

Timestamp when this workload was created.

exception DoesNotExist

Bases: ObjectDoesNotExist

exception MultipleObjectsReturned

Bases: MultipleObjectsReturned

exception NotUpdated

Bases: ObjectNotUpdated, DatabaseError

property cluster_names: list[str]

Get list of cluster names for this workload.

Returns:

List of cluster name strings.

property cluster_uids: list[str]

Get list of cluster UIDs for this workload.

Returns:

List of cluster UID strings.