Filters

OKA offers an advanced filtering functionality. The same filters can be used in OKA interface to filter the data to display or in OKA backend to filter the input data used by the pipelines.

Depending on the plugin selected in OKA UI you will have different filtering options (see Filters). In the same way, depending on the pipeline, you will have different filtering capabilities:

  • For log_js_fetch pipeline: Date filtering is available only to request the jobs presents between two dates.

  • For OKA Predict (Predictor) and MeteoCluster pipelines: Full filtering capabilities on dates and advanced features.

Creation

The easiest way to create a filter is through OKA UI by saving the filters into a profile (see Save filters as profile). You can also create a filter profile manually using the admin panel (see Administrator panels). Be aware that the filters must follow a specific JSON format in order to be understood by OKA:

  • start_date: Filter from this date (included).

  • end_date: Filter to this date (included).

  • date_col: Filter on this date column:

    • Submit: Jobs submission date

    • Start: Jobs start date

    • Eligible: Date when the jobs are eligible

    • End: Jobs end date

    • date: Date of measurement (load values, nodes statistics, energy measurement…)

  • multiple_filters: Filter on features (jobs features…). For example:

    "multiple_filters":{"rules": [{"id":"Account","type":"string","field":"Account","input":"text", "value":"default","operator":"equal"},
    {"id":"Allocated_CPUS","type":"double", "field":"Allocated_CPUS","input":"text","value":"1","operator":"greater"}], "condition":"AND"}}
    
  • time_delta: Filter on x days. If you provide start_date and time_delta, it will filter from start_date to start_date + time_delta. If you provide end_date and time_delta, it will filter from end_date - time_delta to end_date. If you provide time_delta only, it will filter from now - time_delta to now. time_delta=30 has thus the same meaning than the RangeKey Last 30 days.

image0

The filter used in the above example with a pipeline loading jobscheduler logs will gather only jobs present on the cluster from 2022-01-11 00:00:00 until 2022-02-20 00:00:00.

Usage

Filters can be loaded and applied in OKA UI to filter the data to display (see Save filters as profile).

Filters can be used with the pipelines to filter the logs to ingest (log_js_fetch) or the data to train the models (OKA Predict (Predictor) and MeteoCluster). This can be used to train a model for a specific jobs workload caracterised by the features defined in the filters. Use the admin interface > Conf pipelines > Filters to associate a filter to a pipeline.

image1