Past Releases Analyze-IT

Information on Analyze-IT is available at:
https://ucit.fr/index.php/analyze-it/

Release 4.3 - June 01, 2021

NEW:

Adding new resources info regarding allocated GPUs (supported only with Slurm at this point)

ENHANCEMENTS:

Updating extractSlurmData script

Handling unknown sacct / scontrol location

Now retrieving all available fields but User and Group by default

Updating Slurm parser to use –duplicate option and handle duplicated JobIDs

BUG FIXES:

Fixed handling of unknown UID for Concurrent Users plugin

Fixed extractSlurmData batch retrieval method

Fixed handling of NaN value when processing Congestion and Throughput loads’ graphs

Fixed handling regions that do not use USD in Cost pricing library

Release 4.2 - March 29, 2021

NEW:

Adding running and waiting time representation feature to Throughput plugin

Adding new option to select timezone to apply : -an-tz-from and -an-tz-to

Adding a new parameter for LSF parser to lower memory consumption while parsing : -js-p lowmem=True

ENHANCEMENTS:

Updating the way dates and time are displayed on graph X axis depending on active resolution

Updating available columns to use filtering on for option -fc-c

Setting Consumers plugin option for “real names” -cd-rn as True by default

Updating Consumers plugin option -cd-g to allow the use of “-” to exclude groups

Updating Job schedulers parsers: If the UID and GID of a job does not exist, and retrieval of these IDs with the user/group name fails as is, then also try with uppercase and lowercase versions of such names.

BUG FIXES:

Fixed LSF parser errors

Fixed Slurm parser way to handle unknown dates and missing WCKeys

Fixed option to switch graph to a logarithmic scale

Fixed zooming issues on Congestion graph.

Release 4.1 - October 12, 2020

NEW:

Added ‘Cumulative Job Load’ feature to Throughput plugin

ENHANCEMENTS:

Costs: Improve the way we handle jobs without pricing information

Improving OpenLava parsing

BUG FIXES:

Fixed download graph button

Fixed Costs plugin error, number of instances not properly taken into account

Fixed memory consumption: kBytes instead of kbits

Fixed Resubmission plugin to handle missing jobname

Release 4.0 - June 23, 2020

NEW:

Normalization of configuration parameters’ names

Added mandatory –inputfile parameter to specify log files’ path

ENHANCEMENTS:

Added a –cost_savings option to the costs plugins to specify a savings percentage applied to the global bill

Parameters –savedata and –savefiltereddata are now combined into a single option: –save-data

Added a new option to –save-data parameter in order to transform logs into pickle file without generating a report

Updated the Load/nodes view: display results in a heatmap and a table

Display times/durations in human readable format (consumers)

Added a way to process a “Cost” column and display the corresponding results in the consumers plugin. Also added a command line option to specify the unit: -cd-cu (default €)

BUG FIXES:

Updated @Profile mechanism.

Time related values are now displayed in a “human readable” format for the Consumers plugin details graph

Various bug fixes related to missing data in the input data

Release 3.5 - April 02, 2020

NEW:

Added support for “data enhancers”. You can provide csv files that contains data to be added to the job scheduler parsed data, given a pivot column and a column of new values or a Python file to add any type column or modify the content of the job scheduler’s data. New columns are automatically analyzed by the groupDetails plugin.

ENHANCEMENTS:

Added support for slurm WCKEYS column

Added multiple options –cost_instance_savings, –cost_savings, to the costs plugins to specify savings percentage applied different bills

Added option –cost_instance_overhead to specify overhead of starting/shutting down an instance

BUG FIXES:

Fixing Slurm 19 error while extracting data

Release 3.4 - January 28, 2020

NEW:

New packaged version for debian 8 / 9 / 10

ENHANCEMENTS:

Extract data script can be used to retrieve logs from remote cluster

BUG FIXES:

Fixing error when handling missing values

Release 3.3 - December 05, 2019

NEW:

Adding new “Cost” plugin to estimated workload cost on AWS

ENHANCEMENTS:

Adding password protection to http access.

Adding the possibility to set default “Home” page of a report to be a specific plugin

Release 3.2 - November 08, 2019

NEW:

Adding link to UCit helpdesk

Setting specific httpd output for report generation on cloud environment

Adding -tgz option to create a tarball of the generated report

ENHANCEMENTS:

Adding new log file in debug mode

Improving slurm parsing

Updating Copyrights and EULA

Improving zoom mechanism with the possibility to chose between ‘Area’ and ‘Drag’ mode

Improving license verification system to handle cloud environment

Improving congestion/contention algorithm

BUG FIXES:

Fixing Congestion/Contention graph’s default scale choice

Release 3.1 - August 09, 2019

ENHANCEMENTS:

Added -o option to extractData to specify the output filename

Upgrading third-party lib

Improving slum parsing

BUG FIXES:

Fixing HTML links generation to job details pages within ‘Consumers’ plugin

Fixed a bug in the computation of cluster running & waiting load in Congestion plugin

Fixing bug within Throughput plugin

Release 3.0 - June 18, 2019

CHANGES:

New EULA.

New logo and colors added

Duration displayed as human readable values for all plugins

Removed the –noindex and –fullmenu options

Supported OS now includes Ubuntu 16.04 and 18.04

ENHANCEMENTS:

New design for the report (new menu etc.)

New Slowdown CDF representation

Allowing users to specify their own logo within conf/imgs dir as logo.[png,jpg]

Handling “run profile”: Users can now define a “.profile” file and call it using “@name.profile”

The command line used to generate the report can now be found in the help page

New congestion/contention plugin to analyze how the cluster is used over time versus the requested computed resources

BUG FIXES:

Fixed slowdown computation when the eligible time was not available

Release 2.2 - April 12, 2019

CHANGES:

Updated html / css to handle div, container, menu the same way within every plugin.

Data filtering : Allow filtering on dates (==, !=, >, >=, <=, <), and updated allowed filters for strings (>, >=, <=, <).

ENHANCEMENTS:

Improving report html’s files rendering time.

Reducing memory usage during analysis.

Reducing the size of generated html files (new ‘minify’ option).

New packaging system in place for easy build and release of AIT.

BUG FIXES:

Removing destination folder prior to generation of new report.

Removing illegal characters from html menu’s links regarding groupDetails.

Resources plugin , without node analysis crashes.

Release 2.1 - January 11, 2019

CHANGES:

Throughput Submission Frequency displayed within tabs.

Concurrent Users displayed within tabs.

ENHANCEMENTS:

Added zoom and pan for all time series graphs (e.g., Cluster Load, Throughput Submission Frequency…).

BUG FIXES:

Torque/PBS: better handling of non utf-8 characters.

SLURM: Extract job scheduler data in local time instead of UTC.

Prevent crash when handling very long job name.

File not properly moved into bin folder during update.

Release 2.0 - November 5, 2018

CHANGES:

Analyze-IT has been entirely re-written: analysis are now seen as plugins, custom plugins can be added.

Grid Engine is now supported. The analysis relies on the accounting file (e.g., /usr/share/gridengine/default/common/accounting). Note that in the generated reports, Parallel Environments will be displayed as QOS, and Queues as Partitions.

The -t (–top) option has been removed for now. The analysis now returns all the values and not only the -t <top> ones. This leads to longer analysis, but also more accurate ones.

The cluster optimization index has been removed, it will be soon replaced by configurable KPIs (stay tuned!).

The name of the analysis and of some options have changed.

You can change the order in which the analysis are shown in the reports by changing their order in the -a option.

New analysis

Memory analysis (if the information is in the logs).

Weekday and hour of the day for job submission

ENHANCEMENTS:

Pages have been modified to be more lightweight whenever possible, and have been reorganized.

To extract data from the job scheduler, you can now directly use bin/extractData <JS>, instead of <version>/jobschedulers/extractData<JS>.sh

Graphs in the throughput analysis can now be displayed with a linear or logarithmic scale.

Faster analysis and a lower memory usage

Added statistical information in tables for many analysis (concurrent users, throughput, resources consumption…)

Reorganized command line options per analysis types, which should make them clearer.

Start and end dates (-s and -e options) can now be specified without any hour. In this case the start of the day (00:00:00) is used for the start date, and the end of the day (23:59:59) for the end date.

The number of jobs displayed by the graphs in the groupDetails analysis are limited by default to 5000 per graph to speed up page loading. You can increase this number with the -gds option.

The cluster load graph now displays the maximum number of cores used as a red line, and (if it has been specified) the total number of available cores as a black line.

BUG FIXES:

Many small bug fixes and typos have been corrected in all the analysis, and in the job scheduler parsers.

Release 1.1-r3 - September 24, 2018

BUG FIXES:

Fixed a bug that occurs when the slowdown or cluster load scores cannot be computed

Fixed data filtering on categorical columns (e.g., State)

Release 1.1-r2 - September 18, 2018

ENHANCEMENTS:

Script extractSlurmData.sh now has a -e/–end option to specify the end time.

BUG FIXES:

Fixed -mnp option: the specified number of processes wasn’t taken into account when generating html files in parallel

Release 1.1 - August 28, 2018

CHANGES:

Input data filtering can now be done using multiple operators (~=|!=|==|<=|>=|<|>)

Added report of number of allocated cores per job

Added report of requested vs. consumed memory per job

ENHANCEMENTS:

Added failed jobs to the list jobs analyzed for resubmission

Added QOS to the list of consumers

Added statistical information about the cluster load

Slowdown is now computed in two different ways: considering the submission time or the eligible time to compute the waiting time

Added an option to save the data once filtered into a pickle file (allows for faster run of multiple analysis on the same filtered data)

Tables in “consumers” are now dynamic, it is possible to sort, filter, paginate…

Better support for Slurm <15

Faster parsing of Slurm accounting logs

Faster computation of distances between job names when using option -gj

New installation directory organization to ease update procedures

BUG FIXES:

Fixed various graphical bugs in the web pages

Fixed handling of errors

Correctly parse job steps in Slurm logs

Release 1.0 - February 6, 2018

First version of Analyze-IT. See documentation for a complete description of Analyze-IT features.

Past Releases Predict-IT

Information on Predict-IT is available at:
https://ucit.fr/index.php/predict-it/

Release 1.6 - June 01, 2021

ENHANCEMENTS:

Updating extractSlurmData script:

Handling unknown sacct/scontrol location

Now retrieving all available fields but User and Group by default

Updating Slurm parser to use –duplicates option and handle duplicated JobIDs

Updating extractSlurmDataBeforeTraining to handle –dupliactes

BUG FIXES:

Fixed extractSlurmData batch retrieval method

Release 1.5 - July 9, 2020

NEW:

Added support for “data enhancers”. You can provide csv files that contains data to be added to the job scheduler parsed data, given a pivot column and a column of new values or a Python file to add any type column or modify the content of the job scheduler’s data. New features are then automatically used for training. This allows for example the addition of application specific metrics.

Added 15 new features related to target statistis using data enhancers.

Predict-IT now trains the selected model once more with all data available, instead of keeping a model trained only on the train dataset (thus missing the jobs in the test dataset). This allows for better accuracies in production.

Added the possibility to launch multiple Predict-IT in parallel with different configurations. This allows to build specific predictors (e.g., per application), or to test multiple configurations.

ENHANCEMENTS:

Added new configuration options for model features and bins selection.

Added support for slurm WCKEYS column

Added support for automatic batch retrieval as smaller requests for big sacct call

BUG FIXES:

Fixed tracking script: tracking dates were not updated every time

Fixed Slurm 19 error while extracting data

Fixed an error that popped when the server configuration file was specified on the command line, and the default path did not exist.

Various bug fixes related to missing data in the input data

Release 1.4 - March 11, 2020

CHANGES:

Added filter on input dataset.

ENHANCEMENTS:

Added last training status on metrics page.

Added server parameters in help page.

Added a way to run multiple PIT configurations using server.conf files with specific port, host and dirName.

Added comparison between previous and newly trained models.

Added a way to turn on / off the use of balanced bins directly within server.conf.

BUG FIXES:

Error when computing F1-score for unique class (e.g. after filtering).

Release 1.3 - December 17, 2019

CHANGES:

Added wait time and time to result prediction.

Added tracking-script functionality to collect jobs since last X hours, call for predictions on each of them and visualize them.

Added option in server.conf to save training “test” metrics as csv files (Obs, Pred, Confidence).

Added option in server.conf to define the total CPU number of the cluster.

ENHANCEMENTS:

Added 2 RMSE calculations in metrics.

Added unweighted accuracy gauge in metrics to investigate global accuracy without taking into account the support for each class.

Redesigned help page.

BUG FIXES:

BAD_REQUEST when requesting prediction using predictit.client.

Missing column when handling SGE.

Fixing Comparison page.

Release 1.2 - July 26, 2019

CHANGES:

Added AdaBoost in the list of available algorithms.

New EULA.

New logo added.

ENHANCEMENTS:

Script extractSlurmData.sh now has a -e/–end option to specify the end time.

To extract data from the job scheduler, you can now directly use bin/extractData <JS>, instead of <version>/jobschedulers/extractData<JS>.sh

It is now possible to ask for the last N days of data instead of having to define a specific starttime.

BUG FIXES:

Many small bug fixes and typos have been corrected in all the in the job scheduler parsers.

Torque/PBS: better handling of non utf-8 characters.

SLURM: Extract job scheduler data in local time instead of UTC.

Release 1.1 - August 28, 2018

CHANGES:

Added memory consumption (Max RSS) prediction.

Predictions now come with a confidence level (from 0 to 1).

init.d scripts.

ENHANCEMENTS:

Better support for Slurm <15.

Faster parsing of Slurm accounting logs.

Metrics web pages now display a global indicator showing the accuracy of the model, along with a graph showing the evolution of this metric through time.

New installation directory organization to ease update procedures.

BUG FIXES:

Accessing the metrics web pages when the models haven’t been trained do not display a JSON error anymore.

Fixed various graphical bugs in the metrics web pages.

Fixed bug that happened when cv=False.

Correctly parse job steps in Slurm logs.

Release 1.0 - February 6, 2018

First version of Predict-IT. See documentation for a complete description of Predict-IT features.

Other Versions v: OKA_v2.5.1

Tags: OKA_v2.8.2; OKA_v2.8.1; OKA_v2.8.0; OKA_v2.7.0; OKA_v2.6.0; OKA_v2.5.1; OKA_v2.5.0; OKA_v2.4.0; OKA_v2.3.0; OKA_v2.2.1; OKA_v2.2.0; OKA_v2.1.2; OKA_v2.1.1; OKA_v2.1.0; OKA_v2.0.0