Past Releases Analyze-IT

Information on Analyze-IT is available at:

https://ucit.fr/index.php/analyze-it/

Release 4.3 - June 01, 2021

  • NEW:

    • Adding new resources info regarding allocated GPUs (supported only with Slurm at this point)

  • ENHANCEMENTS:

    • Updating extractSlurmData script

      • Handling unknown sacct / scontrol location

      • Now retrieving all available fields but User and Group by default

    • Updating Slurm parser to use –duplicate option and handle duplicated JobIDs

  • BUG FIXES:

    • Fixed handling of unknown UID for Concurrent Users plugin

    • Fixed extractSlurmData batch retrieval method

    • Fixed handling of NaN value when processing Congestion and Throughput loads’ graphs

    • Fixed handling regions that do not use USD in Cost pricing library

Release 4.2 - March 29, 2021

  • NEW:

    • Adding running and waiting time representation feature to Throughput plugin

    • Adding new option to select timezone to apply : -an-tz-from and -an-tz-to

    • Adding a new parameter for LSF parser to lower memory consumption while parsing : -js-p lowmem=True

  • ENHANCEMENTS:

    • Updating the way dates and time are displayed on graph X axis depending on active resolution

    • Updating available columns to use filtering on for option -fc-c

    • Setting Consumers plugin option for “real names” -cd-rn as True by default

    • Updating Consumers plugin option -cd-g to allow the use of “-” to exclude groups

    • Updating Job schedulers parsers: If the UID and GID of a job does not exist, and retrieval of these IDs with the user/group name fails as is, then also try with uppercase and lowercase versions of such names.

  • BUG FIXES:

    • Fixed LSF parser errors

    • Fixed Slurm parser way to handle unknown dates and missing WCKeys

    • Fixed option to switch graph to a logarithmic scale

    • Fixed zooming issues on Congestion graph.

Release 4.1 - October 12, 2020

  • NEW:

    • Added ‘Cumulative Job Load’ feature to Throughput plugin

  • ENHANCEMENTS:

    • Costs: Improve the way we handle jobs without pricing information

    • Improving OpenLava parsing

  • BUG FIXES:

    • Fixed download graph button

    • Fixed Costs plugin error, number of instances not properly taken into account

    • Fixed memory consumption: kBytes instead of kbits

    • Fixed Resubmission plugin to handle missing jobname

Release 4.0 - June 23, 2020

  • NEW:

    • Normalization of configuration parameters’ names

    • Added mandatory –inputfile parameter to specify log files’ path

  • ENHANCEMENTS:

    • Added a –cost_savings option to the costs plugins to specify a savings percentage applied to the global bill

    • Parameters –savedata and –savefiltereddata are now combined into a single option: –save-data

    • Added a new option to –save-data parameter in order to transform logs into pickle file without generating a report

    • Updated the Load/nodes view: display results in a heatmap and a table

    • Display times/durations in human readable format (consumers)

    • Added a way to process a “Cost” column and display the corresponding results in the consumers plugin. Also added a command line option to specify the unit: -cd-cu (default €)

  • BUG FIXES:

    • Updated @Profile mechanism.

    • Time related values are now displayed in a “human readable” format for the Consumers plugin details graph

    • Various bug fixes related to missing data in the input data

Release 3.5 - April 02, 2020

  • NEW:

    • Added support for “data enhancers”. You can provide csv files that contains data to be added to the job scheduler parsed data, given a pivot column and a column of new values or a Python file to add any type column or modify the content of the job scheduler’s data. New columns are automatically analyzed by the groupDetails plugin.

  • ENHANCEMENTS:

    • Added support for slurm WCKEYS column

    • Added multiple options –cost_instance_savings, –cost_savings, to the costs plugins to specify savings percentage applied different bills

    • Added option –cost_instance_overhead to specify overhead of starting/shutting down an instance

  • BUG FIXES:

    • Fixing Slurm 19 error while extracting data

Release 3.4 - January 28, 2020

  • NEW:

    • New packaged version for debian 8 / 9 / 10

  • ENHANCEMENTS:

    • Extract data script can be used to retrieve logs from remote cluster

  • BUG FIXES:

    • Fixing error when handling missing values

Release 3.3 - December 05, 2019

  • NEW:

    • Adding new “Cost” plugin to estimated workload cost on AWS

  • ENHANCEMENTS:

    • Adding password protection to http access.

    • Adding the possibility to set default “Home” page of a report to be a specific plugin

Release 3.2 - November 08, 2019

  • NEW:

    • Adding link to UCit helpdesk

    • Setting specific httpd output for report generation on cloud environment

    • Adding -tgz option to create a tarball of the generated report

  • ENHANCEMENTS:

    • Adding new log file in debug mode

    • Improving slurm parsing

    • Updating Copyrights and EULA

    • Improving zoom mechanism with the possibility to chose between ‘Area’ and ‘Drag’ mode

    • Improving license verification system to handle cloud environment

    • Improving congestion/contention algorithm

  • BUG FIXES:

    • Fixing Congestion/Contention graph’s default scale choice

Release 3.1 - August 09, 2019

  • ENHANCEMENTS:

    • Added -o option to extractData to specify the output filename

    • Upgrading third-party lib

    • Improving slum parsing

  • BUG FIXES:

    • Fixing HTML links generation to job details pages within ‘Consumers’ plugin

    • Fixed a bug in the computation of cluster running & waiting load in Congestion plugin

    • Fixing bug within Throughput plugin

Release 3.0 - June 18, 2019

  • CHANGES:

    • New EULA.

    • New logo and colors added

    • Duration displayed as human readable values for all plugins

    • Removed the –noindex and –fullmenu options

    • Supported OS now includes Ubuntu 16.04 and 18.04

  • ENHANCEMENTS:

    • New design for the report (new menu etc.)

    • New Slowdown CDF representation

    • Allowing users to specify their own logo within conf/imgs dir as logo.[png,jpg]

    • Handling “run profile”: Users can now define a “.profile” file and call it using “@name.profile”

    • The command line used to generate the report can now be found in the help page

    • New congestion/contention plugin to analyze how the cluster is used over time versus the requested computed resources

  • BUG FIXES:

    • Fixed slowdown computation when the eligible time was not available

Release 2.2 - April 12, 2019

  • CHANGES:

    • Updated html / css to handle div, container, menu the same way within every plugin.

    • Data filtering : Allow filtering on dates (==, !=, >, >=, <=, <), and updated allowed filters for strings (>, >=, <=, <).

  • ENHANCEMENTS:

    • Improving report html’s files rendering time.

    • Reducing memory usage during analysis.

    • Reducing the size of generated html files (new ‘minify’ option).

    • New packaging system in place for easy build and release of AIT.

  • BUG FIXES:

    • Removing destination folder prior to generation of new report.

    • Removing illegal characters from html menu’s links regarding groupDetails.

    • Resources plugin , without node analysis crashes.

Release 2.1 - January 11, 2019

  • CHANGES:

    • Throughput Submission Frequency displayed within tabs.

    • Concurrent Users displayed within tabs.

  • ENHANCEMENTS:

    • Added zoom and pan for all time series graphs (e.g., Cluster Load, Throughput Submission Frequency…).

  • BUG FIXES:

    • Torque/PBS: better handling of non utf-8 characters.

    • SLURM: Extract job scheduler data in local time instead of UTC.

    • Prevent crash when handling very long job name.

    • File not properly moved into bin folder during update.

Release 2.0 - November 5, 2018

  • CHANGES:

    • Analyze-IT has been entirely re-written: analysis are now seen as plugins, custom plugins can be added.

    • Grid Engine is now supported. The analysis relies on the accounting file (e.g., /usr/share/gridengine/default/common/accounting). Note that in the generated reports, Parallel Environments will be displayed as QOS, and Queues as Partitions.

    • The -t (–top) option has been removed for now. The analysis now returns all the values and not only the -t <top> ones. This leads to longer analysis, but also more accurate ones.

    • The cluster optimization index has been removed, it will be soon replaced by configurable KPIs (stay tuned!).

    • The name of the analysis and of some options have changed.

    • You can change the order in which the analysis are shown in the reports by changing their order in the -a option.

    • New analysis

    • Memory analysis (if the information is in the logs).

    • Weekday and hour of the day for job submission

  • ENHANCEMENTS:

    • Pages have been modified to be more lightweight whenever possible, and have been reorganized.

    • To extract data from the job scheduler, you can now directly use bin/extractData <JS>, instead of <version>/jobschedulers/extractData<JS>.sh

    • Graphs in the throughput analysis can now be displayed with a linear or logarithmic scale.

    • Faster analysis and a lower memory usage

    • Added statistical information in tables for many analysis (concurrent users, throughput, resources consumption…)

    • Reorganized command line options per analysis types, which should make them clearer.

    • Start and end dates (-s and -e options) can now be specified without any hour. In this case the start of the day (00:00:00) is used for the start date, and the end of the day (23:59:59) for the end date.

    • The number of jobs displayed by the graphs in the groupDetails analysis are limited by default to 5000 per graph to speed up page loading. You can increase this number with the -gds option.

    • The cluster load graph now displays the maximum number of cores used as a red line, and (if it has been specified) the total number of available cores as a black line.

  • BUG FIXES:

    • Many small bug fixes and typos have been corrected in all the analysis, and in the job scheduler parsers.

Release 1.1-r3 - September 24, 2018

  • BUG FIXES:

    • Fixed a bug that occurs when the slowdown or cluster load scores cannot be computed

    • Fixed data filtering on categorical columns (e.g., State)

Release 1.1-r2 - September 18, 2018

  • ENHANCEMENTS:

    • Script extractSlurmData.sh now has a -e/–end option to specify the end time.

  • BUG FIXES:

    • Fixed -mnp option: the specified number of processes wasn’t taken into account when generating html files in parallel

Release 1.1 - August 28, 2018

  • CHANGES:

    • Input data filtering can now be done using multiple operators (~=|!=|==|<=|>=|<|>)

    • Added report of number of allocated cores per job

    • Added report of requested vs. consumed memory per job

  • ENHANCEMENTS:

    • Added failed jobs to the list jobs analyzed for resubmission

    • Added QOS to the list of consumers

    • Added statistical information about the cluster load

    • Slowdown is now computed in two different ways: considering the submission time or the eligible time to compute the waiting time

    • Added an option to save the data once filtered into a pickle file (allows for faster run of multiple analysis on the same filtered data)

    • Tables in “consumers” are now dynamic, it is possible to sort, filter, paginate…

    • Better support for Slurm <15

    • Faster parsing of Slurm accounting logs

    • Faster computation of distances between job names when using option -gj

    • New installation directory organization to ease update procedures

  • BUG FIXES:

    • Fixed various graphical bugs in the web pages

    • Fixed handling of errors

    • Correctly parse job steps in Slurm logs

Release 1.0 - February 6, 2018

First version of Analyze-IT. See documentation for a complete description of Analyze-IT features.

Past Releases Predict-IT

Information on Predict-IT is available at:

https://ucit.fr/index.php/predict-it/

Release 1.6 - June 01, 2021

  • ENHANCEMENTS:

    • Updating extractSlurmData script:

      • Handling unknown sacct/scontrol location

      • Now retrieving all available fields but User and Group by default

    • Updating Slurm parser to use –duplicates option and handle duplicated JobIDs

    • Updating extractSlurmDataBeforeTraining to handle –dupliactes

  • BUG FIXES:

    • Fixed extractSlurmData batch retrieval method

Release 1.5 - July 9, 2020

  • NEW:

    • Added support for “data enhancers”. You can provide csv files that contains data to be added to the job scheduler parsed data, given a pivot column and a column of new values or a Python file to add any type column or modify the content of the job scheduler’s data. New features are then automatically used for training. This allows for example the addition of application specific metrics.

    • Added 15 new features related to target statistis using data enhancers.

    • Predict-IT now trains the selected model once more with all data available, instead of keeping a model trained only on the train dataset (thus missing the jobs in the test dataset). This allows for better accuracies in production.

    • Added the possibility to launch multiple Predict-IT in parallel with different configurations. This allows to build specific predictors (e.g., per application), or to test multiple configurations.

  • ENHANCEMENTS:

    • Added new configuration options for model features and bins selection.

    • Added support for slurm WCKEYS column

    • Added support for automatic batch retrieval as smaller requests for big sacct call

  • BUG FIXES:

    • Fixed tracking script: tracking dates were not updated every time

    • Fixed Slurm 19 error while extracting data

    • Fixed an error that popped when the server configuration file was specified on the command line, and the default path did not exist.

    • Various bug fixes related to missing data in the input data

Release 1.4 - March 11, 2020

  • CHANGES:

  • Added filter on input dataset.

  • ENHANCEMENTS:

  • Added last training status on metrics page.

  • Added server parameters in help page.

  • Added a way to run multiple PIT configurations using server.conf files with specific port, host and dirName.

  • Added comparison between previous and newly trained models.

  • Added a way to turn on / off the use of balanced bins directly within server.conf.

  • BUG FIXES:

  • Error when computing F1-score for unique class (e.g. after filtering).

Release 1.3 - December 17, 2019

  • CHANGES:

    • Added wait time and time to result prediction.

    • Added tracking-script functionality to collect jobs since last X hours, call for predictions on each of them and visualize them.

    • Added option in server.conf to save training “test” metrics as csv files (Obs, Pred, Confidence).

    • Added option in server.conf to define the total CPU number of the cluster.

  • ENHANCEMENTS:

    • Added 2 RMSE calculations in metrics.

    • Added unweighted accuracy gauge in metrics to investigate global accuracy without taking into account the support for each class.

    • Redesigned help page.

  • BUG FIXES:

    • BAD_REQUEST when requesting prediction using predictit.client.

    • Missing column when handling SGE.

    • Fixing Comparison page.

Release 1.2 - July 26, 2019

  • CHANGES:

    • Added AdaBoost in the list of available algorithms.

    • New EULA.

    • New logo added.

  • ENHANCEMENTS:

    • Script extractSlurmData.sh now has a -e/–end option to specify the end time.

    • To extract data from the job scheduler, you can now directly use bin/extractData <JS>, instead of <version>/jobschedulers/extractData<JS>.sh

    • It is now possible to ask for the last N days of data instead of having to define a specific starttime.

  • BUG FIXES:

    • Many small bug fixes and typos have been corrected in all the in the job scheduler parsers.

    • Torque/PBS: better handling of non utf-8 characters.

    • SLURM: Extract job scheduler data in local time instead of UTC.

Release 1.1 - August 28, 2018

  • CHANGES:

    • Added memory consumption (Max RSS) prediction.

    • Predictions now come with a confidence level (from 0 to 1).

    • init.d scripts.

  • ENHANCEMENTS:

    • Better support for Slurm <15.

    • Faster parsing of Slurm accounting logs.

    • Metrics web pages now display a global indicator showing the accuracy of the model, along with a graph showing the evolution of this metric through time.

    • New installation directory organization to ease update procedures.

  • BUG FIXES:

    • Accessing the metrics web pages when the models haven’t been trained do not display a JSON error anymore.

    • Fixed various graphical bugs in the metrics web pages.

    • Fixed bug that happened when cv=False.

    • Correctly parse job steps in Slurm logs.

Release 1.0 - February 6, 2018

First version of Predict-IT. See documentation for a complete description of Predict-IT features.