Hyperparameter Searches 

A general framework to perform hyperparameter searches on single- and multi-GPU systems 

Note, we currently only support simple grid searches.

How to run a hyperparameter search 

The main script in this package is hypnettorch.hpsearch.hpsearch.

$ python -m hypnettorch.hpsearch.hpsearch --help

Though, before being able to run a hyperparameter search, the search grid has to be configured. Therefore, your simulation has its own implementation of the configuration file hypnettorch.hpsearch.hpsearch_config_template. Please refer to the corresponding documentation to obtain information on how to configure a hyperparameter search.

Execute on a single- or multi-GPU system without job scheduling 

The simplest way of execution is to run all hyperparameter configurations sequentially in the foreground. For instance, on a computer without GPUs, you could start the hpsearch on the CPU as follows

$ python -m hypnettorch.hpsearch.hpsearch --visible_gpus=-1

Though, assuming that your simulations automatically run on a visible GPU, you can also apply this sequential foreground execution to a GPU of your choice (e.g., GPU 2):

$ CUDA_VISIBLE_DEVICES=2 python -m hypnettorch.hpsearch.hpsearch --visible_gpus=-1

Alternatively, the hpsearch may assign GPU ressources to jobs. In this case, multiple hyperparameter configurations may run in parallel (on multiple GPUs as well as multiple runs per GPU). For this operation mode, you are required to install the package GPUtil.

Please carefully study the arguments of the hpsearch.

$ python -m hypnettorch.hpsearch.hpsearch --help

Assume you may want to run your search on GPUs 0,1,2,7 and that there should be a hard limit of 5 jobs assigned to a GPU by the hpsearch (which you decide based on available CPU and RAM ressources). Note, option --max_num_jobs_per_gpu currently does not account for other processes that may be running on the GPU which are not assigned by this hpsearch. In addition, a run may only be assigned to a GPU if at maximum 75% of its memory is in use and its compute utilization is maximally at 60%. Since runs take some time to properly startup and allocate GPU ressources, you additionally specify argument --sim_startup_time. Every time a job is assigned to a GPU, this time has to pass before a new job may be assigned (such that the first job had time to acquire GPU memory and compute ressources)

$ python -m hypnettorch.hpsearch.hpsearch --visible_gpus=0,1,2,7 --max_num_jobs_per_gpu=5 --allowed_memory=0.75 --allowed_load=0.6 --sim_startup_time=30

Execute on a cluster with IBM Platform LSF 

You may also run the hpsearch on a cluster that uses the IBM Platform LSF job scheduler. In this case, you have to install the package bsub. To tell the hpsearch that should schedule jobs via bsub, simply append the options --run_cluster --scheduler=lsf. Here is an example call:

$ bsub -n 1 -W 120:00 -e hpsearch_mysim.err -o hpsearch_mysim.out -R "rusage[mem=8000]" python -m hypnettorch.hpsearch.hpsearch --grid_module=my_hpsearch_config --run_cluster --scheduler=lsf --num_jobs=50 --num_hours=24 --num_searches=1000 --resources="\"rusage[mem=8000, ngpus_excl_p=1]\""

In the example above, the hpsearch should run for 120 hours on the cluster, requiring 8GB of RAM during that time. Individual jobs will run for 24 hours. The hpsearch will maximally explore 1000 hyperparameter configurations. At most 50 jobs will be scheduled in parallel (new jobs will be scheduled as soon as old ones finished until the hard limit of 1000 runs is reached). Each job will require 1 GPU and 8GB of RAM.

Execute on a cluster with Slurm Workload Manager 

The hpsearch can also be run on a cluster with the SLURM job scheduler via the arguments --run_cluster --scheduler=slurm. Therefore, simply create a job script my_hpsearch.sh for the hpsearch as follows

#!/bin/bash
#SBATCH --job-name=hpsearch
#SBATCH --output=hpsearch_%j.out
#SBATCH --error=hpsearch_%j.err
#SBATCH --time=24:00:00
#SBATCH --mem=8G
python -m hypnettorch.hpsearch.hpsearch --grid_module=my_hpsearch_config --run_cluster --scheduler=slurm --slurm_mem=8G --slurm_gres=gpu:1 --num_jobs=25 --num_hours=4

The hpsearch can be executed via the command:

$ sbatch my_hpsearch.sh

Execute on a cluster with unsupported job scheduler 

Unfortunately, you can only execute the hpsearch on a cluster with unsupported job scheduler in the sequential foreground mode via --visible_gpus=-1. For instance, on a cluster running the SLURM job scheduler (note, SLURM is supported, see above) you can run the hpsearch in sequential forground mode via a script my_hpsearch.sh:

#!/bin/bash
#SBATCH --job-name=hpsearch
#SBATCH --output=hpsearch_%j.out
#SBATCH --error=hpsearch_%j.err
#SBATCH --time=120:00:00
#SBATCH --mem=8G
#SBATCH --gres gpu:1
python -m hypnettorch.hpsearch.hpsearch --grid_module=my_hpsearch_config --visible_gpus=-1

Note, in this case, you request the ressources required for your jobs for the hpsearch itself. Now, you could execute the hpsearch via

$ sbatch my_hpsearch.sh

Postprocessing 

The post processing script hypnettorch.hpsearch.hpsearch_postprocessing is currently very rudimentary. Its most important task is to make sure that the results of all completed runs are listed in a CSV file (note, that the hpsearch might be killed prematurely while some jobs are still running).

Please checkout

$ python3 -m hypnettorch.hpsearch.hpsearch_postprocessing --help

How to use this framework with your simulation 

In order to utilize the scripts in this subpackage, you have to create a copy of the template hypnettorch.hpsearch.hpsearch_config_template and fill the template with content as described inside the module. For instance, see probabilistic.prob_mnist.hpsearch_config_split_bbb as an example.

Additionally, you need to make sure that your simulation has a command-line option like --out_dir (that specifies the output directory) and that your simulation writes a performance summary file, that can be used to evaluate simulations.

Gather random seeds for a given experiment 

This script can be used to gather random seeds for a given configuration. Thus, it is intended to test the robustness of this certain configuration.

The configuration can either be provided directly, or the path to a simulation output folder or hyperparameter search output folder is provided. A simulation output folder is recognized by the file config.pickle which contains the configuration, i.e., all command-line arguments (cf. function hypnettorch.sim_utils.setup_environment()). If a hyperparameter search output folder (cf. hypnettorch.hpsearch.hpsearch) is provided, the best run will be selected.

Example 1: Assume you are in the simulation directory and want to start the random seed gathering from there for a simulation in folder ./out/example_run. Note, we assume here that the base run in ./out/example_run finished successfully and can already be used as 1 random seed.

$ python -m hypnettorch.hpsearch.gather_random_seeds --grid_module=my_hpsearch_config --run_dir=./out/example_run --num_seeds=10 | tee /dev/tty | awk 'END{print}' | xargs bash -c 'echo --grid_module=$0 --grid_config=$1 --force_out_dir --dont_force_new_dir --out_dir=$2' | xargs python -m hypnettorch.hpsearch.hpsearch

Example 2: Alternatively, the hpsearch can be started directly via the option --start_gathering.

$ python -m hypnettorch.hpsearch.gather_random_seeds --grid_module=my_hpsearch_config --run_dir=./out/example_run --num_seeds=4 --start_gathering --config_name=example_run_seed_gathering

Example 3: An example instantiation of this script can be found in module probabilistic.regression.gather_seeds_bbb.

hypnettorch.hpsearch.gather_random_seeds.build_grid_and_conditions(cmd_args, config, seeds_list)[source]

Build the hpconfig for the random seed gathering.

Parameters:

cmd_args – CLI arguments of this script.
config – The config to be translated into a search grid.
seeds_list (list) – The random seeds to be gathered.

(tuple): Tuple containing:

grid (dict): The search grid.

conditions (list): Constraints for the search grid.

hypnettorch.hpsearch.gather_random_seeds.get_best_hpsearch_config(out_dir)[source]

Load the config file from the best run of a hyperparameter search.

This file loads the results of the hyperparameter search, and select the configuration that lead to the best performance score.

Parameters:

out_dir (str) – The path to the hpsearch result folder.

Returns:

Tuple containing:

config: The config of the best run.
best_out_dir: The path to the best run.

Return type:

(tuple)

hypnettorch.hpsearch.gather_random_seeds.get_hpsearch_call(cmd_args, num_seeds, grid_config, hpsearch_dir=None)[source]

Generate the command line for the hpsearch.

Parameters:

cmd_args – The command line arguments.
num_seeds (int) – Number of searches.
grid_config (str) – Location of search grid.
hpsearch_dir (str, optional) – Where the hpsearch should write its results to.

Returns:

The command line to be executed.

Return type:

(str)

hypnettorch.hpsearch.gather_random_seeds.get_single_run_config(out_dir)[source]

Load the config file from a specified experiment.

Parameters:: out_dir (str) – The path to the experiment.
Returns:: The Namespace object containing argument names and values.

hypnettorch.hpsearch.gather_random_seeds.run(grid_module=None, results_dir='./out/random_seeds', config=None, ignore_kwds=None, forced_params=None, summary_keys=None, summary_sem=False, summary_precs=None, hpmod_path=None)[source]

Run the script.

Parameters:

grid_module (str, optional) – Name of the reference module which contains the hyperparameter search config that can be modified to gather random seeds.
results_dir (str, optional) – The path where the hpsearch should store its results.
config – The Namespace object containing argument names and values. If provided, all random seeds will be gathered from zero, with no reference run.
ignore_kwds (list, optional) – A list of keywords in the config file to exclude from the grid.
forced_params (dict, optional) – Dict of key-value pairs specifying hyperparameter values that should be fixed across runs.
summary_keys (list, optional) – If provided, those mean and std of those summary keys will be written by function write_seeds_summary(). Otherwise, the performance key defined in grid_module will be used.
summary_sem (bool) – Whether SEM or SD should be calculated in function write_seeds_summary().
summary_precs (list or int, optional) – The precision with which the summary statistics according to summary_keys should be listed.
hpmod_path (str, optional) – If the hpsearch doesn’t reside in the same directory as the calling script, then we need to know from where to start the hpsearch.

hypnettorch.hpsearch.gather_random_seeds.write_seeds_summary(results_dir, summary_keys, summary_sem, summary_precs, ret_seeds=False, summary_fn=None, seeds_summary_fn='seeds_summary_text.txt')[source]

Write the MEAN and STD (resp. SEM) while aggregating all seeds to text file.

Parameters:

results_dir (str) – The results directory.
summary_keys (list) – See argument summary_keys of function run().
summary_sem (bool) – See argument summary_sem of function run().
summary_precs (list or int, optional) – See argument summary_precs of function run().
summary_fn (str, optional) – If given, this will determine the name of the summary file within individual runs.
seeds_summmary_fn (str, optional) – The name to give to the summary file across all seeds.
ret_seeds (bool, optional) – If activated, the random seeds of all considered runs are returned as a list.

Hyperparameter Search Configuration File 

`hypnettorch.hpsearch.hpsearch_config_template.conditions`	Define exceptions for the grid search.
`hypnettorch.hpsearch.hpsearch_config_template.grid`	Parameter grid for grid search.

Note, this is just a template for a hyperparameter configuration and not an actual source file.

A configuration file for our custom hyperparameter search script hypnettorch.hpsearch.hpsearch. To setup a configuration file for your simulation, simply create a copy of this template and follow the instructions in this file to fill all defined attributes.

Once the configuration is setup for your simulation, you simply need to modify the fields grid and conditions to prepare a new grid search.

Note, if you are implementing this template for the first time, you also have to modify the code below the “DO NOT CHANGE THE CODE BELOW” section. Normal users may not change the code below this heading.

hypnettorch.hpsearch.hpsearch_config_template.conditions = [({'option1': [1]}, {'option2': [-1]})]

Define exceptions for the grid search.

Sometimes, not the whole grid should be searched. For instance, if an SGD optimizer has been chosen, then it doesn’t make sense to search over multiple beta2 values of an Adam optimizer. Therefore, one can specify special conditions or exceptions. Note* all conditions that are specified here will be enforced. Thus, they overwrite the grid options above.

How to specify a condition? A condition is a key value tuple: whereas as the key as well as the value is a dictionary in the same format as in the grid above. If any configurations matches the values specified in the “key” dict, the values specified in the “values” dict will be searched instead.

Note, if arguments are commented out above but appear in the conditions, the condition will be ignored.

Also keep in mind, that the hpsearch is not checking for conflicting conditions and they are enforced sequentially. For instance, assume condition 2 would change commands such that condition 1 would fire again. But condition 1 is never tested again, so these commands would make it into the final hpsearch (unless later conditions modify them again).

hypnettorch.hpsearch.hpsearch_config_template.grid = {'flag_option': [False, True], 'float_option': [0.5, 1.0], 'string_option': ['"example string"', '"another string"']}

Parameter grid for grid search.

Define a dictionary with parameter names as keys and a list of values for each parameter. For flag arguments, simply use the values [False, True]. Note, the output directory is set by the hyperparameter search script. Therefore, it always assumes that the argument –out_dir exists and you should not add out_dir to this grid!

Example

grid = {'option1': [10], 'option2': [0.1, 0.5],
        'option3': [False, True]}

This dictionary would correspond to the following 4 configurations:

python3 SCRIPT_NAME.py --option1=10 --option2=0.1
python3 SCRIPT_NAME.py --option1=10 --option2=0.5
python3 SCRIPT_NAME.py --option1=10 --option2=0.1 --option3
python3 SCRIPT_NAME.py --option1=10 --option2=0.5 --option3

If fields are commented out (missing), the default value is used. Note, that you can specify special conditions below.

Hyperparameter Search - Postprocessing 

A postprocessing for a hyperparameter search that has been executed via the script hypnettorch.hpsearch.hpsearch.

Hyperparameter Search Script 

A very simple hyperparameter search. The results will be gathered as a CSV file.

Here is an example on how to start an hyperparameter search on a cluster using bsub:

$ bsub -n 1 -W 48:00 -e hpsearch.err -o hpsearch.out \
  -R "rusage[mem=8000]" \
  python -m hypnettorch.hpsearch.hpsearch --run_cluster --num_jobs=20

For more demanding jobs (e.g., ImageNet), one may request more resources:

$ bsub -n 1 -W 96:00 -e hpsearch.err -o hpsearch.out \
  -R "rusage[mem=16000]" \
  python -m hypnettorch.hpsearch.hpsearch --run_cluster --num_jobs=20 \
  --num_hours=48 --resources="\"rusage[mem=8000, ngpus_excl_p=1]\""

Please fill in the grid parameters in the corresponding config file (see command line argument grid_module).

hypnettorch.hpsearch.hpsearch.hpsearch_cli_arguments(parser, show_num_searches=True, show_out_dir=True, dout_dir='./out/hyperparam_search', show_grid_module=True)[source]: The CLI arguments of the hpsearch.

hypnettorch.hpsearch.hpsearch.run(argv=None, dout_dir='./out/hyperparam_search')[source]

Run the hyperparameter search script.

Parameters:

argv (list, optional) – If provided, it will be treated as a list of command-line argument that is passed to the parser in place of sys.argv.
dout_dir (str, optional) – The default value of command-line option --out_dir.

Returns:

The path to the CSV file containing the results of this search.

Return type:

(str)