Hyperparameter Searches
A general framework to perform hyperparameter searches on single- and multi-GPU systems
Note, we currently only support simple grid searches.
How to run a hyperparameter search
The main script in this package is hypnettorch.hpsearch.hpsearch
.
$ python -m hypnettorch.hpsearch.hpsearch --help
Though, before being able to run a hyperparameter search, the search grid has to be configured. Therefore, your simulation has its own implementation of the configuration file hypnettorch.hpsearch.hpsearch_config_template
. Please refer to the corresponding documentation to obtain information on how to configure a hyperparameter search.
Execute on a single- or multi-GPU system without job scheduling
The simplest way of execution is to run all hyperparameter configurations sequentially in the foreground. For instance, on a computer without GPUs, you could start the hpsearch on the CPU as follows
$ python -m hypnettorch.hpsearch.hpsearch --visible_gpus=-1
Though, assuming that your simulations automatically run on a visible GPU, you can also apply this sequential foreground execution to a GPU of your choice (e.g., GPU 2):
$ CUDA_VISIBLE_DEVICES=2 python -m hypnettorch.hpsearch.hpsearch --visible_gpus=-1
Alternatively, the hpsearch may assign GPU ressources to jobs. In this case, multiple hyperparameter configurations may run in parallel (on multiple GPUs as well as multiple runs per GPU). For this operation mode, you are required to install the package GPUtil.
Please carefully study the arguments of the hpsearch.
$ python -m hypnettorch.hpsearch.hpsearch --help
Assume you may want to run your search on GPUs 0,1,2,7 and that there should be a hard limit of 5 jobs assigned to a GPU by the hpsearch (which you decide based on available CPU and RAM ressources). Note, option --max_num_jobs_per_gpu
currently does not account for other processes that may be running on the GPU which are not assigned by this hpsearch. In addition, a run may only be assigned to a GPU if at maximum 75% of its memory is in use and its compute utilization is maximally at 60%. Since runs take some time to properly startup and allocate GPU ressources, you additionally specify argument --sim_startup_time
. Every time a job is assigned to a GPU, this time has to pass before a new job may be assigned (such that the first job had time to acquire GPU memory and compute ressources)
$ python -m hypnettorch.hpsearch.hpsearch --visible_gpus=0,1,2,7 --max_num_jobs_per_gpu=5 --allowed_memory=0.75 --allowed_load=0.6 --sim_startup_time=30
Execute on a cluster with IBM Platform LSF
You may also run the hpsearch on a cluster that uses the IBM Platform LSF job scheduler. In this case, you have to install the package bsub. To tell the hpsearch that should schedule jobs via bsub
, simply append the options --run_cluster --scheduler=lsf
. Here is an example call:
$ bsub -n 1 -W 120:00 -e hpsearch_mysim.err -o hpsearch_mysim.out -R "rusage[mem=8000]" python -m hypnettorch.hpsearch.hpsearch --grid_module=my_hpsearch_config --run_cluster --scheduler=lsf --num_jobs=50 --num_hours=24 --num_searches=1000 --resources="\"rusage[mem=8000, ngpus_excl_p=1]\""
In the example above, the hpsearch should run for 120 hours on the cluster, requiring 8GB of RAM during that time. Individual jobs will run for 24 hours. The hpsearch will maximally explore 1000 hyperparameter configurations. At most 50 jobs will be scheduled in parallel (new jobs will be scheduled as soon as old ones finished until the hard limit of 1000 runs is reached). Each job will require 1 GPU and 8GB of RAM.
Execute on a cluster with Slurm Workload Manager
The hpsearch can also be run on a cluster with the SLURM job scheduler via the arguments --run_cluster --scheduler=slurm
. Therefore, simply create a job script my_hpsearch.sh
for the hpsearch as follows
#!/bin/bash
#SBATCH --job-name=hpsearch
#SBATCH --output=hpsearch_%j.out
#SBATCH --error=hpsearch_%j.err
#SBATCH --time=24:00:00
#SBATCH --mem=8G
python -m hypnettorch.hpsearch.hpsearch --grid_module=my_hpsearch_config --run_cluster --scheduler=slurm --slurm_mem=8G --slurm_gres=gpu:1 --num_jobs=25 --num_hours=4
The hpsearch can be executed via the command:
$ sbatch my_hpsearch.sh
Execute on a cluster with unsupported job scheduler
Unfortunately, you can only execute the hpsearch on a cluster with unsupported job scheduler in the sequential foreground mode via --visible_gpus=-1
. For instance, on a cluster running the SLURM job scheduler (note, SLURM is supported, see above) you can run the hpsearch in sequential forground mode via a script my_hpsearch.sh
:
#!/bin/bash
#SBATCH --job-name=hpsearch
#SBATCH --output=hpsearch_%j.out
#SBATCH --error=hpsearch_%j.err
#SBATCH --time=120:00:00
#SBATCH --mem=8G
#SBATCH --gres gpu:1
python -m hypnettorch.hpsearch.hpsearch --grid_module=my_hpsearch_config --visible_gpus=-1
Note, in this case, you request the ressources required for your jobs for the hpsearch itself. Now, you could execute the hpsearch via
$ sbatch my_hpsearch.sh
Postprocessing
The post processing script hypnettorch.hpsearch.hpsearch_postprocessing
is currently very rudimentary. Its most important task is to make sure that the results of all completed runs are listed in a CSV file (note, that the hpsearch might be killed prematurely while some jobs are still running).
Please checkout
$ python3 -m hypnettorch.hpsearch.hpsearch_postprocessing --help
How to use this framework with your simulation
In order to utilize the scripts in this subpackage, you have to create a copy of the template hypnettorch.hpsearch.hpsearch_config_template
and fill the template with content as described inside the module. For instance, see probabilistic.prob_mnist.hpsearch_config_split_bbb as an example.
Additionally, you need to make sure that your simulation has a command-line option like --out_dir
(that specifies the output directory) and that your simulation writes a performance summary file, that can be used to evaluate simulations.
Gather random seeds for a given experiment
This script can be used to gather random seeds for a given configuration. Thus, it is intended to test the robustness of this certain configuration.
The configuration can either be provided directly, or the path to a simulation
output folder or hyperparameter search output folder is provided. A simulation
output folder is recognized by the file config.pickle
which contains the
configuration, i.e., all command-line arguments (cf. function
hypnettorch.sim_utils.setup_environment()
). If a hyperparameter search
output folder (cf. hypnettorch.hpsearch.hpsearch
) is provided, the best
run will be selected.
Example 1: Assume you are in the simulation directory and want to start the
random seed gathering from there for a simulation in folder
./out/example_run
. Note, we assume here that the base run in
./out/example_run
finished successfully and can already be used
as 1 random seed.
$ python -m hypnettorch.hpsearch.gather_random_seeds --grid_module=my_hpsearch_config --run_dir=./out/example_run --num_seeds=10 | tee /dev/tty | awk 'END{print}' | xargs bash -c 'echo --grid_module=$0 --grid_config=$1 --force_out_dir --dont_force_new_dir --out_dir=$2' | xargs python -m hypnettorch.hpsearch.hpsearch
Example 2: Alternatively, the hpsearch can be started directly via the
option --start_gathering
.
$ python -m hypnettorch.hpsearch.gather_random_seeds --grid_module=my_hpsearch_config --run_dir=./out/example_run --num_seeds=4 --start_gathering --config_name=example_run_seed_gathering
Example 3: An example instantiation of this script can be found in module probabilistic.regression.gather_seeds_bbb.
- hypnettorch.hpsearch.gather_random_seeds.build_grid_and_conditions(cmd_args, config, seeds_list)[source]
Build the hpconfig for the random seed gathering.
- Parameters:
cmd_args – CLI arguments of this script.
config – The config to be translated into a search grid.
seeds_list (list) – The random seeds to be gathered.
(tuple): Tuple containing:
grid (dict): The search grid.
conditions (list): Constraints for the search grid.
- hypnettorch.hpsearch.gather_random_seeds.get_best_hpsearch_config(out_dir)[source]
Load the config file from the best run of a hyperparameter search.
This file loads the results of the hyperparameter search, and select the configuration that lead to the best performance score.
- hypnettorch.hpsearch.gather_random_seeds.get_hpsearch_call(cmd_args, num_seeds, grid_config, hpsearch_dir=None)[source]
Generate the command line for the hpsearch.
- hypnettorch.hpsearch.gather_random_seeds.get_single_run_config(out_dir)[source]
Load the config file from a specified experiment.
- Parameters:
out_dir (str) – The path to the experiment.
- Returns:
The Namespace object containing argument names and values.
- hypnettorch.hpsearch.gather_random_seeds.run(grid_module=None, results_dir='./out/random_seeds', config=None, ignore_kwds=None, forced_params=None, summary_keys=None, summary_sem=False, summary_precs=None, hpmod_path=None)[source]
Run the script.
- Parameters:
grid_module (str, optional) – Name of the reference module which contains the hyperparameter search config that can be modified to gather random seeds.
results_dir (str, optional) – The path where the hpsearch should store its results.
config – The Namespace object containing argument names and values. If provided, all random seeds will be gathered from zero, with no reference run.
ignore_kwds (list, optional) – A list of keywords in the config file to exclude from the grid.
forced_params (dict, optional) – Dict of key-value pairs specifying hyperparameter values that should be fixed across runs.
summary_keys (list, optional) – If provided, those mean and std of those summary keys will be written by function
write_seeds_summary()
. Otherwise, the performance key defined ingrid_module
will be used.summary_sem (bool) – Whether SEM or SD should be calculated in function
write_seeds_summary()
.summary_precs (list or int, optional) – The precision with which the summary statistics according to
summary_keys
should be listed.hpmod_path (str, optional) – If the hpsearch doesn’t reside in the same directory as the calling script, then we need to know from where to start the hpsearch.
- hypnettorch.hpsearch.gather_random_seeds.write_seeds_summary(results_dir, summary_keys, summary_sem, summary_precs, ret_seeds=False, summary_fn=None, seeds_summary_fn='seeds_summary_text.txt')[source]
Write the MEAN and STD (resp. SEM) while aggregating all seeds to text file.
- Parameters:
results_dir (str) – The results directory.
summary_keys (list) – See argument
summary_keys
of functionrun()
.summary_sem (bool) – See argument
summary_sem
of functionrun()
.summary_precs (list or int, optional) – See argument
summary_precs
of functionrun()
.summary_fn (str, optional) – If given, this will determine the name of the summary file within individual runs.
seeds_summmary_fn (str, optional) – The name to give to the summary file across all seeds.
ret_seeds (bool, optional) – If activated, the random seeds of all considered runs are returned as a list.
Hyperparameter Search Configuration File
Define exceptions for the grid search. |
|
Parameter grid for grid search. |
Note, this is just a template for a hyperparameter configuration and not an actual source file.
A configuration file for our custom hyperparameter search script
hypnettorch.hpsearch.hpsearch
. To setup a configuration file for your
simulation, simply create a copy of this template and follow the instructions in
this file to fill all defined attributes.
Once the configuration is setup for your simulation, you simply need to modify
the fields grid
and conditions
to prepare a new grid search.
Note, if you are implementing this template for the first time, you also have to modify the code below the “DO NOT CHANGE THE CODE BELOW” section. Normal users may not change the code below this heading.
- hypnettorch.hpsearch.hpsearch_config_template.conditions = [({'option1': [1]}, {'option2': [-1]})]
Define exceptions for the grid search.
Sometimes, not the whole grid should be searched. For instance, if an SGD optimizer has been chosen, then it doesn’t make sense to search over multiple beta2 values of an Adam optimizer. Therefore, one can specify special conditions or exceptions. Note* all conditions that are specified here will be enforced. Thus, they overwrite the
grid
options above.How to specify a condition? A condition is a key value tuple: whereas as the key as well as the value is a dictionary in the same format as in the
grid
above. If any configurations matches the values specified in the “key” dict, the values specified in the “values” dict will be searched instead.Note, if arguments are commented out above but appear in the conditions, the condition will be ignored.
Also keep in mind, that the hpsearch is not checking for conflicting conditions and they are enforced sequentially. For instance, assume condition 2 would change commands such that condition 1 would fire again. But condition 1 is never tested again, so these commands would make it into the final hpsearch (unless later conditions modify them again).
- hypnettorch.hpsearch.hpsearch_config_template.grid = {'flag_option': [False, True], 'float_option': [0.5, 1.0], 'string_option': ['"example string"', '"another string"']}
Parameter grid for grid search.
Define a dictionary with parameter names as keys and a list of values for each parameter. For flag arguments, simply use the values
[False, True]
. Note, the output directory is set by the hyperparameter search script. Therefore, it always assumes that the argument –out_dir exists and you should not add out_dir to this grid!Example
grid = {'option1': [10], 'option2': [0.1, 0.5], 'option3': [False, True]}
This dictionary would correspond to the following 4 configurations:
python3 SCRIPT_NAME.py --option1=10 --option2=0.1 python3 SCRIPT_NAME.py --option1=10 --option2=0.5 python3 SCRIPT_NAME.py --option1=10 --option2=0.1 --option3 python3 SCRIPT_NAME.py --option1=10 --option2=0.5 --option3
If fields are commented out (missing), the default value is used. Note, that you can specify special
conditions
below.
Hyperparameter Search - Postprocessing
A postprocessing for a hyperparameter search that has been executed via the
script hypnettorch.hpsearch.hpsearch
.
Hyperparameter Search Script
A very simple hyperparameter search. The results will be gathered as a CSV file.
Here is an example on how to start an hyperparameter search on a cluster using
bsub
:
$ bsub -n 1 -W 48:00 -e hpsearch.err -o hpsearch.out \
-R "rusage[mem=8000]" \
python -m hypnettorch.hpsearch.hpsearch --run_cluster --num_jobs=20
For more demanding jobs (e.g., ImageNet), one may request more resources:
$ bsub -n 1 -W 96:00 -e hpsearch.err -o hpsearch.out \
-R "rusage[mem=16000]" \
python -m hypnettorch.hpsearch.hpsearch --run_cluster --num_jobs=20 \
--num_hours=48 --resources="\"rusage[mem=8000, ngpus_excl_p=1]\""
Please fill in the grid parameters in the corresponding config file (see command line argument grid_module).
- hypnettorch.hpsearch.hpsearch.hpsearch_cli_arguments(parser, show_num_searches=True, show_out_dir=True, dout_dir='./out/hyperparam_search', show_grid_module=True)[source]
The CLI arguments of the hpsearch.