Hyperparameter Searches

A general framework to perform hyperparameter searches on single- and multi-GPU systems

Note, we currently only support simple grid searches.

Postprocessing

The post processing script hypnettorch.hpsearch.hpsearch_postprocessing is currently very rudimentary. Its most important task is to make sure that the results of all completed runs are listed in a CSV file (note, that the hpsearch might be killed prematurely while some jobs are still running).

Please checkout

$ python3 -m hypnettorch.hpsearch.hpsearch_postprocessing --help

How to use this framework with your simulation

In order to utilize the scripts in this subpackage, you have to create a copy of the template hypnettorch.hpsearch.hpsearch_config_template and fill the template with content as described inside the module. For instance, see probabilistic.prob_mnist.hpsearch_config_split_bbb as an example.

Additionally, you need to make sure that your simulation has a command-line option like --out_dir (that specifies the output directory) and that your simulation writes a performance summary file, that can be used to evaluate simulations.

Gather random seeds for a given experiment

This script can be used to gather random seeds for a given configuration. Thus, it is intended to test the robustness of this certain configuration.

The configuration can either be provided directly, or the path to a simulation output folder or hyperparameter search output folder is provided. A simulation output folder is recognized by the file config.pickle which contains the configuration, i.e., all command-line arguments (cf. function hypnettorch.sim_utils.setup_environment()). If a hyperparameter search output folder (cf. hypnettorch.hpsearch.hpsearch) is provided, the best run will be selected.

Example 1: Assume you are in the simulation directory and want to start the random seed gathering from there for a simulation in folder ./out/example_run. Note, we assume here that the base run in ./out/example_run finished successfully and can already be used as 1 random seed.

$ python -m hypnettorch.hpsearch.gather_random_seeds --grid_module=my_hpsearch_config --run_dir=./out/example_run --num_seeds=10 | tee /dev/tty | awk 'END{print}' | xargs bash -c 'echo --grid_module=$0 --grid_config=$1 --force_out_dir --dont_force_new_dir --out_dir=$2' | xargs python -m hypnettorch.hpsearch.hpsearch

Example 2: Alternatively, the hpsearch can be started directly via the option --start_gathering.

$ python -m hypnettorch.hpsearch.gather_random_seeds --grid_module=my_hpsearch_config --run_dir=./out/example_run --num_seeds=4 --start_gathering --config_name=example_run_seed_gathering

Example 3: An example instantiation of this script can be found in module probabilistic.regression.gather_seeds_bbb.

hypnettorch.hpsearch.gather_random_seeds.build_grid_and_conditions(cmd_args, config, seeds_list)[source]

Build the hpconfig for the random seed gathering.

Parameters:
  • cmd_args – CLI arguments of this script.

  • config – The config to be translated into a search grid.

  • seeds_list (list) – The random seeds to be gathered.

(tuple): Tuple containing:

  • grid (dict): The search grid.

  • conditions (list): Constraints for the search grid.

hypnettorch.hpsearch.gather_random_seeds.get_best_hpsearch_config(out_dir)[source]

Load the config file from the best run of a hyperparameter search.

This file loads the results of the hyperparameter search, and select the configuration that lead to the best performance score.

Parameters:

out_dir (str) – The path to the hpsearch result folder.

Returns:

Tuple containing:

  • config: The config of the best run.

  • best_out_dir: The path to the best run.

Return type:

(tuple)

hypnettorch.hpsearch.gather_random_seeds.get_hpsearch_call(cmd_args, num_seeds, grid_config, hpsearch_dir=None)[source]

Generate the command line for the hpsearch.

Parameters:
  • cmd_args – The command line arguments.

  • num_seeds (int) – Number of searches.

  • grid_config (str) – Location of search grid.

  • hpsearch_dir (str, optional) – Where the hpsearch should write its results to.

Returns:

The command line to be executed.

Return type:

(str)

hypnettorch.hpsearch.gather_random_seeds.get_single_run_config(out_dir)[source]

Load the config file from a specified experiment.

Parameters:

out_dir (str) – The path to the experiment.

Returns:

The Namespace object containing argument names and values.

hypnettorch.hpsearch.gather_random_seeds.run(grid_module=None, results_dir='./out/random_seeds', config=None, ignore_kwds=None, forced_params=None, summary_keys=None, summary_sem=False, summary_precs=None, hpmod_path=None)[source]

Run the script.

Parameters:
  • grid_module (str, optional) – Name of the reference module which contains the hyperparameter search config that can be modified to gather random seeds.

  • results_dir (str, optional) – The path where the hpsearch should store its results.

  • config – The Namespace object containing argument names and values. If provided, all random seeds will be gathered from zero, with no reference run.

  • ignore_kwds (list, optional) – A list of keywords in the config file to exclude from the grid.

  • forced_params (dict, optional) – Dict of key-value pairs specifying hyperparameter values that should be fixed across runs.

  • summary_keys (list, optional) – If provided, those mean and std of those summary keys will be written by function write_seeds_summary(). Otherwise, the performance key defined in grid_module will be used.

  • summary_sem (bool) – Whether SEM or SD should be calculated in function write_seeds_summary().

  • summary_precs (list or int, optional) – The precision with which the summary statistics according to summary_keys should be listed.

  • hpmod_path (str, optional) – If the hpsearch doesn’t reside in the same directory as the calling script, then we need to know from where to start the hpsearch.

hypnettorch.hpsearch.gather_random_seeds.write_seeds_summary(results_dir, summary_keys, summary_sem, summary_precs, ret_seeds=False, summary_fn=None, seeds_summary_fn='seeds_summary_text.txt')[source]

Write the MEAN and STD (resp. SEM) while aggregating all seeds to text file.

Parameters:
  • results_dir (str) – The results directory.

  • summary_keys (list) – See argument summary_keys of function run().

  • summary_sem (bool) – See argument summary_sem of function run().

  • summary_precs (list or int, optional) – See argument summary_precs of function run().

  • summary_fn (str, optional) – If given, this will determine the name of the summary file within individual runs.

  • seeds_summmary_fn (str, optional) – The name to give to the summary file across all seeds.

  • ret_seeds (bool, optional) – If activated, the random seeds of all considered runs are returned as a list.

Hyperparameter Search Configuration File

hypnettorch.hpsearch.hpsearch_config_template.conditions

Define exceptions for the grid search.

hypnettorch.hpsearch.hpsearch_config_template.grid

Parameter grid for grid search.

Note, this is just a template for a hyperparameter configuration and not an actual source file.

A configuration file for our custom hyperparameter search script hypnettorch.hpsearch.hpsearch. To setup a configuration file for your simulation, simply create a copy of this template and follow the instructions in this file to fill all defined attributes.

Once the configuration is setup for your simulation, you simply need to modify the fields grid and conditions to prepare a new grid search.

Note, if you are implementing this template for the first time, you also have to modify the code below the “DO NOT CHANGE THE CODE BELOW” section. Normal users may not change the code below this heading.

hypnettorch.hpsearch.hpsearch_config_template.conditions = [({'option1': [1]}, {'option2': [-1]})]

Define exceptions for the grid search.

Sometimes, not the whole grid should be searched. For instance, if an SGD optimizer has been chosen, then it doesn’t make sense to search over multiple beta2 values of an Adam optimizer. Therefore, one can specify special conditions or exceptions. Note* all conditions that are specified here will be enforced. Thus, they overwrite the grid options above.

How to specify a condition? A condition is a key value tuple: whereas as the key as well as the value is a dictionary in the same format as in the grid above. If any configurations matches the values specified in the “key” dict, the values specified in the “values” dict will be searched instead.

Note, if arguments are commented out above but appear in the conditions, the condition will be ignored.

Also keep in mind, that the hpsearch is not checking for conflicting conditions and they are enforced sequentially. For instance, assume condition 2 would change commands such that condition 1 would fire again. But condition 1 is never tested again, so these commands would make it into the final hpsearch (unless later conditions modify them again).

hypnettorch.hpsearch.hpsearch_config_template.grid = {'flag_option': [False, True], 'float_option': [0.5, 1.0], 'string_option': ['"example string"', '"another string"']}

Parameter grid for grid search.

Define a dictionary with parameter names as keys and a list of values for each parameter. For flag arguments, simply use the values [False, True]. Note, the output directory is set by the hyperparameter search script. Therefore, it always assumes that the argument –out_dir exists and you should not add out_dir to this grid!

Example

grid = {'option1': [10], 'option2': [0.1, 0.5],
        'option3': [False, True]}

This dictionary would correspond to the following 4 configurations:

python3 SCRIPT_NAME.py --option1=10 --option2=0.1
python3 SCRIPT_NAME.py --option1=10 --option2=0.5
python3 SCRIPT_NAME.py --option1=10 --option2=0.1 --option3
python3 SCRIPT_NAME.py --option1=10 --option2=0.5 --option3

If fields are commented out (missing), the default value is used. Note, that you can specify special conditions below.

Hyperparameter Search - Postprocessing

A postprocessing for a hyperparameter search that has been executed via the script hypnettorch.hpsearch.hpsearch.

Hyperparameter Search Script

A very simple hyperparameter search. The results will be gathered as a CSV file.

Here is an example on how to start an hyperparameter search on a cluster using bsub:

$ bsub -n 1 -W 48:00 -e hpsearch.err -o hpsearch.out \
  -R "rusage[mem=8000]" \
  python -m hypnettorch.hpsearch.hpsearch --run_cluster --num_jobs=20

For more demanding jobs (e.g., ImageNet), one may request more resources:

$ bsub -n 1 -W 96:00 -e hpsearch.err -o hpsearch.out \
  -R "rusage[mem=16000]" \
  python -m hypnettorch.hpsearch.hpsearch --run_cluster --num_jobs=20 \
  --num_hours=48 --resources="\"rusage[mem=8000, ngpus_excl_p=1]\""

Please fill in the grid parameters in the corresponding config file (see command line argument grid_module).

hypnettorch.hpsearch.hpsearch.hpsearch_cli_arguments(parser, show_num_searches=True, show_out_dir=True, dout_dir='./out/hyperparam_search', show_grid_module=True)[source]

The CLI arguments of the hpsearch.

hypnettorch.hpsearch.hpsearch.run(argv=None, dout_dir='./out/hyperparam_search')[source]

Run the hyperparameter search script.

Parameters:
  • argv (list, optional) – If provided, it will be treated as a list of command-line argument that is passed to the parser in place of sys.argv.

  • dout_dir (str, optional) – The default value of command-line option --out_dir.

Returns:

The path to the CSV file containing the results of this search.

Return type:

(str)