Parameters

This section will introduce the various parameters of HAWKS, giving a brief description of what they are and what they do. For some, the technical background is beyond the scope of this documentation and is omitted (for details, please refer to the associated paper).

The parameters can be passed into HAWKS via create_generator() as either a dict or (the path to) a JSON file.

Note

Certain future functionality may require the use of a dict, but this will be for limited use cases.

Explanation

The full set of parameters can be seen in the Defaults. In the following, each of the config sub-dicts ("hawks", "dataset", "constraints", "ga", and "objectives") will be explained separately.

The Python type of parameter is also given. For the relevant JSON type, see this conversion table.

HAWKS Parameters

Parameters for the "hawks" sub-dict. Primarily used for BaseGenerator.

Name Type Description
"folder_name" str The name of the directory with which to save everything. This is empty by default to avoid saving in an interactive session.
"mode" str This is the mode for HAWKS. Currently, only “single” is available.
"n_objectives" int This is the number of objectives to be optimized, which can only be 1 at the moment.
"num_runs" int The number of runs of HAWKS with different seed numbers.
"seed_num" int This is the seed number that will be used to generate the seed numbers for each individual run (according to num_runs). If one is not provided, it is generated randomly, and saved back into the original config for reproducibility. When “num_runs” > 1, the subsequent seeds are based on this initial one.
"comparison" str Method to extract the best individual from the final population. Using “fitness” selects the individual with the best fitness, whereas “ranking” selects the first individual from the population sorted by stochastic ranking.
"save_best_data" bool Save the dataset from the best (most fit) individual across for each run for each config.
"save_stats" bool Save the output values (fitness, penalities etc.) for every individual.
"save_config" bool Flag to save the (full) config associated with the run.

Objective Parameters

Parameters for the "objectives" sub-dict. Primarily used for Objective. For the "objectives" JSON object, it is expected to be in the form below:

{
    "objectives": {
        "objective_name": {
            "arg": "value"
        },
        "objective_name_2": {
            "arg": "value",
            "arg2": "value"
        }
    }
}

where:

Name Type Description
"objective_name" str The name of the objective (it must, when converted to lowercase, match the name of the desired objective in hawks.objectives.
"arg" str The name of an argument to the objective. The value of this will vary by objective, and the value depends on this argument. See either Defaults for an example, or in the relevant hawks.objectives documentation.

Dataset Parameters

Parameters for the "dataset" sub-dict. Primarily used for Dataset.

Name Type Description
"num_examples" int The size of the dataset i.e. number of datapoints/examples. Warning: High values of this can slow down HAWKS, as the silhouette width does not scale well (despite some computational tricks used here).
"num_clusters" int The number of clusters to be generated.
"num_dims" int The dimensionality of the datasets.
"equal_clusters" bool Whether the clusters should be equally sized or not. if not, they are randomly sized such that they sum to the "num_examples" (though this might be ± a few data points).
"min_clust_size" int The minimum number of datapoints that a cluster should have. This will guarantee that each cluster is at least of this size.

GA Parameters

Parameters for the "ga" sub-dict. Primarily used for ga.

Name Type Description
"num_gens" int The number of generations to evolve over.
"num_indivs" int The number of individuals in the population.
"mut_method_mean" str The method used to mutate the mean. At present, only "random" is available.
"mut_args_mean" str The arguments for the above, in the format required by the function for this mutation (with the name of the method as the key). See the Defaults for all possible arguments.
"mut_method_cov" str The method used to mutate the covariance. At present, only "haar" is available.
"mut_args_cov" str The arguments for the above, in the format required by the function for this mutation (with the name of the method as the key). See the Defaults for all possible arguments.
"mut_prob_mean" str The mutation probability to mutate the mean. Either a float between 0 & 1, or "length" to calculate the probability based on the length of the genotype (recommended).
"mut_prob_cov" str The mutation probability to mutate the covariance. Either a float between 0 & 1, or "length" to calculate the probability based on the length of the genotype (recommended).
"mate_scheme" str The method for crossover. Accepts either "dv" (which can swap the mean and covariance separately between individuals), or "cluster" (which swaps whole clusters between individuals).
"mate_prob" str The probability of crossover.
"prob_fitness" str The probabilitity that comparison will be performed based on fitness in the stochastic ranking. Requires "environ_selection" = "sr" (though that is the only option at present).
"elites" str The percentage of elites (the most fit individuals in the population) that will be preserved between generations. Not currently recommended, as it interferes with stochastic ranking.
"initial_mean_upper" float The initial upper range for initializing the means.
"initial_cov_upper" float
  • The initial upper range for initializing the covariances.
"environ_selection" str The environmental selection operator. See select_parent_func() for details.
"parent_selection" str The parental selection operator. At present, only stochastic ranking ("sr") is available. See select_environ_func() for details.

Constraint Parameters

Parameters for the "constraints" sub-dict. It is expected to be in the form below, with any number of "constraint_name" sub-dicts.

{
    "constraints": {
        "constraint_name": {
            "threshold": "value",
            "limit": "value"
        }
    }
}

where:

Name Type Description
"constraint_name" str The name of the constraint, which must match the function for it specified in hawks.constraints
"threshold" float The value which is used to identify penalty violation. The type can vary. but generally is a float.
"limit" str Whether the "threshold" is an "upper" or "lower" limit (only these two options are available). In the former case, values above the threshold will be penalized, and the inverse for a lower limit.

Defaults

The defaults values below are pulled from the defaults.json. For any variables that are not specified, these are used instead.

Default parameters
{
    "hawks": {
        "folder_name": null,
        "mode": "single",
        "n_objectives": 1,
        "num_runs": 1,
        "seed_num": null,
        "comparison": "fitness",
        "save_best_data": false,
        "save_stats": false,
        "save_config": false
    },
    "objectives": {
        "silhouette": {
            "target": 0.8
        }
    },
    "dataset": {
        "num_examples": 1000,
        "num_clusters": 10,
        "num_dims": 2,
        "equal_clusters": false,
        "min_clust_size": 5
    },
    "ga": {
        "num_gens": 50,
        "num_indivs": 10,
        "mut_method_mean": "random",
        "mut_args_mean": {
            "random": {
                "scale": 1.0,
                "dims": "each"
            }
        },
        "mut_method_cov": "haar",
        "mut_args_cov": {
            "haar": {
                "power": 0.3
            }
        },
        "mut_prob_mean": "length",
        "mut_prob_cov": "length",
        "mate_scheme": "dv",
        "mate_prob": 0.7,
        "prob_fitness": 0.5,
        "elites": 0,
        "initial_mean_upper": 1.0,
        "initial_cov_upper": 0.5,
        "environ_selection": "sr",
        "parent_selection": "tournament"
    },
    "constraints": {
        "overlap": {
            "threshold": 0.0,
            "limit": "upper"
        },
        "eigenval_ratio": {
            "threshold": 20,
            "limit": "upper"
        }
    }
}

Multi-config

Multiple runs of HAWKS with varying parameters can be specified by a single config, by wrapping the parameters as a list e.g. "num_examples": [500, 1000, 1500] will run HAWKS three times with three different values for the number of examples. This works combinatorially, so a warning is raised when more than 1,000 runs are expected.

This makes experimenting easier, which is covered in Running Experiments. An example of this is given here.