Hashing Mechanics
As discussed in Parameter set hashes and operational parameters, curifactory computes a hash of every parameter set run through an experiment, and uses that hash to uniquely identify/version the outputs associated with it. This page discusses in more detail how the hash is computed and how you can modify it.
The overall process involves iterating through every field of the parameter class, getting a string representation for the value of each, computing the md5 hash of that string, summing up the integer values of each md5 hash, and then turning this final (very large number) into a hexidecimal string. We sum the individual md5 hashes so that the order in which the fields are iterated doesn’t affect the hash.
There is an ordered list of mechanisms that curifactory will go through to try produce the string representation:
If it’s an internal curifactory parameter, skip it, don’t let it influence the hash. This includes
name
,hash
,overwrite
, and thehash_representations
field itself.If the value of the parameter is
None
, skip it.If the current field is in this parameter class’s
hash_representations
, call its specified function (or skip this parameter if the value isNone
)If the parameter is another dataclass, recursively use all of these mechanics on the fields inside it.
If the parameter is a callable, use its
__qualname__
Otherwise default to calling
repr
on it.
“Skipping” a parameter only means it does not take part in determining the hash. If you
run my_param_set.params_hash(dry=True)
, instead of returning the hash it will return
the computed dictionary of hashing mechanisms to be used on every attribute and the string
representation that will be passed to the md5 hashing algorithm. For any parameter that
will be skipped, it will list the reason why. See example below:
@dataclass
class Params(ExperimentParameters):
some_value: int = 5
operational_param: int = 9
something_crazy: any = "crazy"
nothing: int = None
hash_representations: dict = set_hash_functions(
operational_param=None,
something_crazy=lambda self, obj: str(obj)
)
Params(name="test", some_value=6).params_hash(dry=True)
#> {'name': ('SKIPPED: blacklist', None),
#> 'hash': ('SKIPPED: blacklist', None),
#> 'overwrite': ('SKIPPED: blacklist', None),
#> 'hash_representations': ('SKIPPED: blacklist', None),
#> 'some_value': ('repr(param_set.some_value)', '6'),
#> 'operational_param': ('SKIPPED: set to None in hash_representations', None),
#> 'something_crazy': ("param_set.hash_representations['something_crazy'](param_set, param_set.something_crazy)",
#> 'crazy'),
#> 'nothing': ('SKIPPED: value is None', None)}
Skipping parameters in a single set
As demonstrated so far, we can set the hash_representations
on the parameter class itself with
set_hash_functions
, and this is normally the preferred way to ensure a particular parameter always
gets ignored for the hash. However, sometimes it makes sense to only ignore a parameter in a one or a couple
cases, perhaps for all the parameter sets coming from a single parameter file.
You can set the hash_representations
on the fly after creating a parameter set:
@dataclass
class Params(ExperimentParameters):
i_matter: int = 5
sometimes_i_matter: int = 7
p1 = Params(name="test", i_matter=3)
p2 = Params(name="test2", i_matter=2)
p2.hash_representations["sometimes_i_matter"] = None
p1.params_hash(dry=True)
#> { ...
#> 'i_matter': ('repr(param_set.i_matter)', '3'),
#> 'sometimes_i_matter': ('repr(param_set.sometimes_i_matter)', '7')}
p2.params_hash(dry=True)
#> { ...
#> 'i_matter': ('repr(param_set.i_matter)', '2'),
#> 'sometimes_i_matter': ('SKIPPED: set to None in hash_representations', None)}
The outputs of the dry params hash calls above shows that p1 includes the actual value of the
sometimes_i_matter
parameter, while p2 skips it.