Experiment

This is the ‘main’ runnable function and CLI, which handles setting up logging, folders, reports, and running the passed experiment.

This file contains a __name__ == "__main__" and can be run directly.

Functions:

collect_parameter_sets(manager, param_files, …)

Load the requested parameter files, run their get_params, and filter down based on any other relevant CLI args.

list_experiments()

Print out all valid experiments that have a def run() function, including any top-of-file docstrings associated with each.

list_params()

Print out all valid parameter files that have a def get_params() function, including any top-of-file docstrings associated with each.

regex_lister(module_name, regex[, try_import])

Used by both list_experiments and list_params.

run_experiment(experiment_name[, …])

The experiment entrypoint function.

curifactory.experiment.collect_parameter_sets(manager, param_files: list, overwrite_override: bool, parallel: int, param_set_names: list, param_set_indices_resolved: list, dry: bool, store_full: bool, parallel_mode: bool)

Load the requested parameter files, run their get_params, and filter down based on any other relevant CLI args.

curifactory.experiment.list_experiments()

Print out all valid experiments that have a def run() function, including any top-of-file docstrings associated with each.

curifactory.experiment.list_params()

Print out all valid parameter files that have a def get_params() function, including any top-of-file docstrings associated with each.

curifactory.experiment.regex_lister(module_name, regex, try_import=True)

Used by both list_experiments and list_params. This scans every file in the passed folder for the requested regex, and tries to import the files that have a match.

curifactory.experiment.run_experiment(experiment_name, param_files=None, overwrite_override=None, cache_dir_override=None, mngr: Optional[curifactory.manager.ArtifactManager] = None, log: bool = False, log_debug: bool = False, dry: bool = False, dry_cache: bool = False, store_full: bool = False, log_errors: bool = False, prefix: Optional[str] = None, build_docker: bool = False, build_notebook: bool = False, run_string: Optional[str] = None, stage_overwrites: Optional[list] = None, param_set_names: Optional[list] = None, param_set_indices: Optional[list] = None, global_param_set_indices: Optional[list] = None, parallel: Optional[int] = None, parallel_mode: bool = False, parallel_lock: Optional[multiprocessing.context.BaseContext.Lock] = None, parallel_queue: Optional[multiprocessing.context.BaseContext.Queue] = None, run_num_override: Optional[int] = None, run_ts_override: Optional[datetime.datetime] = None, lazy: bool = False, ignore_lazy: bool = False, no_dag: bool = False, map_only: bool = False, hashes_only: bool = False, print_params: Union[bool, str] = False, no_color: bool = False, quiet: bool = False, progress: bool = False, plain: bool = False, notes: Optional[str] = None, all_loggers: bool = False)

The experiment entrypoint function. This executes the given experiment with the given parameters.

Parameters
  • experiment_name (str) – The name of the experiment script (without the .py).

  • param_files (List[str]) – A list of names of parameter files (without the .py).

  • overwrite_override (bool) – Whether to force overwrite on all cache data.

  • cache_dir_override (str) – Specify a non-default cache location. This would be used if running with the cache from a previous --store-full run.

  • mngr (ArtifactManager) – An artifact manager to use for the experiment. One will be automatically created if none is passed.

  • log (bool) – Whether to write a log file or not.

  • log_debug (bool) – Whether to include DEBUG level messages in the log.

  • dry (bool) – Setting dry to true will suppress saving any files (including logs), and will not update parameter stores. (It should have no effect on any files.)

  • dry_cache (bool) – Setting this to true only suppresses saving cache files. This is recommended if you’re running with a cache_dir_override for some previous --store-full run, so you don’t accidentally overwrite or add new data to the --store-full directory.

  • store_full (bool) – Store environment info, log, output report, and all cached files in a run-specific folder (data/runs by default)

  • log_errors (bool) – Whether to include error messages in the log output.

  • prefix (str) – Instead of using the experiment name to group cached data, use this prefix instead.

  • build_docker (bool) – If true, build a docker image with all of the run cache afterwards.

  • build_notebook (bool) – If true, add a notebook with run info and default cells to reproduce after run execution.

  • run_string (str) – An automatically populated string representing the CLI command for the run, do not change this.

  • stage_overwrites (List[str]) – A list of string stage names that you should overwrite, this is useful if there are specific stages you sometimes want to recompute but the remainder of the data can remain cached.

  • param_set_names (List[str]) – A list of parameter set names to run. If this is specified, only parameter sets with these names will be passed on to the experiment.

  • param_set_indices (List[str]) – A list of parameter set indices to run. If this is specified, only the parameter sets returned from the passed parameter files, indexed by the ranges specified, will be passed to the experiment. Note that you can specify ranges delineated with ‘-‘, e.g. ‘3-7’.

  • global_param_set_indices (List[str]) – A list of parameter set indices to run, indexing the entire collection of parameter sets passed to the experiment instead of each individual parameters file. This can be used to help more intelligently parallelize runs. Formatting follows the same rules as param_set_indices.

  • parallel (int) – How many subprocesses to split this run into. If specified, the experiment will be run that many times with divided up global_param_set_indices in order to generated cached data for all parameters, and then re-run a final time with all cached data combined. Note then that any speedup from this is based on how well and how many steps are cached.

  • parallel_mode (bool) – This is handled by the parallel parameter, informing a particular subproc run that it is being executed from a parallel run.

  • parallel_lock (multiprocessing.Lock) – If this function is called from a multiprocessing context, use this lock to help prevent files being written to and read simultaneously.

  • parallel_queue (multiprocessing.Queue) – If this function is called from a multiprocessing context, use this queue for communicating success/errors back to the main process.

  • run_num_override (int) – Handled in parallel mode, since parallel process experiments do not get run number (since manager.store is not called) the log gets stored in an incorrectly named file.

  • run_ts_override (datetime.datetime) – Handled in parallel mode, since parallel process experiments do not get the same timestamp if started a few seconds later, the log gets stored in an incorrectly named file.

  • lazy (bool) – If true, attempts to set all stage outputs as Lazy objects. Outputs that do not have a cacher specified will be given a PickleCacher. Note that objects without a cacher that do not handle pickle serialization correctly may cause errors.

  • ignore_lazy (bool) – Run the experiment disabling any lazy object caching/keeping everything in memory. This can save time when memory is less of an issue.

  • no_dag (bool) – Prevent pre-execution mapping of experiment records and stages. Recommended if doing anything fancy with records like dynamically creating them based on results of previous records. Mapping is done by running the experiment but skipping all stage execution.

  • map_only (bool) – Runs the pre-execution mapping of an experiment and immediately exits, printing the map to stdout. Note that setting this to True automatically sets dry.

  • hashes_only (bool) – Runs only the parameter set collection from parameter files and then prints out the corresponding hashes to stdout. Note that setting this to True automatically sets dry.

  • print_params (Union[bool, str]) – Runs only the parameter set collection from parameter files and then prints out the corresponding parameters to stdout. Note that setting this to True automatically sets dry.

  • no_color (bool) – Suppress fancy colors in console output.

  • quiet (bool) – Suppress all console log output.

  • progress (bool) – Display fancy rich progress bars for each record.

  • plain (bool) – Use normal text log output rather than rich log. Note that this negates progress.

  • notes (str) – A git-log-like message to store in the run info for the current run. If this is an empty string, query the user for an input string.

  • all_loggers (bool) – Whether to include all non-curifactory library loggers in the output logs as well.

Returns

Whatever is returned from the experiment run().

Example

from curifactory import experiment, ArtifactManager
mngr = ArtifactManager()
experiment.run_experiment('exp_name', ['params1', 'params2'], mngr=mngr, dry=True)