Experiment
This is the ‘main’ runnable function and CLI, which handles setting up logging, folders, reports, and running the passed experiment.
This file contains a __name__ == "__main__"
and can be run directly.
Functions:
Print out all valid experiments that have a |
|
Print out all valid parameter files that have a |
|
|
‘Main’ command line entrypoint, parses command line flags and makes the appropriate |
|
Used by both list_experiments and list_params. |
|
The experiment entrypoint function. |
|
Creates a jupyter notebook prepopulated with experiment info and cells to re-run the experiment and discover all associated data stored in record states. |
-
curifactory.experiment.
list_experiments
() Print out all valid experiments that have a
def run()
function, including any top-of-file docstrings associated with each.
-
curifactory.experiment.
list_params
() Print out all valid parameter files that have a
def get_params()
function, including any top-of-file docstrings associated with each.
-
curifactory.experiment.
main
() ‘Main’ command line entrypoint, parses command line flags and makes the appropriate
run_experiment()
call.
-
curifactory.experiment.
regex_lister
(module_name, regex, try_import=True) Used by both list_experiments and list_params. This scans every file in the passed folder for the requested regex, and tries to import the files that have a match.
-
curifactory.experiment.
run_experiment
(experiment_name, parameters_list, overwrite_override=None, cache_dir_override=None, mngr: Optional[curifactory.manager.ArtifactManager] = None, log: bool = False, log_debug: bool = False, dry: bool = False, dry_cache: bool = False, store_full: bool = False, log_errors: bool = False, prefix: Optional[str] = None, build_docker: bool = False, build_notebook: bool = False, run_string: Optional[str] = None, stage_overwrites: Optional[list] = None, args_names: Optional[list] = None, args_indices: Optional[list] = None, global_args_indices: Optional[list] = None, parallel: Optional[int] = None, parallel_mode: bool = False, parallel_lock: Optional[multiprocessing.context.BaseContext.Lock] = None, parallel_queue: Optional[multiprocessing.context.BaseContext.Queue] = None, run_num_override: Optional[int] = None, run_ts_override: Optional[datetime.datetime] = None, lazy: bool = False, ignore_lazy: bool = False, no_map: bool = False, no_color: bool = False, quiet: bool = False, progress: bool = False, plain: bool = False, notes: Optional[str] = None) The experiment entrypoint function. This executes the given experiment with the given parameters.
- Parameters
experiment_name (str) – The name of the experiment script (without the
.py
).parameters_list (List[str]) – A list of names of parameter files (without the
.py
).overwrite_override (bool) – Whether to force overwrite on all cache data.
cache_dir_override (str) – Specify a non-default cache location. This would be used if running with the cache from a previous –store-full run.
mngr (ArtifactManager) – An artifact manager to use for the experiment. One will be automatically created if none is passed.
log (bool) – Whether to write a log file or not.
log_debug (bool) – Whether to include DEBUG level messages in the log.
dry (bool) – Setting dry to true will suppress saving any files (including logs), and will not update parameter stores. (It should have no effect on any files.)
dry_cache (bool) – Setting this to true only suppresses saving cache files. This is recommended if you’re running with a cache_dir_override for some previous –store-full run, so you don’t accidentally overwrite or add new data to the –store-full directory.
store_full (bool) – Store environment info, log, output report, and all cached files in a run-specific folder (
data/runs
by default)log_errors (bool) – Whether to include error messages in the log output.
prefix (str) – Instead of using the experiment name to group cached data, use this prefix instead.
build_docker (bool) – If true, build a docker image with all of the run cache afterwards.
build_notebook (bool) – If true, add a notebook with run info and default cells to reproduce after run execution.
run_string (str) – An automatically populated string representing the CLI command for the run, do not change this.
stage_overwrites (List[str]) – A list of string stage names that you should overwrite, this is useful if there are specific stages you sometimes want to recompute but the remainder of the data can remain cached.
args_names (List[str]) – A list of argument names to run. If this is specified only arguments returned from the passed parameters, with these names, will be passed to the experiment.
args_indices (List[str]) – A list of argument indices to run. If this is specified, only the arguments returned from the passed parameter files, indexed by the ranges specified, will be passed to the experiment. Note that you can specify ranges delineated with ‘-‘, e.g. ‘3-7’.
global_args_indices (List[str]) – A list of argument indices to run, indexing the entire set of parameters passed to the experiment instead of each individual paarameters file. This can be used to help more intelligently parallelize runs. Formatting follows the same rules as args_indices.
parallel (int) – How many subprocesses to split this run into. If specified, the experiment will be run that many times with divided up global_args_indices in order to generated cached data for all parameters, and then re-run a final time with all cached data combined. Note then that any speedup from this is based on how well and how many steps are cached.
parallel_mode (bool) – This is handled by the parallel parameter, informing a particular subproc run that it is being executed from a parallel run.
parallel_lock (multiprocessing.Lock) – If this function is called from a multiprocessing context, use this lock to help prevent files being written to and read simultaneously.
parallel_queue (multiprocessing.Queue) – If this function is called from a multiprocessing context, use this queue for communicating success/errors back to the main process.
run_num_override (int) – Handled in parallel mode, since parallel process experiments do not get run number (since manager.store is not called) the log gets stored in an incorrectly named file.
run_ts_override (datetime.datetime) – Handled in parallel mode, since parallel process experiments do not get the same timestamp if started a few seconds later, the log gets stored in an incorrectly named file.
lazy (bool) – If true, attempts to set all stage outputs as Lazy objects. Outputs that do not have a cacher specified will be given a PickleCacher. Note that objects without a cacher that do not handle pickle serialization correctly may cause errors.
ignore_lazy (bool) – Run the experiment disabling any lazy object caching/keeping everything in memory. This can save time when memory is less of an issue.
no_map (bool) – Prevent pre-execution mapping of experiment records and stages. Recommended if doing anything fancy with records like dynamically creating them based on results of previous records. Mapping is done by running the experiment but skipping all stage execution.
no_color (bool) – Suppress fancy colors in console output.
quiet (bool) – Suppress all console log output.
progress (bool) – Display fancy rich progress bars for each record.
plain (bool) – Use normal text log output rather than rich log. Note that this negates progress.
notes (str) – A git-log-like message to store in the run info for the current run. If this is an empty string, query the user for an input string.
- Returns
Whatever is returned from the experiment
run()
.
Example
from curifactory import experiment, ArtifactManager mngr = ArtifactManager() experiment.run_experiment('exp_name', ['params1', 'params2'], mngr=mngr, dry=True)
-
curifactory.experiment.
write_experiment_notebook
(experiment_name, parameters_list, argsets, manager, path, directory_change_back_depth=2, use_global_cache=None, errored=False, suppress_global_warning=False) Creates a jupyter notebook prepopulated with experiment info and cells to re-run the experiment and discover all associated data stored in record states. This function is run by the
run_experiment()
function.- Parameters
experiment_name (str) – The name of the run experiment
parameters_list (List[str]) – List of parameter file names
argsets (List[Args]) – List of all used
Args
from parameter files.manager (ArtifactManager) –
ArtifactManager
used in the experiment.path (str) – The path to the directory to store the notebook in.
directory_change_back_depth (int) – How many directories up the notebook needs to be in the project root (so imports and cache paths are all correct.)
use_global_cache (bool) – Whether we’re using the normal experiment cache or a separate specific cache folder (mostly just used to display a warning in the notebook.)
errored (bool) – Whether the experiment errored or nat while running, will display a warning in the notebook.
suppress_global_warning (bool) – Don’t show a warning if
use_global_cache
is true.