Configuration and directory structure

Project directories

Curifactory expects the following directory structure by default:

  • data/

    • cache/: all cached intermediate data

    • runs/: full runs from --store-full

  • docker/: (not required if no intention of creating docker images with --docker)

  • experiments/: the runnable experiment scripts

  • logs/: all logging files from experiment runs, organized by reference name (experiment name, run number, timestamp)

  • notebooks/: (not required if no intention of creating notebooks with --notebook)

    • experiments/: the output notebooks from running experiments with --notebook are stored here.

  • params: the parameter scripts for experiments

  • reports/: output HTML reports from each experiment run.

Configuration

Curifactory allows you to change the default paths where various components are stored in your project, by setting them in a curifactory_config.json file in the project root.

The default values for the configuration are shown in this example:

{
    "experiments_module_name": "experiments",
    "params_module_name": "params",
    "manager_cache_path": "data/",
    "cache_path": "data/cache",
    "runs_path": "data/runs",
    "logs_path": "logs/",
    "notebooks_path": "notebooks/",
    "reports_path": "reports/",
    "report_css_path": "reports/style.css",
}

experiments_module_name - The name of the folder/module where experiment scripts are stored. This is treated as a python module, running an experiment essentially runs import experiments.[experiment_script_name]. This means you can submodule your experiments folder. Note that since this is a module name, if you have it in a subfolder be sure to use ‘.’ in your config instead of ‘/’

params_module_name - The name of the folder/module where parameter scripts are kept. Similar to experiments, this is a module name, and you can submodule your parameters.

manager_cache_path - The folder where artifact manager data is kept, namely the experiment store.

cache_path - The directory used for caching all stage outputs.

runs_path - The directory where full runs are saved with the --store-full flag, see Full stores (--store-full, --dry-cache).

logs_path - The directory where every experiment run log file is stored.

notebooks_path - The directory where every output notebooks from experiments run with --notebook are stored.

reports_path - The directory where every experiment run report is generated.

report_css_path - The CSS file to copy into each report directory. A default stylesheet comes when a project is set up with curifactory init.