DAG

Classes for managing an experiment’s map/DAG logic.

Classes:

DAG()

A DAG represents an entire mapped version of an experiment as a graph, where the nodes are stages and the connections are the outputs of one stage mapped to the associated inputs of another.

ExecutionNode(record, stage_name)

Represents a particular stage for a particular record - a node in the overall experiment graph.

class curifactory.dag.DAG

A DAG represents an entire mapped version of an experiment as a graph, where the nodes are stages and the connections are the outputs of one stage mapped to the associated inputs of another.

The DAG is constructed as the first step of an experiment run (provided it isn’t run with no_dag=True or --no-dag on the CLI), by setting the artifact manager to a special map_mode. The experiment code is all run but every stage short circuits before execution and after collecting information about it (the record it’s part of, what outputs are cached, etc.) This information is then used to determine which stages actually need to execute, working backwards from the leaf stages. This differs from curifactory’s base no_dag mode because the need-to-execute for every stage is based primarily on whether any future stage actually requires this one’s outputs and has a need-to-execute (resulting in a recursive check backwards from the leaf stages.)

Methods:

analyze()

Construct execution trees and execution list.

build_execution_tree_recursive(record, stage)

Recursively builds the stage dependency tree based on inputs/outputs.

build_execution_trees()

Build an execution tree (essentially the sub-DAG) for every leaf node found.

child_records(record)

Return a list of all records for which the provided record is an input record.

determine_execution_list()

I’ve got them on the list, they’ll none of them be missed.

determine_execution_list_recursive(node, …)

Determines if the requested stage will need to execute or not, and if so prepends itself and all prior stages needed to execute through recursive calls.

find_leaves()

Get all of the nodes who have no outputs depended on by any others, these should be all of the “last” stages in the experiment and/or utility stages (e.g.

get_record_string(record_index)

Get a string representation for the given record.

is_leaf(record, stage_name)

Check if the given stage is a leaf, based on two conditions:

is_output_used_anywhere(record, …)

Check if the specified output is used as input in any stage.

print_experiment_map()

Print the representations for each record.

Attributes:

artifacts

This should essentially be an equivalent copy of ArtifactManager.artifacts.

execution_list

This is the list of ExecutionNode individual (non-recursive) string representations: (RECORD_ID, STAGE_NAME)

execution_trees

The set of node execution trees - each node here is a “leaf stage”, or stage with no outputs that other stages depend on.

analyze()

Construct execution trees and execution list.

artifacts: list

This should essentially be an equivalent copy of ArtifactManager.artifacts. All of record’s stage inputs and outputs should correctly index into this.

build_execution_tree_recursive(record: curifactory.record.Record, stage: str)curifactory.dag.ExecutionNode

Recursively builds the stage dependency tree based on inputs/outputs. This does not condition anything based on cache or overwrite status, this is exclusively the “inverted” stage path (provided stage is the root).

build_execution_trees()

Build an execution tree (essentially the sub-DAG) for every leaf node found.

child_records(record: curifactory.record.Record)list

Return a list of all records for which the provided record is an input record. (This occurs when calling record.make_copy() and for aggregates.)

determine_execution_list()

I’ve got them on the list, they’ll none of them be missed.

determine_execution_list_recursive(node: curifactory.dag.ExecutionNode, overwrite_check_only: False)bool

Determines if the requested stage will need to execute or not, and if so prepends itself and all prior stages needed to execute through recursive calls.

execution_list: list

This is the list of ExecutionNode individual (non-recursive) string representations: (RECORD_ID, STAGE_NAME)

execution_trees: list

The set of node execution trees - each node here is a “leaf stage”, or stage with no outputs that other stages depend on. This is essentially the inverted tree, because stage “leafs” will each be a root of an execution tree, where the sub-trees are all the dependencies required for it to run.

find_leaves()list

Get all of the nodes who have no outputs depended on by any others, these should be all of the “last” stages in the experiment and/or utility stages (e.g. a stage that just handles reporting or something like that/doesn’t really output any artifacts.)

Returns

a list of tuples where the first element is the record index and the second is the name of the stage.

get_record_string(record_index: int)str

Get a string representation for the given record. This collects all of the associated stages, inputs and outputs for each, and cache status for each artifact.

is_leaf(record: curifactory.record.Record, stage_name: str)bool

Check if the given stage is a leaf, based on two conditions:

  1. Stage is a leaf if it has no output artifacts.

  2. Stage is a leaf if it has outputs but they aren’t used as inputs in

    any other stages.

is_output_used_anywhere(record: curifactory.record.Record, stage_search_start_index: int, output: str)bool

Check if the specified output is used as input in any stage.

print_experiment_map()

Print the representations for each record.

class curifactory.dag.ExecutionNode(record: curifactory.record.Record, stage_name: str)

Represents a particular stage for a particular record - a node in the overall experiment graph.

Parameters
  • record (Record) – The record in which this stage would execute.

  • stage_name (str) – The name of the stage that would execute.

Methods:

chain_rep()

Return this node represented as a tuple of the record index and stage name.

string_rep([level])

Recursively collect and return this node’s index and name and that of its subtrees.

Attributes:

dependencies

Dependencies are the ‘subtree’ - the other nodes/stages that create the outputs that match this node’s inputs.

parent

The parent is a node that depends on this one/uses its outputs.

chain_rep()tuple

Return this node represented as a tuple of the record index and stage name.

dependencies: list

Dependencies are the ‘subtree’ - the other nodes/stages that create the outputs that match this node’s inputs.

parent: curifactory.dag.ExecutionNode

The parent is a node that depends on this one/uses its outputs. (This means that the same execution node might appear in multiple execution trees, if its output is used by more than one other stage)

string_rep(level=0)str

Recursively collect and return this node’s index and name and that of its subtrees.