DAG
Classes for managing an experiment’s map/DAG logic.
Classes:
|
A DAG represents an entire mapped version of an experiment as a graph, where the nodes are stages and the connections are the outputs of one stage mapped to the associated inputs of another. |
|
Represents a particular stage for a particular record - a node in the overall experiment graph. |
-
class
curifactory.dag.
DAG
A DAG represents an entire mapped version of an experiment as a graph, where the nodes are stages and the connections are the outputs of one stage mapped to the associated inputs of another.
The DAG is constructed as the first step of an experiment run (provided it isn’t run with
no_dag=True
or--no-dag
on the CLI), by setting the artifact manager to a specialmap_mode
. The experiment code is all run but every stage short circuits before execution and after collecting information about it (the record it’s part of, what outputs are cached, etc.) This information is then used to determine which stages actually need to execute, working backwards from the leaf stages. This differs from curifactory’s baseno_dag
mode because the need-to-execute for every stage is based primarily on whether any future stage actually requires this one’s outputs and has a need-to-execute (resulting in a recursive check backwards from the leaf stages.)Methods:
analyze
()Construct execution trees and execution list.
build_execution_tree_recursive
(record, stage)Recursively builds the stage dependency tree based on inputs/outputs.
Build an execution tree (essentially the sub-DAG) for every leaf node found.
child_records
(record)Return a list of all records for which the provided record is an input record.
I’ve got them on the list, they’ll none of them be missed.
determine_execution_list_recursive
(node, …)Determines if the requested stage will need to execute or not, and if so prepends itself and all prior stages needed to execute through recursive calls.
Get all of the nodes who have no outputs depended on by any others, these should be all of the “last” stages in the experiment and/or utility stages (e.g.
get_record_string
(record_index)Get a string representation for the given record.
is_leaf
(record, stage_name)Check if the given stage is a leaf, based on two conditions:
is_output_used_anywhere
(record, …)Check if the specified output is used as input in any stage.
Print the representations for each record.
Attributes:
This should essentially be an equivalent copy of ArtifactManager.artifacts.
This is the list of ExecutionNode individual (non-recursive) string representations:
(RECORD_ID, STAGE_NAME)
The set of node execution trees - each node here is a “leaf stage”, or stage with no outputs that other stages depend on.
-
analyze
() Construct execution trees and execution list.
-
artifacts
: list This should essentially be an equivalent copy of ArtifactManager.artifacts. All of record’s stage inputs and outputs should correctly index into this.
-
build_execution_tree_recursive
(record: curifactory.record.Record, stage: str) → curifactory.dag.ExecutionNode Recursively builds the stage dependency tree based on inputs/outputs. This does not condition anything based on cache or overwrite status, this is exclusively the “inverted” stage path (provided stage is the root).
-
build_execution_trees
() Build an execution tree (essentially the sub-DAG) for every leaf node found.
-
child_records
(record: curifactory.record.Record) → list Return a list of all records for which the provided record is an input record. (This occurs when calling
record.make_copy()
and for aggregates.)
-
determine_execution_list
() I’ve got them on the list, they’ll none of them be missed.
-
determine_execution_list_recursive
(node: curifactory.dag.ExecutionNode, overwrite_check_only: False) → bool Determines if the requested stage will need to execute or not, and if so prepends itself and all prior stages needed to execute through recursive calls.
-
execution_list
: list This is the list of ExecutionNode individual (non-recursive) string representations:
(RECORD_ID, STAGE_NAME)
-
execution_trees
: list The set of node execution trees - each node here is a “leaf stage”, or stage with no outputs that other stages depend on. This is essentially the inverted tree, because stage “leafs” will each be a root of an execution tree, where the sub-trees are all the dependencies required for it to run.
-
find_leaves
() → list Get all of the nodes who have no outputs depended on by any others, these should be all of the “last” stages in the experiment and/or utility stages (e.g. a stage that just handles reporting or something like that/doesn’t really output any artifacts.)
- Returns
a list of tuples where the first element is the record index and the second is the name of the stage.
-
get_record_string
(record_index: int) → str Get a string representation for the given record. This collects all of the associated stages, inputs and outputs for each, and cache status for each artifact.
-
is_leaf
(record: curifactory.record.Record, stage_name: str) → bool Check if the given stage is a leaf, based on two conditions:
Stage is a leaf if it has no output artifacts.
- Stage is a leaf if it has outputs but they aren’t used as inputs in
any other stages.
-
is_output_used_anywhere
(record: curifactory.record.Record, stage_search_start_index: int, output: str) → bool Check if the specified output is used as input in any stage.
-
print_experiment_map
() Print the representations for each record.
-
-
class
curifactory.dag.
ExecutionNode
(record: curifactory.record.Record, stage_name: str) Represents a particular stage for a particular record - a node in the overall experiment graph.
- Parameters
record (Record) – The record in which this stage would execute.
stage_name (str) – The name of the stage that would execute.
Methods:
Return this node represented as a tuple of the record index and stage name.
string_rep
([level])Recursively collect and return this node’s index and name and that of its subtrees.
Attributes:
Dependencies are the ‘subtree’ - the other nodes/stages that create the outputs that match this node’s inputs.
The parent is a node that depends on this one/uses its outputs.
-
chain_rep
() → tuple Return this node represented as a tuple of the record index and stage name.
-
dependencies
: list Dependencies are the ‘subtree’ - the other nodes/stages that create the outputs that match this node’s inputs.
-
parent
: curifactory.dag.ExecutionNode The parent is a node that depends on this one/uses its outputs. (This means that the same execution node might appear in multiple execution trees, if its output is used by more than one other stage)
-
string_rep
(level=0) → str Recursively collect and return this node’s index and name and that of its subtrees.