Artifact

Classes:

Artifact([name, cacher])

ArtifactFilter([starting_artifacts, ...])

ArtifactList([name, artifacts])

DBArtifact([name, connection_str])

Artifact that represents a duckdb connection

StageReportables()

Functions:

pointer_based_property(name)

pointer_based_property_getter(self, name)

pointer_based_property_setter(self, value, name)

class curifactory.experimental.artifact.Artifact(name=None, cacher=None)

Methods:

artifact_list([building_list])

Recursively builds a list of _all_ artifacts prior to this one.

artifact_list_debug()

artifact_tree()

check_shared_artifact(other_artifact)

Two artifacts are considered equivalent (can be shared) if their hash and name is the same

compute_hash()

context_names_minus(minus)

copy()

dependencies()

Gets any input artifacts from the compute stage.

determine_overwrite()

If any artifacts upstream from this one have overwrite specified, this artifact needs to be overwritten as well.

filter(search_str)

from_cacher(cacher)

from_list(name, artifacts)

from_metadata([metadata, path])

get()

get_from_db()

load_from_uuid(uuid[, building_stages, ...])

Recurisvely load the full DAG prior to this artifact and then return this artifact.

map([mapped, need, source])

replace(artifact)

reset_map()

verify()

visualize([g])

Attributes:

artifacts

cache_status

Integer (unofficial enum) referring to whether this artifact is in cache or doesn't even have a cacher.

cacher

Cacher instance/strategy used to save/load the computed object.

compute

The Stage object that outputs this artifact.

computed

A boolean representing whether the compute stage that outputs this artifact has run or not.

context

The Pipeline that "owns" this artifact.

context_name

contextualized_name

db_id

UUID of this artifact within the store database.

generated_time

Timestamp of when the compute stage outputting this artifact ran.

hash_debug

A dictionary of parameters/parameter values that results in the hash for this artifact.

hash_str

The string of the numerical hash encoding the parameters for the stage that produces this artifact.

internal_id

Refers to the id() of the underlying Artifact object.

map_status

Integer (unofficial enum) referring to what will be done with this artifact in a run (e.g. compute, skip, use cache, or overwrite, etc.

name

Name of the artifact

obj

The computed object itself - this is populated either by running the compute stage or by loading it with the cacher if previously computed.

overwrite

Boolean flag for whether to force this artifact's compute stage to run (ignoring cached values).

previous_context_names

Any previous pipeline names through which this artifact has been passed.

artifact_list(building_list=None)

Recursively builds a list of _all_ artifacts prior to this one.

Parameters:

building_list (list)

artifact_list_debug()
artifact_tree()
property artifacts
property cache_status

Integer (unofficial enum) referring to whether this artifact is in cache or doesn’t even have a cacher. See __init__.py for possible statuses.

property cacher

Cacher instance/strategy used to save/load the computed object.

check_shared_artifact(other_artifact)

Two artifacts are considered equivalent (can be shared) if their hash and name is the same

property compute

The Stage object that outputs this artifact.

compute_hash()
property computed

A boolean representing whether the compute stage that outputs this artifact has run or not.

property context

The Pipeline that “owns” this artifact.

property context_name
context_names_minus(minus)
Parameters:

minus (str)

property contextualized_name
copy()
property db_id

UUID of this artifact within the store database.

dependencies()

Gets any input artifacts from the compute stage.

Return type:

list[Artifact]

determine_overwrite()

If any artifacts upstream from this one have overwrite specified, this artifact needs to be overwritten as well.

Return type:

bool

filter(search_str)
Parameters:

search_str (str)

Return type:

ArtifactFilter

static from_cacher(cacher)
static from_list(name, artifacts)
static from_metadata(metadata=None, path=None)
property generated_time

Timestamp of when the compute stage outputting this artifact ran.

get()
get_from_db()
property hash_debug

A dictionary of parameters/parameter values that results in the hash for this artifact.

property hash_str

The string of the numerical hash encoding the parameters for the stage that produces this artifact.

property internal_id

Refers to the id() of the underlying Artifact object. Since Artifacts are also loosely pointers, can use this to determine which underlying object is being pointed to.

static load_from_uuid(uuid, building_stages=None, building_artifacts=None)

Recurisvely load the full DAG prior to this artifact and then return this artifact.

Parameters:
  • building_stages (dict)

  • building_artifacts (dict)

map(mapped=None, need=True, source=None)
Parameters:
  • mapped (dict)

  • need (bool)

property map_status

Integer (unofficial enum) referring to what will be done with this artifact in a run (e.g. compute, skip, use cache, or overwrite, etc. See __init__.py for possible statuses.

property name

Name of the artifact

property obj

The computed object itself - this is populated either by running the compute stage or by loading it with the cacher if previously computed.

property overwrite

Boolean flag for whether to force this artifact’s compute stage to run (ignoring cached values). Setting this to True impacts all downstream artifacts.

property previous_context_names

Any previous pipeline names through which this artifact has been passed. When a pipeline is passed into a new pipeline, or a previous pipeline’s artifact is copied into a new one, that previous pipeline name is retained here.

replace(artifact)
reset_map()
verify()
visualize(g=None, **kwargs)
class curifactory.experimental.artifact.ArtifactFilter(starting_artifacts=None, filter_string='')

Methods:

copy()

filter(search_str)

list()

TODO: maybe this should return ArtifactList instead?

replace(new_artifact)

resolve()

copy()
filter(search_str)
Parameters:

search_str (str)

Return type:

ArtifactFilter

list()

TODO: maybe this should return ArtifactList instead?

replace(new_artifact)
resolve()
Return type:

Artifact

class curifactory.experimental.artifact.ArtifactList(name=None, artifacts=None)

Methods:

append(value)

Parameters:

name (str)

append(value)
class curifactory.experimental.artifact.DBArtifact(name=None, connection_str=None, **kwargs)

Artifact that represents a duckdb connection

Methods:

compute_hash()

Parameters:
  • name (str)

  • connection_str (str)

compute_hash()
class curifactory.experimental.artifact.StageReportables
curifactory.experimental.artifact.pointer_based_property(name)
curifactory.experimental.artifact.pointer_based_property_getter(self, name)
curifactory.experimental.artifact.pointer_based_property_setter(self, value, name)