Flyte Entity Identifiers

Flyte Entity Identifiers provide a robust and hierarchical system for uniquely identifying all entities within the Flyte platform, encompassing both definitional components (like tasks and workflows) and their corresponding runtime executions. This identification system is fundamental for traceability, API interactions, and managing the lifecycle of your data pipelines.

Definitional Entity Identifiers

The core Identifier class serves as the primary mechanism for uniquely identifying definitional entities such as tasks, workflows, and launch plans. Each Identifier is composed of five key attributes:

resource_type: Specifies the type of Flyte entity being identified. This is an enum value from ResourceType.
project: The project to which the entity belongs.
domain: The domain within the project where the entity resides.
name: The unique name of the entity within its project and domain.
version: The specific version of the entity. Flyte supports versioning for all definitional entities, allowing for iterative development and deployment.

The ResourceType class defines the available types:

UNSPECIFIED: A default or unknown resource type.
TASK: Identifies a Flyte task definition.
WORKFLOW: Identifies a Flyte workflow definition.
LAUNCH_PLAN: Identifies a Flyte launch plan definition.

Example: Creating a Task Identifier

from flytekit.models.core.identifier import Identifier, ResourceType

# Create an identifier for a specific task
task_id = Identifier(
    resource_type=ResourceType.TASK,
    project="my-project",
    domain="development",
    name="my_data_processing_task",
    version="v1.0.0"
)

print(f"Task Identifier: {task_id}")
# Expected output: TASK:my-project:development:my_data_processing_task:v1.0.0

The to_flyte_idl() and from_flyte_idl() methods facilitate seamless conversion between the Python object representation and Flyte's internal protobuf (IDL) format, which is crucial for interacting with the Flyte Admin service.

Execution Identifiers

Flyte also provides a set of specialized identifiers for tracking the runtime execution of workflows, nodes, and tasks. These identifiers are inherently hierarchical, reflecting the nested structure of Flyte executions.

Workflow Execution Identifier

The WorkflowExecutionIdentifier uniquely identifies a specific instance of a workflow execution. It is defined by:

project: The project where the workflow execution occurred.
domain: The domain within the project where the workflow execution occurred.
name: The unique name assigned to this particular workflow execution.

Example: Creating a Workflow Execution Identifier

from flytekit.models.core.identifier import WorkflowExecutionIdentifier

workflow_exec_id = WorkflowExecutionIdentifier(
    project="my-project",
    domain="development",
    name="my_workflow_run_12345"
)

print(f"Workflow Execution Identifier: {workflow_exec_id}")

Node Execution Identifier

The NodeExecutionIdentifier identifies a specific execution of a node within a workflow. It is composed of:

node_id: The unique identifier of the node within the workflow definition.
execution_id: The WorkflowExecutionIdentifier of the parent workflow execution.

Example: Creating a Node Execution Identifier

from flytekit.models.core.identifier import NodeExecutionIdentifier

node_exec_id = NodeExecutionIdentifier(
    node_id="my_task_node",
    execution_id=workflow_exec_id # Using the workflow_exec_id from the previous example
)

print(f"Node Execution Identifier: {node_exec_id}")

Task Execution Identifier

The TaskExecutionIdentifier pinpoints a specific attempt of a task execution within a node. This is the most granular execution identifier and includes:

task_id: The Identifier of the task definition being executed.
node_execution_id: The NodeExecutionIdentifier of the parent node execution.
retry_attempt: An integer indicating the specific retry attempt (0 for the first attempt, 1 for the first retry, and so on).

Example: Creating a Task Execution Identifier

from flytekit.models.core.identifier import TaskExecutionIdentifier

# Using task_id and node_exec_id from previous examples
task_exec_id = TaskExecutionIdentifier(
    task_id=task_id,
    node_execution_id=node_exec_id,
    retry_attempt=0 # First attempt
)

print(f"Task Execution Identifier: {task_exec_id}")

Signal Identifier

The SignalIdentifier is used to uniquely identify a signal within a workflow execution. Signals are typically used for external inputs or gates that pause workflow execution until a condition is met.

signal_id: A user-provided name for the signal (often corresponding to a gate node).
execution_id: The WorkflowExecutionIdentifier of the workflow execution that the signal belongs to.

Example: Creating a Signal Identifier

from flytekit.models.core.identifier import SignalIdentifier

# Using workflow_exec_id from a previous example
signal_id = SignalIdentifier(
    signal_id="wait_for_approval",
    execution_id=workflow_exec_id
)

print(f"Signal Identifier: {signal_id}")

Common Use Cases and Best Practices

Flyte Entity Identifiers are critical for:

Interacting with the Flyte Admin API: All API calls to fetch, update, or manage Flyte entities and executions rely on these identifiers as primary keys. For example, retrieving a specific task definition or monitoring a workflow's status requires providing the correct Identifier or WorkflowExecutionIdentifier.
Monitoring and Observability: Tools and dashboards built on top of Flyte leverage these identifiers to track the progress, status, and logs of individual tasks, nodes, and workflows.
Debugging and Troubleshooting: When an execution fails, the detailed hierarchical identifiers allow pinpointing the exact task attempt that encountered an issue.
Building Custom Flyte Clients and Integrations: Any custom application that needs to programmatically interact with Flyte will extensively use these identifier objects.

Best Practices:

Consistency: Maintain consistent naming conventions for project, domain, and name across your Flyte entities to ensure clarity and ease of management.
Versioning: Always use meaningful versions for definitional entities. This enables reproducible runs and safe updates without affecting ongoing executions.
Hierarchical Navigation: Understand the nested nature of execution identifiers. To get a TaskExecutionIdentifier, you typically need its parent NodeExecutionIdentifier, which in turn needs its parent WorkflowExecutionIdentifier. This structure is fundamental to how Flyte organizes execution data.

Definitional Entity Identifiers​

Execution Identifiers​

Workflow Execution Identifier​

Node Execution Identifier​

Task Execution Identifier​

Signal Identifier​

Common Use Cases and Best Practices​