Transformer

Transformers are the primary building block for complex data pipelines in the Transformation Engine.

Overview

Transformers take at least one incoming data connection and turn it into at least one outgoing data connection. The specific type of each data connection is configurable when registering a new Transformer Template.

Each transformer is a function of:

  • Code (a container, Python, JavaScript, etc.)

  • Inputs and Outputs (Kafka Topics, S3 buckets, etc.)

  • Configuration supplied by the Transformation Engine

  • Configuration supplied by clients when instantiating it as part of a pipeline

A transformer can be defined to take any number of inputs or outputs, where each input/output can be a Kafka Topic path on MinIO, or direct connection to another Transformer. For Kubernetes-based transformers (i.e. containers), environment variables are the main way dynamic configuration is provided. Input and output dataset references (i.e. Kafka Topic to pub/sub, MinIO file path) will be provided by the Transformation Engine as environment variables, though their names are controlled by the transformer’s JSON configuration.

Transformer Templates

A Transformer Template is a JSON object that describes how a Transformer can be used within the Transformation Engine. The application of configuration to a Transformer Template within a Pipeline becomes a Transformer, which is the instantiated process.

Writing the Transformer Template JSON is how the Transformation Engine maps a client’s request to something running. A breakdown of the various fields is as follows:

{
    "uid": "649152b4-ac10-4e2d-9a58-f3ec00d6d1c1",
    "name": "My Transformer",
    "description": "Transforms data",
    "status": "available",
    "security_markings": "UNCLASSIFIED",
    "types": ["sink"],
    "inputs": {
        "SOURCE_TOPIC": {
            "display_name": "Source Topic",
            "conn_type": "INTERNAL_KAFKA",
            "arity": {
                "min": 1,
                "max": 1
            }
        }
    },
    "outputs": {
        "DEST_TOPIC": {
            "display_name": "Destination Topic",
            "conn_type": "INTERNAL_KAFKA",
            "arity": {
                "min": 0,
                "max": 1
            }
        }
    },
    "configuration": {
        "environment": [
            {
                "name": "MY_VAR_1",
                "description": "Something useful for UIs",
                "default_value": "83412"
            },
            {
                "name": "MY_VAR_2",
                "description": "A required variable",
                "required": true
            },
            {
                "name": "MY_SECRET",
                "description": "A sensitive variable",
                "sensitive": true
            }
        ],
        "static_environment": [
            {"name": "STATIC_VAR_1", "value": "rdp-backend"},
            {"name": "KEYCLOAK_CLIENT_SECRET", "valueFrom": {"secretKeyRef": {"key": "rdpPlatformClientSecret", "name": "keycloak-realm-init"}}}
        ],
        "engine_provided_environment": [
            "KEYCLOAK_REALM_URL",
            "KEYCLOAK_URL",
            "KEYCLOAK_REALM"
        ],
        "environment_mapping": {
            "KEYCLOAK_REALM_URL": "MY_CUSTOM_REALM_URL"
        }
    },
    "instantiation": {
        "job_image": {
            "image": "ghcr.io/raft-tech/my-transformer:v1.0",
            "pull_policy": "IfNotPresent",
            "image_pull_secret": "regcred-default",
            "default_replicas": 1,
            "args": ["--arg", "val"]
        }
    }
}

The key sections are:

uid

Unique identifier for the transformer. If one is not specified, a random UUID will be generated and assigned.

name, description, status

Basic metadata shown in the UI.

security_markings

Classification level applied to this template (e.g. UNCLASSIFIED, SECRET).

types

Array of transformer types. Options: sink, source, ai_agent, ai_svc, rdp_workflow, geoserver_client.

inputs / outputs

Named data connections. Each key becomes an environment variable name for container-based transformers. conn_type specifies the storage medium (e.g. INTERNAL_KAFKA). arity controls how many connections are valid (min/max).

configuration.environment

Array of environment variables that clients fill in when creating a pipeline. Each can have a default_value, be marked required, or flagged sensitive.

configuration.static_environment

Environment set as-is on the container. Supports both literal values and Kubernetes secret references (corev1.EnvVar format).

configuration.engine_provided_environment

Environment variable names populated automatically by the Transformation Engine (e.g. Keycloak URLs).

configuration.environment_mapping

Renames engine-provided variables to custom names expected by your code.

instantiation.job_image

Container image configuration — image name, pull policy, pull secret, default replica count, and optional args.