Dataset

Datasets map to underlying storage media in [R]DP.

Overview

Datasets are primarily used in the Transformation Engine for two reasons:

To store intermediate or final results of Transformations for later ingest or visualization.
To allow cross-Transformer coordination at high data volumes while providing fault tolerance.

Many Transformers work entirely with Datasets; that is, all of their inputs and outputs are Dataset connections. The exception to this rule is [R]DP Workflows, which represent a fundamentally different Transformer paradigm.

There are several types of Datasets available:

INTERNAL_KAFKA Datasets map to streaming topics in [R]DP and allow configuration of the underlying topic settings.
INTERNAL_MINIO Datasets map to buckets and/or key prefixes in [R]DP’s blob storage.
INTERNAL_ICEBERG Datasets map to Iceberg tables within [R]DP.