Skip to content

Physical Relations

There is no true distinction between logical and physical operations in Substrait. By convention, certain operations are classified as physical, but all operations can be potentially used in any kind of plan. A particular set of transformations or target operators may (by convention) be considered the “physical plan” but this is a characteristic of the system consuming substrait as opposed to a definition within Substrait.

Hash Equijoin Operator

The hash equijoin join operator will build a hash table out of the right input based on a set of join keys. It will then probe that hash table for incoming inputs, finding matches.

Signature Value
Inputs 2
Outputs 1
Property Maintenance Distribution is maintained. Orderedness of the left set is maintained in INNER join cases, otherwise it is eliminated.
Direct Output Order Same as the Join operator.

Hash Equijoin Properties

Property Description Required
Left Input A relational input.(Probe-side) Required
Right Input A relational input.(Build-side) Required
Left Keys References to the fields to join on in the left input. Required
Right Keys References to the fields to join on in the right input. Required
Post Join Predicate An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. Optional, defaults true.
Join Type One of the join types defined in the Join operator. Required

NLJ (Nested Loop Join) Operator

The nested loop join operator does a join by holding the entire right input and then iterating over it using the left input, evaluating the join expression on the Cartesian product of all rows, only outputting rows where the expression is true. Will also include non-matching rows in the OUTER, LEFT and RIGHT operations per the join type requirements.

Signature Value
Inputs 2
Outputs 1
Property Maintenance Distribution is maintained. Orderedness is eliminated.
Direct Output Order Same as the Join operator.

NLJ Properties

Property Description Required
Left Input A relational input. Required
Right Input A relational input. Required
Join Expression A boolean condition that describes whether each record from the left set “match” the record from the right set. Optional. Defaults to true (a Cartesian join).
Join Type One of the join types defined in the Join operator. Required

Merge Equijoin Operator

The merge equijoin does a join by taking advantage of two sets that are sorted on the join keys. This allows the join operation to be done in a streaming fashion.

Signature Value
Inputs 2
Outputs 1
Property Maintenance Distribution is maintained. Orderedness is eliminated.
Direct Output Order Same as the Join operator.

Merge Join Properties

Property Description Required
Left Input A relational input. Required
Right Input A relational input. Required
Left Keys References to the fields to join on in the left input. Required
Right Keys References to the fields to join on in the right input. Reauired
Post Join Predicate An additional expression that can be used to reduce the output of the join operation post the equality condition. Minimizes the overhead of secondary join conditions that cannot be evaluated using the equijoin keys. Optional, defaults true.
Join Type One of the join types defined in the Join operator. Required

Exchange Operator

The exchange operator will redistribute data based on an exchange type definition. Applying this operation will lead to an output that presents the desired distribution.

Signature Value
Inputs 1
Outputs 1
Property Maintenance Orderedness is maintained. Distribution is overwritten based on configuration.
Direct Output Order Order of the input.

Exchange Types

Type Description
Scatter Distribute data using a system defined hashing function that considers one or more fields. For the same type of fields and same ordering of values, the same partition target should be identified for different ExchangeRels
Single Bucket Define an expression that provides a single i32 bucket number. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition.
Multi Bucket Define an expression that provides a List<i32> of bucket numbers. Optionally define whether the expression will only return values within the valid number of partition counts. If not, the system should modulo the return value to determine a target partition. The records should be sent to all bucket numbers provided by the expression.
Broadcast Send all records to all partitions.
Round Robin Send records to each target in sequence. Can follow either exact or approximate behavior. Approximate will attempt to balance the number of records sent to each destination but may not exactly distribute evenly and may send batches of records to each target before moving to the next.

Exchange Properties

Property Description Required
Input The relational input. Required.
Distribution Type One of the distribution types defined above. Required.
Partition Count The number of partitions targeted for output. Optional. If not defined, implementation system should decide the number of partitions. Note that when not defined, single or multi bucket expressions should not be constrained to count.
Expression Mapping Describes a relationship between each partition ID and the destination that partition should be sent to. Optional. A partition may be sent to 0..N locations. Value can either be a URI or arbitrary value.

Merging Capture

A receiving operation that will merge multiple ordered streams to maintain orderedness.

Signature Value
Inputs 1
Outputs 1
Property Maintenance Orderedness and distribution are maintained.
Direct Output Order Order of the input.

Merging Capture Properties

Property Description Required
Blocking Whether the merging should block incoming data. Blocking should be used carefully, based on whether a deadlock can be produced. Optional, defaults to false

Simple Capture

A receiving operation that will merge multiple streams in an arbitrary order.

Signature Value
Inputs 1
Outputs 1
Property Maintenance Orderness is empty after this operation. Distribution are maintained.
Direct Output Order Order of the input.

Naive Capture Properties

Property Description Required
Input The relational input. Required

Top-N Operation

The top-N operator reorders a dataset based on one or more identified sort fields as well as a sorting function. Rather than sort the entire dataset, the top-N will only maintain the total number of records required to ensure a limited output. A top-n is a combination of a logical sort and logical fetch operations.

Signature Value
Inputs 1
Outputs 1
Property Maintenance Will update orderedness property to the output of the sort operation. Distribution property only remapped based on emit.
Direct Output Order The field order of the input.

Top-N Properties

Property Description Required
Input The relational input. Required
Sort Fields List of one or more fields to sort by. Uses the same properties as the orderedness property. One sort field required
Offset A positive integer. Declares the offset for retrieval of records. Optional, defaults to 0.
Count A positive integer. Declares the number of records that should be returned. Required

Hash Aggregate Operation

The hash aggregate operation maintains a hash table for each grouping set to coalesce equivalent tuples.

Signature Value
Inputs 1
Outputs 1
Property Maintenance Maintains distribution if all distribution fields are contained in every grouping set. No orderness guaranteed.
Direct Output Order Same as defined by Aggregate operation.

Hash Aggregate Properties

Property Description Required
Input The relational input. Required
Grouping Sets One or more grouping sets. Optional, required if no measures.
Per Grouping Set A list of expression grouping that the aggregation measured should be calculated for. Optional, defaults to 0.
Measures A list of one or more aggregate expressions. Implementations may or may not support aggregate ordering expressions. Optional, required if no grouping sets.

Streaming Aggregate Operation

The streaming aggregate operation leverages data ordered by the grouping expressions to calculate data each grouping set tuple-by-tuple in streaming fashion. All grouping sets and orderings requested on each aggregate must be compatible to allow multiple grouping sets or aggregate orderings.

Signature Value
Inputs 1
Outputs 1
Property Maintenance Maintains distribution if all distribution fields are contained in every grouping set. Maintains input ordering.
Direct Output Order Same as defined by Aggregate operation.

Streaming Aggregate Properties

Property Description Required
Input The relational input. Required
Grouping Sets One or more grouping sets. If multiple grouping sets are declared, sets must all be compatible with the input sortedness. Optional, required if no measures.
Per Grouping Set A list of expression grouping that the aggregation measured should be calculated for. Optional, defaults to 0.
Measures A list of one or more aggregate expressions. Aggregate expressions ordering requirements must be compatible with expected ordering. Optional, required if no grouping sets.

Consistent Partition Window Operation

A consistent partition window operation is a special type of project operation where every function is a window function and all of the window functions share the same sorting and partitioning. This allows for the sort and partition to be calculated once and shared between the various function evaluations.

Signature Value
Inputs 1
Outputs 1
Property Maintenance Maintains distribution and ordering.
Direct Output Order Same as Project operator (input followed by each window expression).

Window Properties

Property Description Required
Input The relational input. Required
Window Functions One or more window functions. At least one required.

Expand Operation

The expand operation creates duplicates of input records based on the Expand Fields. Each Expand Field can be a Switching Field or an expression. Switching Fields are described below. If an Expand Field is an expression then its value is consistent across all duplicate rows.

Signature Value
Inputs 1
Outputs 1
Property Maintenance Distribution is maintained if all the distribution fields are consistent fields with direct references. Ordering can only be maintained down to the level of consistent fields that are kept.
Direct Output Order The expand fields followed by an i32 column describing the index of the duplicate that the row is derived from.

Expand Properties

Property Description Required
Input The relational input. Required
Direct Fields Expressions describing the output fields. These refer to the schema of the input. Each Direct Field must be an expression or a Switching Field Required

Switching Field Properties

A switching field is a field whose value is different in each duplicated row. All switching fields in an Expand Operation must have the same number of duplicates.

Property Description Required
Duplicates List of one or more expressions. The output will contain a row for each expression. Required

Hashing Window Operation

A window aggregate operation that will build hash tables for each distinct partition expression.

Signature Value
Inputs 1
Outputs 1
Property Maintenance Maintains distribution. Eliminates ordering.
Direct Output Order Same as Project operator (input followed by each window expression).

Hashing Window Properties

Property Description Required
Input The relational input. Required
Window Expressions One or more window expressions. At least one required.

Streaming Window Operation

A window aggregate operation that relies on a partition/ordering sorted input.

Signature Value
Inputs 1
Outputs 1
Property Maintenance Maintains distribution. Eliminates ordering.
Direct Output Order Same as Project operator (input followed by each window expression).

Streaming Window Properties

Property Description Required
Input The relational input. Required
Window Expressions One or more window expressions. Must be supported by the sortedness of the input. At least one required.