Logical Relations
Read Operator
The read operator is an operator that produces one output. A simple example would be the reading of a Parquet file. It is expected that many types of reads will be added over time.
Signature | Value |
Inputs | 0 |
Outputs | 1 |
Property Maintenance | N/A (no inputs) |
Direct Output Order | Defaults to the schema of the data read after the optional projection (masked complex expression) is applied. |
Read Properties
Property | Description | Required |
Definition | The contents of the read property definition. | Required |
Direct Schema | Defines the schema of the output of the read (before any projection or emit remapping/hiding). | Required |
Filter | A boolean Substrait expression that describes the filter of an iceberg dataset. TBD: define how field referencing works. | Optional, defaults to none. |
Projection | A masked complex expression describing the portions of the content that should be read | Optional, defaults to all of schema |
Output properties | Declaration of orderedness and/or distribution properties this read produces. | Optional, defaults to no properties. |
Properties | A list of name/value pairs associated with the read. | Optional, defaults to empty |
Read Definition Types
Read definition types are built by the community and added to the specification. This is a portion of specification that is expected to grow rapidly.
Virtual Table
Property | Description | Required |
Data | Required | Required |
Files Type
Property | Description | Required |
Items | An array of Items (path or path glob) associated with the read. | Required |
Format per item | Enumeration of available formats. Only current option is PARQUET. | Required |
Slicing parameters per item | Information to use when reading a slice of a file. | Optional |
Slicing Files
A read operation is allowed to only read part of a file. This is convenient, for example, when distributing a read operation across several nodes. The slicing parameters are specified as byte offsets into the file.
Many file formats consist of indivisible “chunks” of data (e.g. Parquet row groups). If this happens the consumer can determine which slice a particular chunk belongs to. For example, one possible approach is that a chunk should only be read if the midpoint of the chunk (dividing by 2 and rounding down) is contained within the asked-for byte range.
Filter Operation
The filter operator eliminates one or more records from the input data based on a boolean filter expression.
Signature | Value |
Inputs | 1 |
Outputs | 1 |
Property Maintenance | Orderedness, Distribution, remapped by emit |
Direct Output Order | The field order as the input. |
Filter Properties
Property | Description | Required |
Input | The relational input. | Required |
Expression | A boolean expression which describes which records are included/excluded. | Required |
Sort Operation
The sort operator reorders a dataset based on one or more identified sort fields and a sorting function for each.
Signature | Value |
Inputs | 1 |
Outputs | 1 |
Property Maintenance | Will update orderedness property to the output of the sort operation. Distribution property only remapped based on emit. |
Direct Output Order | The field order of the input. |
Sort Properties
Property | Description | Required |
Input | The relational input. | Required |
Sort Fields | List of one or more fields to sort by. Uses the same properties as the orderedness property. | One sort field required |
Project Operation
The project operation will produce one or more additional expressions based on the inputs of the dataset.
Signature | Value |
Inputs | 1 |
Outputs | 1 |
Property Maintenance | Distribution maintained, mapped by emit. Orderedness: Maintained if no window operations. Extended to include projection fields if fields are direct references. If window operations are present, no orderedness is maintained. |
Direct Output Order | The field order of the input + the list of new expressions in the order they are declared in the expressions list. |
Project Properties
Property | Description | Required |
Input | The relational input. | Required |
Expressions | List of one or more expressions to add to the input. | At least one expression required |
Cross Product Operation
The cross product operation will combine two separate inputs into a single output. It pairs every record from the left input with every record of the right input.
Signature | Value |
Inputs | 2 |
Outputs | 1 |
Property Maintenance | Distribution is maintained. Orderedness is empty post operation. |
Direct Output Order | The emit order of the left input followed by the emit order of the right input. |
Cross Product Properties
Property | Description | Required |
Left Input | A relational input. | Required |
Right Input | A relational input. | Required |
Join Operation
The join operation will combine two separate inputs into a single output, based on a join expression. A common subtype of joins is an equality join where the join expression is constrained to a list of equality (or equality + null equality) conditions between the two inputs of the join.
Signature | Value |
Inputs | 2 |
Outputs | 1 |
Property Maintenance | Distribution is maintained. Orderedness is empty post operation. Physical relations may provide better property maintenance. |
Direct Output Order | The emit order of the left input followed by the emit order of the right input. |
Join Properties
Property | Description | Required |
Left Input | A relational input. | Required |
Right Input | A relational input. | Required |
Join Expression | A boolean condition that describes whether each record from the left set “match” the record from the right set. Field references correspond to the direct output order of the data. | Required. Can be the literal True. |
Join Type | One of the join types defined below. | Required |
Join Types
Type | Description |
Inner | Return records from the left side only if they match the right side. Return records from the right side only when they match the left side. For each cross input match, return a record including the data from both sides. Non-matching records are ignored. |
Outer | Return all records from both the left and right inputs. For each cross input match, return a record including the data from both sides. For any remaining non-match records, return the record from the corresponding input along with nulls for the opposite input. |
Left | Return all records from the left input. For each cross input match, return a record including the data from both sides. For any remaining non-matching records from the left input, return the left record along with nulls for the right input. |
Right | Return all records from the right input. For each cross input match, return a record including the data from both sides. For any remaining non-matching records from the right input, return the left record along with nulls for the right input. |
Semi | Returns records from the left input. These are returned only if the records have a join partner on the right side. |
Anti | Return records from the left input. These are returned only if the records do not have a join partner on the right side. |
Single | Returns one join partner per entry on the left input. If more than one join partner exists, there are two valid semantics. 1) Only the first match is returned. 2) The system throws an error. If there is no match between the left and right inputs, NULL is returned. |
Set Operation
The set operation encompasses several set-level operations that support combining datasets based, possibly excluding records based on various types of record level matching.
Signature | Value |
Inputs | 2 or more |
Outputs | 1 |
Property Maintenance | Maintains distribution if all inputs have the same ordinal distribution. Orderedness is not maintained. |
Direct Output Order | All inputs are ordinally matched and returned together. All inputs must have matching record types. |
Set Properties
Property | Description | Required |
Primary Input | The primary input of the dataset. | Required |
Secondary Inputs | One or more relational inputs. | At least one required |
Set Operation Type | From list below. | Required |
Set Operation Types
Property | Description |
Minus (Primary) | Returns the primary input excluding any matching records from secondary inputs. |
Minus (Multiset) | Returns the primary input minus any records that are included in all sets. |
Intersection (Primary) | Returns all rows primary rows that intersect at least one secondary input. |
Intersection (Multiset) | Returns all rows that intersect at least one record from each secondary inputs. |
Union Distinct | Returns all the records from each set, removing any rows that are duplicated (within or across sets). |
Union All | Returns all records from each set, allowing duplicates. |
Fetch Operation
The fetch operation eliminates records outside a desired window. Typically corresponds to a fetch/offset SQL clause. Will only returns records between the start offset and the end offset.
Signature | Value |
Inputs | 2 or more |
Outputs | 1 |
Property Maintenance | Maintains distribution and orderedness. |
Direct Output Order | Unchanged from input. |
Fetch Properties
Property | Description | Required |
Input | A relational input, typically with a desired orderedness property. | Required |
Offset | A positive integer. Declares the offset for retrieval of records. | Optional, defaults to 0. |
Count | A positive integer. Declares the number of records that should be returned. | Required |
Aggregate Operation
The aggregate operation groups input data on one or more sets of grouping keys, calculating each measure for each combination of grouping key.
Signature | Value |
Inputs | 1 |
Outputs | 1 |
Property Maintenance | Maintains distribution if all distribution fields are contained in every grouping set. No orderedness guaranteed. |
Direct Output Order | The list of distinct columns from each grouping set (ordered by their first appearance) followed by the list of measures in declaration order, followed by an integer describing the associated particular grouping set the value is derived from. |
Aggregate Properties
Property | Description | Required |
Input | The relational input. | Required |
Grouping Sets | One or more grouping sets. | Optional, required if no measures. |
Per Grouping Set | A list of expression grouping that the aggregation measured should be calculated for. | Optional, defaults to 0. |
Measures | A list of one or more aggregate expressions along with an optional filter. | Optional, required if no grouping sets. |
Write Operator
The write operator is an operator that consumes one output and writes it to storage. A simple example would be writing Parquet files. It is expected that many types of writes will be added over time.
Signature | Value |
Inputs | 1 |
Outputs | 0 |
Property Maintenance | N/A (no output) |
Direct Output Order | N/A (no output) |
Write Properties
Property | Description | Required |
Definition | The contents of the write property definition. | Required |
Field names | The names of all struct fields in breadth-first order. | Required |
Masked Complex Expression | The masking expression applied to the input record prior to write. | Optional, defaults to all |
Rotation description fields | A list of fields that can be used for stream description whenever a stream is reset. | Optional, defaults to none. |
Rotation indicator | An input field ID that describes when the current stream should be “rotated”. Individual write definition types may support the ability to rotate the output into one or more streams. This could mean closing and opening a new file, finishing and restarting a TCP connection, etc. If a rotation indicator is available, it will be 0 except when a rotation should occur. Rotation indication are frequently defined by things like discrete partition values but could be done based on number of records or other arbitrary criteria. | Optional, defaults to none. |
Write Definition Types
Write definition types are built by the community and added to the specification. This is a portion of specification that is expected to grow rapidly.
Virtual Table
Property | Description | Required |
Name | The in-memory name to give the dataset. | Required |
Pin | Whether it is okay to remove this dataset from memory or it should be kept in memory. | Optional, defaults to false. |
Files Type
Property | Description | Required |
Path | A URI to write the data to. Supports the inclusion of field references that are listed as available in properties as a “rotation description field”. | Required |
Format | Enumeration of available formats. Only current option is PARQUET. | Required |
Discussion Points
- How to handle correlated operations?