Specialized Record Expressions¶
While all types of operations could be reduced to functions, in some cases this would be overly simplistic. Instead, it is helpful to construct some other expression constructs.
These constructs should be focused on different expression types as opposed to something that directly related to syntactic sugar. For example, CAST and EXTRACT or SQL operations that are presented using specialized syntax. However, they can easily be modeled using a function paradigm with minimal complexity.
Literal Expressions¶
For each data type, it is possible to create a literal value for that data type. The representation depends on the serialization format. Literal expressions include both a type literal and a possibly null value.
Nested Type Constructor Expressions¶
These expressions allow structs, lists, and maps to be constructed from a set of expressions. For example, they allow a struct expression like (field 0 - field 1, field 0 + field 1) to be represented.
Cast Expression¶
To convert a value from one type to another, Substrait defines a cast expression. Cast expressions declare an expected type, an input argument and an enumeration specifying failure behavior, indicating whether cast should return null on failure or throw an exception.
Note that Substrait always requires a cast expression whenever the current type is not exactly equal to (one of) the expected types. For example, it is illegal to directly pass a value of type i8[0] to a function that only supports an i8?[0] argument.
If Expression¶
An if value expression is an expression composed of one if clause, zero or more else if clauses and an else clause. In pseudocode, they are envisioned as:
if <boolean expression> then <result expression 1>
else if <boolean expression> then <result expression 2> (zero or more times)
else <result expression 3>
When an if expression is declared, all return expressions must be the same identical type.
Shortcut Behavior¶
An if expression is expected to logically short-circuit on a positive outcome. This means that a skipped else/elseif expression cannot cause an error. For example, this should not actually throw an error despite the fact that the cast operation should fail.
if 'value' = 'value' then 0
else cast('hello' as integer)
Switch Expression¶
Switch expression allow a selection of alternate branches based on the value of a given expression. They are an optimized form of a generic if expression where all conditions are equality to the same value. In pseudocode:
switch(value)
<value 1> => <return 1> (1 or more times)
<else> => <return default>
Return values for a switch expression must all be of identical type.
Shortcut Behavior¶
As in if expressions, switch expression evaluation should not be interrupted by “roads not taken”.
Or List Equality Expression¶
A specialized structure that is often used is a large list of possible values. In SQL, these are typically large IN lists. They can be composed from one or more fields. There are two common patterns, single value and multi value. In pseudocode they are represented as:
Single Value:
expression, [<value1>, <value2>, ... <valueN>]
Multi Value:
[expressionA, expressionB], [[value1a, value1b], [value2a, value2b].. [valueNa, valueNb]]
For single value expressions, these are a compact equivalent of expression = value1 OR expression = value2 OR .. OR expression = valueN. When using an expression of this type, two things are required; the types of the test expression and all value expressions that are related must be of the same type. Additionally, a function signature for equality must be available for the expression type used.
Execution Context Variables¶
Execution context variables are a special class of expressions whose behavior depends on the current execution context. The evaluation of these variables is controlled by the execution_behavior setting described below.
| Execution Context Variables | Description | Return Type |
|---|---|---|
| current_date | a variable containing the current date | date |
| current_timestamp | a variable containing the current timestamp in current_timezone | PRECISION_TIMESTAMP_TZ |
| current_timezone | a variable containing the current session timezone as a string defined by IANA timezone database (https://www.iana.org/time-zones). | string |
Execution Behavior¶
The execution behavior settings in a Substrait Plan control how execution context variables are evaluated during plan execution. This is specified in the Plan message’s execution_behavior field.
The execution behavior defines a VariableEvaluationMode that controls the scope and frequency of execution context variable evaluation:
- VARIABLE_EVALUATION_MODE_PER_PLAN: Variables are evaluated once per Substrait plan execution. All records in a single plan execution will see the same values for execution context variables like
current_dateandcurrent_timestamp. - VARIABLE_EVALUATION_MODE_PER_RECORD: Variables are evaluated once per record during execution. Each record may see different values for execution context variables if the execution context changes between records.
This setting is particularly important for time-based functions where the evaluation time affects the returned value. For example, with VARIABLE_EVALUATION_MODE_PER_PLAN mode, all rows processed in a single plan execution will have the same current_date value, while with VARIABLE_EVALUATION_MODE_PER_RECORD mode, the date could potentially change between rows if the plan spans a date boundary during execution.