Lambda Expressions¶
Lambda expressions represent inline, anonymous functions within query plans, enabling operations that require nested computation over data. They are primarily used to express array operations (such as transform or filter) and support higher-order functions that apply computations element-wise over collections. A lambda consists of explicit parameter types and a body expression that references those parameters.
Overview¶
Lambda expressions are a type of expression in Substrait (like IfThen, Subquery, or Nested expressions) that can be passed as arguments to higher-order functions or invoked directly.
Documentation Syntax
This documentation uses the syntax (param: type, ...) -> expression as an illustrative notation to explain lambda concepts in a readable form. There is no formal syntax specified in the Substrait spec for compactly representing lambdas.
Lambda Expression Structure¶
A lambda expression consists of:
| Component | Description | Protobuf Field | Required |
|---|---|---|---|
| Parameters | Struct type defining the lambda’s parameters. Each field in the struct represents a parameter that can be accessed via LambdaParameterReference (a type of FieldReference). The struct’s nullability must be NULLABILITY_REQUIRED. | parameters | Yes |
| Body Expression | The expression to evaluate (can reference parameters via LambdaParameterReference). The type of this expression is the return type of the lambda. | body | Yes |
message Lambda {
// Parameters this lambda accepts, represented as a struct where each field corresponds
// to a parameter. Parameters can be accessed using FieldReference with
// FieldReference.LambdaParameterReference as root_type and StructField to select
// specific parameters. The struct's nullability must be NULLABILITY_REQUIRED.
Type.Struct parameters = 1;
// The lambda body expression. Lambda parameters can be referenced using FieldReference
// with FieldReference.LambdaParameterReference as root_type.
Expression body = 2;
}
Type Derivation¶
The type of a lambda expression is a func type. The parameters of the func are the parameters of the lambda. The return type of the the func is determined by the type of the expression comprising the body of the lambda.
Parameter References¶
Lambda parameters are referenced within the lambda body using FieldReferences with LambdaParameterReference as the root type. Lambda parameters are conceptually treated as a struct, where each parameter occupies a position that can be accessed via StructField references.
LambdaParameterReference Fields¶
LambdaParameterReference is a nested message within FieldReference that identifies which lambda scope to reference:
| Field | Description | Values |
|---|---|---|
steps_out | Number of lambda boundaries to traverse (0 = current lambda) | 0, 1, 2… |
message LambdaParameterReference {
// Number of lambda boundaries to traverse up for this reference.
// For nested lambdas:
// 0 = innermost lambda (current lambda's parameters as a struct)
// 1 = one lambda level out (outer lambda's parameters as a struct)
// 2 = two lambda levels out, etc.
uint32 steps_out = 1;
}
To access a specific parameter, wrap LambdaParameterReference in a FieldReference and use direct_reference with StructField to specify which parameter (field 0 = first parameter, field 1 = second parameter, etc.).
Simple Example¶
# Represents: `(x: i32) -> x * 2`
#
# message Expression.Lambda
parameters: {
types: [
{
i32: {
nullability: NULLABILITY_REQUIRED
}
}
]
}
body: {
scalar_function: {
function_reference: 1 # Reference to multiply function
arguments: [
{
# First argument: lambda parameter x
value: {
selection: {
lambda_parameter_reference: {
steps_out: 0 # 0 = current lambda
}
direct_reference: {
struct_field: {
field: 0 # 0 = first parameter (x)
}
}
}
}
},
{
value: {
literal: {
i32: 2
}
}
}
]
output_type: {
i32: {
nullability: NULLABILITY_REQUIRED
}
}
}
}
Accessing Fields within Parameters¶
Because lambda parameters are accessed using FieldReference, all field navigation mechanisms are available for drilling into complex objects. For example, when a lambda parameter is a struct, you can access deeply nested fields like person.address.city:
# Access nested field within a struct parameter
# Example: lambda parameter is a Person struct with fields:
# field 0: name (string)
# field 1: age (i32)
# field 2: address (struct with fields:)
# field 0: street (string)
# field 1: city (string)
# field 2: zip (string)
#
# This demonstrates accessing: person.address.city
# Equivalent to: lambda_parameter[0].struct_field[2].struct_field[1]
#
# message Expression.FieldReference
lambda_parameter_reference: {
steps_out: 0 # Current lambda's parameters
}
direct_reference: {
struct_field: {
field: 0 # First parameter (person)
child: {
struct_field: {
field: 2 # Third field of person (address)
child: {
struct_field: {
field: 1 # Second field of address (city)
}
}
}
}
}
}
Function Type Syntax¶
In YAML extension definitions, function types are specified using the func keyword with generic type parameters:
This notation applies to extension YAML signatures; in plans, lambdas are always represented as Expression.Lambda with parameters (a struct type) and body.
Single parameter (represents a lambda with 1 field in the parameters struct):
func<any1 -> any2> # Single parameter without parentheses
func<(any1) -> any2> # Single parameter with parentheses (equivalent)
Multiple parameters (represents a lambda with 2+ fields in the parameters struct):
func<(any1, any2) -> any3> # Multiple parameters (parentheses required)
func<(any1, any2, any3) -> any4> # Three parameters
Nullability¶
The Func type has its own nullability field, which applies to the function value itself — not its return type. A nullable function type (func?<i32 -> i32>) means the function reference may be null, whereas a non-nullable function with a nullable return type (func<i32 -> i32?>) always exists but may return null.
Example: The transform Function¶
The transform function transforms each element of a list using a lambda. Here’s how it’s defined in the functions_list extension:
%YAML 1.2
---
urn: extension:io.substrait:functions_list
scalar_functions:
- name: "transform"
description: >-
Transforms each element of a list using a lambda function.
Also known as "map" in functional programming.
Returns a new list where each element is the result of applying
the transformer function to the corresponding element in the input list.
The lambda receives one parameter (the current element) and must return
the transformed value.
impls:
- args:
- name: input
value: list<any1>
- name: transformer
value: func<any1 -> any2>
nullability: MIRROR
return: list<any2>
The func<any1 -> any2> type indicates the lambda accepts one parameter of type any1 and returns type any2. Using numbered any types ensures repeated labels within a signature must resolve to the same concrete type.
Closures¶
Lambda bodies can reference data from outside their parameter list through FieldReferences. References to input records (via RootReference) and outer query records (via OuterReference) work as they do elsewhere in Substrait. There is also a way to capture lambda parameters from outer lambdas.
Outer Lambda Parameters¶
steps_out = 0 refers to the current lambda’s parameters. To reference an enclosing lambda (i.e., a lambda further out in scope), use steps_out > 0 (1 = immediate parent, 2 = grandparent, etc.). Combine this with StructField to access specific parameters from that scope.
# Represents: `(outer_x: i32) -> ((inner_y: i32) -> add(outer_x, inner_y))`
# Demonstrates steps_out:
# - steps_out: 1 with struct_field: 0 -> outer_x
# - steps_out: 0 with struct_field: 0 -> inner_y
#
# message Expression.Lambda
parameters: {types: [{i32: {nullability: NULLABILITY_REQUIRED}}]}
body: {
lambda: {
parameters: {types: [{i32: {nullability: NULLABILITY_REQUIRED}}]}
body: {
scalar_function: {
function_reference: 1 # reference to add
arguments: [
{value: {selection: {lambda_parameter_reference: {steps_out: 1}, direct_reference: {struct_field: {field: 0}}}}}, # outer_x
{value: {selection: {lambda_parameter_reference: {steps_out: 0}, direct_reference: {struct_field: {field: 0}}}}} # inner_y
]
}
}
}
}
Lambda Invocation¶
Lambda expressions can be invoked using the LambdaInvocation expression type, allowing a lambda to be defined and called in a single expression.
A lambda invocation consists of:
| Component | Description | Protobuf Field | Required |
|---|---|---|---|
| Lambda | The inline lambda expression to invoke | lambda | Yes |
| Arguments | A Nested.Struct containing expressions for each lambda parameter. Each field corresponds to a lambda parameter and must evaluate to the matching parameter type. | arguments | Yes |
The arguments field must be a Nested.Struct with exactly as many fields as the lambda has parameters. The type of each expression field must match the corresponding parameter type. The return type is derived from the type of the lambda’s body expression.
message LambdaInvocation {
// The lambda expression to invoke.
Lambda lambda = 1;
// Arguments to pass to the lambda, as a struct expression. The struct must have
// exactly one Expression field for each lambda parameter, and the expression at
// each position must have a type that matches the corresponding parameter type.
Nested.Struct arguments = 2;
}
Example¶
Invoking ((x: i32) -> x * 2)(5) to compute 10:
# Represents the invocation of the lambda `(x: i32) -> x * 2)` on parameter `5`
# i.e. `(x: i32) -> x * 2)(5)`
#
# message Expression.LambdaInvocation
lambda: {
# Lambda parameter: x (type i32)
parameters: {
types: [
{
i32: {
nullability: NULLABILITY_REQUIRED
}
}
]
}
# Lambda body: x * 2
body: {
scalar_function: {
function_reference: 1 # Reference to multiply function
arguments: [
{
# First argument: lambda parameter x
value: {
selection: {
lambda_parameter_reference: {
steps_out: 0 # 0 = current lambda
}
direct_reference: {
struct_field: {
field: 0 # 0 = first parameter (x)
}
}
}
}
},
{
# Second argument: literal 2
value: {
literal: {
i32: 2
}
}
}
]
output_type: {
i32: {
nullability: NULLABILITY_REQUIRED
}
}
}
}
}
# Invocation arguments: struct with one field containing 5
arguments: {
fields: [
{
literal: {
i32: 5
}
}
]
}
See Also¶
- Field References - How to reference data in expressions
- Scalar Functions - General scalar function documentation
- functions_list Extension - Complete list of higher-order functions