Skip to content

Extended Expression

Extended Expression messages are provided for expression-level protocols as an alternative to using a Plan. They mainly target expression-only evaluations, such as those computed in Filter/Project/Aggregation rels. Unlike the original Expression defined in the substrait protocol, Extended Expression messages require more information to completely describe the computation context including: input data schema, referred function signatures, and output schema.

Since Extended Expression will be used seperately from the Plan rel representation, it will need to include basic fields like Version.

message ExtendedExpression {
  // Substrait version of the expression. Optional up to 0.17.0, required for later
  // versions.
  Version version = 7;

  // a list of yaml specifications this expression may depend on
  repeated substrait.extensions.SimpleExtensionURI extension_uris = 1;

  // a list of extensions this expression may depend on
  repeated substrait.extensions.SimpleExtensionDeclaration extensions = 2;

  // one or more expression trees with same order in plan rel
  repeated ExpressionReference referred_expr = 3;

  NamedStruct base_schema = 4;
  // additional extensions associated with this expression.
  substrait.extensions.AdvancedExtension advanced_extensions = 5;

  // A list of com.google.Any entities that this plan may use. Can be used to
  // warn if some embedded message types are unknown. Note that this list may
  // include message types that are ignorable (optimizations) or that are
  // unused. In many cases, a consumer may be able to work with a plan even if
  // one or more message types defined here are unknown.
  repeated string expected_type_urls = 6;

}

Input and output data schema

Similar to base_schema defined in ReadRel, the input data schema describes the name/type/nullibilty and layout info of input data for the target expression evalutation. It also has a field name to define the name of the output data.

Referred expression

An Extended Exression will have one or more referred expressions, which can be either Expression or AggregateFunction. Additional types of expressions may be added in the future.

For a message with multiple expressions, users may produce each Extended Expression in the same order as they occur in the original Plan rel. But, the consumer does NOT have to handle them in this order. A consumer needs only to ensure that the columns in the final output are organized in the same order as defined in the message.

Function extensions

Function extensions work the same for both Extended Expression and the original Expression defined in the Substrait protocol.