Skip to content

Extending

Substrait is a community project and requires consensus about new additions to the specification in order to maintain consistency. The best way to get consensus is to discuss ideas. The main ways to communicate are:

  • Substrait Mailing List
  • Substrait Slack
  • Community Meeting

Minor changes

Simple changes like typos and bug fixes do not require as much effort. File an issue or send a PR and we can discuss it there.

Complex changes

For complex features it is useful to discuss the change first. It will be useful to gather some background information to help get everyone on the same page.

Outline the issue

Language

Every engine has its own terminology. Every Spark user probably knows what an “attribute” is. Velox users will know what a “RowVector” means. Etc. However, Substrait is used by people that come from a variety of backgrounds and you should generally assume that its users do not know anything about your own implementation. As a result, all PRs and discussion should endeavor to use Substrait terminology wherever possible.

Motivation

What problems does this relation solve? If it is a more logical relation then how does it allow users to express new capabilities? If it is more of an internal relation then how does it map to existing logical relations? How is it different than other existing relations? Why do we need this?

Examples

Provide example input and output for the relation. Show example plans. Try and motivate your examples, as best as possible, with something that looks like a real world problem. These will go a long ways towards helping others understand the purpose of a relation.

Alternatives

Discuss what alternatives are out there. Are there other ways to achieve similar results? Do some systems handle this problem differently?

Survey existing implementation

It’s unlikely that this is the first time that this has been done. Figuring out

Prototype the feature

Novel approaches should be implemented as an extension first.

Substrait design principles

Substrait is designed around interoperability so a feature only used by a single system may not be accepted. But don’t dispair! Substrait has a highly developed extension system for this express purpose.

You don’t have to do it alone

If you are hoping to add a feature and these criteria seem intimidating then feel free to start a mailing list discussion before you have all the information and ask for help. Investigating other implementations, in particular, is something that can be quite difficult to do on your own.