Skip to content

functions_string.yaml

This document file is generated for functions_string.yaml. The extension URN is extension:io.substrait:functions_string.

Scalar Functions

concat

Concatenate strings. The null_handling option determines whether or not null values will be recognized by the function. If null_handling is set to IGNORE_NULLS, null value arguments will be ignored when strings are concatenated. If set to ACCEPT_NULLS, the result will be null if any argument passed to the concat function is null.

Implementations:

  • concat(input: varchar<L1>, option:null_handling): -> varchar<L1>
  • concat(input: string, option:null_handling): -> string
Options:
  • null_handling ['IGNORE_NULLS', 'ACCEPT_NULLS']
  • like

    Are two strings like each other. The case_sensitivity option applies to the match argument.

    Implementations:

    • like(input: varchar<L1>, match: varchar<L2>, option:case_sensitivity): -> boolean
    • like(input: string, match: string, option:case_sensitivity): -> boolean
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • substring

    Extract a substring of a specified length starting from position start. A start value of 1 refers to the first characters of the string. When length is not specified the function will extract a substring starting from position start and ending at the end of the string. The negative_start option applies to the start parameter. WRAP_FROM_END means the index will start from the end of the input and move backwards. The last character has an index of -1, the second to last character has an index of -2, and so on. LEFT_OF_BEGINNING means the returned substring will start from the left of the first character. A start of -1 will begin 2 characters left of the the input, while a start of 0 begins 1 character left of the input.

    Implementations:

    • substring(input: varchar<L1>, start: i32, length: i32, option:negative_start): -> varchar<L1>
    • substring(input: string, start: i32, length: i32, option:negative_start): -> string
    • substring(input: fixedchar<l1>, start: i32, length: i32, option:negative_start): -> string
    • substring(input: varchar<L1>, start: i32, option:negative_start): -> varchar<L1>
    • substring(input: string, start: i32, option:negative_start): -> string
    • substring(input: fixedchar<l1>, start: i32, option:negative_start): -> string
    Options:
  • negative_start ['WRAP_FROM_END', 'LEFT_OF_BEGINNING', 'ERROR']
  • negative_start ['WRAP_FROM_END', 'LEFT_OF_BEGINNING']
  • regexp_match_substring

    Extract a substring that matches the given regular expression pattern. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The occurrence of the pattern to be extracted is specified using the occurrence argument. Specifying 1 means the first occurrence will be extracted, 2 means the second occurrence, and so on. The occurrence argument should be a positive non-zero integer. The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The regular expression capture group can be specified using the group argument. Specifying 0 will return the substring matching the full regular expression. Specifying 1 will return the substring matching only the first capture group, and so on. The group argument should be a non-negative integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the occurrence value is out of range, the position value is out of range, or the group value is out of range.

    Implementations:

    • regexp_match_substring(input: varchar<L1>, pattern: varchar<L2>, position: i64, occurrence: i64, group: i64, option:case_sensitivity, option:multiline, option:dotall): -> varchar<L1>
    • regexp_match_substring(input: string, pattern: string, position: i64, occurrence: i64, group: i64, option:case_sensitivity, option:multiline, option:dotall): -> string
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • regexp_match_substring

    Extract a substring that matches the given regular expression pattern. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The first occurrence of the pattern from the beginning of the string is extracted. It returns the substring matching the full regular expression. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile.

    Implementations:

    • regexp_match_substring(input: string, pattern: string, option:case_sensitivity, option:multiline, option:dotall): -> string
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • regexp_match_substring_all

    Extract all substrings that match the given regular expression pattern. This will return a list of extracted strings with one value for each occurrence of a match. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The regular expression capture group can be specified using the group argument. Specifying 0 will return substrings matching the full regular expression. Specifying 1 will return substrings matching only the first capture group, and so on. The group argument should be a non-negative integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the position value is out of range, or the group value is out of range.

    Implementations:

    • regexp_match_substring_all(input: varchar<L1>, pattern: varchar<L2>, position: i64, group: i64, option:case_sensitivity, option:multiline, option:dotall): -> List<varchar<L1>>
    • regexp_match_substring_all(input: string, pattern: string, position: i64, group: i64, option:case_sensitivity, option:multiline, option:dotall): -> List<string>
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • starts_with

    Whether the input string starts with the substring. The case_sensitivity option applies to the substring argument.

    Implementations:

    • starts_with(input: varchar<L1>, substring: varchar<L2>, option:case_sensitivity): -> boolean
    • starts_with(input: varchar<L1>, substring: string, option:case_sensitivity): -> boolean
    • starts_with(input: varchar<L1>, substring: fixedchar<L2>, option:case_sensitivity): -> boolean
    • starts_with(input: string, substring: string, option:case_sensitivity): -> boolean
    • starts_with(input: string, substring: varchar<L1>, option:case_sensitivity): -> boolean
    • starts_with(input: string, substring: fixedchar<L1>, option:case_sensitivity): -> boolean
    • starts_with(input: fixedchar<L1>, substring: fixedchar<L2>, option:case_sensitivity): -> boolean
    • starts_with(input: fixedchar<L1>, substring: string, option:case_sensitivity): -> boolean
    • starts_with(input: fixedchar<L1>, substring: varchar<L2>, option:case_sensitivity): -> boolean
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • ends_with

    Whether input string ends with the substring. The case_sensitivity option applies to the substring argument.

    Implementations:

    • ends_with(input: varchar<L1>, substring: varchar<L2>, option:case_sensitivity): -> boolean
    • ends_with(input: varchar<L1>, substring: string, option:case_sensitivity): -> boolean
    • ends_with(input: varchar<L1>, substring: fixedchar<L2>, option:case_sensitivity): -> boolean
    • ends_with(input: string, substring: string, option:case_sensitivity): -> boolean
    • ends_with(input: string, substring: varchar<L1>, option:case_sensitivity): -> boolean
    • ends_with(input: string, substring: fixedchar<L1>, option:case_sensitivity): -> boolean
    • ends_with(input: fixedchar<L1>, substring: fixedchar<L2>, option:case_sensitivity): -> boolean
    • ends_with(input: fixedchar<L1>, substring: string, option:case_sensitivity): -> boolean
    • ends_with(input: fixedchar<L1>, substring: varchar<L2>, option:case_sensitivity): -> boolean
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • contains

    Whether the input string contains the substring. The case_sensitivity option applies to the substring argument.

    Implementations:

    • contains(input: varchar<L1>, substring: varchar<L2>, option:case_sensitivity): -> boolean
    • contains(input: varchar<L1>, substring: string, option:case_sensitivity): -> boolean
    • contains(input: varchar<L1>, substring: fixedchar<L2>, option:case_sensitivity): -> boolean
    • contains(input: string, substring: string, option:case_sensitivity): -> boolean
    • contains(input: string, substring: varchar<L1>, option:case_sensitivity): -> boolean
    • contains(input: string, substring: fixedchar<L1>, option:case_sensitivity): -> boolean
    • contains(input: fixedchar<L1>, substring: fixedchar<L2>, option:case_sensitivity): -> boolean
    • contains(input: fixedchar<L1>, substring: string, option:case_sensitivity): -> boolean
    • contains(input: fixedchar<L1>, substring: varchar<L2>, option:case_sensitivity): -> boolean
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • strpos

    Return the position of the first occurrence of a string in another string. The first character of the string is at position 1. If no occurrence is found, 0 is returned. The case_sensitivity option applies to the substring argument.

    Implementations:

    • strpos(input: string, substring: string, option:case_sensitivity): -> i64
    • strpos(input: varchar<L1>, substring: varchar<L1>, option:case_sensitivity): -> i64
    • strpos(input: fixedchar<L1>, substring: fixedchar<L2>, option:case_sensitivity): -> i64
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • regexp_strpos

    Return the position of an occurrence of the given regular expression pattern in a string. The first character of the string is at position 1. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. Which occurrence to return the position of is specified using the occurrence argument. Specifying 1 means the position first occurrence will be returned, 2 means the position of the second occurrence, and so on. The occurrence argument should be a positive non-zero integer. If no occurrence is found, 0 is returned. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the occurrence value is out of range, or the position value is out of range.

    Implementations:

    • regexp_strpos(input: varchar<L1>, pattern: varchar<L2>, position: i64, occurrence: i64, option:case_sensitivity, option:multiline, option:dotall): -> i64
    • regexp_strpos(input: string, pattern: string, position: i64, occurrence: i64, option:case_sensitivity, option:multiline, option:dotall): -> i64
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • count_substring

    Return the number of non-overlapping occurrences of a substring in an input string. The case_sensitivity option applies to the substring argument.

    Implementations:

    • count_substring(input: string, substring: string, option:case_sensitivity): -> i64
    • count_substring(input: varchar<L1>, substring: varchar<L2>, option:case_sensitivity): -> i64
    • count_substring(input: fixedchar<L1>, substring: fixedchar<L2>, option:case_sensitivity): -> i64
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • regexp_count_substring

    Return the number of non-overlapping occurrences of a regular expression pattern in an input string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile or the position value is out of range.

    Implementations:

    • regexp_count_substring(input: string, pattern: string, position: i64, option:case_sensitivity, option:multiline, option:dotall): -> i64
    • regexp_count_substring(input: varchar<L1>, pattern: varchar<L2>, position: i64, option:case_sensitivity, option:multiline, option:dotall): -> i64
    • regexp_count_substring(input: fixedchar<L1>, pattern: fixedchar<L2>, position: i64, option:case_sensitivity, option:multiline, option:dotall): -> i64
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • regexp_count_substring

    Return the number of non-overlapping occurrences of a regular expression pattern in an input string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The match starts at the first character of the input string. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile.

    Implementations:

    • regexp_count_substring(input: string, pattern: string, option:case_sensitivity, option:multiline, option:dotall): -> i64
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • replace

    Replace all occurrences of the substring with the replacement string. The case_sensitivity option applies to the substring argument.

    Implementations:

    • replace(input: string, substring: string, replacement: string, option:case_sensitivity): -> string
    • replace(input: varchar<L1>, substring: varchar<L2>, replacement: varchar<L3>, option:case_sensitivity): -> varchar<L1>
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • concat_ws

    Concatenate strings together separated by a separator.

    Implementations:

    • concat_ws(separator: string, string_arguments: string): -> string
    • concat_ws(separator: varchar<L2>, string_arguments: varchar<L1>): -> varchar<L1>

    repeat

    Repeat a string count number of times.

    Implementations:

    • repeat(input: string, count: i64): -> string
    • repeat(input: varchar<L1>, count: i64): -> varchar<L1>

    reverse

    Returns the string in reverse order.

    Implementations:

    • reverse(input: string): -> string
    • reverse(input: varchar<L1>): -> varchar<L1>
    • reverse(input: fixedchar<L1>): -> fixedchar<L1>

    replace_slice

    Replace a slice of the input string. A specified ‘length’ of characters will be deleted from the input string beginning at the ‘start’ position and will be replaced by a new string. A start value of 1 indicates the first character of the input string. If start is negative or zero, or greater than the length of the input string, a null string is returned. If ‘length’ is negative, a null string is returned. If ‘length’ is zero, inserting of the new string occurs at the specified ‘start’ position and no characters are deleted. If ‘length’ is greater than the input string, deletion will occur up to the last character of the input string.

    Implementations:

    • replace_slice(input: string, start: i64, length: i64, replacement: string): -> string
    • replace_slice(input: varchar<L1>, start: i64, length: i64, replacement: varchar<L2>): -> varchar<L1>

    lower

    Transform the string to lower case characters. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Implementations:

    • lower(input: string, option:char_set): -> string
    • lower(input: varchar<L1>, option:char_set): -> varchar<L1>
    • lower(input: fixedchar<L1>, option:char_set): -> fixedchar<L1>
    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • upper

    Transform the string to upper case characters. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Implementations:

    • upper(input: string, option:char_set): -> string
    • upper(input: varchar<L1>, option:char_set): -> varchar<L1>
    • upper(input: fixedchar<L1>, option:char_set): -> fixedchar<L1>
    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • swapcase

    Transform the string’s lowercase characters to uppercase and uppercase characters to lowercase. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Implementations:

    • swapcase(input: string, option:char_set): -> string
    • swapcase(input: varchar<L1>, option:char_set): -> varchar<L1>
    • swapcase(input: fixedchar<L1>, option:char_set): -> fixedchar<L1>
    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • capitalize

    Capitalize the first character of the input string. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Implementations:

    • capitalize(input: string, option:char_set): -> string
    • capitalize(input: varchar<L1>, option:char_set): -> varchar<L1>
    • capitalize(input: fixedchar<L1>, option:char_set): -> fixedchar<L1>
    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • title

    Converts the input string into titlecase. Capitalize the first character of each word in the input string except for articles (a, an, the). Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Implementations:

    • title(input: string, option:char_set): -> string
    • title(input: varchar<L1>, option:char_set): -> varchar<L1>
    • title(input: fixedchar<L1>, option:char_set): -> fixedchar<L1>
    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • initcap

    Capitalizes the first character of each word in the input string, including articles, and lowercases the rest. Implementation should follow the utf8_unicode_ci collations according to the Unicode Collation Algorithm described at http://www.unicode.org/reports/tr10/.

    Implementations:

    • initcap(input: string, option:char_set): -> string
    • initcap(input: varchar<L1>, option:char_set): -> varchar<L1>
    • initcap(input: fixedchar<L1>, option:char_set): -> fixedchar<L1>
    Options:
  • char_set ['UTF8', 'ASCII_ONLY']
  • char_length

    Return the number of characters in the input string. The length includes trailing spaces.

    Implementations:

    • char_length(input: string): -> i64
    • char_length(input: varchar<L1>): -> i64
    • char_length(input: fixedchar<L1>): -> i64

    bit_length

    Return the number of bits in the input string.

    Implementations:

    • bit_length(input: string): -> i64
    • bit_length(input: varchar<L1>): -> i64
    • bit_length(input: fixedchar<L1>): -> i64

    octet_length

    Return the number of bytes in the input string.

    Implementations:

    • octet_length(input: string): -> i64
    • octet_length(input: varchar<L1>): -> i64
    • octet_length(input: fixedchar<L1>): -> i64

    regexp_replace

    Search a string for a substring that matches a given regular expression pattern and replace it with a replacement string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github .io/icu/userguide/strings/regexp.html). The occurrence of the pattern to be replaced is specified using the occurrence argument. Specifying 1 means only the first occurrence will be replaced, 2 means the second occurrence, and so on. Specifying 0 means all occurrences will be replaced. The number of characters from the beginning of the string to begin starting to search for pattern matches can be specified using the position argument. Specifying 1 means to search for matches starting at the first character of the input string, 2 means the second character, and so on. The position argument should be a positive non-zero integer. The replacement string can capture groups using numbered backreferences. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile, the replacement contains an illegal back-reference, the occurrence value is out of range, or the position value is out of range.

    Implementations:

    • regexp_replace(input: string, pattern: string, replacement: string, position: i64, occurrence: i64, option:case_sensitivity, option:multiline, option:dotall): -> string
    • regexp_replace(input: varchar<L1>, pattern: varchar<L2>, replacement: varchar<L3>, position: i64, occurrence: i64, option:case_sensitivity, option:multiline, option:dotall): -> varchar<L1>
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • regexp_replace

    Search a string for a substring that matches a given regular expression pattern and replace it with a replacement string. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github .io/icu/userguide/strings/regexp.html). The replacement string can capture groups using numbered backreferences. All occurrences of the pattern will be replaced. The search for matches start at the first character of the input. The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string. Behavior is undefined if the regex fails to compile or the replacement contains an illegal back-reference.

    Implementations:

    • regexp_replace(input: string, pattern: string, replacement: string, option:case_sensitivity, option:multiline, option:dotall): -> string
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • ltrim

    Remove any occurrence of the characters from the left side of the string. If no characters are specified, spaces are removed.

    Implementations:

    • ltrim(input: varchar<L1>, characters: varchar<L2>): -> varchar<L1>
    • ltrim(input: string, characters: string): -> string

    rtrim

    Remove any occurrence of the characters from the right side of the string. If no characters are specified, spaces are removed.

    Implementations:

    • rtrim(input: varchar<L1>, characters: varchar<L2>): -> varchar<L1>
    • rtrim(input: string, characters: string): -> string

    trim

    Remove any occurrence of the characters from the left and right sides of the string. If no characters are specified, spaces are removed.

    Implementations:

    • trim(input: varchar<L1>, characters: varchar<L2>): -> varchar<L1>
    • trim(input: string, characters: string): -> string

    lpad

    Left-pad the input string with the string of ‘characters’ until the specified length of the string has been reached. If the input string is longer than ‘length’, remove characters from the right-side to shorten it to ‘length’ characters. If the string of ‘characters’ is longer than the remaining ‘length’ needed to be filled, only pad until ‘length’ has been reached. If ‘characters’ is not specified, the default value is a single space.

    Implementations:

    • lpad(input: varchar<L1>, length: i32, characters: varchar<L2>): -> varchar<L1>
    • lpad(input: string, length: i32, characters: string): -> string

    rpad

    Right-pad the input string with the string of ‘characters’ until the specified length of the string has been reached. If the input string is longer than ‘length’, remove characters from the left-side to shorten it to ‘length’ characters. If the string of ‘characters’ is longer than the remaining ‘length’ needed to be filled, only pad until ‘length’ has been reached. If ‘characters’ is not specified, the default value is a single space.

    Implementations:

    • rpad(input: varchar<L1>, length: i32, characters: varchar<L2>): -> varchar<L1>
    • rpad(input: string, length: i32, characters: string): -> string

    center

    Center the input string by padding the sides with a single character until the specified length of the string has been reached. By default, if the length will be reached with an uneven number of padding, the extra padding will be applied to the right side. The side with extra padding can be controlled with the padding option. Behavior is undefined if the number of characters passed to the character argument is not 1.

    Implementations:

    • center(input: varchar<L1>, length: i32, character: varchar<L1>, option:padding): -> varchar<L1>
    • center(input: string, length: i32, character: string, option:padding): -> string
    Options:
  • padding ['RIGHT', 'LEFT']
  • left

    Extract count characters starting from the left of the string.

    Implementations:

    • left(input: varchar<L1>, count: i32): -> varchar<L1>
    • left(input: string, count: i32): -> string

    Extract count characters starting from the right of the string.

    Implementations:

    • right(input: varchar<L1>, count: i32): -> varchar<L1>
    • right(input: string, count: i32): -> string

    string_split

    Split a string into a list of strings, based on a specified separator character.

    Implementations:

    • string_split(input: varchar<L1>, separator: varchar<L2>): -> List<varchar<L1>>
    • string_split(input: string, separator: string): -> List<string>

    regexp_string_split

    Split a string into a list of strings, based on a regular expression pattern. The substrings matched by the pattern will be used as the separators to split the input string and will not be included in the resulting list. The regular expression pattern should follow the International Components for Unicode implementation (https://unicode-org.github.io/icu/userguide/strings/regexp.html). The case_sensitivity option specifies case-sensitive or case-insensitive matching. Enabling the multiline option will treat the input string as multiple lines. This makes the ^ and $ characters match at the beginning and end of any line, instead of just the beginning and end of the input string. Enabling the dotall option makes the . character match line terminator characters in a string.

    Implementations:

    • regexp_string_split(input: varchar<L1>, pattern: varchar<L2>, option:case_sensitivity, option:multiline, option:dotall): -> List<varchar<L1>>
    • regexp_string_split(input: string, pattern: string, option:case_sensitivity, option:multiline, option:dotall): -> List<string>
    Options:
  • case_sensitivity ['CASE_SENSITIVE', 'CASE_INSENSITIVE', 'CASE_INSENSITIVE_ASCII']
  • dotall ['DOTALL_DISABLED', 'DOTALL_ENABLED']
  • multiline ['MULTILINE_DISABLED', 'MULTILINE_ENABLED']
  • Aggregate Functions

    string_agg

    Concatenates a column of string values with a separator.

    Implementations:

    • string_agg(input: string, separator: string): -> string