Overview

The Chunk Block is used to split a string into an array of strings based on a token count. This is particularly useful for handling large text inputs that exceed token limits in Language Models (LLMs), or for truncating strings to specific token counts.

Inputs

input
string
required

The string to be split into chunks. Non-string inputs will be coerced to strings.

model
string

The AI model to use for tokenization. Only available when “Use Model Input” is enabled in settings.

Outputs

chunks
string[]

An array containing all the chunks after splitting the input string.

first
string

The first chunk from the chunks array. Useful for truncating text from the start.

last
string

The last chunk from the chunks array. Useful for truncating text from the end.

indexes
number[]

A list of sequential numbers starting from 1, one for each chunk. Useful for filtering or zipping with the chunks array.

count
number

The total number of chunks created.

Editor Settings

AI Model
string
default:"gpt-3.5-turbo"

The model to use for tokenizing the text. Different models may tokenize text differently. Can be overridden by the “model” input if “Use Model Input” is enabled.

Use Model Input
boolean
default:false

When enabled, adds a “model” input port that can override the “AI Model” setting.

Number of tokens per chunk
number
default:1024

The target number of tokens for each chunk. The actual chunk sizes may vary slightly to maintain text coherence.

Overlap (in %)
number
default:0

The percentage of overlap between consecutive chunks. For example, with a 50% overlap and 1000 tokens per chunk, each chunk will share approximately 500 tokens with the next chunk. This helps maintain context between chunks.

Example: Chunking a Long Text

  1. Create a Text Block with a long piece of text.
  2. Add a Chunk Block and connect the Text Block to its input.
  3. Configure the desired token count and overlap in the settings.
  4. Run the flow. The text will be split into chunks based on your settings.

Error Handling

The Chunk Block will automatically coerce non-string inputs into strings. No other notable error handling behavior.

FAQ

See Also