The Chunk Block is used to split a string into an array of strings based on a token count. This is particularly useful for handling large text inputs that exceed token limits in Language Models (LLMs), or for truncating strings to specific token counts.
The model to use for tokenizing the text. Different models may tokenize text differently. Can be overridden by the “model” input if “Use Model Input” is enabled.
The percentage of overlap between consecutive chunks. For example, with a 50% overlap and 1000 tokens per chunk, each chunk will share approximately 500 tokens with the next chunk. This helps maintain context between chunks.
Chunking is useful to avoid hitting token count limits in LLMs. You can split a long string into multiple chunks, process each chunk separately, and then combine the results.
How does the overlap feature work?
The overlap percentage determines how much text is shared between consecutive chunks. For example, with 1000 tokens per chunk and 50% overlap, each chunk will share approximately 500 tokens with the next chunk. This helps maintain context and coherence between chunks.