Extract Data Block

Overview

The Extract Data Block uses AI capabilities to extract structured data from unstructured text input. By defining a flexible schema, you can specify which fields to extract, their data types, and provide detailed descriptions to guide the AI model in accurately extracting the desired information.

Inputs

systemPrompt

string

The system prompt to send to the model. Optional. Used to provide high-level guidance to the AI model.

context

string

required

The context to extract the specified schema(s) from. Required.

Outputs

all

object[]

An array of objects containing all extracted data, structured according to the defined schema.

[schemaField]

any[]

For each field defined in the schema, an array containing the extracted values for that field. The type will match the schema field type.

duration

number

The time in milliseconds that the block took to execute.

Editor Settings

AI Model

string

default:"gpt-3.5-turbo"

The AI model to use for data extraction. Available models are dynamically populated based on the LLM provider configuration.

Schema

object

The schema defining what data to extract. Specify field names, types, and descriptions to guide the extraction.

Parameters

Temperature

number

default:0

What sampling temperature to use. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic.

Top P

number

default:1

Alternative to temperature sampling. Only tokens comprising the top P probability mass are considered. For example, 0.1 means only tokens in the top 10% probability are considered.

Use Top P

boolean

default:false

Whether to use top p sampling instead of temperature sampling.

Max Tokens

number

default:8192

The maximum number of tokens to generate in the completion.

Stop

string

A sequence where the API will stop generating further tokens.

Presence Penalty

number

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.

Frequency Penalty

number

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

Seed

number

If specified, OpenAI will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

Advanced

Cache Responses

boolean

default:false

If enabled, requests with the same parameters and messages will be cached for immediate responses without an API call.

Use for subgraph partial output

boolean

If enabled, streaming responses from this node will be shown in Subgraph nodes that call this graph.

Error on invalid output(s)

boolean

If enabled, the node will error if any of the results returned do not conform to the schema.

Error Handling

The block will retry failed attempts up to 3 times with exponential backoff:

Minimum retry delay: 500ms
Maximum retry delay: 5000ms
Retry factor: 2.5x
Includes randomization
Maximum retry time: 5 minutes

Error messages will be logged for:

Invalid schema validation
Missing LLM provider configuration
API errors and timeouts
Aborted requests

Always validate the output of the Extract Data block, especially when working with critical data or making important decisions based on the extracted information.

Example: Extracting Contact Information

Add an Extract Data block to your flow.
Connect your input text (e.g., an email or document) to the Context input.

Define a schema in the block settings, for example:

{
  "name": {"type": "string", "description": "Full name of the person"},
  "email": {"type": "string", "description": "Email address"},
  "phone": {"type": "string", "description": "Phone number"},
  "company": {"type": "string", "description": "Company name"}
}

Select your desired provider and model.
Run your flow. The block will output an object with the extracted information.

Block Documentation

AI Blocks

Draft Blocks

Loader Blocks

Logic Blocks

Data Blocks

Modifier Blocks

Advanced Blocks

IO Blocks

Agent Blocks

Overview

Inputs

Outputs

Editor Settings

Parameters

Advanced

Error Handling

Example: Extracting Contact Information

See Also

Block Documentation

AI Blocks

Draft Blocks

Loader Blocks

Logic Blocks

Data Blocks

Modifier Blocks

Advanced Blocks

IO Blocks

Agent Blocks

​Overview

​Inputs

​Outputs

​Editor Settings

​Parameters

​Advanced

​Error Handling

​Example: Extracting Contact Information

​See Also

Overview

Inputs

Outputs

Editor Settings

Parameters

Advanced

Error Handling

Example: Extracting Contact Information

See Also