Overview

The Extract Data Block uses AI capabilities to extract structured data from unstructured text input. By defining a flexible schema, you can specify which fields to extract, their data types, and provide detailed descriptions to guide the AI model in accurately extracting the desired information.

Inputs

systemPrompt
string

The system prompt to send to the model. Optional. Used to provide high-level guidance to the AI model.

context
string
required

The context to extract the specified schema(s) from. Required.

Outputs

all
object[]

An array of objects containing all extracted data, structured according to the defined schema.

[schemaField]
any[]

For each field defined in the schema, an array containing the extracted values for that field. The type will match the schema field type.

duration
number

The time in milliseconds that the block took to execute.

Editor Settings

AI Model
string
default:"gpt-3.5-turbo"

The AI model to use for data extraction. Available models are dynamically populated based on the LLM provider configuration.

Schema
object

The schema defining what data to extract. Specify field names, types, and descriptions to guide the extraction.

Parameters

Temperature
number
default:0

What sampling temperature to use. Higher values like 0.8 make output more random, while lower values like 0.2 make it more focused and deterministic.

Top P
number
default:1

Alternative to temperature sampling. Only tokens comprising the top P probability mass are considered. For example, 0.1 means only tokens in the top 10% probability are considered.

Use Top P
boolean
default:false

Whether to use top p sampling instead of temperature sampling.

Max Tokens
number
default:8192

The maximum number of tokens to generate in the completion.

Stop
string

A sequence where the API will stop generating further tokens.

Presence Penalty
number

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.

Frequency Penalty
number

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

Seed
number

If specified, OpenAI will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

Advanced

Cache Responses
boolean
default:false

If enabled, requests with the same parameters and messages will be cached for immediate responses without an API call.

Use for subgraph partial output
boolean

If enabled, streaming responses from this node will be shown in Subgraph nodes that call this graph.

Error on invalid output(s)
boolean

If enabled, the node will error if any of the results returned do not conform to the schema.

Error Handling

The block will retry failed attempts up to 3 times with exponential backoff:

  • Minimum retry delay: 500ms
  • Maximum retry delay: 5000ms
  • Retry factor: 2.5x
  • Includes randomization
  • Maximum retry time: 5 minutes

Error messages will be logged for:

  • Invalid schema validation
  • Missing LLM provider configuration
  • API errors and timeouts
  • Aborted requests

Always validate the output of the Extract Data block, especially when working with critical data or making important decisions based on the extracted information.

Example: Extracting Contact Information

  1. Add an Extract Data block to your flow.
  2. Connect your input text (e.g., an email or document) to the Context input.
  3. Define a schema in the block settings, for example:
    {
      "name": {"type": "string", "description": "Full name of the person"},
      "email": {"type": "string", "description": "Email address"},
      "phone": {"type": "string", "description": "Phone number"},
      "company": {"type": "string", "description": "Company name"}
    }
    
  4. Select your desired provider and model.
  5. Run your flow. The block will output an object with the extracted information.

See Also