Overview

The Ask AI Block allows you to send messages to over 250 different AI models for chat completions. This versatile block provides a wide range of options for AI-powered conversations and text generation tasks.

Inputs

systemPrompt
string | chat-message

The system prompt to send to the model. Optional. Used to provide high-level guidance to the AI model.

prompt
string | string[] | chat-message | chat-message[]

The prompt message or messages to send to the model. Required. Strings will be converted into chat messages of type user, with no name.

model
string

The model to use for the chat. Only available when “Use Model Input” is enabled.

temperature
number

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Only available when “Use Temperature Input” is enabled.

top_p
number

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Only available when “Use Top P Input” is enabled.

useTopP
boolean

Whether to use top p sampling, or temperature sampling. Only available when “Use Top P Input” is enabled.

maxTokens
number

The maximum number of tokens to generate in the chat completion. Only available when “Use Max Tokens Input” is enabled.

stop
string

A sequence where the API will stop generating further tokens. Only available when “Use Stop Input” is enabled.

presencePenalty
number

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics. Only available when “Use Presence Penalty Input” is enabled.

frequencyPenalty
number

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim. Only available when “Use Frequency Penalty Input” is enabled.

seed
number

If specified, OpenAI will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Only available when “Use Seed Input” is enabled.

Outputs

response
string

The textual response from the model.

in-messages
chat-message[]

All messages sent to the model.

all-messages
chat-message[]

All messages, with the response appended.

responseTokens
number

The number of tokens in the response from the LLM. For a multi-response, this is the sum.

cost
number

The estimated cost of the API call in USD.

duration
number

The time taken to complete the request in milliseconds.

Editor Settings

Model
string

The AI model to use for responses. Choose from over 250 available models across various providers.

Use Prompt Input
boolean
default:true

Whether to use the prompt input, or input a prompt directly in the settings.

Temperature
number
default:0.7

Controls randomness in the output. Lower values make the output more focused and deterministic.

Top P
number
default:1

Alternative to temperature sampling. Only tokens comprising the top P probability mass are considered.

Use Top P
boolean
default:false

Whether to use top p sampling instead of temperature sampling.

Max Tokens
number
default:8192

The maximum number of tokens to generate in the completion.

Stop
string

A sequence where the API will stop generating further tokens.

Presence Penalty
number

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics.

Frequency Penalty
number

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

Seed
number

If specified, OpenAI will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

Advanced

Custom Max Tokens
number

Overrides the max number of tokens a model can support. Leave blank for preconfigured token limits.

Cache Responses
boolean
default:false

If enabled, requests with the same parameters and messages will be cached for immediate responses without an API call.

Use for subgraph partial output
boolean

If enabled, streaming responses from this node will be shown in Subgraph nodes that call this graph.

Example: Simple Question Answering

  1. Add an Ask AI block to your flow.
  2. Add a Text block and enter your question in its editor.
  3. Connect the output of the Text block to the Prompt input of the Ask AI block.
  4. Select your desired model in the Ask AI block settings.
  5. Run your flow. The AI’s response will appear at the bottom of the Ask AI block.

Error Handling

The block will retry failed attempts up to 3 times with exponential backoff:

  • Minimum retry delay: 500ms
  • Maximum retry delay: 5000ms
  • Retry factor: 2.5x
  • Includes randomization
  • Maximum retry time: 5 minutes

Error messages will be logged for:

  • Missing prompt input
  • API rate limits (will retry)
  • API timeouts (will retry)
  • Token limit exceeded
  • Invalid model configuration
  • Other API errors

Be mindful of rate limits when using the Ask AI block, especially when batching requests.

FAQ

See Also