Read the contents of a file as text or markdown
The Read Text Block is used to read and extract text content from PDF and DOCX files, outputting it as either plain text or markdown. This block is particularly useful for processing text-based documents within your workflow.
The input file to be read. Must be a PDF or DOCX file with a valid media type. This should be a file data value, typically provided by a File block or another block that outputs file data.
The extracted text content from the file, formatted as either plain text or markdown based on the Output Text Format setting.
Determines how the file content is interpreted and output. Options are:
file
input.Only PDF and DOCX files are currently supported. Ensure your input files are in one of these formats and have valid media types.
What file types are supported?
The Read Text block currently supports:
How does the Markdown output differ from Text?
When set to Markdown output, the block will preserve formatting and structure from the original document in markdown syntax. Text output provides raw text content without formatting.
How does the block handle large files?
The block includes retry logic for processing large files, with configurable timeout and retry settings. For extremely large files, consider potential memory usage and processing time implications.
Read the contents of a file as text or markdown
The Read Text Block is used to read and extract text content from PDF and DOCX files, outputting it as either plain text or markdown. This block is particularly useful for processing text-based documents within your workflow.
The input file to be read. Must be a PDF or DOCX file with a valid media type. This should be a file data value, typically provided by a File block or another block that outputs file data.
The extracted text content from the file, formatted as either plain text or markdown based on the Output Text Format setting.
Determines how the file content is interpreted and output. Options are:
file
input.Only PDF and DOCX files are currently supported. Ensure your input files are in one of these formats and have valid media types.
What file types are supported?
The Read Text block currently supports:
How does the Markdown output differ from Text?
When set to Markdown output, the block will preserve formatting and structure from the original document in markdown syntax. Text output provides raw text content without formatting.
How does the block handle large files?
The block includes retry logic for processing large files, with configurable timeout and retry settings. For extremely large files, consider potential memory usage and processing time implications.