Apache Arrow Integration

This section covers Apache Arrow-specific cursors, result sets, and data converters.

Arrow Cursors

class pyathena.arrow.cursor.ArrowCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, connect_timeout: float | None = None, request_timeout: float | None = None, **kwargs)[source]

Cursor for handling Apache Arrow Table results from Athena queries.

This cursor returns query results as Apache Arrow Tables, which provide efficient columnar data processing and memory usage. Arrow Tables are especially useful for analytical workloads and data science applications.

The cursor supports both regular CSV-based results and high-performance UNLOAD operations that return results in Parquet format for improved performance with large datasets.

description

Sequence of column descriptions for the last query.

rowcount

Number of rows affected by the last query (-1 for SELECT queries).

arraysize

Default number of rows to fetch with fetchmany().

Example

>>> from pyathena.arrow.cursor import ArrowCursor
>>> cursor = connection.cursor(ArrowCursor)
>>> cursor.execute("SELECT * FROM large_table")
>>> table = cursor.fetchall()  # Returns pyarrow.Table
>>> df = table.to_pandas()  # Convert to pandas if needed

# High-performance UNLOAD for large datasets >>> cursor = connection.cursor(ArrowCursor, unload=True) >>> cursor.execute(“SELECT * FROM huge_table”) >>> table = cursor.fetchall() # Faster Parquet-based result

__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, connect_timeout: float | None = None, request_timeout: float | None = None, **kwargs) None[source]

Initialize an ArrowCursor.

Parameters:
  • s3_staging_dir – S3 location for query results.

  • schema_name – Default schema name.

  • catalog_name – Default catalog name.

  • work_group – Athena workgroup name.

  • poll_interval – Query status polling interval in seconds.

  • encryption_option – S3 encryption option (SSE_S3, SSE_KMS, CSE_KMS).

  • kms_key – KMS key ARN for encryption.

  • kill_on_interrupt – Cancel running query on keyboard interrupt.

  • unload – Enable UNLOAD for high-performance Parquet output.

  • result_reuse_enable – Enable Athena query result reuse.

  • result_reuse_minutes – Minutes to reuse cached results.

  • on_start_query_execution – Callback invoked when query starts.

  • connect_timeout – Socket connection timeout in seconds for S3 operations. Defaults to AWS SDK default (typically 1 second) if not specified.

  • request_timeout – Request timeout in seconds for S3 operations. Defaults to AWS SDK default (typically 3 seconds) if not specified. Increase this value if you experience timeout errors when using role assumption with STS or have high latency to S3.

  • **kwargs – Additional connection parameters.

Example

>>> # Use higher timeouts for role assumption scenarios
>>> cursor = connection.cursor(
...     ArrowCursor,
...     connect_timeout=10,
...     request_timeout=30
... )
static get_default_converter(unload: bool = False) DefaultArrowTypeConverter | DefaultArrowUnloadTypeConverter | Any[source]

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, on_start_query_execution: Callable[[str], None] | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) ArrowCursor[source]

Execute a SQL query and return results as Apache Arrow Tables.

Executes the SQL query on Amazon Athena and configures the result set for Apache Arrow Table output. Arrow format provides high-performance columnar data processing with efficient memory usage.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters for parameterized queries.

  • work_group – Athena workgroup to use for this query.

  • s3_staging_dir – S3 location for query results.

  • cache_size – Number of queries to check for result caching.

  • cache_expiration_time – Cache expiration time in seconds.

  • result_reuse_enable – Enable Athena result reuse for this query.

  • result_reuse_minutes – Minutes to reuse cached results.

  • paramstyle – Parameter style (‘qmark’ or ‘pyformat’).

  • on_start_query_execution – Callback called when query starts.

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.

  • **kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

Example

>>> cursor.execute("SELECT * FROM sales WHERE year = 2023")
>>> table = cursor.as_arrow()  # Returns Apache Arrow Table
as_arrow() Table[source]

Return query results as an Apache Arrow Table.

Converts the entire result set into an Apache Arrow Table for efficient columnar data processing. Arrow Tables provide excellent performance for analytical workloads and interoperability with other data processing frameworks.

Returns:

Apache Arrow Table containing all query results.

Raises:

ProgrammingError – If no query has been executed or no results are available.

Example

>>> cursor = connection.cursor(ArrowCursor)
>>> cursor.execute("SELECT * FROM my_table")
>>> table = cursor.as_arrow()
>>> print(f"Table has {table.num_rows} rows and {table.num_columns} columns")
as_polars() pl.DataFrame[source]

Return query results as a Polars DataFrame.

Converts the Apache Arrow Table to a Polars DataFrame for interoperability with the Polars data processing library.

Returns:

Polars DataFrame containing all query results.

Raises:

Example

>>> cursor = connection.cursor(ArrowCursor)
>>> cursor.execute("SELECT * FROM my_table")
>>> df = cursor.as_polars()
>>> print(f"DataFrame has {df.height} rows and {df.width} columns")
DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
property arraysize: int
cancel() None

Cancel the currently executing query.

Raises:

ProgrammingError – If no query is currently executing.

property catalog: str | None
close() None

Close the cursor and release associated resources.

property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute a SQL query multiple times with different parameters.

Parameters:
  • operation – SQL query string to execute.

  • seq_of_parameters – Sequence of parameter sets, one per execution.

  • **kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]
property expected_bucket_owner: str | None
fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]]

Fetch all remaining rows from the result set.

Returns:

List of tuples representing all remaining rows.

Raises:

ProgrammingError – If no result set is available.

fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]]

Fetch multiple rows from the result set.

Parameters:

size – Maximum number of rows to fetch. Defaults to arraysize.

Returns:

List of tuples representing the fetched rows.

Raises:

ProgrammingError – If no result set is available.

fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None

Fetch the next row of the result set.

Returns:

A tuple representing the next row, or None if no more rows.

Raises:

ProgrammingError – If no result set is available.

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
property has_result_set: bool
property kms_key: str | None
list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property result_set: AthenaResultSet | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:

The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None
class pyathena.arrow.async_cursor.AsyncArrowCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, max_workers: int = 20, arraysize: int = 1000, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, connect_timeout: float | None = None, request_timeout: float | None = None, **kwargs)[source]

Asynchronous cursor that returns results in Apache Arrow format.

This cursor extends AsyncCursor to provide asynchronous query execution with results returned as Apache Arrow Tables or RecordBatches. It’s optimized for high-performance analytics workloads and interoperability with the Apache Arrow ecosystem.

Features:
  • Asynchronous query execution with concurrent futures

  • Apache Arrow columnar data format for high performance

  • Memory-efficient processing of large datasets

  • Support for UNLOAD operations with Parquet output

  • Integration with pandas, Polars, and other Arrow-compatible libraries

arraysize

Number of rows to fetch per batch (configurable).

Example

>>> from pyathena.arrow.async_cursor import AsyncArrowCursor
>>>
>>> cursor = connection.cursor(AsyncArrowCursor, unload=True)
>>> query_id, future = cursor.execute("SELECT * FROM large_table")
>>>
>>> # Get result when ready
>>> result_set = future.result()
>>> arrow_table = result_set.as_arrow()
>>>
>>> # Convert to pandas if needed
>>> df = arrow_table.to_pandas()
>>>
>>> # Convert to Polars if needed (requires polars)
>>> polars_df = result_set.as_polars()

Note

Requires pyarrow to be installed. UNLOAD operations generate Parquet files in S3 for optimal Arrow compatibility. For Polars interoperability, polars must be installed separately.

__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, max_workers: int = 20, arraysize: int = 1000, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, connect_timeout: float | None = None, request_timeout: float | None = None, **kwargs) None[source]

Initialize an AsyncArrowCursor.

Parameters:
  • s3_staging_dir – S3 location for query results.

  • schema_name – Default schema name.

  • catalog_name – Default catalog name.

  • work_group – Athena workgroup name.

  • poll_interval – Query status polling interval in seconds.

  • encryption_option – S3 encryption option (SSE_S3, SSE_KMS, CSE_KMS).

  • kms_key – KMS key ARN for encryption.

  • kill_on_interrupt – Cancel running query on keyboard interrupt.

  • max_workers – Maximum number of workers for concurrent execution.

  • arraysize – Number of rows to fetch per batch.

  • unload – Enable UNLOAD for high-performance Parquet output.

  • result_reuse_enable – Enable Athena query result reuse.

  • result_reuse_minutes – Minutes to reuse cached results.

  • connect_timeout – Socket connection timeout in seconds for S3 operations. Defaults to AWS SDK default (typically 1 second) if not specified.

  • request_timeout – Request timeout in seconds for S3 operations. Defaults to AWS SDK default (typically 3 seconds) if not specified. Increase this value if you experience timeout errors when using role assumption with STS or have high latency to S3.

  • **kwargs – Additional connection parameters.

Example

>>> # Use higher timeouts for role assumption scenarios
>>> cursor = connection.cursor(
...     AsyncArrowCursor,
...     connect_timeout=10.0,
...     request_timeout=30.0
... )
static get_default_converter(unload: bool = False) DefaultArrowTypeConverter | DefaultArrowUnloadTypeConverter | Any[source]

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

property arraysize: int
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
cancel(query_id: str) Future[None]

Cancel a running query asynchronously.

Submits a cancellation request for the specified query. The cancellation itself runs asynchronously in the background.

Parameters:

query_id – The Athena query execution ID to cancel.

Returns:

Future object that completes when the cancellation request finishes.

Example

>>> query_id, future = cursor.execute("SELECT * FROM huge_table")
>>> # Later, cancel the query
>>> cancel_future = cursor.cancel(query_id)
>>> cancel_future.result()  # Wait for cancellation to complete
close(wait: bool = False) None
property connection: Connection[Any]
description(query_id: str) Future[list[tuple[str, str, None, None, int, int, str]] | None]
execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) tuple[str, Future[AthenaArrowResultSet | Any]][source]

Execute a SQL query asynchronously.

Starts query execution on Amazon Athena and returns immediately without waiting for completion. The query runs in the background while your application can continue with other work.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters (optional).

  • work_group – Athena workgroup to use (optional).

  • s3_staging_dir – S3 location for query results (optional).

  • cache_size – Query result cache size in MB (optional).

  • cache_expiration_time – Cache expiration time in seconds (optional).

  • result_reuse_enable – Enable result reuse for identical queries (optional).

  • result_reuse_minutes – Result reuse duration in minutes (optional).

  • paramstyle – Parameter style to use (optional).

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.

  • **kwargs – Additional execution parameters.

Returns:

  • query_id: Athena query execution ID for tracking

  • future: Future object for result retrieval

Return type:

Tuple of (query_id, future) where

Example

>>> query_id, future = cursor.execute("SELECT * FROM large_table")
>>> print(f"Query started: {query_id}")
>>> # Do other work while query runs...
>>> result_set = future.result()  # Wait for completion
executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute multiple queries asynchronously (not supported).

This method is not supported for asynchronous cursors because managing multiple concurrent queries would be complex and resource-intensive.

Parameters:
  • operation – SQL query string.

  • seq_of_parameters – Sequence of parameter sets.

  • **kwargs – Additional arguments.

Raises:

NotSupportedError – Always raised as this operation is not supported.

Note

For bulk operations, consider using execute() with parameterized queries or batch processing patterns instead.

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
poll(query_id: str) Future[AthenaQueryExecution]

Poll for query completion asynchronously.

Waits for the query to complete (succeed, fail, or be cancelled) and returns the final execution status. This method blocks until completion but runs the polling in a background thread.

Parameters:

query_id – The Athena query execution ID to poll.

Returns:

Future object containing the final AthenaQueryExecution status.

Note

This method performs polling internally, so it will take time proportional to your query execution duration.

query_execution(query_id: str) Future[AthenaQueryExecution]

Get query execution details asynchronously.

Retrieves the current execution status and metadata for a query. This is useful for monitoring query progress without blocking.

Parameters:

query_id – The Athena query execution ID.

Returns:

Future object containing AthenaQueryExecution with query details.

setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

Arrow Result Set

class pyathena.arrow.result_set.AthenaArrowResultSet(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, block_size: int | None = None, unload: bool = False, unload_location: str | None = None, connect_timeout: float | None = None, request_timeout: float | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs)[source]

Result set that provides Apache Arrow Table results with columnar optimization.

This result set handles CSV and Parquet result files from S3, converting them to Apache Arrow Tables which provide efficient columnar data processing and memory usage. It’s optimized for analytical workloads and large dataset operations.

Features:
  • Efficient columnar data processing with Apache Arrow

  • Support for both CSV and Parquet result formats

  • Optimized memory usage for large datasets

  • Advanced timestamp parsing with multiple format support

  • Zero-copy operations where possible

DEFAULT_BLOCK_SIZE

Default block size for Arrow operations (128MB).

Example

>>> # Used automatically by ArrowCursor
>>> cursor = connection.cursor(ArrowCursor)
>>> cursor.execute("SELECT * FROM large_table")
>>>
>>> # Get Arrow Table
>>> table = cursor.fetchall()
>>>
>>> # Convert to pandas if needed
>>> df = table.to_pandas()
>>>
>>> # Or work with Arrow directly
>>> print(f"Table has {table.num_rows} rows and {table.num_columns} columns")

Note

This class is used internally by ArrowCursor and typically not instantiated directly by users. Requires pyarrow to be installed.

DEFAULT_BLOCK_SIZE = 134217728
__init__(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, block_size: int | None = None, unload: bool = False, unload_location: str | None = None, connect_timeout: float | None = None, request_timeout: float | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) None[source]
property timestamp_parsers: list[str]
property column_types: dict[str, type[Any]]
property converters: dict[str, Callable[[str | None], Any | None]]
fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None[source]
fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]][source]
fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]][source]
as_arrow() Table[source]
DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
property arraysize: int
as_polars() pl.DataFrame[source]

Return query results as a Polars DataFrame.

Converts the Apache Arrow Table to a Polars DataFrame for interoperability with the Polars data processing library.

Returns:

Polars DataFrame containing all query results.

Raises:

ImportError – If polars is not installed.

Example

>>> cursor = connection.cursor(ArrowCursor)
>>> cursor.execute("SELECT * FROM my_table")
>>> df = cursor.as_polars()
>>> # Use with Polars operations
property catalog: str | None
property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
property execution_parameters: list[str]
property expected_bucket_owner: str | None
property is_closed: bool
property is_unload: bool

Check if the query is an UNLOAD statement.

Returns:

True if the query is an UNLOAD statement, False otherwise.

property kms_key: str | None
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int
property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None
close() None[source]

Arrow Data Converters

class pyathena.arrow.converter.DefaultArrowTypeConverter[source]

Optimized type converter for Apache Arrow Table results.

This converter is specifically designed for the ArrowCursor and provides optimized type conversion for Apache Arrow’s columnar data format. It converts Athena data types to Python types that are efficiently handled by Apache Arrow.

The converter focuses on:
  • Converting date/time types to appropriate Python objects

  • Handling decimal and binary types for Arrow compatibility

  • Preserving JSON and complex types

  • Maintaining high performance for columnar operations

Example

>>> from pyathena.arrow.converter import DefaultArrowTypeConverter
>>> converter = DefaultArrowTypeConverter()
>>>
>>> # Used automatically by ArrowCursor
>>> cursor = connection.cursor(ArrowCursor)
>>> # converter is applied automatically to results

Note

This converter is used by default in ArrowCursor. Most users don’t need to instantiate it directly.

__init__() None[source]
convert(type_: str, value: str | None, type_hint: str | None = None) Any | None[source]
class pyathena.arrow.converter.DefaultArrowUnloadTypeConverter[source]

Type converter for Arrow UNLOAD operations.

This converter is designed for use with UNLOAD queries that write results directly to Parquet files in S3. Since UNLOAD operations bypass the normal conversion process and write data in native Parquet format, this converter has minimal functionality.

Note

Used automatically when ArrowCursor is configured with unload=True. UNLOAD results are read directly as Arrow tables from Parquet files.

__init__() None[source]
convert(type_: str, value: str | None, type_hint: str | None = None) Any | None[source]

Arrow Utilities

pyathena.arrow.util.to_column_info(schema: Schema) tuple[dict[str, Any], ...][source]

Convert a PyArrow schema to Athena column information.

Iterates through all fields in the schema and converts each field’s type information to an Athena-compatible column metadata dictionary.

Parameters:

schema – A PyArrow Schema object containing field definitions.

Returns:

  • Name: The column name

  • Type: The Athena SQL type name

  • Precision: Numeric precision (0 for non-numeric types)

  • Scale: Numeric scale (0 for non-numeric types)

  • Nullable: Either “NULLABLE” or “NOT_NULL”

Return type:

A tuple of dictionaries, each containing column metadata with keys

pyathena.arrow.util.get_athena_type(type_: DataType) tuple[str, int, int][source]

Map a PyArrow data type to an Athena SQL type.

Converts PyArrow type identifiers to corresponding Athena SQL type names with appropriate precision and scale values. Handles all common Arrow types including numeric, string, binary, temporal, and complex types.

Parameters:

type – A PyArrow DataType object to convert.

Returns:

  • type_name: The Athena SQL type (e.g., “varchar”, “bigint”, “timestamp”)

  • precision: The numeric precision or max length

  • scale: The numeric scale (decimal places)

Return type:

A tuple of (type_name, precision, scale) where

Note

Unknown types default to “string” with maximum varchar length. Decimal types preserve their original precision and scale.