Polars Integration

This section covers Polars-specific cursors, result sets, and data converters.

Polars Cursors

class pyathena.polars.cursor.PolarsCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, chunksize: int | None = None, **kwargs)[source]

Cursor for handling Polars DataFrame results from Athena queries.

This cursor returns query results as Polars DataFrames using Polars’ native reading capabilities. It does not require PyArrow for basic functionality, but can optionally provide Arrow Table access when PyArrow is installed.

The cursor supports both regular CSV-based results and high-performance UNLOAD operations that return results in Parquet format for improved performance with large datasets.

description

Sequence of column descriptions for the last query.

rowcount

Number of rows affected by the last query (-1 for SELECT queries).

arraysize

Default number of rows to fetch with fetchmany().

Example

>>> from pyathena.polars.cursor import PolarsCursor
>>> cursor = connection.cursor(PolarsCursor)
>>> cursor.execute("SELECT * FROM large_table")
>>> df = cursor.as_polars()  # Returns polars.DataFrame

# Optional: Get Arrow Table (requires pyarrow) >>> table = cursor.as_arrow()

# High-performance UNLOAD for large datasets >>> cursor = connection.cursor(PolarsCursor, unload=True) >>> cursor.execute(“SELECT * FROM huge_table”) >>> df = cursor.as_polars() # Faster Parquet-based result

Note

Requires polars to be installed. PyArrow is optional and only needed for as_arrow() functionality.

__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, chunksize: int | None = None, **kwargs) None[source]

Initialize a PolarsCursor.

Parameters:
  • s3_staging_dir – S3 location for query results.

  • schema_name – Default schema name.

  • catalog_name – Default catalog name.

  • work_group – Athena workgroup name.

  • poll_interval – Query status polling interval in seconds.

  • encryption_option – S3 encryption option (SSE_S3, SSE_KMS, CSE_KMS).

  • kms_key – KMS key ARN for encryption.

  • kill_on_interrupt – Cancel running query on keyboard interrupt.

  • unload – Enable UNLOAD for high-performance Parquet output.

  • result_reuse_enable – Enable Athena query result reuse.

  • result_reuse_minutes – Minutes to reuse cached results.

  • on_start_query_execution – Callback invoked when query starts.

  • block_size – S3 read block size.

  • cache_type – S3 caching strategy.

  • max_workers – Maximum worker threads for parallel S3 operations.

  • chunksize – Number of rows per chunk for memory-efficient processing. If specified, data is loaded lazily in chunks for all data access methods including fetchone(), fetchmany(), and iter_chunks().

  • **kwargs – Additional connection parameters.

Example

>>> cursor = connection.cursor(PolarsCursor, unload=True)
>>> # With chunked processing
>>> cursor = connection.cursor(PolarsCursor, chunksize=50000)
static get_default_converter(unload: bool = False) DefaultPolarsTypeConverter | DefaultPolarsUnloadTypeConverter | Any[source]

Get the default type converter for Polars results.

Parameters:

unload – If True, returns converter for UNLOAD (Parquet) results.

Returns:

Type converter appropriate for the result format.

execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, on_start_query_execution: Callable[[str], None] | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) PolarsCursor[source]

Execute a SQL query and return results as Polars DataFrames.

Executes the SQL query on Amazon Athena and configures the result set for Polars DataFrame output using Polars’ native reading capabilities.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters for parameterized queries.

  • work_group – Athena workgroup to use for this query.

  • s3_staging_dir – S3 location for query results.

  • cache_size – Number of queries to check for result caching.

  • cache_expiration_time – Cache expiration time in seconds.

  • result_reuse_enable – Enable Athena result reuse for this query.

  • result_reuse_minutes – Minutes to reuse cached results.

  • paramstyle – Parameter style (‘qmark’ or ‘pyformat’).

  • on_start_query_execution – Callback called when query starts.

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.

  • **kwargs – Additional execution parameters passed to Polars read functions.

Returns:

Self reference for method chaining.

Example

>>> cursor.execute("SELECT * FROM sales WHERE year = 2023")
>>> df = cursor.as_polars()  # Returns Polars DataFrame
as_polars() pl.DataFrame[source]

Return query results as a Polars DataFrame.

Returns the query results as a Polars DataFrame. This is the primary method for accessing results with PolarsCursor.

Returns:

Polars DataFrame containing all query results.

Raises:

ProgrammingError – If no query has been executed or no results are available.

Example

>>> cursor = connection.cursor(PolarsCursor)
>>> cursor.execute("SELECT * FROM my_table")
>>> df = cursor.as_polars()
>>> print(f"DataFrame has {df.height} rows and {df.width} columns")
>>> filtered = df.filter(pl.col("value") > 100)
as_arrow() Table[source]

Return query results as an Apache Arrow Table.

Converts the Polars DataFrame to an Apache Arrow Table for interoperability with other Arrow-compatible tools and libraries.

Returns:

Apache Arrow Table containing all query results.

Raises:

Example

>>> cursor = connection.cursor(PolarsCursor)
>>> cursor.execute("SELECT * FROM my_table")
>>> table = cursor.as_arrow()
>>> print(f"Table has {table.num_rows} rows and {table.num_columns} columns")
iter_chunks() Iterator[pl.DataFrame][source]

Iterate over result chunks as Polars DataFrames.

This method provides an iterator interface for processing result sets. When chunksize is specified, it yields DataFrames in chunks using lazy evaluation for memory-efficient processing. When chunksize is not specified, it yields the entire result as a single DataFrame, providing a consistent interface regardless of chunking configuration.

Yields:

Polars DataFrame for each chunk of rows, or the entire DataFrame if chunksize was not specified.

Raises:

ProgrammingError – If no result set is available.

Example

>>> # With chunking for large datasets
>>> cursor = connection.cursor(PolarsCursor, chunksize=50000)
>>> cursor.execute("SELECT * FROM large_table")
>>> for chunk in cursor.iter_chunks():
...     process_chunk(chunk)  # Each chunk is a Polars DataFrame
>>>
>>> # Without chunking - yields entire result as single chunk
>>> cursor = connection.cursor(PolarsCursor)
>>> cursor.execute("SELECT * FROM small_table")
>>> for df in cursor.iter_chunks():
...     process(df)  # Single DataFrame with all data
DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
property arraysize: int
cancel() None

Cancel the currently executing query.

Raises:

ProgrammingError – If no query is currently executing.

property catalog: str | None
close() None

Close the cursor and release associated resources.

property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute a SQL query multiple times with different parameters.

Parameters:
  • operation – SQL query string to execute.

  • seq_of_parameters – Sequence of parameter sets, one per execution.

  • **kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]
property expected_bucket_owner: str | None
fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]]

Fetch all remaining rows from the result set.

Returns:

List of tuples representing all remaining rows.

Raises:

ProgrammingError – If no result set is available.

fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]]

Fetch multiple rows from the result set.

Parameters:

size – Maximum number of rows to fetch. Defaults to arraysize.

Returns:

List of tuples representing the fetched rows.

Raises:

ProgrammingError – If no result set is available.

fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None

Fetch the next row of the result set.

Returns:

A tuple representing the next row, or None if no more rows.

Raises:

ProgrammingError – If no result set is available.

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
property has_result_set: bool
property kms_key: str | None
list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property result_set: AthenaResultSet | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:

The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None
class pyathena.polars.async_cursor.AsyncPolarsCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, max_workers: int = 20, arraysize: int = 1000, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, block_size: int | None = None, cache_type: str | None = None, chunksize: int | None = None, **kwargs)[source]

Asynchronous cursor that returns results as Polars DataFrames.

This cursor extends AsyncCursor to provide asynchronous query execution with results returned as Polars DataFrames using Polars’ native reading capabilities. It does not require PyArrow for basic functionality, but can optionally provide Arrow Table access when PyArrow is installed.

Features:
  • Asynchronous query execution with concurrent futures

  • Native Polars CSV and Parquet reading (no PyArrow required)

  • Memory-efficient columnar data processing

  • Support for UNLOAD operations with Parquet output

  • Optional Arrow interoperability when PyArrow is installed

arraysize

Number of rows to fetch per batch (configurable).

Example

>>> from pyathena.polars.async_cursor import AsyncPolarsCursor
>>>
>>> cursor = connection.cursor(AsyncPolarsCursor, unload=True)
>>> query_id, future = cursor.execute("SELECT * FROM large_table")
>>>
>>> # Get result when ready
>>> result_set = future.result()
>>> df = result_set.as_polars()
>>>
>>> # Optional: Convert to Arrow Table if pyarrow is installed
>>> table = result_set.as_arrow()

Note

Requires polars to be installed. PyArrow is optional and only needed for as_arrow() functionality. UNLOAD operations generate Parquet files in S3 for optimal performance.

__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, max_workers: int = 20, arraysize: int = 1000, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, block_size: int | None = None, cache_type: str | None = None, chunksize: int | None = None, **kwargs) None[source]

Initialize an AsyncPolarsCursor.

Parameters:
  • s3_staging_dir – S3 location for query results.

  • schema_name – Default schema name.

  • catalog_name – Default catalog name.

  • work_group – Athena workgroup name.

  • poll_interval – Query status polling interval in seconds.

  • encryption_option – S3 encryption option (SSE_S3, SSE_KMS, CSE_KMS).

  • kms_key – KMS key ARN for encryption.

  • kill_on_interrupt – Cancel running query on keyboard interrupt.

  • max_workers – Maximum number of workers for concurrent execution.

  • arraysize – Number of rows to fetch per batch.

  • unload – Enable UNLOAD for high-performance Parquet output.

  • result_reuse_enable – Enable Athena query result reuse.

  • result_reuse_minutes – Minutes to reuse cached results.

  • block_size – S3 read block size.

  • cache_type – S3 caching strategy.

  • chunksize – Number of rows per chunk for memory-efficient processing. If specified, data is loaded lazily in chunks for all data access methods including fetchone(), fetchmany(), and iter_chunks().

  • **kwargs – Additional connection parameters.

Example

>>> cursor = connection.cursor(AsyncPolarsCursor, unload=True)
>>> # With chunked processing
>>> cursor = connection.cursor(AsyncPolarsCursor, chunksize=50000)
static get_default_converter(unload: bool = False) DefaultPolarsTypeConverter | DefaultPolarsUnloadTypeConverter | Any[source]

Get the default type converter for Polars results.

Parameters:

unload – If True, returns converter for UNLOAD (Parquet) results.

Returns:

Type converter appropriate for the result format.

property arraysize: int

Get the number of rows to fetch per batch.

LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
cancel(query_id: str) Future[None]

Cancel a running query asynchronously.

Submits a cancellation request for the specified query. The cancellation itself runs asynchronously in the background.

Parameters:

query_id – The Athena query execution ID to cancel.

Returns:

Future object that completes when the cancellation request finishes.

Example

>>> query_id, future = cursor.execute("SELECT * FROM huge_table")
>>> # Later, cancel the query
>>> cancel_future = cursor.cancel(query_id)
>>> cancel_future.result()  # Wait for cancellation to complete
close(wait: bool = False) None
property connection: Connection[Any]
description(query_id: str) Future[list[tuple[str, str, None, None, int, int, str]] | None]
execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) tuple[str, Future[AthenaPolarsResultSet | Any]][source]

Execute a SQL query asynchronously and return results as Polars DataFrames.

Executes the SQL query on Amazon Athena asynchronously and returns a future that resolves to a result set for Polars DataFrame output.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters for parameterized queries.

  • work_group – Athena workgroup to use for this query.

  • s3_staging_dir – S3 location for query results.

  • cache_size – Number of queries to check for result caching.

  • cache_expiration_time – Cache expiration time in seconds.

  • result_reuse_enable – Enable Athena result reuse for this query.

  • result_reuse_minutes – Minutes to reuse cached results.

  • paramstyle – Parameter style (‘qmark’ or ‘pyformat’).

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.

  • **kwargs – Additional execution parameters passed to Polars read functions.

Returns:

Tuple of (query_id, future) where future resolves to AthenaPolarsResultSet.

Example

>>> query_id, future = cursor.execute("SELECT * FROM sales")
>>> result_set = future.result()
>>> df = result_set.as_polars()  # Returns Polars DataFrame
executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute multiple queries asynchronously (not supported).

This method is not supported for asynchronous cursors because managing multiple concurrent queries would be complex and resource-intensive.

Parameters:
  • operation – SQL query string.

  • seq_of_parameters – Sequence of parameter sets.

  • **kwargs – Additional arguments.

Raises:

NotSupportedError – Always raised as this operation is not supported.

Note

For bulk operations, consider using execute() with parameterized queries or batch processing patterns instead.

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
poll(query_id: str) Future[AthenaQueryExecution]

Poll for query completion asynchronously.

Waits for the query to complete (succeed, fail, or be cancelled) and returns the final execution status. This method blocks until completion but runs the polling in a background thread.

Parameters:

query_id – The Athena query execution ID to poll.

Returns:

Future object containing the final AthenaQueryExecution status.

Note

This method performs polling internally, so it will take time proportional to your query execution duration.

query_execution(query_id: str) Future[AthenaQueryExecution]

Get query execution details asynchronously.

Retrieves the current execution status and metadata for a query. This is useful for monitoring query progress without blocking.

Parameters:

query_id – The Athena query execution ID.

Returns:

Future object containing AthenaQueryExecution with query details.

setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

Polars Result Set

class pyathena.polars.result_set.AthenaPolarsResultSet(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, unload: bool = False, unload_location: str | None = None, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, chunksize: int | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs)[source]

Result set that provides Polars DataFrame results with optional Arrow interoperability.

This result set handles CSV and Parquet result files from S3, converting them to Polars DataFrames using Polars’ native reading capabilities. It does not require PyArrow for basic functionality, but can optionally provide Arrow Table access when PyArrow is installed.

Features:
  • Native Polars CSV and Parquet reading (no PyArrow required)

  • Efficient columnar data processing with Polars

  • Optional Arrow interoperability when PyArrow is available

  • Support for both CSV and Parquet result formats

  • Chunked iteration for memory-efficient processing of large datasets

  • Optimized memory usage through columnar format

Example

>>> # Used automatically by PolarsCursor
>>> cursor = connection.cursor(PolarsCursor)
>>> cursor.execute("SELECT * FROM large_table")
>>>
>>> # Get Polars DataFrame
>>> df = cursor.as_polars()
>>>
>>> # Work with Polars
>>> print(f"DataFrame has {df.height} rows and {df.width} columns")
>>> filtered = df.filter(pl.col("value") > 100)
>>>
>>> # Optional: Get Arrow Table (requires pyarrow)
>>> table = cursor.as_arrow()
>>>
>>> # Memory-efficient chunked iteration
>>> cursor = connection.cursor(PolarsCursor, chunksize=50000)
>>> cursor.execute("SELECT * FROM huge_table")
>>> for chunk in cursor.iter_chunks():
...     process_chunk(chunk)

Note

This class is used internally by PolarsCursor and typically not instantiated directly by users. Requires polars to be installed. PyArrow is optional and only needed for as_arrow() functionality.

__init__(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, unload: bool = False, unload_location: str | None = None, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, chunksize: int | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) None[source]

Initialize the Polars result set.

Parameters:
  • connection – The Athena connection object.

  • converter – Type converter for Athena data types.

  • query_execution – Query execution metadata.

  • arraysize – Number of rows to fetch per batch.

  • retry_config – Configuration for retry behavior.

  • unload – Whether this is an UNLOAD query result.

  • unload_location – S3 location for UNLOAD results.

  • block_size – Block size for S3 file reading.

  • cache_type – Cache type for S3 file system.

  • max_workers – Maximum number of worker threads.

  • chunksize – Number of rows per chunk for memory-efficient processing. If specified, data is loaded lazily in chunks for all data access methods including fetchone(), fetchmany(), and iter_chunks().

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion.

  • **kwargs – Additional arguments passed to Polars read functions.

property dtypes: dict[str, Any]

Get Polars-compatible data types for result columns.

property converters: dict[str, Callable[[str | None], Any | None]]

Get converter functions for each column.

Returns:

Dictionary mapping column names to their converter functions.

fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None[source]

Fetch the next row of the query result.

Returns:

A single row as a tuple, or None if no more rows are available.

fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch the next set of rows of the query result.

Parameters:

size – Number of rows to fetch. Defaults to arraysize.

Returns:

A list of rows as tuples.

fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch all remaining rows of the query result.

Returns:

A list of all remaining rows as tuples.

as_polars() pl.DataFrame[source]

Return query results as a Polars DataFrame.

Returns the query results as a Polars DataFrame. This is the primary method for accessing results with PolarsCursor.

Note

When chunksize is set, calling this method will collect all chunks into a single DataFrame, loading all data into memory. Use iter_chunks() for memory-efficient processing of large datasets.

Returns:

Polars DataFrame containing all query results.

Example

>>> cursor = connection.cursor(PolarsCursor)
>>> cursor.execute("SELECT * FROM my_table")
>>> df = cursor.as_polars()
>>> print(f"DataFrame has {df.height} rows")
>>> filtered = df.filter(pl.col("value") > 100)
as_arrow() Table[source]

Return query results as an Apache Arrow Table.

Converts the Polars DataFrame to an Apache Arrow Table for interoperability with other Arrow-compatible tools and libraries.

Returns:

Apache Arrow Table containing all query results.

Raises:

ImportError – If pyarrow is not installed.

Example

>>> cursor = connection.cursor(PolarsCursor)
>>> cursor.execute("SELECT * FROM my_table")
>>> table = cursor.as_arrow()
>>> # Use with other Arrow-compatible libraries
DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
property arraysize: int
property catalog: str | None
property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
property execution_parameters: list[str]
property expected_bucket_owner: str | None
property is_closed: bool
property is_unload: bool

Check if the query is an UNLOAD statement.

Returns:

True if the query is an UNLOAD statement, False otherwise.

property kms_key: str | None
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int
property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None
iter_chunks() PolarsDataFrameIterator[source]

Iterate over result chunks as Polars DataFrames.

This method provides an iterator interface for processing large result sets. When chunksize is specified, it yields DataFrames in chunks using lazy evaluation for memory-efficient processing. When chunksize is not specified, it yields the entire result as a single DataFrame.

Returns:

PolarsDataFrameIterator that yields Polars DataFrames for each chunk of rows, or the entire DataFrame if chunksize was not specified.

Example

>>> # With chunking for large datasets
>>> cursor = connection.cursor(PolarsCursor, chunksize=50000)
>>> cursor.execute("SELECT * FROM large_table")
>>> for chunk in cursor.iter_chunks():
...     process_chunk(chunk)  # Each chunk is a Polars DataFrame
>>>
>>> # Without chunking - yields entire result as single chunk
>>> cursor = connection.cursor(PolarsCursor)
>>> cursor.execute("SELECT * FROM small_table")
>>> for df in cursor.iter_chunks():
...     process(df)  # Single DataFrame with all data
close() None[source]

Close the result set and release resources.

class pyathena.polars.result_set.PolarsDataFrameIterator(reader: Iterator[pl.DataFrame] | pl.DataFrame, converters: dict[str, Callable[[str | None], Any | None]], column_names: list[str])[source]

Iterator for chunked DataFrame results from Athena queries.

This class wraps either a Polars DataFrame iterator (for chunked reading) or a single DataFrame, providing a unified iterator interface. It applies optional type conversion to each DataFrame chunk as it’s yielded.

The iterator is used by AthenaPolarsResultSet to provide chunked access to large query results, enabling memory-efficient processing of datasets that would be too large to load entirely into memory.

Example

>>> # Iterate over DataFrame chunks
>>> for df_chunk in iterator:
...     process(df_chunk)
>>>
>>> # Iterate over individual rows
>>> for idx, row in iterator.iterrows():
...     print(row)

Note

This class is primarily for internal use by AthenaPolarsResultSet. Most users should access results through PolarsCursor methods.

__init__(reader: Iterator[pl.DataFrame] | pl.DataFrame, converters: dict[str, Callable[[str | None], Any | None]], column_names: list[str]) None[source]

Initialize the iterator.

Parameters:
  • reader – Either a DataFrame iterator (for chunked) or a single DataFrame.

  • converters – Dictionary mapping column names to converter functions.

  • column_names – List of column names in order.

__next__() pl.DataFrame[source]

Get the next DataFrame chunk.

Returns:

The next Polars DataFrame chunk.

Raises:

StopIteration – When no more chunks are available.

__iter__() PolarsDataFrameIterator[source]

Return self as iterator.

__enter__() PolarsDataFrameIterator[source]

Context manager entry.

__exit__(exc_type, exc_value, traceback) None[source]

Context manager exit.

close() None[source]

Close the iterator and release resources.

iterrows() Iterator[tuple[int, dict[str, Any]]][source]

Iterate over rows as (index, row_dict) tuples.

Yields:

Tuple of (row_index, row_dict) for each row across all chunks.

as_polars() pl.DataFrame[source]

Collect all chunks into a single DataFrame.

Returns:

Single Polars DataFrame containing all data.

Polars Data Converters

class pyathena.polars.converter.DefaultPolarsTypeConverter[source]

Optimized type converter for Polars DataFrame results.

This converter is specifically designed for the PolarsCursor and provides optimized type conversion for Polars DataFrames.

The converter focuses on:
  • Converting date/time types to appropriate Python objects

  • Handling decimal and binary types

  • Preserving JSON and complex types

  • Maintaining high performance for columnar operations

Example

>>> from pyathena.polars.converter import DefaultPolarsTypeConverter
>>> converter = DefaultPolarsTypeConverter()
>>>
>>> # Used automatically by PolarsCursor
>>> cursor = connection.cursor(PolarsCursor)
>>> # converter is applied automatically to results

Note

This converter is used by default in PolarsCursor. Most users don’t need to instantiate it directly.

__init__() None[source]
get_dtype(type_: str, precision: int = 0, scale: int = 0) Any[source]

Get the Polars data type for a given Athena type.

Parameters:
  • type – The Athena data type name.

  • precision – The precision for decimal types.

  • scale – The scale for decimal types.

Returns:

The Polars data type.

convert(type_: str, value: str | None, type_hint: str | None = None) Any | None[source]
class pyathena.polars.converter.DefaultPolarsUnloadTypeConverter[source]

Type converter for Polars UNLOAD operations.

This converter is designed for use with UNLOAD queries that write results directly to Parquet files in S3. Since UNLOAD operations bypass the normal conversion process and write data in native Parquet format, this converter has minimal functionality.

Note

Used automatically when PolarsCursor is configured with unload=True. UNLOAD results are read directly as Polars DataFrames from Parquet files.

__init__() None[source]
convert(type_: str, value: str | None, type_hint: str | None = None) Any | None[source]

Polars Utilities

pyathena.polars.util.to_column_info(schema: pl.Schema) tuple[dict[str, Any], ...][source]

Convert a Polars schema to Athena column information.

Iterates through all fields in the schema and converts each field’s type information to an Athena-compatible column metadata dictionary.

Parameters:

schema – A Polars Schema object containing field definitions.

Returns:

  • Name: The column name

  • Type: The Athena SQL type name

  • Precision: Numeric precision (0 for non-numeric types)

  • Scale: Numeric scale (0 for non-numeric types)

  • Nullable: Always “NULLABLE” for Polars types

Return type:

A tuple of dictionaries, each containing column metadata with keys

pyathena.polars.util.get_athena_type(dtype: Any) tuple[str, int, int][source]

Map a Polars data type to an Athena SQL type.

Converts Polars type identifiers to corresponding Athena SQL type names with appropriate precision and scale values. Handles all common Polars types including numeric, string, binary, temporal, and complex types.

Parameters:

dtype – A Polars DataType object to convert.

Returns:

  • type_name: The Athena SQL type (e.g., “varchar”, “bigint”, “timestamp”)

  • precision: The numeric precision or max length

  • scale: The numeric scale (decimal places)

Return type:

A tuple of (type_name, precision, scale) where

Note

Unknown types default to “string” with maximum varchar length. Decimal types preserve their original precision and scale.