Polars Integration¶
This section covers Polars-specific cursors, result sets, and data converters.
Polars Cursors¶
- class pyathena.polars.cursor.PolarsCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, chunksize: int | None = None, **kwargs)[source]¶
Cursor for handling Polars DataFrame results from Athena queries.
This cursor returns query results as Polars DataFrames using Polars’ native reading capabilities. It does not require PyArrow for basic functionality, but can optionally provide Arrow Table access when PyArrow is installed.
The cursor supports both regular CSV-based results and high-performance UNLOAD operations that return results in Parquet format for improved performance with large datasets.
- description¶
Sequence of column descriptions for the last query.
- rowcount¶
Number of rows affected by the last query (-1 for SELECT queries).
- arraysize¶
Default number of rows to fetch with fetchmany().
Example
>>> from pyathena.polars.cursor import PolarsCursor >>> cursor = connection.cursor(PolarsCursor) >>> cursor.execute("SELECT * FROM large_table") >>> df = cursor.as_polars() # Returns polars.DataFrame
# Optional: Get Arrow Table (requires pyarrow) >>> table = cursor.as_arrow()
# High-performance UNLOAD for large datasets >>> cursor = connection.cursor(PolarsCursor, unload=True) >>> cursor.execute(“SELECT * FROM huge_table”) >>> df = cursor.as_polars() # Faster Parquet-based result
Note
Requires polars to be installed. PyArrow is optional and only needed for as_arrow() functionality.
- __init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, chunksize: int | None = None, **kwargs) None[source]¶
Initialize a PolarsCursor.
- Parameters:
s3_staging_dir – S3 location for query results.
schema_name – Default schema name.
catalog_name – Default catalog name.
work_group – Athena workgroup name.
poll_interval – Query status polling interval in seconds.
encryption_option – S3 encryption option (SSE_S3, SSE_KMS, CSE_KMS).
kms_key – KMS key ARN for encryption.
kill_on_interrupt – Cancel running query on keyboard interrupt.
unload – Enable UNLOAD for high-performance Parquet output.
result_reuse_enable – Enable Athena query result reuse.
result_reuse_minutes – Minutes to reuse cached results.
on_start_query_execution – Callback invoked when query starts.
block_size – S3 read block size.
cache_type – S3 caching strategy.
max_workers – Maximum worker threads for parallel S3 operations.
chunksize – Number of rows per chunk for memory-efficient processing. If specified, data is loaded lazily in chunks for all data access methods including fetchone(), fetchmany(), and iter_chunks().
**kwargs – Additional connection parameters.
Example
>>> cursor = connection.cursor(PolarsCursor, unload=True) >>> # With chunked processing >>> cursor = connection.cursor(PolarsCursor, chunksize=50000)
- static get_default_converter(unload: bool = False) DefaultPolarsTypeConverter | DefaultPolarsUnloadTypeConverter | Any[source]¶
Get the default type converter for Polars results.
- Parameters:
unload – If True, returns converter for UNLOAD (Parquet) results.
- Returns:
Type converter appropriate for the result format.
- execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, on_start_query_execution: Callable[[str], None] | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) PolarsCursor[source]¶
Execute a SQL query and return results as Polars DataFrames.
Executes the SQL query on Amazon Athena and configures the result set for Polars DataFrame output using Polars’ native reading capabilities.
- Parameters:
operation – SQL query string to execute.
parameters – Query parameters for parameterized queries.
work_group – Athena workgroup to use for this query.
s3_staging_dir – S3 location for query results.
cache_size – Number of queries to check for result caching.
cache_expiration_time – Cache expiration time in seconds.
result_reuse_enable – Enable Athena result reuse for this query.
result_reuse_minutes – Minutes to reuse cached results.
paramstyle – Parameter style (‘qmark’ or ‘pyformat’).
on_start_query_execution – Callback called when query starts.
result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.
**kwargs – Additional execution parameters passed to Polars read functions.
- Returns:
Self reference for method chaining.
Example
>>> cursor.execute("SELECT * FROM sales WHERE year = 2023") >>> df = cursor.as_polars() # Returns Polars DataFrame
- as_polars() pl.DataFrame[source]¶
Return query results as a Polars DataFrame.
Returns the query results as a Polars DataFrame. This is the primary method for accessing results with PolarsCursor.
- Returns:
Polars DataFrame containing all query results.
- Raises:
ProgrammingError – If no query has been executed or no results are available.
Example
>>> cursor = connection.cursor(PolarsCursor) >>> cursor.execute("SELECT * FROM my_table") >>> df = cursor.as_polars() >>> print(f"DataFrame has {df.height} rows and {df.width} columns") >>> filtered = df.filter(pl.col("value") > 100)
- as_arrow() Table[source]¶
Return query results as an Apache Arrow Table.
Converts the Polars DataFrame to an Apache Arrow Table for interoperability with other Arrow-compatible tools and libraries.
- Returns:
Apache Arrow Table containing all query results.
- Raises:
ProgrammingError – If no query has been executed or no results are available.
ImportError – If pyarrow is not installed.
Example
>>> cursor = connection.cursor(PolarsCursor) >>> cursor.execute("SELECT * FROM my_table") >>> table = cursor.as_arrow() >>> print(f"Table has {table.num_rows} rows and {table.num_columns} columns")
- iter_chunks() Iterator[pl.DataFrame][source]¶
Iterate over result chunks as Polars DataFrames.
This method provides an iterator interface for processing result sets. When chunksize is specified, it yields DataFrames in chunks using lazy evaluation for memory-efficient processing. When chunksize is not specified, it yields the entire result as a single DataFrame, providing a consistent interface regardless of chunking configuration.
- Yields:
Polars DataFrame for each chunk of rows, or the entire DataFrame if chunksize was not specified.
- Raises:
ProgrammingError – If no result set is available.
Example
>>> # With chunking for large datasets >>> cursor = connection.cursor(PolarsCursor, chunksize=50000) >>> cursor.execute("SELECT * FROM large_table") >>> for chunk in cursor.iter_chunks(): ... process_chunk(chunk) # Each chunk is a Polars DataFrame >>> >>> # Without chunking - yields entire result as single chunk >>> cursor = connection.cursor(PolarsCursor) >>> cursor.execute("SELECT * FROM small_table") >>> for df in cursor.iter_chunks(): ... process(df) # Single DataFrame with all data
- DEFAULT_RESULT_REUSE_MINUTES = 60¶
- LIST_DATABASES_MAX_RESULTS = 50¶
- LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶
- LIST_TABLE_METADATA_MAX_RESULTS = 50¶
- cancel() None¶
Cancel the currently executing query.
- Raises:
ProgrammingError – If no query is currently executing.
- property connection: Connection[Any]¶
- executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None¶
Execute a SQL query multiple times with different parameters.
- Parameters:
operation – SQL query string to execute.
seq_of_parameters – Sequence of parameter sets, one per execution.
**kwargs – Additional keyword arguments passed to each
execute().
- fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]]¶
Fetch all remaining rows from the result set.
- Returns:
List of tuples representing all remaining rows.
- Raises:
ProgrammingError – If no result set is available.
- fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]]¶
Fetch multiple rows from the result set.
- Parameters:
size – Maximum number of rows to fetch. Defaults to arraysize.
- Returns:
List of tuples representing the fetched rows.
- Raises:
ProgrammingError – If no result set is available.
- fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None¶
Fetch the next row of the result set.
- Returns:
A tuple representing the next row, or None if no more rows.
- Raises:
ProgrammingError – If no result set is available.
- get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata¶
- list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]¶
- property result_set: AthenaResultSet | None¶
- property rowcount: int¶
Get the number of rows affected by the last operation.
For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.
- Returns:
The number of rows, or -1 if not applicable or unknown.
- setinputsizes(sizes)¶
Does nothing by default
- setoutputsize(size, column=None)¶
Does nothing by default
- class pyathena.polars.async_cursor.AsyncPolarsCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, max_workers: int = 20, arraysize: int = 1000, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, block_size: int | None = None, cache_type: str | None = None, chunksize: int | None = None, **kwargs)[source]¶
Asynchronous cursor that returns results as Polars DataFrames.
This cursor extends AsyncCursor to provide asynchronous query execution with results returned as Polars DataFrames using Polars’ native reading capabilities. It does not require PyArrow for basic functionality, but can optionally provide Arrow Table access when PyArrow is installed.
- Features:
Asynchronous query execution with concurrent futures
Native Polars CSV and Parquet reading (no PyArrow required)
Memory-efficient columnar data processing
Support for UNLOAD operations with Parquet output
Optional Arrow interoperability when PyArrow is installed
- arraysize¶
Number of rows to fetch per batch (configurable).
Example
>>> from pyathena.polars.async_cursor import AsyncPolarsCursor >>> >>> cursor = connection.cursor(AsyncPolarsCursor, unload=True) >>> query_id, future = cursor.execute("SELECT * FROM large_table") >>> >>> # Get result when ready >>> result_set = future.result() >>> df = result_set.as_polars() >>> >>> # Optional: Convert to Arrow Table if pyarrow is installed >>> table = result_set.as_arrow()
Note
Requires polars to be installed. PyArrow is optional and only needed for as_arrow() functionality. UNLOAD operations generate Parquet files in S3 for optimal performance.
- __init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, max_workers: int = 20, arraysize: int = 1000, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, block_size: int | None = None, cache_type: str | None = None, chunksize: int | None = None, **kwargs) None[source]¶
Initialize an AsyncPolarsCursor.
- Parameters:
s3_staging_dir – S3 location for query results.
schema_name – Default schema name.
catalog_name – Default catalog name.
work_group – Athena workgroup name.
poll_interval – Query status polling interval in seconds.
encryption_option – S3 encryption option (SSE_S3, SSE_KMS, CSE_KMS).
kms_key – KMS key ARN for encryption.
kill_on_interrupt – Cancel running query on keyboard interrupt.
max_workers – Maximum number of workers for concurrent execution.
arraysize – Number of rows to fetch per batch.
unload – Enable UNLOAD for high-performance Parquet output.
result_reuse_enable – Enable Athena query result reuse.
result_reuse_minutes – Minutes to reuse cached results.
block_size – S3 read block size.
cache_type – S3 caching strategy.
chunksize – Number of rows per chunk for memory-efficient processing. If specified, data is loaded lazily in chunks for all data access methods including fetchone(), fetchmany(), and iter_chunks().
**kwargs – Additional connection parameters.
Example
>>> cursor = connection.cursor(AsyncPolarsCursor, unload=True) >>> # With chunked processing >>> cursor = connection.cursor(AsyncPolarsCursor, chunksize=50000)
- static get_default_converter(unload: bool = False) DefaultPolarsTypeConverter | DefaultPolarsUnloadTypeConverter | Any[source]¶
Get the default type converter for Polars results.
- Parameters:
unload – If True, returns converter for UNLOAD (Parquet) results.
- Returns:
Type converter appropriate for the result format.
- LIST_DATABASES_MAX_RESULTS = 50¶
- LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶
- LIST_TABLE_METADATA_MAX_RESULTS = 50¶
- cancel(query_id: str) Future[None]¶
Cancel a running query asynchronously.
Submits a cancellation request for the specified query. The cancellation itself runs asynchronously in the background.
- Parameters:
query_id – The Athena query execution ID to cancel.
- Returns:
Future object that completes when the cancellation request finishes.
Example
>>> query_id, future = cursor.execute("SELECT * FROM huge_table") >>> # Later, cancel the query >>> cancel_future = cursor.cancel(query_id) >>> cancel_future.result() # Wait for cancellation to complete
- property connection: Connection[Any]¶
- execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) tuple[str, Future[AthenaPolarsResultSet | Any]][source]¶
Execute a SQL query asynchronously and return results as Polars DataFrames.
Executes the SQL query on Amazon Athena asynchronously and returns a future that resolves to a result set for Polars DataFrame output.
- Parameters:
operation – SQL query string to execute.
parameters – Query parameters for parameterized queries.
work_group – Athena workgroup to use for this query.
s3_staging_dir – S3 location for query results.
cache_size – Number of queries to check for result caching.
cache_expiration_time – Cache expiration time in seconds.
result_reuse_enable – Enable Athena result reuse for this query.
result_reuse_minutes – Minutes to reuse cached results.
paramstyle – Parameter style (‘qmark’ or ‘pyformat’).
result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.
**kwargs – Additional execution parameters passed to Polars read functions.
- Returns:
Tuple of (query_id, future) where future resolves to AthenaPolarsResultSet.
Example
>>> query_id, future = cursor.execute("SELECT * FROM sales") >>> result_set = future.result() >>> df = result_set.as_polars() # Returns Polars DataFrame
- executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None¶
Execute multiple queries asynchronously (not supported).
This method is not supported for asynchronous cursors because managing multiple concurrent queries would be complex and resource-intensive.
- Parameters:
operation – SQL query string.
seq_of_parameters – Sequence of parameter sets.
**kwargs – Additional arguments.
- Raises:
NotSupportedError – Always raised as this operation is not supported.
Note
For bulk operations, consider using execute() with parameterized queries or batch processing patterns instead.
- get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata¶
- list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]¶
- poll(query_id: str) Future[AthenaQueryExecution]¶
Poll for query completion asynchronously.
Waits for the query to complete (succeed, fail, or be cancelled) and returns the final execution status. This method blocks until completion but runs the polling in a background thread.
- Parameters:
query_id – The Athena query execution ID to poll.
- Returns:
Future object containing the final AthenaQueryExecution status.
Note
This method performs polling internally, so it will take time proportional to your query execution duration.
- query_execution(query_id: str) Future[AthenaQueryExecution]¶
Get query execution details asynchronously.
Retrieves the current execution status and metadata for a query. This is useful for monitoring query progress without blocking.
- Parameters:
query_id – The Athena query execution ID.
- Returns:
Future object containing AthenaQueryExecution with query details.
- setinputsizes(sizes)¶
Does nothing by default
- setoutputsize(size, column=None)¶
Does nothing by default
Polars Result Set¶
- class pyathena.polars.result_set.AthenaPolarsResultSet(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, unload: bool = False, unload_location: str | None = None, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, chunksize: int | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs)[source]¶
Result set that provides Polars DataFrame results with optional Arrow interoperability.
This result set handles CSV and Parquet result files from S3, converting them to Polars DataFrames using Polars’ native reading capabilities. It does not require PyArrow for basic functionality, but can optionally provide Arrow Table access when PyArrow is installed.
- Features:
Native Polars CSV and Parquet reading (no PyArrow required)
Efficient columnar data processing with Polars
Optional Arrow interoperability when PyArrow is available
Support for both CSV and Parquet result formats
Chunked iteration for memory-efficient processing of large datasets
Optimized memory usage through columnar format
Example
>>> # Used automatically by PolarsCursor >>> cursor = connection.cursor(PolarsCursor) >>> cursor.execute("SELECT * FROM large_table") >>> >>> # Get Polars DataFrame >>> df = cursor.as_polars() >>> >>> # Work with Polars >>> print(f"DataFrame has {df.height} rows and {df.width} columns") >>> filtered = df.filter(pl.col("value") > 100) >>> >>> # Optional: Get Arrow Table (requires pyarrow) >>> table = cursor.as_arrow() >>> >>> # Memory-efficient chunked iteration >>> cursor = connection.cursor(PolarsCursor, chunksize=50000) >>> cursor.execute("SELECT * FROM huge_table") >>> for chunk in cursor.iter_chunks(): ... process_chunk(chunk)
Note
This class is used internally by PolarsCursor and typically not instantiated directly by users. Requires polars to be installed. PyArrow is optional and only needed for as_arrow() functionality.
- __init__(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, unload: bool = False, unload_location: str | None = None, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, chunksize: int | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) None[source]¶
Initialize the Polars result set.
- Parameters:
connection – The Athena connection object.
converter – Type converter for Athena data types.
query_execution – Query execution metadata.
arraysize – Number of rows to fetch per batch.
retry_config – Configuration for retry behavior.
unload – Whether this is an UNLOAD query result.
unload_location – S3 location for UNLOAD results.
block_size – Block size for S3 file reading.
cache_type – Cache type for S3 file system.
max_workers – Maximum number of worker threads.
chunksize – Number of rows per chunk for memory-efficient processing. If specified, data is loaded lazily in chunks for all data access methods including fetchone(), fetchmany(), and iter_chunks().
result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion.
**kwargs – Additional arguments passed to Polars read functions.
- property converters: dict[str, Callable[[str | None], Any | None]]¶
Get converter functions for each column.
- Returns:
Dictionary mapping column names to their converter functions.
- fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None[source]¶
Fetch the next row of the query result.
- Returns:
A single row as a tuple, or None if no more rows are available.
- fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶
Fetch the next set of rows of the query result.
- Parameters:
size – Number of rows to fetch. Defaults to arraysize.
- Returns:
A list of rows as tuples.
- fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶
Fetch all remaining rows of the query result.
- Returns:
A list of all remaining rows as tuples.
- as_polars() pl.DataFrame[source]¶
Return query results as a Polars DataFrame.
Returns the query results as a Polars DataFrame. This is the primary method for accessing results with PolarsCursor.
Note
When chunksize is set, calling this method will collect all chunks into a single DataFrame, loading all data into memory. Use iter_chunks() for memory-efficient processing of large datasets.
- Returns:
Polars DataFrame containing all query results.
Example
>>> cursor = connection.cursor(PolarsCursor) >>> cursor.execute("SELECT * FROM my_table") >>> df = cursor.as_polars() >>> print(f"DataFrame has {df.height} rows") >>> filtered = df.filter(pl.col("value") > 100)
- as_arrow() Table[source]¶
Return query results as an Apache Arrow Table.
Converts the Polars DataFrame to an Apache Arrow Table for interoperability with other Arrow-compatible tools and libraries.
- Returns:
Apache Arrow Table containing all query results.
- Raises:
ImportError – If pyarrow is not installed.
Example
>>> cursor = connection.cursor(PolarsCursor) >>> cursor.execute("SELECT * FROM my_table") >>> table = cursor.as_arrow() >>> # Use with other Arrow-compatible libraries
- DEFAULT_RESULT_REUSE_MINUTES = 60¶
- property connection: Connection[Any]¶
- property is_unload: bool¶
Check if the query is an UNLOAD statement.
- Returns:
True if the query is an UNLOAD statement, False otherwise.
- iter_chunks() PolarsDataFrameIterator[source]¶
Iterate over result chunks as Polars DataFrames.
This method provides an iterator interface for processing large result sets. When chunksize is specified, it yields DataFrames in chunks using lazy evaluation for memory-efficient processing. When chunksize is not specified, it yields the entire result as a single DataFrame.
- Returns:
PolarsDataFrameIterator that yields Polars DataFrames for each chunk of rows, or the entire DataFrame if chunksize was not specified.
Example
>>> # With chunking for large datasets >>> cursor = connection.cursor(PolarsCursor, chunksize=50000) >>> cursor.execute("SELECT * FROM large_table") >>> for chunk in cursor.iter_chunks(): ... process_chunk(chunk) # Each chunk is a Polars DataFrame >>> >>> # Without chunking - yields entire result as single chunk >>> cursor = connection.cursor(PolarsCursor) >>> cursor.execute("SELECT * FROM small_table") >>> for df in cursor.iter_chunks(): ... process(df) # Single DataFrame with all data
- class pyathena.polars.result_set.PolarsDataFrameIterator(reader: Iterator[pl.DataFrame] | pl.DataFrame, converters: dict[str, Callable[[str | None], Any | None]], column_names: list[str])[source]¶
Iterator for chunked DataFrame results from Athena queries.
This class wraps either a Polars DataFrame iterator (for chunked reading) or a single DataFrame, providing a unified iterator interface. It applies optional type conversion to each DataFrame chunk as it’s yielded.
The iterator is used by AthenaPolarsResultSet to provide chunked access to large query results, enabling memory-efficient processing of datasets that would be too large to load entirely into memory.
Example
>>> # Iterate over DataFrame chunks >>> for df_chunk in iterator: ... process(df_chunk) >>> >>> # Iterate over individual rows >>> for idx, row in iterator.iterrows(): ... print(row)
Note
This class is primarily for internal use by AthenaPolarsResultSet. Most users should access results through PolarsCursor methods.
- __init__(reader: Iterator[pl.DataFrame] | pl.DataFrame, converters: dict[str, Callable[[str | None], Any | None]], column_names: list[str]) None[source]¶
Initialize the iterator.
- Parameters:
reader – Either a DataFrame iterator (for chunked) or a single DataFrame.
converters – Dictionary mapping column names to converter functions.
column_names – List of column names in order.
- __next__() pl.DataFrame[source]¶
Get the next DataFrame chunk.
- Returns:
The next Polars DataFrame chunk.
- Raises:
StopIteration – When no more chunks are available.
- __iter__() PolarsDataFrameIterator[source]¶
Return self as iterator.
- __enter__() PolarsDataFrameIterator[source]¶
Context manager entry.
Polars Data Converters¶
- class pyathena.polars.converter.DefaultPolarsTypeConverter[source]¶
Optimized type converter for Polars DataFrame results.
This converter is specifically designed for the PolarsCursor and provides optimized type conversion for Polars DataFrames.
- The converter focuses on:
Converting date/time types to appropriate Python objects
Handling decimal and binary types
Preserving JSON and complex types
Maintaining high performance for columnar operations
Example
>>> from pyathena.polars.converter import DefaultPolarsTypeConverter >>> converter = DefaultPolarsTypeConverter() >>> >>> # Used automatically by PolarsCursor >>> cursor = connection.cursor(PolarsCursor) >>> # converter is applied automatically to results
Note
This converter is used by default in PolarsCursor. Most users don’t need to instantiate it directly.
- class pyathena.polars.converter.DefaultPolarsUnloadTypeConverter[source]¶
Type converter for Polars UNLOAD operations.
This converter is designed for use with UNLOAD queries that write results directly to Parquet files in S3. Since UNLOAD operations bypass the normal conversion process and write data in native Parquet format, this converter has minimal functionality.
Note
Used automatically when PolarsCursor is configured with unload=True. UNLOAD results are read directly as Polars DataFrames from Parquet files.
Polars Utilities¶
- pyathena.polars.util.to_column_info(schema: pl.Schema) tuple[dict[str, Any], ...][source]¶
Convert a Polars schema to Athena column information.
Iterates through all fields in the schema and converts each field’s type information to an Athena-compatible column metadata dictionary.
- Parameters:
schema – A Polars Schema object containing field definitions.
- Returns:
Name: The column name
Type: The Athena SQL type name
Precision: Numeric precision (0 for non-numeric types)
Scale: Numeric scale (0 for non-numeric types)
Nullable: Always “NULLABLE” for Polars types
- Return type:
A tuple of dictionaries, each containing column metadata with keys
- pyathena.polars.util.get_athena_type(dtype: Any) tuple[str, int, int][source]¶
Map a Polars data type to an Athena SQL type.
Converts Polars type identifiers to corresponding Athena SQL type names with appropriate precision and scale values. Handles all common Polars types including numeric, string, binary, temporal, and complex types.
- Parameters:
dtype – A Polars DataType object to convert.
- Returns:
type_name: The Athena SQL type (e.g., “varchar”, “bigint”, “timestamp”)
precision: The numeric precision or max length
scale: The numeric scale (decimal places)
- Return type:
A tuple of (type_name, precision, scale) where
Note
Unknown types default to “string” with maximum varchar length. Decimal types preserve their original precision and scale.