Native Asyncio¶

This section covers the native asyncio connection, cursors, and base classes.

Connection¶

class pyathena.DBAPITypeObject[source]¶

Type Objects and Constructors

https://www.python.org/dev/peps/pep-0249/#type-objects-and-constructors

pyathena.connect(*args, cursor_class: None = ..., **kwargs) → Connection[Cursor][source]¶

pyathena.connect(*args, cursor_class: type[ConnectionCursor], **kwargs) → Connection[ConnectionCursor]

Create a new database connection to Amazon Athena.

This function provides the main entry point for establishing connections to Amazon Athena. It follows the DB API 2.0 specification and returns a Connection object that can be used to create cursors for executing SQL queries.

Parameters:

s3_staging_dir – S3 location to store query results. Required if not using workgroups or if the workgroup doesn’t have a result location. Pass an empty string to explicitly disable S3 staging and skip the AWS_ATHENA_S3_STAGING_DIR environment variable fallback (required for workgroups with managed query result storage).
region_name – AWS region name. If not specified, uses the default region from your AWS configuration.
schema_name – Athena database/schema name. Defaults to “default”.
catalog_name – Athena data catalog name. Defaults to “awsdatacatalog”.
work_group – Athena workgroup name. Can be used instead of s3_staging_dir if the workgroup has a result location configured.
poll_interval – Time in seconds between polling for query completion. Defaults to 1.0.
encryption_option – S3 encryption option for query results. Can be “SSE_S3”, “SSE_KMS”, or “CSE_KMS”.
kms_key – KMS key ID for encryption when using SSE_KMS or CSE_KMS.
profile_name – AWS profile name to use for authentication.
role_arn – ARN of IAM role to assume for authentication.
role_session_name – Session name when assuming a role.
cursor_class – Custom cursor class to use. If not specified, uses the default Cursor class.
kill_on_interrupt – Whether to cancel running queries when interrupted. Defaults to True.
**kwargs – Additional keyword arguments passed to the Connection constructor.

Returns:

A Connection object that can be used to create cursors and execute queries.

Raises:

ProgrammingError – If neither s3_staging_dir nor work_group is provided.

Example

>>> import pyathena
>>> conn = pyathena.connect(
...     s3_staging_dir='s3://my-bucket/staging/',
...     region_name='us-east-1',
...     schema_name='mydatabase'
... )
>>> cursor = conn.cursor()
>>> cursor.execute("SELECT * FROM mytable LIMIT 10")
>>> results = cursor.fetchall()

async pyathena.aio_connect(*args, **kwargs) → AioConnection[source]¶

Create a new async database connection to Amazon Athena.

This is the async counterpart of connect(). It returns an AioConnection whose cursors use native asyncio for polling and API calls, keeping the event loop free.

Parameters:: **kwargs – Arguments forwarded to AioConnection.create(). See connect() for the full list of supported arguments.
Returns:: An AioConnection that produces AioCursor instances by default.

Example

>>> import pyathena
>>> conn = await pyathena.aio_connect(
...     s3_staging_dir='s3://my-bucket/staging/',
...     region_name='us-east-1',
... )
>>> async with conn.cursor() as cursor:
...     await cursor.execute("SELECT 1")
...     print(await cursor.fetchone())

class pyathena.aio.connection.AioConnection(**kwargs: Any)[source]¶

Async-aware connection to Amazon Athena.

Wraps the synchronous Connection with async context manager support and provides create() for non-blocking initialization.

Example

>>> async with await AioConnection.create(
...     s3_staging_dir="s3://bucket/path/",
...     region_name="us-east-1",
... ) as conn:
...     async with conn.cursor() as cursor:
...         await cursor.execute("SELECT 1")
...         print(await cursor.fetchone())

__init__(**kwargs: Any) → None[source]¶

Initialize a new Athena database connection.

Parameters:

s3_staging_dir – S3 location to store query results. Required if not using workgroups or if workgroup doesn’t have result location. Pass an empty string to explicitly disable S3 staging and skip the AWS_ATHENA_S3_STAGING_DIR environment variable fallback. This is required when connecting to a workgroup with managed query result storage enabled.
region_name – AWS region name. Uses default region if not specified.
schema_name – Default database/schema name. Defaults to “default”.
catalog_name – Data catalog name. Defaults to “awsdatacatalog”.
work_group – Athena workgroup name. Can substitute for s3_staging_dir if workgroup has result location configured.
poll_interval – Seconds between query status polls. Defaults to 1.0.
encryption_option – S3 encryption for results (“SSE_S3”, “SSE_KMS”, “CSE_KMS”).
kms_key – KMS key ID when using SSE_KMS or CSE_KMS encryption.
profile_name – AWS profile name for authentication.
role_arn – IAM role ARN to assume for authentication.
role_session_name – Session name when assuming IAM role.
external_id – External ID for role assumption (if required by role).
serial_number – MFA device serial number for role assumption.
duration_seconds – Role session duration in seconds. Defaults to 3600.
converter – Custom type converter. Uses DefaultTypeConverter if None.
formatter – Custom parameter formatter. Uses DefaultParameterFormatter if None.
retry_config – Retry configuration for API calls. Uses default if None.
cursor_class – Default cursor class for this connection.
cursor_kwargs – Default keyword arguments for cursor creation.
kill_on_interrupt – Cancel running queries on interrupt. Defaults to True.
session – Pre-configured boto3 Session. Creates new session if None.
config – Boto3 Config object for client configuration.
result_reuse_enable – Enable Athena query result reuse. Defaults to False.
result_reuse_minutes – Minutes to reuse cached results.
on_start_query_execution – Callback function called when query starts.
on_poll – Callback invoked once per poll iteration with the current execution object (AthenaQueryExecution, or AthenaCalculationExecutionStatus for Spark). Useful for monitoring live query progress. Defaults to None.
**kwargs – Additional arguments passed to boto3 Session and client.

Raises:

ProgrammingError – If neither s3_staging_dir nor work_group is provided.

Note

Either s3_staging_dir or work_group must be specified. Environment variables AWS_ATHENA_S3_STAGING_DIR and AWS_ATHENA_WORK_GROUP are checked if parameters are not provided.

When using a workgroup with managed query result storage, pass s3_staging_dir="" to prevent the environment variable fallback from sending a ResultConfiguration that conflicts with ManagedQueryResultsConfiguration.

async classmethod create(**kwargs: Any) → AioConnection[source]¶

Async factory for creating an AioConnection.

Runs the (potentially blocking) __init__ in a thread so that STS calls (role_arn / serial_number) do not block the loop.

Parameters:: **kwargs – Arguments forwarded to AioConnection.__init__.
Returns:: A fully initialized AioConnection.

__enter__()¶

Enter the runtime context for the connection.

Returns:: Self for use in context manager protocol.

__exit__(exc_type, exc_val, exc_tb)¶

Exit the runtime context and close the connection.

Parameters:

exc_type – Exception type if an exception occurred.
exc_val – Exception value if an exception occurred.
exc_tb – Exception traceback if an exception occurred.

property client: BaseClient¶

Get the boto3 Athena client used for query operations.

Returns:: The configured boto3 Athena client.

close() → None¶

Close the connection.

Closes the database connection. This method is provided for DB API 2.0 compatibility. Since Athena connections are stateless, this method currently does not perform any actual cleanup operations.

Note

This method is called automatically when using the connection as a context manager (with statement).

commit() → None¶

Commit any pending transaction.

This method is provided for DB API 2.0 compatibility. Since Athena does not support transactions, this method does nothing.

Note

Athena queries are auto-committed and cannot be rolled back.

cursor(cursor: type[FunctionalCursor] | None = None, **kwargs) → FunctionalCursor | ConnectionCursor¶

Create a new cursor object for executing queries.

Creates and returns a cursor object that can be used to execute SQL queries against Amazon Athena. The cursor inherits connection settings but can be customized with additional parameters.

Parameters:

cursor – Custom cursor class to use. If not provided, uses the connection’s default cursor class.
**kwargs – Additional keyword arguments to pass to the cursor constructor. These override connection defaults.

Returns:

A cursor object that can execute SQL queries.

Example

>>> cursor = connection.cursor()
>>> cursor.execute("SELECT * FROM my_table LIMIT 10")
>>> results = cursor.fetchall()

# Using a custom cursor type >>> from pyathena.pandas.cursor import PandasCursor >>> pandas_cursor = connection.cursor(PandasCursor) >>> df = pandas_cursor.execute(“SELECT * FROM my_table”).fetchall()

property retry_config: RetryConfig¶

Get the retry configuration for AWS API calls.

Returns:: The RetryConfig object that controls retry behavior for failed requests.

rollback() → None¶

Rollback any pending transaction.

This method is required by DB API 2.0 but is not supported by Athena since Athena does not support transactions.

Raises:: NotSupportedError – Always raised since transactions are not supported.

property session: Session¶

Get the boto3 session used for AWS API calls.

Returns:: The configured boto3 Session object.

Aio Cursors¶

class pyathena.aio.cursor.AioCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, **kwargs)[source]¶

Native asyncio cursor for Amazon Athena.

Unlike AsyncCursor (which uses ThreadPoolExecutor), this cursor uses asyncio.sleep for polling and asyncio.to_thread for boto3 calls, keeping the event loop free.

Example

>>> async with AioConnection.create(...) as conn:
...     async with conn.cursor() as cursor:
...         await cursor.execute("SELECT * FROM my_table")
...         rows = await cursor.fetchall()

__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, **kwargs) → None[source]¶

property arraysize: int¶

Execute a SQL query asynchronously.

Parameters:

operation – SQL query string to execute.
parameters – Query parameters (optional).
work_group – Athena workgroup to use (optional).
s3_staging_dir – S3 location for query results (optional).
cache_size – Query result cache size (optional).
cache_expiration_time – Cache expiration time in seconds (optional).
result_reuse_enable – Enable result reuse (optional).
result_reuse_minutes – Result reuse duration in minutes (optional).
paramstyle – Parameter style to use (optional).
result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.
**kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

async fetchone() → Any | dict[Any, Any | None] | None[source]¶

Fetch the next row of a query result set.

Returns:: A tuple representing the next row, or None if no more rows.
Raises:: ProgrammingError – If called before executing a query that returns results.

async fetchmany(size: int | None = None) → list[Any | dict[Any, Any | None]][source]¶

Fetch multiple rows from a query result set.

Parameters:: size – Maximum number of rows to fetch. If None, uses arraysize.
Returns:: List of tuples representing the fetched rows.
Raises:: ProgrammingError – If called before executing a query that returns results.

async fetchall() → list[Any | dict[Any, Any | None]][source]¶

Fetch all remaining rows from a query result set.

Returns:: List of tuples representing all remaining rows in the result set.
Raises:: ProgrammingError – If called before executing a query that returns results.

DEFAULT_FETCH_SIZE: int = 1000¶

DEFAULT_RESULT_REUSE_MINUTES = 60¶

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

async cancel() → None¶

Cancel the currently executing query.

Raises:: ProgrammingError – If no query is currently executing.

property catalog: str | None¶

close() → None¶: Close the cursor and release associated resources.

property completion_date_time: datetime | None¶

property connection: Connection[Any]¶

property data_manifest_location: str | None¶

property data_scanned_in_bytes: int | None¶

property database: str | None¶

property description: list[tuple[str, str, None, None, int, int, str]] | None¶

property effective_engine_version: str | None¶

property encryption_option: str | None¶

property engine_execution_time_in_millis: int | None¶

property error_category: int | None¶

property error_message: str | None¶

property error_type: int | None¶

async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None¶

Execute a SQL query multiple times with different parameters.

Parameters:

operation – SQL query string to execute.
seq_of_parameters – Sequence of parameter sets, one per execution.
**kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]¶

property expected_bucket_owner: str | None¶

static get_default_converter(unload: bool = False) → DefaultTypeConverter | Any¶

Get the default type converter for this cursor class.

Parameters:: unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
Returns:: The default type converter instance for this cursor type.

async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata¶

property has_result_set: bool¶

property kms_key: str | None¶

async list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase]¶

async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata]¶

property output_location: str | None¶

property query: str | None¶

property query_id: str | None¶

property query_planning_time_in_millis: int | None¶

property query_queue_time_in_millis: int | None¶

property result_reuse_enabled: bool | None¶

property result_reuse_minutes: int | None¶

property result_set: AthenaResultSet | None¶

property retryable: bool | None¶

property reused_previous_result: bool | None¶

property rowcount: int¶

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:: The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None¶

property s3_acl_option: str | None¶

property selected_engine_version: str | None¶

property service_processing_time_in_millis: int | None¶

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

property state: str | None¶

property state_change_reason: str | None¶

property statement_type: str | None¶

property submission_date_time: datetime | None¶

property substatement_type: str | None¶

property total_execution_time_in_millis: int | None¶

property work_group: str | None¶

class pyathena.aio.cursor.AioDictCursor(**kwargs)[source]¶

Native asyncio cursor that returns rows as dictionaries.

Example

>>> async with AioConnection.create(...) as conn:
...     cursor = conn.cursor(AioDictCursor)
...     await cursor.execute("SELECT id, name FROM users")
...     row = await cursor.fetchone()
...     print(row["name"])

__init__(**kwargs) → None[source]¶

DEFAULT_FETCH_SIZE: int = 1000¶

DEFAULT_RESULT_REUSE_MINUTES = 60¶

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

property arraysize: int¶

async cancel() → None¶

Cancel the currently executing query.

Raises:: ProgrammingError – If no query is currently executing.

property catalog: str | None¶

close() → None¶: Close the cursor and release associated resources.

property completion_date_time: datetime | None¶

property connection: Connection[Any]¶

property data_manifest_location: str | None¶

property data_scanned_in_bytes: int | None¶

property database: str | None¶

property description: list[tuple[str, str, None, None, int, int, str]] | None¶

property effective_engine_version: str | None¶

property encryption_option: str | None¶

property engine_execution_time_in_millis: int | None¶

property error_category: int | None¶

property error_message: str | None¶

property error_type: int | None¶

Execute a SQL query asynchronously.

Parameters:

operation – SQL query string to execute.
parameters – Query parameters (optional).
work_group – Athena workgroup to use (optional).
s3_staging_dir – S3 location for query results (optional).
cache_size – Query result cache size (optional).
cache_expiration_time – Cache expiration time in seconds (optional).
result_reuse_enable – Enable result reuse (optional).
result_reuse_minutes – Result reuse duration in minutes (optional).
paramstyle – Parameter style to use (optional).
result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.
**kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None¶

Execute a SQL query multiple times with different parameters.

Parameters:

operation – SQL query string to execute.
seq_of_parameters – Sequence of parameter sets, one per execution.
**kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]¶

property expected_bucket_owner: str | None¶

async fetchall() → list[Any | dict[Any, Any | None]]¶

Fetch all remaining rows from a query result set.

Returns:: List of tuples representing all remaining rows in the result set.
Raises:: ProgrammingError – If called before executing a query that returns results.

async fetchmany(size: int | None = None) → list[Any | dict[Any, Any | None]]¶

Fetch multiple rows from a query result set.

Parameters:: size – Maximum number of rows to fetch. If None, uses arraysize.
Returns:: List of tuples representing the fetched rows.
Raises:: ProgrammingError – If called before executing a query that returns results.

async fetchone() → Any | dict[Any, Any | None] | None¶

Fetch the next row of a query result set.

Returns:: A tuple representing the next row, or None if no more rows.
Raises:: ProgrammingError – If called before executing a query that returns results.

static get_default_converter(unload: bool = False) → DefaultTypeConverter | Any¶

Get the default type converter for this cursor class.

Parameters:: unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
Returns:: The default type converter instance for this cursor type.

async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata¶

property has_result_set: bool¶

property kms_key: str | None¶

async list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase]¶

async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata]¶

property output_location: str | None¶

property query: str | None¶

property query_id: str | None¶

property query_planning_time_in_millis: int | None¶

property query_queue_time_in_millis: int | None¶

property result_reuse_enabled: bool | None¶

property result_reuse_minutes: int | None¶

property result_set: AthenaResultSet | None¶

property retryable: bool | None¶

property reused_previous_result: bool | None¶

property rowcount: int¶

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:: The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None¶

property s3_acl_option: str | None¶

property selected_engine_version: str | None¶

property service_processing_time_in_millis: int | None¶

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

property state: str | None¶

property state_change_reason: str | None¶

property statement_type: str | None¶

property submission_date_time: datetime | None¶

property substatement_type: str | None¶

property total_execution_time_in_millis: int | None¶

property work_group: str | None¶

Aio Result Set¶

class pyathena.aio.result_set.AthenaAioResultSet(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, result_set_type_hints: dict[str | int, str] | None = None)[source]¶

Async result set that provides async fetch methods.

Skips the synchronous _pre_fetch by passing _pre_fetch=False to the parent __init__ and provides an async create() classmethod factory instead.

__init__(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, result_set_type_hints: dict[str | int, str] | None = None) → None[source]¶

async classmethod create(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, result_set_type_hints: dict[str | int, str] | None = None) → AthenaAioResultSet[source]¶

Async factory method.

Creates an AthenaAioResultSet and awaits the initial data fetch.

Parameters:

connection – The database connection.
converter – Type converter for result values.
query_execution – Query execution metadata.
arraysize – Number of rows to fetch per request.
retry_config – Retry configuration for API calls.
result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion.

Returns:

A fully initialized AthenaAioResultSet.

async fetchone() → tuple[Any | None, ...] | dict[Any, Any | None] | None[source]¶

Fetch the next row of the result set.

Automatically fetches the next page from Athena when the current page is exhausted and more pages are available.

Returns:: A tuple representing the next row, or None if no more rows.

async fetchmany(size: int | None = None) → list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶

Fetch multiple rows from the result set.

Parameters:: size – Maximum number of rows to fetch. If None, uses arraysize.
Returns:: List of row tuples. May contain fewer rows than requested if fewer are available.

async fetchall() → list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶

Fetch all remaining rows from the result set.

Returns:: List of all remaining row tuples.

DEFAULT_FETCH_SIZE: int = 1000¶

DEFAULT_RESULT_REUSE_MINUTES = 60¶

property arraysize: int¶

property catalog: str | None¶

close() → None¶

property completion_date_time: datetime | None¶

property connection: Connection[Any]¶

property data_manifest_location: str | None¶

property data_scanned_in_bytes: int | None¶

property database: str | None¶

property description: list[tuple[str, str, None, None, int, int, str]] | None¶

property effective_engine_version: str | None¶

property encryption_option: str | None¶

property engine_execution_time_in_millis: int | None¶

property error_category: int | None¶

property error_message: str | None¶

property error_type: int | None¶

property execution_parameters: list[str]¶

property expected_bucket_owner: str | None¶

property is_closed: bool¶

property is_unload: bool¶

Check if the query is an UNLOAD statement.

Returns:: True if the query is an UNLOAD statement, False otherwise.

property kms_key: str | None¶

property output_location: str | None¶

property query: str | None¶

property query_id: str | None¶

property query_planning_time_in_millis: int | None¶

property query_queue_time_in_millis: int | None¶

property result_reuse_enabled: bool | None¶

property result_reuse_minutes: int | None¶

property retryable: bool | None¶

property reused_previous_result: bool | None¶

property rowcount: int¶

property rownumber: int | None¶

property s3_acl_option: str | None¶

property selected_engine_version: str | None¶

property service_processing_time_in_millis: int | None¶

property state: str | None¶

property state_change_reason: str | None¶

property statement_type: str | None¶

property submission_date_time: datetime | None¶

property substatement_type: str | None¶

property total_execution_time_in_millis: int | None¶

property work_group: str | None¶

class pyathena.aio.result_set.AthenaAioDictResultSet(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, result_set_type_hints: dict[str | int, str] | None = None)[source]¶

Async result set that returns rows as dictionaries.

Inherits _get_rows from AthenaDictResultSet and async fetch methods from AthenaAioResultSet via multiple inheritance.

DEFAULT_FETCH_SIZE: int = 1000¶

DEFAULT_RESULT_REUSE_MINUTES = 60¶

__init__(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, result_set_type_hints: dict[str | int, str] | None = None) → None¶

property arraysize: int¶

property catalog: str | None¶

close() → None¶

property completion_date_time: datetime | None¶

property connection: Connection[Any]¶

Async factory method.

Creates an AthenaAioResultSet and awaits the initial data fetch.

Parameters:

connection – The database connection.
converter – Type converter for result values.
query_execution – Query execution metadata.
arraysize – Number of rows to fetch per request.
retry_config – Retry configuration for API calls.
result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion.

Returns:

A fully initialized AthenaAioResultSet.

property data_manifest_location: str | None¶

property data_scanned_in_bytes: int | None¶

property database: str | None¶

property description: list[tuple[str, str, None, None, int, int, str]] | None¶

dict_type¶: alias of dict

property effective_engine_version: str | None¶

property encryption_option: str | None¶

property engine_execution_time_in_millis: int | None¶

property error_category: int | None¶

property error_message: str | None¶

property error_type: int | None¶

property execution_parameters: list[str]¶

property expected_bucket_owner: str | None¶

async fetchall() → list[tuple[Any | None, ...] | dict[Any, Any | None]]¶

Fetch all remaining rows from the result set.

Returns:: List of all remaining row tuples.

async fetchmany(size: int | None = None) → list[tuple[Any | None, ...] | dict[Any, Any | None]]¶

Fetch multiple rows from the result set.

Parameters:: size – Maximum number of rows to fetch. If None, uses arraysize.
Returns:: List of row tuples. May contain fewer rows than requested if fewer are available.

async fetchone() → tuple[Any | None, ...] | dict[Any, Any | None] | None¶

Fetch the next row of the result set.

Automatically fetches the next page from Athena when the current page is exhausted and more pages are available.

Returns:: A tuple representing the next row, or None if no more rows.

property is_closed: bool¶

property is_unload: bool¶

Check if the query is an UNLOAD statement.

Returns:: True if the query is an UNLOAD statement, False otherwise.

property kms_key: str | None¶

property output_location: str | None¶

property query: str | None¶

property query_id: str | None¶

property query_planning_time_in_millis: int | None¶

property query_queue_time_in_millis: int | None¶

property result_reuse_enabled: bool | None¶

property result_reuse_minutes: int | None¶

property retryable: bool | None¶

property reused_previous_result: bool | None¶

property rowcount: int¶

property rownumber: int | None¶

property s3_acl_option: str | None¶

property selected_engine_version: str | None¶

property service_processing_time_in_millis: int | None¶

property state: str | None¶

property state_change_reason: str | None¶

property statement_type: str | None¶

property submission_date_time: datetime | None¶

property substatement_type: str | None¶

property total_execution_time_in_millis: int | None¶

property work_group: str | None¶

Aio Base Classes¶

class pyathena.aio.common.AioBaseCursor(connection: Connection[Any], converter: Converter, formatter: Formatter, retry_config: RetryConfig, s3_staging_dir: str | None, schema_name: str | None, catalog_name: str | None, work_group: str | None, poll_interval: float, encryption_option: str | None, kms_key: str | None, kill_on_interrupt: bool, result_reuse_enable: bool, result_reuse_minutes: int, on_start_query_execution: Callable[[str], None] | None = None, on_poll: OnPollCallback | None = None, **kwargs)[source]¶

Async base cursor that overrides I/O methods with async equivalents.

Reuses BaseCursor.__init__, all _build_* methods, and constants. Only the methods that perform network I/O or blocking sleep are overridden to use asyncio.to_thread / asyncio.sleep.

async list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase][source]¶

async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata[source]¶

async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata][source]¶

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

__init__(connection: Connection[Any], converter: Converter, formatter: Formatter, retry_config: RetryConfig, s3_staging_dir: str | None, schema_name: str | None, catalog_name: str | None, work_group: str | None, poll_interval: float, encryption_option: str | None, kms_key: str | None, kill_on_interrupt: bool, result_reuse_enable: bool, result_reuse_minutes: int, on_start_query_execution: Callable[[str], None] | None = None, on_poll: OnPollCallback | None = None, **kwargs) → None¶

abstractmethod close() → None¶

property connection: Connection[Any]¶

abstractmethod execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, **kwargs)¶

abstractmethod executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None¶

static get_default_converter(unload: bool = False) → DefaultTypeConverter | Any¶

Get the default type converter for this cursor class.

Parameters:: unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
Returns:: The default type converter instance for this cursor type.

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

class pyathena.aio.common.WithAsyncFetch(**kwargs)[source]¶

Mixin providing shared fetch, lifecycle, and async protocol for SQL cursors.

Provides properties (arraysize, result_set, query_id, rownumber, rowcount), lifecycle methods (close, executemany, cancel), default sync fetch (for cursors whose result sets load all data eagerly in __init__), and the async iteration protocol.

Subclasses override execute() and optionally __init__ and format-specific helpers.

__init__(**kwargs) → None[source]¶

property arraysize: int¶

property result_set: AthenaResultSet | None¶

property query_id: str | None¶

property rownumber: int | None¶

property rowcount: int¶

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:: The number of rows, or -1 if not applicable or unknown.

close() → None[source]¶: Close the cursor and release associated resources.

async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None[source]¶

Execute a SQL query multiple times with different parameters.

Parameters:

operation – SQL query string to execute.
seq_of_parameters – Sequence of parameter sets, one per execution.
**kwargs – Additional keyword arguments passed to each execute().

async cancel() → None[source]¶

Cancel the currently executing query.

Raises:: ProgrammingError – If no query is currently executing.

fetchone() → tuple[Any | None, ...] | dict[Any, Any | None] | None[source]¶

Fetch the next row of the result set.

Returns:: A tuple representing the next row, or None if no more rows.
Raises:: ProgrammingError – If no result set is available.

fetchmany(size: int | None = None) → list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶

Fetch multiple rows from the result set.

Parameters:: size – Maximum number of rows to fetch. Defaults to arraysize.
Returns:: List of tuples representing the fetched rows.
Raises:: ProgrammingError – If no result set is available.

fetchall() → list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶

Fetch all remaining rows from the result set.

Returns:: List of tuples representing all remaining rows.
Raises:: ProgrammingError – If no result set is available.

DEFAULT_FETCH_SIZE: int = 1000¶

DEFAULT_RESULT_REUSE_MINUTES = 60¶

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

property catalog: str | None¶

property completion_date_time: datetime | None¶

property connection: Connection[Any]¶

property data_manifest_location: str | None¶

property data_scanned_in_bytes: int | None¶

property database: str | None¶

property description: list[tuple[str, str, None, None, int, int, str]] | None¶

property effective_engine_version: str | None¶

property encryption_option: str | None¶

property engine_execution_time_in_millis: int | None¶

property error_category: int | None¶

property error_message: str | None¶

property error_type: int | None¶

abstractmethod execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, **kwargs)¶

property execution_parameters: list[str]¶

property expected_bucket_owner: str | None¶

static get_default_converter(unload: bool = False) → DefaultTypeConverter | Any¶

Get the default type converter for this cursor class.

Parameters:: unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
Returns:: The default type converter instance for this cursor type.

async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata¶

property has_result_set: bool¶

property kms_key: str | None¶

async list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase]¶

async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata]¶

property output_location: str | None¶

property query: str | None¶

property query_planning_time_in_millis: int | None¶

property query_queue_time_in_millis: int | None¶

property result_reuse_enabled: bool | None¶

property result_reuse_minutes: int | None¶

property retryable: bool | None¶

property reused_previous_result: bool | None¶

property s3_acl_option: str | None¶

property selected_engine_version: str | None¶

property service_processing_time_in_millis: int | None¶

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

property state: str | None¶

property state_change_reason: str | None¶

property statement_type: str | None¶

property submission_date_time: datetime | None¶

property substatement_type: str | None¶

property total_execution_time_in_millis: int | None¶

property work_group: str | None¶

Aio Pandas Cursor¶

class pyathena.aio.pandas.cursor.AioPandasCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, engine: str = 'auto', chunksize: int | None = None, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, auto_optimize_chunksize: bool = False, **kwargs)[source]¶

Native asyncio cursor that returns results as pandas DataFrames.

Uses asyncio.to_thread() for both result set creation and fetch operations, keeping the event loop free. This is especially important when chunksize is set, as fetch calls trigger lazy S3 reads.

Example

>>> async with await pyathena.aio_connect(...) as conn:
...     cursor = conn.cursor(AioPandasCursor)
...     await cursor.execute("SELECT * FROM my_table")
...     df = cursor.as_pandas()

__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, engine: str = 'auto', chunksize: int | None = None, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, auto_optimize_chunksize: bool = False, **kwargs) → None[source]¶

static get_default_converter(unload: bool = False) → DefaultPandasTypeConverter | Any[source]¶

Get the default type converter for this cursor class.

Parameters:: unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
Returns:: The default type converter instance for this cursor type.

async execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, keep_default_na: bool = False, na_values: Iterable[str] | None = ('',), quoting: int = 1, **kwargs) → AioPandasCursor[source]¶

Execute a SQL query asynchronously and return results as pandas DataFrames.

Parameters:

operation – SQL query string to execute.
parameters – Query parameters for parameterized queries.
work_group – Athena workgroup to use for this query.
s3_staging_dir – S3 location for query results.
cache_size – Number of queries to check for result caching.
cache_expiration_time – Cache expiration time in seconds.
result_reuse_enable – Enable Athena result reuse for this query.
result_reuse_minutes – Minutes to reuse cached results.
paramstyle – Parameter style (‘qmark’ or ‘pyformat’).
keep_default_na – Whether to keep default pandas NA values.
na_values – Additional values to treat as NA.
quoting – CSV quoting behavior (pandas csv.QUOTE_* constants).
**kwargs – Additional pandas read_csv/read_parquet parameters.

Returns:

Self reference for method chaining.

async fetchone() → tuple[Any | None, ...] | dict[Any, Any | None] | None[source]¶

Fetch the next row of the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop when chunksize triggers lazy S3 reads.

Returns:: A tuple representing the next row, or None if no more rows.
Raises:: ProgrammingError – If no result set is available.

async fetchmany(size: int | None = None) → list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶

Fetch multiple rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop when chunksize triggers lazy S3 reads.

Parameters:: size – Maximum number of rows to fetch. Defaults to arraysize.
Returns:: List of tuples representing the fetched rows.
Raises:: ProgrammingError – If no result set is available.

async fetchall() → list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶

Fetch all remaining rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop when chunksize triggers lazy S3 reads.

Returns:: List of tuples representing all remaining rows.
Raises:: ProgrammingError – If no result set is available.

as_pandas() → DataFrame | PandasDataFrameIterator[source]¶

Return DataFrame or PandasDataFrameIterator based on chunksize setting.

Returns:: DataFrame when chunksize is None, PandasDataFrameIterator when chunksize is set.

DEFAULT_FETCH_SIZE: int = 1000¶

DEFAULT_RESULT_REUSE_MINUTES = 60¶

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

property arraysize: int¶

async cancel() → None¶

Cancel the currently executing query.

Raises:: ProgrammingError – If no query is currently executing.

property catalog: str | None¶

close() → None¶: Close the cursor and release associated resources.

property completion_date_time: datetime | None¶

property connection: Connection[Any]¶

property data_manifest_location: str | None¶

property data_scanned_in_bytes: int | None¶

property database: str | None¶

property description: list[tuple[str, str, None, None, int, int, str]] | None¶

property effective_engine_version: str | None¶

property encryption_option: str | None¶

property engine_execution_time_in_millis: int | None¶

property error_category: int | None¶

property error_message: str | None¶

property error_type: int | None¶

async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None¶

Execute a SQL query multiple times with different parameters.

Parameters:

operation – SQL query string to execute.
seq_of_parameters – Sequence of parameter sets, one per execution.
**kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]¶

property expected_bucket_owner: str | None¶

async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata¶

property has_result_set: bool¶

property kms_key: str | None¶

async list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase]¶

async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata]¶

property output_location: str | None¶

property query: str | None¶

property query_id: str | None¶

property query_planning_time_in_millis: int | None¶

property query_queue_time_in_millis: int | None¶

property result_reuse_enabled: bool | None¶

property result_reuse_minutes: int | None¶

property result_set: AthenaResultSet | None¶

property retryable: bool | None¶

property reused_previous_result: bool | None¶

property rowcount: int¶

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:: The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None¶

property s3_acl_option: str | None¶

property selected_engine_version: str | None¶

property service_processing_time_in_millis: int | None¶

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

property state: str | None¶

property state_change_reason: str | None¶

property statement_type: str | None¶

property submission_date_time: datetime | None¶

property substatement_type: str | None¶

property total_execution_time_in_millis: int | None¶

property work_group: str | None¶

Aio Arrow Cursor¶

class pyathena.aio.arrow.cursor.AioArrowCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, connect_timeout: float | None = None, request_timeout: float | None = None, **kwargs)[source]¶

Native asyncio cursor that returns results as Apache Arrow Tables.

Uses asyncio.to_thread() for both result set creation and fetch operations, keeping the event loop free.

Example

>>> async with await pyathena.aio_connect(...) as conn:
...     cursor = conn.cursor(AioArrowCursor)
...     await cursor.execute("SELECT * FROM my_table")
...     table = cursor.as_arrow()

__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, connect_timeout: float | None = None, request_timeout: float | None = None, **kwargs) → None[source]¶

static get_default_converter(unload: bool = False) → DefaultArrowTypeConverter | DefaultArrowUnloadTypeConverter | Any[source]¶

Get the default type converter for this cursor class.

Parameters:: unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
Returns:: The default type converter instance for this cursor type.

Execute a SQL query asynchronously and return results as Arrow Tables.

Parameters:

operation – SQL query string to execute.
parameters – Query parameters for parameterized queries.
work_group – Athena workgroup to use for this query.
s3_staging_dir – S3 location for query results.
cache_size – Number of queries to check for result caching.
cache_expiration_time – Cache expiration time in seconds.
result_reuse_enable – Enable Athena result reuse for this query.
result_reuse_minutes – Minutes to reuse cached results.
paramstyle – Parameter style (‘qmark’ or ‘pyformat’).
**kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

async fetchone() → tuple[Any | None, ...] | dict[Any, Any | None] | None[source]¶

Fetch the next row of the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop.

Returns:: A tuple representing the next row, or None if no more rows.
Raises:: ProgrammingError – If no result set is available.

async fetchmany(size: int | None = None) → list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶

Fetch multiple rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop.

Parameters:: size – Maximum number of rows to fetch. Defaults to arraysize.
Returns:: List of tuples representing the fetched rows.
Raises:: ProgrammingError – If no result set is available.

async fetchall() → list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶

Fetch all remaining rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop.

Returns:: List of tuples representing all remaining rows.
Raises:: ProgrammingError – If no result set is available.

as_arrow() → Table[source]¶

Return query results as an Apache Arrow Table.

Returns:: Apache Arrow Table containing all query results.

as_polars() → pl.DataFrame[source]¶

Return query results as a Polars DataFrame.

Returns:: Polars DataFrame containing all query results.

DEFAULT_FETCH_SIZE: int = 1000¶

DEFAULT_RESULT_REUSE_MINUTES = 60¶

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

property arraysize: int¶

async cancel() → None¶

Cancel the currently executing query.

Raises:: ProgrammingError – If no query is currently executing.

property catalog: str | None¶

close() → None¶: Close the cursor and release associated resources.

property completion_date_time: datetime | None¶

property connection: Connection[Any]¶

property data_manifest_location: str | None¶

property data_scanned_in_bytes: int | None¶

property database: str | None¶

property description: list[tuple[str, str, None, None, int, int, str]] | None¶

property effective_engine_version: str | None¶

property encryption_option: str | None¶

property engine_execution_time_in_millis: int | None¶

property error_category: int | None¶

property error_message: str | None¶

property error_type: int | None¶

async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None¶

Execute a SQL query multiple times with different parameters.

Parameters:

operation – SQL query string to execute.
seq_of_parameters – Sequence of parameter sets, one per execution.
**kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]¶

property expected_bucket_owner: str | None¶

async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata¶

property has_result_set: bool¶

property kms_key: str | None¶

async list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase]¶

async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata]¶

property output_location: str | None¶

property query: str | None¶

property query_id: str | None¶

property query_planning_time_in_millis: int | None¶

property query_queue_time_in_millis: int | None¶

property result_reuse_enabled: bool | None¶

property result_reuse_minutes: int | None¶

property result_set: AthenaResultSet | None¶

property retryable: bool | None¶

property reused_previous_result: bool | None¶

property rowcount: int¶

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:: The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None¶

property s3_acl_option: str | None¶

property selected_engine_version: str | None¶

property service_processing_time_in_millis: int | None¶

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

property state: str | None¶

property state_change_reason: str | None¶

property statement_type: str | None¶

property submission_date_time: datetime | None¶

property substatement_type: str | None¶

property total_execution_time_in_millis: int | None¶

property work_group: str | None¶

Aio Polars Cursor¶

class pyathena.aio.polars.cursor.AioPolarsCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, chunksize: int | None = None, **kwargs)[source]¶

Native asyncio cursor that returns results as Polars DataFrames.

Uses asyncio.to_thread() for both result set creation and fetch operations, keeping the event loop free. This is especially important when chunksize is set, as fetch calls trigger lazy S3 reads.

Example

>>> async with await pyathena.aio_connect(...) as conn:
...     cursor = conn.cursor(AioPolarsCursor)
...     await cursor.execute("SELECT * FROM my_table")
...     df = cursor.as_polars()

__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, chunksize: int | None = None, **kwargs) → None[source]¶

static get_default_converter(unload: bool = False) → DefaultPolarsTypeConverter | DefaultPolarsUnloadTypeConverter | Any[source]¶

Get the default type converter for this cursor class.

Parameters:: unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
Returns:: The default type converter instance for this cursor type.

Execute a SQL query asynchronously and return results as Polars DataFrames.

Parameters:

operation – SQL query string to execute.
parameters – Query parameters for parameterized queries.
work_group – Athena workgroup to use for this query.
s3_staging_dir – S3 location for query results.
cache_size – Number of queries to check for result caching.
cache_expiration_time – Cache expiration time in seconds.
result_reuse_enable – Enable Athena result reuse for this query.
result_reuse_minutes – Minutes to reuse cached results.
paramstyle – Parameter style (‘qmark’ or ‘pyformat’).
**kwargs – Additional execution parameters passed to Polars read functions.

Returns:

Self reference for method chaining.

async fetchone() → tuple[Any | None, ...] | dict[Any, Any | None] | None[source]¶

Fetch the next row of the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop when chunksize triggers lazy S3 reads.

Returns:: A tuple representing the next row, or None if no more rows.
Raises:: ProgrammingError – If no result set is available.

async fetchmany(size: int | None = None) → list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶

Fetch multiple rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop when chunksize triggers lazy S3 reads.

Parameters:: size – Maximum number of rows to fetch. Defaults to arraysize.
Returns:: List of tuples representing the fetched rows.
Raises:: ProgrammingError – If no result set is available.

async fetchall() → list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶

Fetch all remaining rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop when chunksize triggers lazy S3 reads.

Returns:: List of tuples representing all remaining rows.
Raises:: ProgrammingError – If no result set is available.

as_polars() → pl.DataFrame[source]¶

Return query results as a Polars DataFrame.

Returns:: Polars DataFrame containing all query results.

as_arrow() → Table[source]¶

Return query results as an Apache Arrow Table.

Returns:: Apache Arrow Table containing all query results.

DEFAULT_FETCH_SIZE: int = 1000¶

DEFAULT_RESULT_REUSE_MINUTES = 60¶

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

property arraysize: int¶

async cancel() → None¶

Cancel the currently executing query.

Raises:: ProgrammingError – If no query is currently executing.

property catalog: str | None¶

close() → None¶: Close the cursor and release associated resources.

property completion_date_time: datetime | None¶

property connection: Connection[Any]¶

property data_manifest_location: str | None¶

property data_scanned_in_bytes: int | None¶

property database: str | None¶

property description: list[tuple[str, str, None, None, int, int, str]] | None¶

property effective_engine_version: str | None¶

property encryption_option: str | None¶

property engine_execution_time_in_millis: int | None¶

property error_category: int | None¶

property error_message: str | None¶

property error_type: int | None¶

async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None¶

Execute a SQL query multiple times with different parameters.

Parameters:

operation – SQL query string to execute.
seq_of_parameters – Sequence of parameter sets, one per execution.
**kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]¶

property expected_bucket_owner: str | None¶

async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata¶

property has_result_set: bool¶

property kms_key: str | None¶

async list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase]¶

async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata]¶

property output_location: str | None¶

property query: str | None¶

property query_id: str | None¶

property query_planning_time_in_millis: int | None¶

property query_queue_time_in_millis: int | None¶

property result_reuse_enabled: bool | None¶

property result_reuse_minutes: int | None¶

property result_set: AthenaResultSet | None¶

property retryable: bool | None¶

property reused_previous_result: bool | None¶

property rowcount: int¶

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:: The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None¶

property s3_acl_option: str | None¶

property selected_engine_version: str | None¶

property service_processing_time_in_millis: int | None¶

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

property state: str | None¶

property state_change_reason: str | None¶

property statement_type: str | None¶

property submission_date_time: datetime | None¶

property substatement_type: str | None¶

property total_execution_time_in_millis: int | None¶

property work_group: str | None¶

Aio S3FS Cursor¶

class pyathena.aio.s3fs.cursor.AioS3FSCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, csv_reader: type[DefaultCSVReader] | type[AthenaCSVReader] | None = None, **kwargs)[source]¶

Native asyncio cursor that reads CSV results via AioS3FileSystem.

Uses AioS3FileSystem for S3 operations, which replaces ThreadPoolExecutor parallelism with asyncio.gather + asyncio.to_thread. Fetch operations are wrapped in asyncio.to_thread() because CSV reading is blocking I/O.

Example

>>> async with await pyathena.aio_connect(...) as conn:
...     cursor = conn.cursor(AioS3FSCursor)
...     await cursor.execute("SELECT * FROM my_table")
...     row = await cursor.fetchone()

__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, csv_reader: type[DefaultCSVReader] | type[AthenaCSVReader] | None = None, **kwargs) → None[source]¶

static get_default_converter(unload: bool = False) → DefaultS3FSTypeConverter[source]¶

Get the default type converter for S3FS cursor.

Parameters:: unload – Unused. S3FS cursor does not support UNLOAD operations.
Returns:: DefaultS3FSTypeConverter instance.

Execute a SQL query asynchronously via S3FileSystem CSV reader.

Parameters:

operation – SQL query string to execute.
parameters – Query parameters for parameterized queries.
work_group – Athena workgroup to use for this query.
s3_staging_dir – S3 location for query results.
cache_size – Number of queries to check for result caching.
cache_expiration_time – Cache expiration time in seconds.
result_reuse_enable – Enable Athena result reuse for this query.
result_reuse_minutes – Minutes to reuse cached results.
paramstyle – Parameter style (‘qmark’ or ‘pyformat’).
**kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

async fetchone() → tuple[Any | None, ...] | dict[Any, Any | None] | None[source]¶

Fetch the next row of the result set.

Wraps the synchronous fetch in asyncio.to_thread because AthenaS3FSResultSet reads rows lazily from S3.

Returns:: A tuple representing the next row, or None if no more rows.
Raises:: ProgrammingError – If no result set is available.

async fetchmany(size: int | None = None) → list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶

Fetch multiple rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread because AthenaS3FSResultSet reads rows lazily from S3.

Parameters:: size – Maximum number of rows to fetch. Defaults to arraysize.
Returns:: List of tuples representing the fetched rows.
Raises:: ProgrammingError – If no result set is available.

async fetchall() → list[tuple[Any | None, ...] | dict[Any, Any | None]][source]¶

Fetch all remaining rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread because AthenaS3FSResultSet reads rows lazily from S3.

Returns:: List of tuples representing all remaining rows.
Raises:: ProgrammingError – If no result set is available.

DEFAULT_FETCH_SIZE: int = 1000¶

DEFAULT_RESULT_REUSE_MINUTES = 60¶

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

property arraysize: int¶

async cancel() → None¶

Cancel the currently executing query.

Raises:: ProgrammingError – If no query is currently executing.

property catalog: str | None¶

close() → None¶: Close the cursor and release associated resources.

property completion_date_time: datetime | None¶

property connection: Connection[Any]¶

property data_manifest_location: str | None¶

property data_scanned_in_bytes: int | None¶

property database: str | None¶

property description: list[tuple[str, str, None, None, int, int, str]] | None¶

property effective_engine_version: str | None¶

property encryption_option: str | None¶

property engine_execution_time_in_millis: int | None¶

property error_category: int | None¶

property error_message: str | None¶

property error_type: int | None¶

async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None¶

Execute a SQL query multiple times with different parameters.

Parameters:

operation – SQL query string to execute.
seq_of_parameters – Sequence of parameter sets, one per execution.
**kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]¶

property expected_bucket_owner: str | None¶

async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata¶

property has_result_set: bool¶

property kms_key: str | None¶

async list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase]¶

async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata]¶

property output_location: str | None¶

property query: str | None¶

property query_id: str | None¶

property query_planning_time_in_millis: int | None¶

property query_queue_time_in_millis: int | None¶

property result_reuse_enabled: bool | None¶

property result_reuse_minutes: int | None¶

property result_set: AthenaResultSet | None¶

property retryable: bool | None¶

property reused_previous_result: bool | None¶

property rowcount: int¶

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:: The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None¶

property s3_acl_option: str | None¶

property selected_engine_version: str | None¶

property service_processing_time_in_millis: int | None¶

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

property state: str | None¶

property state_change_reason: str | None¶

property statement_type: str | None¶

property submission_date_time: datetime | None¶

property substatement_type: str | None¶

property total_execution_time_in_millis: int | None¶

property work_group: str | None¶

Aio Spark Cursor¶

class pyathena.aio.spark.cursor.AioSparkCursor(session_id: str | None = None, description: str | None = None, engine_configuration: dict[str, Any] | None = None, notebook_version: str | None = None, session_idle_timeout_minutes: int | None = None, **kwargs)[source]¶

Native asyncio cursor for executing PySpark code on Athena.

Overrides post-init I/O methods of SparkBaseCursor with async equivalents. Session management (_exists_session, _start_session, etc.) stays synchronous because __init__ runs inside asyncio.to_thread.

Since SparkBaseCursor.__init__ performs I/O (session management), cursor creation must be wrapped in asyncio.to_thread:

cursor = await asyncio.to_thread(conn.cursor)

Example

>>> import asyncio
>>> async with await pyathena.aio_connect(
...     work_group="spark-workgroup",
...     cursor_class=AioSparkCursor,
... ) as conn:
...     cursor = await asyncio.to_thread(conn.cursor)
...     await cursor.execute("spark.sql('SELECT 1').show()")
...     print(await cursor.get_std_out())

__init__(session_id: str | None = None, description: str | None = None, engine_configuration: dict[str, Any] | None = None, notebook_version: str | None = None, session_idle_timeout_minutes: int | None = None, **kwargs) → None[source]¶

property calculation_execution: AthenaCalculationExecution | None¶

async get_std_out() → str | None[source]¶

Get the standard output from the Spark calculation execution.

Returns:: The standard output as a string, or None if no output is available.

async get_std_error() → str | None[source]¶

Get the standard error from the Spark calculation execution.

Returns:: The standard error as a string, or None if no error output is available.

Execute PySpark code asynchronously.

Parameters:

operation – PySpark code to execute.
parameters – Unused, kept for API compatibility.
session_id – Spark session ID override.
description – Calculation description.
client_request_token – Idempotency token.
work_group – Unused, kept for API compatibility.
**kwargs – Additional parameters.

Returns:

Self reference for method chaining.

async cancel() → None[source]¶

Cancel the currently running calculation.

Raises:: ProgrammingError – If no calculation is running.

async close() → None[source]¶: Close the cursor by terminating the Spark session.

async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None[source]¶

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

property calculation_id: str | None¶

property completion_date_time: datetime | None¶

property connection: Connection[Any]¶

property description: str | None¶

property dpu_execution_in_millis: int | None¶

static get_default_converter(unload: bool = False) → DefaultTypeConverter | Any¶

Get the default type converter for this cursor class.

Parameters:: unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
Returns:: The default type converter instance for this cursor type.

static get_default_engine_configuration() → dict[str, Any]¶

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata¶

list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase]¶

list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata]¶

property progress: str | None¶

property result_s3_uri: str | None¶

property result_type: str | None¶

property session_id: str¶

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

property state: str | None¶

property state_change_reason: str | None¶

property std_error_s3_uri: str | None¶

property std_out_s3_uri: str | None¶

property submission_date_time: datetime | None¶

property working_directory: str | None¶