Native Asyncio

This section covers the native asyncio connection, cursors, and base classes.

Connection

class pyathena.DBAPITypeObject[source]

Type Objects and Constructors

https://www.python.org/dev/peps/pep-0249/#type-objects-and-constructors

pyathena.connect(*args, cursor_class: None = ..., **kwargs) Connection[Cursor][source]
pyathena.connect(*args, cursor_class: type[ConnectionCursor], **kwargs) Connection[ConnectionCursor]

Create a new database connection to Amazon Athena.

This function provides the main entry point for establishing connections to Amazon Athena. It follows the DB API 2.0 specification and returns a Connection object that can be used to create cursors for executing SQL queries.

Parameters:
  • s3_staging_dir – S3 location to store query results. Required if not using workgroups or if the workgroup doesn’t have a result location. Pass an empty string to explicitly disable S3 staging and skip the AWS_ATHENA_S3_STAGING_DIR environment variable fallback (required for workgroups with managed query result storage).

  • region_name – AWS region name. If not specified, uses the default region from your AWS configuration.

  • schema_name – Athena database/schema name. Defaults to “default”.

  • catalog_name – Athena data catalog name. Defaults to “awsdatacatalog”.

  • work_group – Athena workgroup name. Can be used instead of s3_staging_dir if the workgroup has a result location configured.

  • poll_interval – Time in seconds between polling for query completion. Defaults to 1.0.

  • encryption_option – S3 encryption option for query results. Can be “SSE_S3”, “SSE_KMS”, or “CSE_KMS”.

  • kms_key – KMS key ID for encryption when using SSE_KMS or CSE_KMS.

  • profile_name – AWS profile name to use for authentication.

  • role_arn – ARN of IAM role to assume for authentication.

  • role_session_name – Session name when assuming a role.

  • cursor_class – Custom cursor class to use. If not specified, uses the default Cursor class.

  • kill_on_interrupt – Whether to cancel running queries when interrupted. Defaults to True.

  • **kwargs – Additional keyword arguments passed to the Connection constructor.

Returns:

A Connection object that can be used to create cursors and execute queries.

Raises:

ProgrammingError – If neither s3_staging_dir nor work_group is provided.

Example

>>> import pyathena
>>> conn = pyathena.connect(
...     s3_staging_dir='s3://my-bucket/staging/',
...     region_name='us-east-1',
...     schema_name='mydatabase'
... )
>>> cursor = conn.cursor()
>>> cursor.execute("SELECT * FROM mytable LIMIT 10")
>>> results = cursor.fetchall()
async pyathena.aio_connect(*args, **kwargs) AioConnection[source]

Create a new async database connection to Amazon Athena.

This is the async counterpart of connect(). It returns an AioConnection whose cursors use native asyncio for polling and API calls, keeping the event loop free.

Parameters:

**kwargs – Arguments forwarded to AioConnection.create(). See connect() for the full list of supported arguments.

Returns:

An AioConnection that produces AioCursor instances by default.

Example

>>> import pyathena
>>> conn = await pyathena.aio_connect(
...     s3_staging_dir='s3://my-bucket/staging/',
...     region_name='us-east-1',
... )
>>> async with conn.cursor() as cursor:
...     await cursor.execute("SELECT 1")
...     print(await cursor.fetchone())
class pyathena.aio.connection.AioConnection(**kwargs: Any)[source]

Async-aware connection to Amazon Athena.

Wraps the synchronous Connection with async context manager support and provides create() for non-blocking initialization.

Example

>>> async with await AioConnection.create(
...     s3_staging_dir="s3://bucket/path/",
...     region_name="us-east-1",
... ) as conn:
...     async with conn.cursor() as cursor:
...         await cursor.execute("SELECT 1")
...         print(await cursor.fetchone())
__init__(**kwargs: Any) None[source]

Initialize a new Athena database connection.

Parameters:
  • s3_staging_dir – S3 location to store query results. Required if not using workgroups or if workgroup doesn’t have result location. Pass an empty string to explicitly disable S3 staging and skip the AWS_ATHENA_S3_STAGING_DIR environment variable fallback. This is required when connecting to a workgroup with managed query result storage enabled.

  • region_name – AWS region name. Uses default region if not specified.

  • schema_name – Default database/schema name. Defaults to “default”.

  • catalog_name – Data catalog name. Defaults to “awsdatacatalog”.

  • work_group – Athena workgroup name. Can substitute for s3_staging_dir if workgroup has result location configured.

  • poll_interval – Seconds between query status polls. Defaults to 1.0.

  • encryption_option – S3 encryption for results (“SSE_S3”, “SSE_KMS”, “CSE_KMS”).

  • kms_key – KMS key ID when using SSE_KMS or CSE_KMS encryption.

  • profile_name – AWS profile name for authentication.

  • role_arn – IAM role ARN to assume for authentication.

  • role_session_name – Session name when assuming IAM role.

  • external_id – External ID for role assumption (if required by role).

  • serial_number – MFA device serial number for role assumption.

  • duration_seconds – Role session duration in seconds. Defaults to 3600.

  • converter – Custom type converter. Uses DefaultTypeConverter if None.

  • formatter – Custom parameter formatter. Uses DefaultParameterFormatter if None.

  • retry_config – Retry configuration for API calls. Uses default if None.

  • cursor_class – Default cursor class for this connection.

  • cursor_kwargs – Default keyword arguments for cursor creation.

  • kill_on_interrupt – Cancel running queries on interrupt. Defaults to True.

  • session – Pre-configured boto3 Session. Creates new session if None.

  • config – Boto3 Config object for client configuration.

  • result_reuse_enable – Enable Athena query result reuse. Defaults to False.

  • result_reuse_minutes – Minutes to reuse cached results.

  • on_start_query_execution – Callback function called when query starts.

  • **kwargs – Additional arguments passed to boto3 Session and client.

Raises:

ProgrammingError – If neither s3_staging_dir nor work_group is provided.

Note

Either s3_staging_dir or work_group must be specified. Environment variables AWS_ATHENA_S3_STAGING_DIR and AWS_ATHENA_WORK_GROUP are checked if parameters are not provided.

When using a workgroup with managed query result storage, pass s3_staging_dir="" to prevent the environment variable fallback from sending a ResultConfiguration that conflicts with ManagedQueryResultsConfiguration.

async classmethod create(**kwargs: Any) AioConnection[source]

Async factory for creating an AioConnection.

Runs the (potentially blocking) __init__ in a thread so that STS calls (role_arn / serial_number) do not block the loop.

Parameters:

**kwargs – Arguments forwarded to AioConnection.__init__.

Returns:

A fully initialized AioConnection.

__enter__()

Enter the runtime context for the connection.

Returns:

Self for use in context manager protocol.

__exit__(exc_type, exc_val, exc_tb)

Exit the runtime context and close the connection.

Parameters:
  • exc_type – Exception type if an exception occurred.

  • exc_val – Exception value if an exception occurred.

  • exc_tb – Exception traceback if an exception occurred.

property client: BaseClient

Get the boto3 Athena client used for query operations.

Returns:

The configured boto3 Athena client.

close() None

Close the connection.

Closes the database connection. This method is provided for DB API 2.0 compatibility. Since Athena connections are stateless, this method currently does not perform any actual cleanup operations.

Note

This method is called automatically when using the connection as a context manager (with statement).

commit() None

Commit any pending transaction.

This method is provided for DB API 2.0 compatibility. Since Athena does not support transactions, this method does nothing.

Note

Athena queries are auto-committed and cannot be rolled back.

cursor(cursor: type[FunctionalCursor] | None = None, **kwargs) FunctionalCursor | ConnectionCursor

Create a new cursor object for executing queries.

Creates and returns a cursor object that can be used to execute SQL queries against Amazon Athena. The cursor inherits connection settings but can be customized with additional parameters.

Parameters:
  • cursor – Custom cursor class to use. If not provided, uses the connection’s default cursor class.

  • **kwargs – Additional keyword arguments to pass to the cursor constructor. These override connection defaults.

Returns:

A cursor object that can execute SQL queries.

Example

>>> cursor = connection.cursor()
>>> cursor.execute("SELECT * FROM my_table LIMIT 10")
>>> results = cursor.fetchall()

# Using a custom cursor type >>> from pyathena.pandas.cursor import PandasCursor >>> pandas_cursor = connection.cursor(PandasCursor) >>> df = pandas_cursor.execute(“SELECT * FROM my_table”).fetchall()

property retry_config: RetryConfig

Get the retry configuration for AWS API calls.

Returns:

The RetryConfig object that controls retry behavior for failed requests.

rollback() None

Rollback any pending transaction.

This method is required by DB API 2.0 but is not supported by Athena since Athena does not support transactions.

Raises:

NotSupportedError – Always raised since transactions are not supported.

property session: Session

Get the boto3 session used for AWS API calls.

Returns:

The configured boto3 Session object.

Aio Cursors

class pyathena.aio.cursor.AioCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, **kwargs)[source]

Native asyncio cursor for Amazon Athena.

Unlike AsyncCursor (which uses ThreadPoolExecutor), this cursor uses asyncio.sleep for polling and asyncio.to_thread for boto3 calls, keeping the event loop free.

Example

>>> async with AioConnection.create(...) as conn:
...     async with conn.cursor() as cursor:
...         await cursor.execute("SELECT * FROM my_table")
...         rows = await cursor.fetchall()
__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, **kwargs) None[source]
property arraysize: int
async execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int = 0, cache_expiration_time: int = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) AioCursor[source]

Execute a SQL query asynchronously.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters (optional).

  • work_group – Athena workgroup to use (optional).

  • s3_staging_dir – S3 location for query results (optional).

  • cache_size – Query result cache size (optional).

  • cache_expiration_time – Cache expiration time in seconds (optional).

  • result_reuse_enable – Enable result reuse (optional).

  • result_reuse_minutes – Result reuse duration in minutes (optional).

  • paramstyle – Parameter style to use (optional).

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.

  • **kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

async fetchone() Any | dict[Any, Any | None] | None[source]

Fetch the next row of a query result set.

Returns:

A tuple representing the next row, or None if no more rows.

Raises:

ProgrammingError – If called before executing a query that returns results.

async fetchmany(size: int | None = None) list[Any | dict[Any, Any | None]][source]

Fetch multiple rows from a query result set.

Parameters:

size – Maximum number of rows to fetch. If None, uses arraysize.

Returns:

List of tuples representing the fetched rows.

Raises:

ProgrammingError – If called before executing a query that returns results.

async fetchall() list[Any | dict[Any, Any | None]][source]

Fetch all remaining rows from a query result set.

Returns:

List of tuples representing all remaining rows in the result set.

Raises:

ProgrammingError – If called before executing a query that returns results.

DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
async cancel() None

Cancel the currently executing query.

Raises:

ProgrammingError – If no query is currently executing.

property catalog: str | None
close() None

Close the cursor and release associated resources.

property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute a SQL query multiple times with different parameters.

Parameters:
  • operation – SQL query string to execute.

  • seq_of_parameters – Sequence of parameter sets, one per execution.

  • **kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]
property expected_bucket_owner: str | None
static get_default_converter(unload: bool = False) DefaultTypeConverter | Any

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
property has_result_set: bool
property kms_key: str | None
async list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property result_set: AthenaResultSet | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:

The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None
class pyathena.aio.cursor.AioDictCursor(**kwargs)[source]

Native asyncio cursor that returns rows as dictionaries.

Example

>>> async with AioConnection.create(...) as conn:
...     cursor = conn.cursor(AioDictCursor)
...     await cursor.execute("SELECT id, name FROM users")
...     row = await cursor.fetchone()
...     print(row["name"])
__init__(**kwargs) None[source]
DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
property arraysize: int
async cancel() None

Cancel the currently executing query.

Raises:

ProgrammingError – If no query is currently executing.

property catalog: str | None
close() None

Close the cursor and release associated resources.

property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
async execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int = 0, cache_expiration_time: int = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) AioCursor

Execute a SQL query asynchronously.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters (optional).

  • work_group – Athena workgroup to use (optional).

  • s3_staging_dir – S3 location for query results (optional).

  • cache_size – Query result cache size (optional).

  • cache_expiration_time – Cache expiration time in seconds (optional).

  • result_reuse_enable – Enable result reuse (optional).

  • result_reuse_minutes – Result reuse duration in minutes (optional).

  • paramstyle – Parameter style to use (optional).

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.

  • **kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute a SQL query multiple times with different parameters.

Parameters:
  • operation – SQL query string to execute.

  • seq_of_parameters – Sequence of parameter sets, one per execution.

  • **kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]
property expected_bucket_owner: str | None
async fetchall() list[Any | dict[Any, Any | None]]

Fetch all remaining rows from a query result set.

Returns:

List of tuples representing all remaining rows in the result set.

Raises:

ProgrammingError – If called before executing a query that returns results.

async fetchmany(size: int | None = None) list[Any | dict[Any, Any | None]]

Fetch multiple rows from a query result set.

Parameters:

size – Maximum number of rows to fetch. If None, uses arraysize.

Returns:

List of tuples representing the fetched rows.

Raises:

ProgrammingError – If called before executing a query that returns results.

async fetchone() Any | dict[Any, Any | None] | None

Fetch the next row of a query result set.

Returns:

A tuple representing the next row, or None if no more rows.

Raises:

ProgrammingError – If called before executing a query that returns results.

static get_default_converter(unload: bool = False) DefaultTypeConverter | Any

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
property has_result_set: bool
property kms_key: str | None
async list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property result_set: AthenaResultSet | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:

The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None

Aio Result Set

class pyathena.aio.result_set.AthenaAioResultSet(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, result_set_type_hints: dict[str | int, str] | None = None)[source]

Async result set that provides async fetch methods.

Skips the synchronous _pre_fetch by passing _pre_fetch=False to the parent __init__ and provides an async create() classmethod factory instead.

__init__(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, result_set_type_hints: dict[str | int, str] | None = None) None[source]
async classmethod create(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, result_set_type_hints: dict[str | int, str] | None = None) AthenaAioResultSet[source]

Async factory method.

Creates an AthenaAioResultSet and awaits the initial data fetch.

Parameters:
  • connection – The database connection.

  • converter – Type converter for result values.

  • query_execution – Query execution metadata.

  • arraysize – Number of rows to fetch per request.

  • retry_config – Retry configuration for API calls.

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion.

Returns:

A fully initialized AthenaAioResultSet.

async fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None[source]

Fetch the next row of the result set.

Automatically fetches the next page from Athena when the current page is exhausted and more pages are available.

Returns:

A tuple representing the next row, or None if no more rows.

async fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch multiple rows from the result set.

Parameters:

size – Maximum number of rows to fetch. If None, uses arraysize.

Returns:

List of row tuples. May contain fewer rows than requested if fewer are available.

async fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch all remaining rows from the result set.

Returns:

List of all remaining row tuples.

DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
property arraysize: int
property catalog: str | None
close() None
property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
property execution_parameters: list[str]
property expected_bucket_owner: str | None
property is_closed: bool
property is_unload: bool

Check if the query is an UNLOAD statement.

Returns:

True if the query is an UNLOAD statement, False otherwise.

property kms_key: str | None
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int
property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None
class pyathena.aio.result_set.AthenaAioDictResultSet(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, result_set_type_hints: dict[str | int, str] | None = None)[source]

Async result set that returns rows as dictionaries.

Inherits _get_rows from AthenaDictResultSet and async fetch methods from AthenaAioResultSet via multiple inheritance.

DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
__init__(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, result_set_type_hints: dict[str | int, str] | None = None) None
property arraysize: int
property catalog: str | None
close() None
property completion_date_time: datetime | None
property connection: Connection[Any]
async classmethod create(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, result_set_type_hints: dict[str | int, str] | None = None) AthenaAioResultSet

Async factory method.

Creates an AthenaAioResultSet and awaits the initial data fetch.

Parameters:
  • connection – The database connection.

  • converter – Type converter for result values.

  • query_execution – Query execution metadata.

  • arraysize – Number of rows to fetch per request.

  • retry_config – Retry configuration for API calls.

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion.

Returns:

A fully initialized AthenaAioResultSet.

property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
dict_type

alias of dict

property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
property execution_parameters: list[str]
property expected_bucket_owner: str | None
async fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]]

Fetch all remaining rows from the result set.

Returns:

List of all remaining row tuples.

async fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]]

Fetch multiple rows from the result set.

Parameters:

size – Maximum number of rows to fetch. If None, uses arraysize.

Returns:

List of row tuples. May contain fewer rows than requested if fewer are available.

async fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None

Fetch the next row of the result set.

Automatically fetches the next page from Athena when the current page is exhausted and more pages are available.

Returns:

A tuple representing the next row, or None if no more rows.

property is_closed: bool
property is_unload: bool

Check if the query is an UNLOAD statement.

Returns:

True if the query is an UNLOAD statement, False otherwise.

property kms_key: str | None
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int
property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None

Aio Base Classes

class pyathena.aio.common.AioBaseCursor(connection: Connection[Any], converter: Converter, formatter: Formatter, retry_config: RetryConfig, s3_staging_dir: str | None, schema_name: str | None, catalog_name: str | None, work_group: str | None, poll_interval: float, encryption_option: str | None, kms_key: str | None, kill_on_interrupt: bool, result_reuse_enable: bool, result_reuse_minutes: int, **kwargs)[source]

Async base cursor that overrides I/O methods with async equivalents.

Reuses BaseCursor.__init__, all _build_* methods, and constants. Only the methods that perform network I/O or blocking sleep are overridden to use asyncio.to_thread / asyncio.sleep.

async list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase][source]
async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata[source]
async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata][source]
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
__init__(connection: Connection[Any], converter: Converter, formatter: Formatter, retry_config: RetryConfig, s3_staging_dir: str | None, schema_name: str | None, catalog_name: str | None, work_group: str | None, poll_interval: float, encryption_option: str | None, kms_key: str | None, kill_on_interrupt: bool, result_reuse_enable: bool, result_reuse_minutes: int, **kwargs) None
abstractmethod close() None
property connection: Connection[Any]
abstractmethod execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, **kwargs)
abstractmethod executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None
static get_default_converter(unload: bool = False) DefaultTypeConverter | Any

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

class pyathena.aio.common.WithAsyncFetch(**kwargs)[source]

Mixin providing shared fetch, lifecycle, and async protocol for SQL cursors.

Provides properties (arraysize, result_set, query_id, rownumber, rowcount), lifecycle methods (close, executemany, cancel), default sync fetch (for cursors whose result sets load all data eagerly in __init__), and the async iteration protocol.

Subclasses override execute() and optionally __init__ and format-specific helpers.

__init__(**kwargs) None[source]
property arraysize: int
property result_set: AthenaResultSet | None
property query_id: str | None
property rownumber: int | None
property rowcount: int

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:

The number of rows, or -1 if not applicable or unknown.

close() None[source]

Close the cursor and release associated resources.

async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None[source]

Execute a SQL query multiple times with different parameters.

Parameters:
  • operation – SQL query string to execute.

  • seq_of_parameters – Sequence of parameter sets, one per execution.

  • **kwargs – Additional keyword arguments passed to each execute().

async cancel() None[source]

Cancel the currently executing query.

Raises:

ProgrammingError – If no query is currently executing.

fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None[source]

Fetch the next row of the result set.

Returns:

A tuple representing the next row, or None if no more rows.

Raises:

ProgrammingError – If no result set is available.

fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch multiple rows from the result set.

Parameters:

size – Maximum number of rows to fetch. Defaults to arraysize.

Returns:

List of tuples representing the fetched rows.

Raises:

ProgrammingError – If no result set is available.

fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch all remaining rows from the result set.

Returns:

List of tuples representing all remaining rows.

Raises:

ProgrammingError – If no result set is available.

DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
property catalog: str | None
property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
abstractmethod execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, **kwargs)
property execution_parameters: list[str]
property expected_bucket_owner: str | None
static get_default_converter(unload: bool = False) DefaultTypeConverter | Any

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
property has_result_set: bool
property kms_key: str | None
async list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
property output_location: str | None
property query: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property retryable: bool | None
property reused_previous_result: bool | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None

Aio Pandas Cursor

class pyathena.aio.pandas.cursor.AioPandasCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, engine: str = 'auto', chunksize: int | None = None, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, auto_optimize_chunksize: bool = False, **kwargs)[source]

Native asyncio cursor that returns results as pandas DataFrames.

Uses asyncio.to_thread() for both result set creation and fetch operations, keeping the event loop free. This is especially important when chunksize is set, as fetch calls trigger lazy S3 reads.

Example

>>> async with await pyathena.aio_connect(...) as conn:
...     cursor = conn.cursor(AioPandasCursor)
...     await cursor.execute("SELECT * FROM my_table")
...     df = cursor.as_pandas()
__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, engine: str = 'auto', chunksize: int | None = None, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, auto_optimize_chunksize: bool = False, **kwargs) None[source]
static get_default_converter(unload: bool = False) DefaultPandasTypeConverter | Any[source]

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

async execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, keep_default_na: bool = False, na_values: Iterable[str] | None = ('',), quoting: int = 1, **kwargs) AioPandasCursor[source]

Execute a SQL query asynchronously and return results as pandas DataFrames.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters for parameterized queries.

  • work_group – Athena workgroup to use for this query.

  • s3_staging_dir – S3 location for query results.

  • cache_size – Number of queries to check for result caching.

  • cache_expiration_time – Cache expiration time in seconds.

  • result_reuse_enable – Enable Athena result reuse for this query.

  • result_reuse_minutes – Minutes to reuse cached results.

  • paramstyle – Parameter style (‘qmark’ or ‘pyformat’).

  • keep_default_na – Whether to keep default pandas NA values.

  • na_values – Additional values to treat as NA.

  • quoting – CSV quoting behavior (pandas csv.QUOTE_* constants).

  • **kwargs – Additional pandas read_csv/read_parquet parameters.

Returns:

Self reference for method chaining.

async fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None[source]

Fetch the next row of the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop when chunksize triggers lazy S3 reads.

Returns:

A tuple representing the next row, or None if no more rows.

Raises:

ProgrammingError – If no result set is available.

async fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch multiple rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop when chunksize triggers lazy S3 reads.

Parameters:

size – Maximum number of rows to fetch. Defaults to arraysize.

Returns:

List of tuples representing the fetched rows.

Raises:

ProgrammingError – If no result set is available.

async fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch all remaining rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop when chunksize triggers lazy S3 reads.

Returns:

List of tuples representing all remaining rows.

Raises:

ProgrammingError – If no result set is available.

as_pandas() DataFrame | PandasDataFrameIterator[source]

Return DataFrame or PandasDataFrameIterator based on chunksize setting.

Returns:

DataFrame when chunksize is None, PandasDataFrameIterator when chunksize is set.

DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
property arraysize: int
async cancel() None

Cancel the currently executing query.

Raises:

ProgrammingError – If no query is currently executing.

property catalog: str | None
close() None

Close the cursor and release associated resources.

property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute a SQL query multiple times with different parameters.

Parameters:
  • operation – SQL query string to execute.

  • seq_of_parameters – Sequence of parameter sets, one per execution.

  • **kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]
property expected_bucket_owner: str | None
async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
property has_result_set: bool
property kms_key: str | None
async list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property result_set: AthenaResultSet | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:

The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None

Aio Arrow Cursor

class pyathena.aio.arrow.cursor.AioArrowCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, connect_timeout: float | None = None, request_timeout: float | None = None, **kwargs)[source]

Native asyncio cursor that returns results as Apache Arrow Tables.

Uses asyncio.to_thread() for both result set creation and fetch operations, keeping the event loop free.

Example

>>> async with await pyathena.aio_connect(...) as conn:
...     cursor = conn.cursor(AioArrowCursor)
...     await cursor.execute("SELECT * FROM my_table")
...     table = cursor.as_arrow()
__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, connect_timeout: float | None = None, request_timeout: float | None = None, **kwargs) None[source]
static get_default_converter(unload: bool = False) DefaultArrowTypeConverter | DefaultArrowUnloadTypeConverter | Any[source]

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

async execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, **kwargs) AioArrowCursor[source]

Execute a SQL query asynchronously and return results as Arrow Tables.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters for parameterized queries.

  • work_group – Athena workgroup to use for this query.

  • s3_staging_dir – S3 location for query results.

  • cache_size – Number of queries to check for result caching.

  • cache_expiration_time – Cache expiration time in seconds.

  • result_reuse_enable – Enable Athena result reuse for this query.

  • result_reuse_minutes – Minutes to reuse cached results.

  • paramstyle – Parameter style (‘qmark’ or ‘pyformat’).

  • **kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

async fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None[source]

Fetch the next row of the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop.

Returns:

A tuple representing the next row, or None if no more rows.

Raises:

ProgrammingError – If no result set is available.

async fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch multiple rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop.

Parameters:

size – Maximum number of rows to fetch. Defaults to arraysize.

Returns:

List of tuples representing the fetched rows.

Raises:

ProgrammingError – If no result set is available.

async fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch all remaining rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop.

Returns:

List of tuples representing all remaining rows.

Raises:

ProgrammingError – If no result set is available.

as_arrow() Table[source]

Return query results as an Apache Arrow Table.

Returns:

Apache Arrow Table containing all query results.

as_polars() pl.DataFrame[source]

Return query results as a Polars DataFrame.

Returns:

Polars DataFrame containing all query results.

DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
property arraysize: int
async cancel() None

Cancel the currently executing query.

Raises:

ProgrammingError – If no query is currently executing.

property catalog: str | None
close() None

Close the cursor and release associated resources.

property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute a SQL query multiple times with different parameters.

Parameters:
  • operation – SQL query string to execute.

  • seq_of_parameters – Sequence of parameter sets, one per execution.

  • **kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]
property expected_bucket_owner: str | None
async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
property has_result_set: bool
property kms_key: str | None
async list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property result_set: AthenaResultSet | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:

The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None

Aio Polars Cursor

class pyathena.aio.polars.cursor.AioPolarsCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, chunksize: int | None = None, **kwargs)[source]

Native asyncio cursor that returns results as Polars DataFrames.

Uses asyncio.to_thread() for both result set creation and fetch operations, keeping the event loop free. This is especially important when chunksize is set, as fetch calls trigger lazy S3 reads.

Example

>>> async with await pyathena.aio_connect(...) as conn:
...     cursor = conn.cursor(AioPolarsCursor)
...     await cursor.execute("SELECT * FROM my_table")
...     df = cursor.as_polars()
__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, unload: bool = False, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, block_size: int | None = None, cache_type: str | None = None, max_workers: int = 20, chunksize: int | None = None, **kwargs) None[source]
static get_default_converter(unload: bool = False) DefaultPolarsTypeConverter | DefaultPolarsUnloadTypeConverter | Any[source]

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

async execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, **kwargs) AioPolarsCursor[source]

Execute a SQL query asynchronously and return results as Polars DataFrames.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters for parameterized queries.

  • work_group – Athena workgroup to use for this query.

  • s3_staging_dir – S3 location for query results.

  • cache_size – Number of queries to check for result caching.

  • cache_expiration_time – Cache expiration time in seconds.

  • result_reuse_enable – Enable Athena result reuse for this query.

  • result_reuse_minutes – Minutes to reuse cached results.

  • paramstyle – Parameter style (‘qmark’ or ‘pyformat’).

  • **kwargs – Additional execution parameters passed to Polars read functions.

Returns:

Self reference for method chaining.

async fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None[source]

Fetch the next row of the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop when chunksize triggers lazy S3 reads.

Returns:

A tuple representing the next row, or None if no more rows.

Raises:

ProgrammingError – If no result set is available.

async fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch multiple rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop when chunksize triggers lazy S3 reads.

Parameters:

size – Maximum number of rows to fetch. Defaults to arraysize.

Returns:

List of tuples representing the fetched rows.

Raises:

ProgrammingError – If no result set is available.

async fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch all remaining rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread to avoid blocking the event loop when chunksize triggers lazy S3 reads.

Returns:

List of tuples representing all remaining rows.

Raises:

ProgrammingError – If no result set is available.

as_polars() pl.DataFrame[source]

Return query results as a Polars DataFrame.

Returns:

Polars DataFrame containing all query results.

as_arrow() Table[source]

Return query results as an Apache Arrow Table.

Returns:

Apache Arrow Table containing all query results.

DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
property arraysize: int
async cancel() None

Cancel the currently executing query.

Raises:

ProgrammingError – If no query is currently executing.

property catalog: str | None
close() None

Close the cursor and release associated resources.

property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute a SQL query multiple times with different parameters.

Parameters:
  • operation – SQL query string to execute.

  • seq_of_parameters – Sequence of parameter sets, one per execution.

  • **kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]
property expected_bucket_owner: str | None
async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
property has_result_set: bool
property kms_key: str | None
async list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property result_set: AthenaResultSet | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:

The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None

Aio S3FS Cursor

class pyathena.aio.s3fs.cursor.AioS3FSCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, csv_reader: type[DefaultCSVReader] | type[AthenaCSVReader] | None = None, **kwargs)[source]

Native asyncio cursor that reads CSV results via AioS3FileSystem.

Uses AioS3FileSystem for S3 operations, which replaces ThreadPoolExecutor parallelism with asyncio.gather + asyncio.to_thread. Fetch operations are wrapped in asyncio.to_thread() because CSV reading is blocking I/O.

Example

>>> async with await pyathena.aio_connect(...) as conn:
...     cursor = conn.cursor(AioS3FSCursor)
...     await cursor.execute("SELECT * FROM my_table")
...     row = await cursor.fetchone()
__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, csv_reader: type[DefaultCSVReader] | type[AthenaCSVReader] | None = None, **kwargs) None[source]
static get_default_converter(unload: bool = False) DefaultS3FSTypeConverter[source]

Get the default type converter for S3FS cursor.

Parameters:

unload – Unused. S3FS cursor does not support UNLOAD operations.

Returns:

DefaultS3FSTypeConverter instance.

async execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, **kwargs) AioS3FSCursor[source]

Execute a SQL query asynchronously via S3FileSystem CSV reader.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters for parameterized queries.

  • work_group – Athena workgroup to use for this query.

  • s3_staging_dir – S3 location for query results.

  • cache_size – Number of queries to check for result caching.

  • cache_expiration_time – Cache expiration time in seconds.

  • result_reuse_enable – Enable Athena result reuse for this query.

  • result_reuse_minutes – Minutes to reuse cached results.

  • paramstyle – Parameter style (‘qmark’ or ‘pyformat’).

  • **kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

async fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None[source]

Fetch the next row of the result set.

Wraps the synchronous fetch in asyncio.to_thread because AthenaS3FSResultSet reads rows lazily from S3.

Returns:

A tuple representing the next row, or None if no more rows.

Raises:

ProgrammingError – If no result set is available.

async fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch multiple rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread because AthenaS3FSResultSet reads rows lazily from S3.

Parameters:

size – Maximum number of rows to fetch. Defaults to arraysize.

Returns:

List of tuples representing the fetched rows.

Raises:

ProgrammingError – If no result set is available.

async fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]][source]

Fetch all remaining rows from the result set.

Wraps the synchronous fetch in asyncio.to_thread because AthenaS3FSResultSet reads rows lazily from S3.

Returns:

List of tuples representing all remaining rows.

Raises:

ProgrammingError – If no result set is available.

DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
property arraysize: int
async cancel() None

Cancel the currently executing query.

Raises:

ProgrammingError – If no query is currently executing.

property catalog: str | None
close() None

Close the cursor and release associated resources.

property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute a SQL query multiple times with different parameters.

Parameters:
  • operation – SQL query string to execute.

  • seq_of_parameters – Sequence of parameter sets, one per execution.

  • **kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]
property expected_bucket_owner: str | None
async get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
property has_result_set: bool
property kms_key: str | None
async list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
async list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property result_set: AthenaResultSet | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:

The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None

Aio Spark Cursor

class pyathena.aio.spark.cursor.AioSparkCursor(session_id: str | None = None, description: str | None = None, engine_configuration: dict[str, Any] | None = None, notebook_version: str | None = None, session_idle_timeout_minutes: int | None = None, **kwargs)[source]

Native asyncio cursor for executing PySpark code on Athena.

Overrides post-init I/O methods of SparkBaseCursor with async equivalents. Session management (_exists_session, _start_session, etc.) stays synchronous because __init__ runs inside asyncio.to_thread.

Since SparkBaseCursor.__init__ performs I/O (session management), cursor creation must be wrapped in asyncio.to_thread:

cursor = await asyncio.to_thread(conn.cursor)

Example

>>> import asyncio
>>> async with await pyathena.aio_connect(
...     work_group="spark-workgroup",
...     cursor_class=AioSparkCursor,
... ) as conn:
...     cursor = await asyncio.to_thread(conn.cursor)
...     await cursor.execute("spark.sql('SELECT 1').show()")
...     print(await cursor.get_std_out())
__init__(session_id: str | None = None, description: str | None = None, engine_configuration: dict[str, Any] | None = None, notebook_version: str | None = None, session_idle_timeout_minutes: int | None = None, **kwargs) None[source]
property calculation_execution: AthenaCalculationExecution | None
async get_std_out() str | None[source]

Get the standard output from the Spark calculation execution.

Returns:

The standard output as a string, or None if no output is available.

async get_std_error() str | None[source]

Get the standard error from the Spark calculation execution.

Returns:

The standard error as a string, or None if no error output is available.

async execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, session_id: str | None = None, description: str | None = None, client_request_token: str | None = None, work_group: str | None = None, **kwargs) AioSparkCursor[source]

Execute PySpark code asynchronously.

Parameters:
  • operation – PySpark code to execute.

  • parameters – Unused, kept for API compatibility.

  • session_id – Spark session ID override.

  • description – Calculation description.

  • client_request_token – Idempotency token.

  • work_group – Unused, kept for API compatibility.

  • **kwargs – Additional parameters.

Returns:

Self reference for method chaining.

async cancel() None[source]

Cancel the currently running calculation.

Raises:

ProgrammingError – If no calculation is running.

async close() None[source]

Close the cursor by terminating the Spark session.

async executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None[source]
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
property calculation_id: str | None
property completion_date_time: datetime | None
property connection: Connection[Any]
property description: str | None
property dpu_execution_in_millis: int | None
static get_default_converter(unload: bool = False) DefaultTypeConverter | Any

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

static get_default_engine_configuration() dict[str, Any]
get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
property progress: str | None
property result_s3_uri: str | None
property result_type: str | None
property session_id: str
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

property state: str | None
property state_change_reason: str | None
property std_error_s3_uri: str | None
property std_out_s3_uri: str | None
property submission_date_time: datetime | None
property working_directory: str | None