Connection and Cursors

This section covers the core connection functionality and basic cursor operations.

Connection

class pyathena.DBAPITypeObject[source]

Type Objects and Constructors

https://www.python.org/dev/peps/pep-0249/#type-objects-and-constructors

pyathena.connect(*args, cursor_class: None = ..., **kwargs) Connection[Cursor][source]
pyathena.connect(*args, cursor_class: type[ConnectionCursor], **kwargs) Connection[ConnectionCursor]

Create a new database connection to Amazon Athena.

This function provides the main entry point for establishing connections to Amazon Athena. It follows the DB API 2.0 specification and returns a Connection object that can be used to create cursors for executing SQL queries.

Parameters:
  • s3_staging_dir – S3 location to store query results. Required if not using workgroups or if the workgroup doesn’t have a result location. Pass an empty string to explicitly disable S3 staging and skip the AWS_ATHENA_S3_STAGING_DIR environment variable fallback (required for workgroups with managed query result storage).

  • region_name – AWS region name. If not specified, uses the default region from your AWS configuration.

  • schema_name – Athena database/schema name. Defaults to “default”.

  • catalog_name – Athena data catalog name. Defaults to “awsdatacatalog”.

  • work_group – Athena workgroup name. Can be used instead of s3_staging_dir if the workgroup has a result location configured.

  • poll_interval – Time in seconds between polling for query completion. Defaults to 1.0.

  • encryption_option – S3 encryption option for query results. Can be “SSE_S3”, “SSE_KMS”, or “CSE_KMS”.

  • kms_key – KMS key ID for encryption when using SSE_KMS or CSE_KMS.

  • profile_name – AWS profile name to use for authentication.

  • role_arn – ARN of IAM role to assume for authentication.

  • role_session_name – Session name when assuming a role.

  • cursor_class – Custom cursor class to use. If not specified, uses the default Cursor class.

  • kill_on_interrupt – Whether to cancel running queries when interrupted. Defaults to True.

  • **kwargs – Additional keyword arguments passed to the Connection constructor.

Returns:

A Connection object that can be used to create cursors and execute queries.

Raises:

ProgrammingError – If neither s3_staging_dir nor work_group is provided.

Example

>>> import pyathena
>>> conn = pyathena.connect(
...     s3_staging_dir='s3://my-bucket/staging/',
...     region_name='us-east-1',
...     schema_name='mydatabase'
... )
>>> cursor = conn.cursor()
>>> cursor.execute("SELECT * FROM mytable LIMIT 10")
>>> results = cursor.fetchall()
async pyathena.aio_connect(*args, **kwargs) AioConnection[source]

Create a new async database connection to Amazon Athena.

This is the async counterpart of connect(). It returns an AioConnection whose cursors use native asyncio for polling and API calls, keeping the event loop free.

Parameters:

**kwargs – Arguments forwarded to AioConnection.create(). See connect() for the full list of supported arguments.

Returns:

An AioConnection that produces AioCursor instances by default.

Example

>>> import pyathena
>>> conn = await pyathena.aio_connect(
...     s3_staging_dir='s3://my-bucket/staging/',
...     region_name='us-east-1',
... )
>>> async with conn.cursor() as cursor:
...     await cursor.execute("SELECT 1")
...     print(await cursor.fetchone())
class pyathena.connection.Connection(s3_staging_dir: str | None = ..., region_name: str | None = ..., schema_name: str | None = ..., catalog_name: str | None = ..., work_group: str | None = ..., poll_interval: float = ..., encryption_option: str | None = ..., kms_key: str | None = ..., profile_name: str | None = ..., role_arn: str | None = ..., role_session_name: str = ..., external_id: str | None = ..., serial_number: str | None = ..., duration_seconds: int = ..., converter: Converter | None = ..., formatter: Formatter | None = ..., retry_config: RetryConfig | None = ..., cursor_class: None = ..., cursor_kwargs: dict[str, Any] | None = ..., kill_on_interrupt: bool = ..., session: Session | None = ..., config: Config | None = ..., result_reuse_enable: bool = ..., result_reuse_minutes: int = ..., on_start_query_execution: Callable[[str], None] | None = ..., **kwargs)[source]
class pyathena.connection.Connection(s3_staging_dir: str | None = ..., region_name: str | None = ..., schema_name: str | None = ..., catalog_name: str | None = ..., work_group: str | None = ..., poll_interval: float = ..., encryption_option: str | None = ..., kms_key: str | None = ..., profile_name: str | None = ..., role_arn: str | None = ..., role_session_name: str = ..., external_id: str | None = ..., serial_number: str | None = ..., duration_seconds: int = ..., converter: Converter | None = ..., formatter: Formatter | None = ..., retry_config: RetryConfig | None = ..., cursor_class: type[ConnectionCursor] = ..., cursor_kwargs: dict[str, Any] | None = ..., kill_on_interrupt: bool = ..., session: Session | None = ..., config: Config | None = ..., result_reuse_enable: bool = ..., result_reuse_minutes: int = ..., on_start_query_execution: Callable[[str], None] | None = ..., **kwargs)

A DB API 2.0 compliant connection to Amazon Athena.

The Connection class represents a database session and provides methods to create cursors for executing SQL queries against Amazon Athena. It handles authentication, session management, and query result storage in S3.

This class follows the Python Database API Specification v2.0 (PEP 249) and provides a familiar interface for database operations.

s3_staging_dir

S3 location where query results are stored.

region_name

AWS region name.

schema_name

Default database/schema name for queries.

catalog_name

Data catalog name (typically “awsdatacatalog”).

work_group

Athena workgroup name.

poll_interval

Interval in seconds for polling query status.

encryption_option

S3 encryption option for query results.

kms_key

KMS key for encryption when applicable.

kill_on_interrupt

Whether to cancel queries on interrupt signals.

result_reuse_enable

Whether to enable Athena’s result reuse feature.

result_reuse_minutes

Minutes to reuse cached results.

Example

>>> conn = Connection(
...     s3_staging_dir='s3://my-bucket/staging/',
...     region_name='us-east-1',
...     schema_name='mydatabase'
... )
>>> with conn:
...     cursor = conn.cursor()
...     cursor.execute("SELECT COUNT(*) FROM mytable")
...     result = cursor.fetchone()

Note

Either s3_staging_dir or work_group must be specified. If using a workgroup, it must have a result location configured unless s3_staging_dir is also provided. For workgroups with managed query result storage, pass s3_staging_dir="" to skip the environment variable fallback.

__init__(s3_staging_dir: str | None = None, region_name: str | None = None, schema_name: str | None = 'default', catalog_name: str | None = 'awsdatacatalog', work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, profile_name: str | None = None, role_arn: str | None = None, role_session_name: str = 'PyAthena-session-1774913569', external_id: str | None = None, serial_number: str | None = None, duration_seconds: int = 3600, converter: Converter | None = None, formatter: Formatter | None = None, retry_config: RetryConfig | None = None, cursor_class: None = None, cursor_kwargs: dict[str, Any] | None = None, kill_on_interrupt: bool = True, session: Session | None = None, config: Config | None = None, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, **kwargs) None[source]
__init__(s3_staging_dir: str | None = None, region_name: str | None = None, schema_name: str | None = 'default', catalog_name: str | None = 'awsdatacatalog', work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, profile_name: str | None = None, role_arn: str | None = None, role_session_name: str = 'PyAthena-session-1774913569', external_id: str | None = None, serial_number: str | None = None, duration_seconds: int = 3600, converter: Converter | None = None, formatter: Formatter | None = None, retry_config: RetryConfig | None = None, cursor_class: type[ConnectionCursor] = None, cursor_kwargs: dict[str, Any] | None = None, kill_on_interrupt: bool = True, session: Session | None = None, config: Config | None = None, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, **kwargs) None

Initialize a new Athena database connection.

Parameters:
  • s3_staging_dir – S3 location to store query results. Required if not using workgroups or if workgroup doesn’t have result location. Pass an empty string to explicitly disable S3 staging and skip the AWS_ATHENA_S3_STAGING_DIR environment variable fallback. This is required when connecting to a workgroup with managed query result storage enabled.

  • region_name – AWS region name. Uses default region if not specified.

  • schema_name – Default database/schema name. Defaults to “default”.

  • catalog_name – Data catalog name. Defaults to “awsdatacatalog”.

  • work_group – Athena workgroup name. Can substitute for s3_staging_dir if workgroup has result location configured.

  • poll_interval – Seconds between query status polls. Defaults to 1.0.

  • encryption_option – S3 encryption for results (“SSE_S3”, “SSE_KMS”, “CSE_KMS”).

  • kms_key – KMS key ID when using SSE_KMS or CSE_KMS encryption.

  • profile_name – AWS profile name for authentication.

  • role_arn – IAM role ARN to assume for authentication.

  • role_session_name – Session name when assuming IAM role.

  • external_id – External ID for role assumption (if required by role).

  • serial_number – MFA device serial number for role assumption.

  • duration_seconds – Role session duration in seconds. Defaults to 3600.

  • converter – Custom type converter. Uses DefaultTypeConverter if None.

  • formatter – Custom parameter formatter. Uses DefaultParameterFormatter if None.

  • retry_config – Retry configuration for API calls. Uses default if None.

  • cursor_class – Default cursor class for this connection.

  • cursor_kwargs – Default keyword arguments for cursor creation.

  • kill_on_interrupt – Cancel running queries on interrupt. Defaults to True.

  • session – Pre-configured boto3 Session. Creates new session if None.

  • config – Boto3 Config object for client configuration.

  • result_reuse_enable – Enable Athena query result reuse. Defaults to False.

  • result_reuse_minutes – Minutes to reuse cached results.

  • on_start_query_execution – Callback function called when query starts.

  • **kwargs – Additional arguments passed to boto3 Session and client.

Raises:

ProgrammingError – If neither s3_staging_dir nor work_group is provided.

Note

Either s3_staging_dir or work_group must be specified. Environment variables AWS_ATHENA_S3_STAGING_DIR and AWS_ATHENA_WORK_GROUP are checked if parameters are not provided.

When using a workgroup with managed query result storage, pass s3_staging_dir="" to prevent the environment variable fallback from sending a ResultConfiguration that conflicts with ManagedQueryResultsConfiguration.

property session: Session

Get the boto3 session used for AWS API calls.

Returns:

The configured boto3 Session object.

property client: BaseClient

Get the boto3 Athena client used for query operations.

Returns:

The configured boto3 Athena client.

property retry_config: RetryConfig

Get the retry configuration for AWS API calls.

Returns:

The RetryConfig object that controls retry behavior for failed requests.

__enter__()[source]

Enter the runtime context for the connection.

Returns:

Self for use in context manager protocol.

__exit__(exc_type, exc_val, exc_tb)[source]

Exit the runtime context and close the connection.

Parameters:
  • exc_type – Exception type if an exception occurred.

  • exc_val – Exception value if an exception occurred.

  • exc_tb – Exception traceback if an exception occurred.

cursor(cursor: None = None, **kwargs) ConnectionCursor[source]
cursor(cursor: type[FunctionalCursor], **kwargs) FunctionalCursor

Create a new cursor object for executing queries.

Creates and returns a cursor object that can be used to execute SQL queries against Amazon Athena. The cursor inherits connection settings but can be customized with additional parameters.

Parameters:
  • cursor – Custom cursor class to use. If not provided, uses the connection’s default cursor class.

  • **kwargs – Additional keyword arguments to pass to the cursor constructor. These override connection defaults.

Returns:

A cursor object that can execute SQL queries.

Example

>>> cursor = connection.cursor()
>>> cursor.execute("SELECT * FROM my_table LIMIT 10")
>>> results = cursor.fetchall()

# Using a custom cursor type >>> from pyathena.pandas.cursor import PandasCursor >>> pandas_cursor = connection.cursor(PandasCursor) >>> df = pandas_cursor.execute(“SELECT * FROM my_table”).fetchall()

close() None[source]

Close the connection.

Closes the database connection. This method is provided for DB API 2.0 compatibility. Since Athena connections are stateless, this method currently does not perform any actual cleanup operations.

Note

This method is called automatically when using the connection as a context manager (with statement).

commit() None[source]

Commit any pending transaction.

This method is provided for DB API 2.0 compatibility. Since Athena does not support transactions, this method does nothing.

Note

Athena queries are auto-committed and cannot be rolled back.

rollback() None[source]

Rollback any pending transaction.

This method is required by DB API 2.0 but is not supported by Athena since Athena does not support transactions.

Raises:

NotSupportedError – Always raised since transactions are not supported.

Standard Cursors

class pyathena.cursor.Cursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, **kwargs)[source]

A DB API 2.0 compliant cursor for executing SQL queries on Amazon Athena.

The Cursor class provides methods for executing SQL queries against Amazon Athena and retrieving results. It follows the Python Database API Specification v2.0 (PEP 249) and provides familiar database cursor operations.

This cursor returns results as tuples by default. For other data formats, consider using specialized cursor classes like PandasCursor or ArrowCursor.

description

Sequence of column descriptions for the last query.

rowcount

Number of rows affected by the last query (-1 for SELECT queries).

arraysize

Default number of rows to fetch with fetchmany().

Example

>>> cursor = connection.cursor()
>>> cursor.execute("SELECT name, age FROM users WHERE age > %s", (18,))
>>> while True:
...     row = cursor.fetchone()
...     if not row:
...         break
...     print(f"Name: {row[0]}, Age: {row[1]}")
>>> cursor.execute("CREATE TABLE test AS SELECT 1 as id, 'test' as name")
>>> print(f"Created table, rows affected: {cursor.rowcount}")
__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, **kwargs) None[source]
property arraysize: int
execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int = 0, cache_expiration_time: int = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, on_start_query_execution: Callable[[str], None] | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) Cursor[source]

Execute a SQL query.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters (optional).

  • on_start_query_execution – Callback function called immediately after start_query_execution API is called. Function signature: (query_id: str) -> None This allows early access to query_id for monitoring/cancellation.

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types. For example: {"tags": "array(varchar)", "metadata": "map(varchar, integer)"}

  • **kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

Example

>>> cursor.execute(
...     "SELECT * FROM table_with_complex_types",
...     result_set_type_hints={
...         "tags": "array(varchar)",
...         "metadata": "map(varchar, integer)",
...     }
... )
DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
cancel() None

Cancel the currently executing query.

Raises:

ProgrammingError – If no query is currently executing.

property catalog: str | None
close() None

Close the cursor and release associated resources.

property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute a SQL query multiple times with different parameters.

Parameters:
  • operation – SQL query string to execute.

  • seq_of_parameters – Sequence of parameter sets, one per execution.

  • **kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]
property expected_bucket_owner: str | None
fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]]

Fetch all remaining rows from the result set.

Returns:

List of tuples representing all remaining rows.

Raises:

ProgrammingError – If no result set is available.

fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]]

Fetch multiple rows from the result set.

Parameters:

size – Maximum number of rows to fetch. Defaults to arraysize.

Returns:

List of tuples representing the fetched rows.

Raises:

ProgrammingError – If no result set is available.

fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None

Fetch the next row of the result set.

Returns:

A tuple representing the next row, or None if no more rows.

Raises:

ProgrammingError – If no result set is available.

static get_default_converter(unload: bool = False) DefaultTypeConverter | Any

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
property has_result_set: bool
property kms_key: str | None
list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property result_set: AthenaResultSet | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:

The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None
class pyathena.cursor.DictCursor(**kwargs)[source]

A cursor that returns query results as dictionaries instead of tuples.

DictCursor provides the same functionality as the standard Cursor but returns rows as dictionaries where column names are keys. This makes it easier to access column values by name rather than position.

Example

>>> cursor = connection.cursor(DictCursor)
>>> cursor.execute("SELECT id, name, email FROM users LIMIT 1")
>>> row = cursor.fetchone()
>>> print(f"User: {row['name']} ({row['email']})")
>>> cursor.execute("SELECT * FROM products")
>>> for row in cursor.fetchall():
...     print(f"Product {row['id']}: {row['name']} - ${row['price']}")
__init__(**kwargs) None[source]
DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
property arraysize: int
cancel() None

Cancel the currently executing query.

Raises:

ProgrammingError – If no query is currently executing.

property catalog: str | None
close() None

Close the cursor and release associated resources.

property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int = 0, cache_expiration_time: int = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, on_start_query_execution: Callable[[str], None] | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) Cursor

Execute a SQL query.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters (optional).

  • on_start_query_execution – Callback function called immediately after start_query_execution API is called. Function signature: (query_id: str) -> None This allows early access to query_id for monitoring/cancellation.

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types. For example: {"tags": "array(varchar)", "metadata": "map(varchar, integer)"}

  • **kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

Example

>>> cursor.execute(
...     "SELECT * FROM table_with_complex_types",
...     result_set_type_hints={
...         "tags": "array(varchar)",
...         "metadata": "map(varchar, integer)",
...     }
... )
executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute a SQL query multiple times with different parameters.

Parameters:
  • operation – SQL query string to execute.

  • seq_of_parameters – Sequence of parameter sets, one per execution.

  • **kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]
property expected_bucket_owner: str | None
fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]]

Fetch all remaining rows from the result set.

Returns:

List of tuples representing all remaining rows.

Raises:

ProgrammingError – If no result set is available.

fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]]

Fetch multiple rows from the result set.

Parameters:

size – Maximum number of rows to fetch. Defaults to arraysize.

Returns:

List of tuples representing the fetched rows.

Raises:

ProgrammingError – If no result set is available.

fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None

Fetch the next row of the result set.

Returns:

A tuple representing the next row, or None if no more rows.

Raises:

ProgrammingError – If no result set is available.

static get_default_converter(unload: bool = False) DefaultTypeConverter | Any

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
property has_result_set: bool
property kms_key: str | None
list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property result_set: AthenaResultSet | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:

The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None

Asynchronous Cursors

class pyathena.async_cursor.AsyncCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, max_workers: int = 20, arraysize: int = 1000, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, **kwargs)[source]

Asynchronous cursor for non-blocking Athena query execution.

This cursor allows multiple queries to be executed concurrently without blocking the main thread. It’s useful for applications that need to execute multiple queries in parallel or perform other work while queries are running.

The cursor maintains a thread pool for executing queries asynchronously and provides methods to check query status and retrieve results when ready.

description[source]

Sequence of column descriptions for the last query.

rowcount

Number of rows affected by the last query (-1 for SELECT queries).

arraysize

Default number of rows to fetch with fetchmany().

max_workers

Maximum number of worker threads for concurrent execution.

Example

>>> cursor = connection.cursor(AsyncCursor)
>>>
>>> # Execute multiple queries concurrently
>>> future1 = cursor.execute("SELECT COUNT(*) FROM table1")
>>> future2 = cursor.execute("SELECT COUNT(*) FROM table2")
>>> future3 = cursor.execute("SELECT COUNT(*) FROM table3")
>>>
>>> # Check if queries are done and get results
>>> if future1.done():
...     result1 = future1.result().fetchall()
>>>
>>> # Wait for all to complete
>>> results = [f.result().fetchall() for f in [future1, future2, future3]]

Note

Each execute() call returns a Future object that can be used to check completion status and retrieve results.

__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, max_workers: int = 20, arraysize: int = 1000, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, **kwargs) None[source]
property arraysize: int
close(wait: bool = False) None[source]
description(query_id: str) Future[list[tuple[str, str, None, None, int, int, str]] | None][source]
query_execution(query_id: str) Future[AthenaQueryExecution][source]

Get query execution details asynchronously.

Retrieves the current execution status and metadata for a query. This is useful for monitoring query progress without blocking.

Parameters:

query_id – The Athena query execution ID.

Returns:

Future object containing AthenaQueryExecution with query details.

poll(query_id: str) Future[AthenaQueryExecution][source]

Poll for query completion asynchronously.

Waits for the query to complete (succeed, fail, or be cancelled) and returns the final execution status. This method blocks until completion but runs the polling in a background thread.

Parameters:

query_id – The Athena query execution ID to poll.

Returns:

Future object containing the final AthenaQueryExecution status.

Note

This method performs polling internally, so it will take time proportional to your query execution duration.

execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) tuple[str, Future[AthenaResultSet | Any]][source]

Execute a SQL query asynchronously.

Starts query execution on Amazon Athena and returns immediately without waiting for completion. The query runs in the background while your application can continue with other work.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters (optional).

  • work_group – Athena workgroup to use (optional).

  • s3_staging_dir – S3 location for query results (optional).

  • cache_size – Query result cache size in MB (optional).

  • cache_expiration_time – Cache expiration time in seconds (optional).

  • result_reuse_enable – Enable result reuse for identical queries (optional).

  • result_reuse_minutes – Result reuse duration in minutes (optional).

  • paramstyle – Parameter style to use (optional).

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.

  • **kwargs – Additional execution parameters.

Returns:

  • query_id: Athena query execution ID for tracking

  • future: Future object for result retrieval

Return type:

Tuple of (query_id, future) where

Example

>>> query_id, future = cursor.execute("SELECT * FROM large_table")
>>> print(f"Query started: {query_id}")
>>> # Do other work while query runs...
>>> result_set = future.result()  # Wait for completion
executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None[source]

Execute multiple queries asynchronously (not supported).

This method is not supported for asynchronous cursors because managing multiple concurrent queries would be complex and resource-intensive.

Parameters:
  • operation – SQL query string.

  • seq_of_parameters – Sequence of parameter sets.

  • **kwargs – Additional arguments.

Raises:

NotSupportedError – Always raised as this operation is not supported.

Note

For bulk operations, consider using execute() with parameterized queries or batch processing patterns instead.

cancel(query_id: str) Future[None][source]

Cancel a running query asynchronously.

Submits a cancellation request for the specified query. The cancellation itself runs asynchronously in the background.

Parameters:

query_id – The Athena query execution ID to cancel.

Returns:

Future object that completes when the cancellation request finishes.

Example

>>> query_id, future = cursor.execute("SELECT * FROM huge_table")
>>> # Later, cancel the query
>>> cancel_future = cursor.cancel(query_id)
>>> cancel_future.result()  # Wait for cancellation to complete
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
property connection: Connection[Any]
static get_default_converter(unload: bool = False) DefaultTypeConverter | Any

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

class pyathena.async_cursor.AsyncDictCursor(**kwargs)[source]

Asynchronous cursor that returns query results as dictionaries.

Combines the asynchronous execution capabilities of AsyncCursor with the dictionary-based result format of DictCursor. Results are returned as dictionaries where column names are keys, making it easier to access column values by name rather than position.

Example

>>> cursor = connection.cursor(AsyncDictCursor)
>>> future = cursor.execute("SELECT id, name, email FROM users")
>>> result_cursor = future.result()
>>> row = result_cursor.fetchone()
>>> print(f"User: {row['name']} ({row['email']})")
__init__(**kwargs) None[source]
LIST_DATABASES_MAX_RESULTS = 50
LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50
LIST_TABLE_METADATA_MAX_RESULTS = 50
property arraysize: int
cancel(query_id: str) Future[None]

Cancel a running query asynchronously.

Submits a cancellation request for the specified query. The cancellation itself runs asynchronously in the background.

Parameters:

query_id – The Athena query execution ID to cancel.

Returns:

Future object that completes when the cancellation request finishes.

Example

>>> query_id, future = cursor.execute("SELECT * FROM huge_table")
>>> # Later, cancel the query
>>> cancel_future = cursor.cancel(query_id)
>>> cancel_future.result()  # Wait for cancellation to complete
close(wait: bool = False) None
property connection: Connection[Any]
description(query_id: str) Future[list[tuple[str, str, None, None, int, int, str]] | None]
execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int | None = 0, cache_expiration_time: int | None = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) tuple[str, Future[AthenaResultSet | Any]]

Execute a SQL query asynchronously.

Starts query execution on Amazon Athena and returns immediately without waiting for completion. The query runs in the background while your application can continue with other work.

Parameters:
  • operation – SQL query string to execute.

  • parameters – Query parameters (optional).

  • work_group – Athena workgroup to use (optional).

  • s3_staging_dir – S3 location for query results (optional).

  • cache_size – Query result cache size in MB (optional).

  • cache_expiration_time – Cache expiration time in seconds (optional).

  • result_reuse_enable – Enable result reuse for identical queries (optional).

  • result_reuse_minutes – Result reuse duration in minutes (optional).

  • paramstyle – Parameter style to use (optional).

  • result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.

  • **kwargs – Additional execution parameters.

Returns:

  • query_id: Athena query execution ID for tracking

  • future: Future object for result retrieval

Return type:

Tuple of (query_id, future) where

Example

>>> query_id, future = cursor.execute("SELECT * FROM large_table")
>>> print(f"Query started: {query_id}")
>>> # Do other work while query runs...
>>> result_set = future.result()  # Wait for completion
executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) None

Execute multiple queries asynchronously (not supported).

This method is not supported for asynchronous cursors because managing multiple concurrent queries would be complex and resource-intensive.

Parameters:
  • operation – SQL query string.

  • seq_of_parameters – Sequence of parameter sets.

  • **kwargs – Additional arguments.

Raises:

NotSupportedError – Always raised as this operation is not supported.

Note

For bulk operations, consider using execute() with parameterized queries or batch processing patterns instead.

static get_default_converter(unload: bool = False) DefaultTypeConverter | Any

Get the default type converter for this cursor class.

Parameters:

unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.

Returns:

The default type converter instance for this cursor type.

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) AthenaTableMetadata
list_databases(catalog_name: str | None, max_results: int | None = None) list[AthenaDatabase]
list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) list[AthenaTableMetadata]
poll(query_id: str) Future[AthenaQueryExecution]

Poll for query completion asynchronously.

Waits for the query to complete (succeed, fail, or be cancelled) and returns the final execution status. This method blocks until completion but runs the polling in a background thread.

Parameters:

query_id – The Athena query execution ID to poll.

Returns:

Future object containing the final AthenaQueryExecution status.

Note

This method performs polling internally, so it will take time proportional to your query execution duration.

query_execution(query_id: str) Future[AthenaQueryExecution]

Get query execution details asynchronously.

Retrieves the current execution status and metadata for a query. This is useful for monitoring query progress without blocking.

Parameters:

query_id – The Athena query execution ID.

Returns:

Future object containing AthenaQueryExecution with query details.

setinputsizes(sizes)

Does nothing by default

setoutputsize(size, column=None)

Does nothing by default

Result Sets

class pyathena.result_set.AthenaResultSet(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, _pre_fetch: bool = True, result_set_type_hints: dict[str | int, str] | None = None)[source]

Result set for Athena query execution using the GetQueryResults API.

This class provides a DB API 2.0 compliant result set implementation that fetches query results from Amazon Athena. It uses the GetQueryResults API to retrieve data in paginated chunks, converting each value according to its Athena data type.

The result set exposes query execution metadata (timing, data scanned, state, etc.) through read-only properties, allowing inspection of query performance and status.

This is the base result set implementation used by the standard Cursor. Specialized implementations exist for different output formats:

Example

>>> cursor.execute("SELECT * FROM my_table")
>>> result_set = cursor.result_set
>>> print(f"Query ID: {result_set.query_id}")
>>> print(f"Data scanned: {result_set.data_scanned_in_bytes} bytes")
>>> for row in result_set:
...     print(row)
__init__(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, _pre_fetch: bool = True, result_set_type_hints: dict[str | int, str] | None = None) None[source]
property database: str | None
property catalog: str | None
property query_id: str | None
property query: str | None
property statement_type: str | None
property substatement_type: str | None
property work_group: str | None
property execution_parameters: list[str]
property state: str | None
property state_change_reason: str | None
property submission_date_time: datetime | None
property completion_date_time: datetime | None
property error_category: int | None
property error_type: int | None
property retryable: bool | None
property error_message: str | None
property data_scanned_in_bytes: int | None
property engine_execution_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property total_execution_time_in_millis: int | None
property query_planning_time_in_millis: int | None
property service_processing_time_in_millis: int | None
property output_location: str | None
property data_manifest_location: str | None
property reused_previous_result: bool | None
property is_unload: bool

Check if the query is an UNLOAD statement.

Returns:

True if the query is an UNLOAD statement, False otherwise.

property encryption_option: str | None
property kms_key: str | None
property expected_bucket_owner: str | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property effective_engine_version: str | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property connection: Connection[Any]
fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None[source]
fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]][source]
fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]][source]
property is_closed: bool
close() None[source]
DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
property arraysize: int
property rowcount: int
property rownumber: int | None
class pyathena.result_set.AthenaDictResultSet(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, _pre_fetch: bool = True, result_set_type_hints: dict[str | int, str] | None = None)[source]
dict_type

alias of dict

DEFAULT_FETCH_SIZE: int = 1000
DEFAULT_RESULT_REUSE_MINUTES = 60
__init__(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, _pre_fetch: bool = True, result_set_type_hints: dict[str | int, str] | None = None) None
property arraysize: int
property catalog: str | None
close() None
property completion_date_time: datetime | None
property connection: Connection[Any]
property data_manifest_location: str | None
property data_scanned_in_bytes: int | None
property database: str | None
property description: list[tuple[str, str, None, None, int, int, str]] | None
property effective_engine_version: str | None
property encryption_option: str | None
property engine_execution_time_in_millis: int | None
property error_category: int | None
property error_message: str | None
property error_type: int | None
property execution_parameters: list[str]
property expected_bucket_owner: str | None
fetchall() list[tuple[Any | None, ...] | dict[Any, Any | None]]
fetchmany(size: int | None = None) list[tuple[Any | None, ...] | dict[Any, Any | None]]
fetchone() tuple[Any | None, ...] | dict[Any, Any | None] | None
property is_closed: bool
property is_unload: bool

Check if the query is an UNLOAD statement.

Returns:

True if the query is an UNLOAD statement, False otherwise.

property kms_key: str | None
property output_location: str | None
property query: str | None
property query_id: str | None
property query_planning_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property retryable: bool | None
property reused_previous_result: bool | None
property rowcount: int
property rownumber: int | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property service_processing_time_in_millis: int | None
property state: str | None
property state_change_reason: str | None
property statement_type: str | None
property submission_date_time: datetime | None
property substatement_type: str | None
property total_execution_time_in_millis: int | None
property work_group: str | None
class pyathena.result_set.WithResultSet[source]
__init__()[source]
abstract property result_set: AthenaResultSet | None
property has_result_set: bool
property description: list[tuple[str, str, None, None, int, int, str]] | None
property database: str | None
property catalog: str | None
abstract property query_id: str | None
property query: str | None
property statement_type: str | None
property substatement_type: str | None
property work_group: str | None
property execution_parameters: list[str]
property state: str | None
property state_change_reason: str | None
property submission_date_time: datetime | None
property completion_date_time: datetime | None
property error_category: int | None
property error_type: int | None
property retryable: bool | None
property error_message: str | None
property data_scanned_in_bytes: int | None
property engine_execution_time_in_millis: int | None
property query_queue_time_in_millis: int | None
property total_execution_time_in_millis: int | None
property query_planning_time_in_millis: int | None
property service_processing_time_in_millis: int | None
property output_location: str | None
property data_manifest_location: str | None
property reused_previous_result: bool | None
property encryption_option: str | None
property kms_key: str | None
property expected_bucket_owner: str | None
property s3_acl_option: str | None
property selected_engine_version: str | None
property effective_engine_version: str | None
property result_reuse_enabled: bool | None
property result_reuse_minutes: int | None
property rowcount: int

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:

The number of rows, or -1 if not applicable or unknown.