Connection and Cursors¶

This section covers the core connection functionality and basic cursor operations.

Connection¶

class pyathena.DBAPITypeObject[source]¶

Type Objects and Constructors

https://www.python.org/dev/peps/pep-0249/#type-objects-and-constructors

pyathena.connect(*args, cursor_class: None = ..., **kwargs) → Connection[Cursor][source]¶

pyathena.connect(*args, cursor_class: type[ConnectionCursor], **kwargs) → Connection[ConnectionCursor]

Create a new database connection to Amazon Athena.

This function provides the main entry point for establishing connections to Amazon Athena. It follows the DB API 2.0 specification and returns a Connection object that can be used to create cursors for executing SQL queries.

Parameters:

s3_staging_dir – S3 location to store query results. Required if not using workgroups or if the workgroup doesn’t have a result location. Pass an empty string to explicitly disable S3 staging and skip the AWS_ATHENA_S3_STAGING_DIR environment variable fallback (required for workgroups with managed query result storage).
region_name – AWS region name. If not specified, uses the default region from your AWS configuration.
schema_name – Athena database/schema name. Defaults to “default”.
catalog_name – Athena data catalog name. Defaults to “awsdatacatalog”.
work_group – Athena workgroup name. Can be used instead of s3_staging_dir if the workgroup has a result location configured.
poll_interval – Time in seconds between polling for query completion. Defaults to 1.0.
encryption_option – S3 encryption option for query results. Can be “SSE_S3”, “SSE_KMS”, or “CSE_KMS”.
kms_key – KMS key ID for encryption when using SSE_KMS or CSE_KMS.
profile_name – AWS profile name to use for authentication.
role_arn – ARN of IAM role to assume for authentication.
role_session_name – Session name when assuming a role.
cursor_class – Custom cursor class to use. If not specified, uses the default Cursor class.
kill_on_interrupt – Whether to cancel running queries when interrupted. Defaults to True.
**kwargs – Additional keyword arguments passed to the Connection constructor.

Returns:

A Connection object that can be used to create cursors and execute queries.

Raises:

ProgrammingError – If neither s3_staging_dir nor work_group is provided.

Example

>>> import pyathena
>>> conn = pyathena.connect(
...     s3_staging_dir='s3://my-bucket/staging/',
...     region_name='us-east-1',
...     schema_name='mydatabase'
... )
>>> cursor = conn.cursor()
>>> cursor.execute("SELECT * FROM mytable LIMIT 10")
>>> results = cursor.fetchall()

async pyathena.aio_connect(*args, **kwargs) → AioConnection[source]¶

Create a new async database connection to Amazon Athena.

This is the async counterpart of connect(). It returns an AioConnection whose cursors use native asyncio for polling and API calls, keeping the event loop free.

Parameters:: **kwargs – Arguments forwarded to AioConnection.create(). See connect() for the full list of supported arguments.
Returns:: An AioConnection that produces AioCursor instances by default.

Example

>>> import pyathena
>>> conn = await pyathena.aio_connect(
...     s3_staging_dir='s3://my-bucket/staging/',
...     region_name='us-east-1',
... )
>>> async with conn.cursor() as cursor:
...     await cursor.execute("SELECT 1")
...     print(await cursor.fetchone())

class pyathena.connection.Connection(s3_staging_dir: str | None = ..., region_name: str | None = ..., schema_name: str | None = ..., catalog_name: str | None = ..., work_group: str | None = ..., poll_interval: float = ..., encryption_option: str | None = ..., kms_key: str | None = ..., profile_name: str | None = ..., role_arn: str | None = ..., role_session_name: str = ..., external_id: str | None = ..., serial_number: str | None = ..., duration_seconds: int = ..., converter: Converter | None = ..., formatter: Formatter | None = ..., retry_config: RetryConfig | None = ..., cursor_class: None = ..., cursor_kwargs: dict[str, Any] | None = ..., kill_on_interrupt: bool = ..., session: Session | None = ..., config: Config | None = ..., result_reuse_enable: bool = ..., result_reuse_minutes: int = ..., on_start_query_execution: Callable[[str], None] | None = ..., **kwargs)[source]¶

class pyathena.connection.Connection(s3_staging_dir: str | None = ..., region_name: str | None = ..., schema_name: str | None = ..., catalog_name: str | None = ..., work_group: str | None = ..., poll_interval: float = ..., encryption_option: str | None = ..., kms_key: str | None = ..., profile_name: str | None = ..., role_arn: str | None = ..., role_session_name: str = ..., external_id: str | None = ..., serial_number: str | None = ..., duration_seconds: int = ..., converter: Converter | None = ..., formatter: Formatter | None = ..., retry_config: RetryConfig | None = ..., cursor_class: type[ConnectionCursor] = ..., cursor_kwargs: dict[str, Any] | None = ..., kill_on_interrupt: bool = ..., session: Session | None = ..., config: Config | None = ..., result_reuse_enable: bool = ..., result_reuse_minutes: int = ..., on_start_query_execution: Callable[[str], None] | None = ..., **kwargs)

A DB API 2.0 compliant connection to Amazon Athena.

The Connection class represents a database session and provides methods to create cursors for executing SQL queries against Amazon Athena. It handles authentication, session management, and query result storage in S3.

This class follows the Python Database API Specification v2.0 (PEP 249) and provides a familiar interface for database operations.

s3_staging_dir¶: S3 location where query results are stored.

region_name¶: AWS region name.

schema_name¶: Default database/schema name for queries.

catalog_name¶: Data catalog name (typically “awsdatacatalog”).

work_group¶: Athena workgroup name.

poll_interval¶: Interval in seconds for polling query status.

encryption_option¶: S3 encryption option for query results.

kms_key¶: KMS key for encryption when applicable.

kill_on_interrupt¶: Whether to cancel queries on interrupt signals.

result_reuse_enable¶: Whether to enable Athena’s result reuse feature.

result_reuse_minutes¶: Minutes to reuse cached results.

Example

>>> conn = Connection(
...     s3_staging_dir='s3://my-bucket/staging/',
...     region_name='us-east-1',
...     schema_name='mydatabase'
... )
>>> with conn:
...     cursor = conn.cursor()
...     cursor.execute("SELECT COUNT(*) FROM mytable")
...     result = cursor.fetchone()

Note

Either s3_staging_dir or work_group must be specified. If using a workgroup, it must have a result location configured unless s3_staging_dir is also provided. For workgroups with managed query result storage, pass s3_staging_dir="" to skip the environment variable fallback.

__init__(s3_staging_dir: str | None = None, region_name: str | None = None, schema_name: str | None = 'default', catalog_name: str | None = 'awsdatacatalog', work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, profile_name: str | None = None, role_arn: str | None = None, role_session_name: str = 'PyAthena-session-1774913569', external_id: str | None = None, serial_number: str | None = None, duration_seconds: int = 3600, converter: Converter | None = None, formatter: Formatter | None = None, retry_config: RetryConfig | None = None, cursor_class: None = None, cursor_kwargs: dict[str, Any] | None = None, kill_on_interrupt: bool = True, session: Session | None = None, config: Config | None = None, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, **kwargs) → None[source]¶

__init__(s3_staging_dir: str | None = None, region_name: str | None = None, schema_name: str | None = 'default', catalog_name: str | None = 'awsdatacatalog', work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, profile_name: str | None = None, role_arn: str | None = None, role_session_name: str = 'PyAthena-session-1774913569', external_id: str | None = None, serial_number: str | None = None, duration_seconds: int = 3600, converter: Converter | None = None, formatter: Formatter | None = None, retry_config: RetryConfig | None = None, cursor_class: type[ConnectionCursor] = None, cursor_kwargs: dict[str, Any] | None = None, kill_on_interrupt: bool = True, session: Session | None = None, config: Config | None = None, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, **kwargs) → None

Initialize a new Athena database connection.

Parameters:

s3_staging_dir – S3 location to store query results. Required if not using workgroups or if workgroup doesn’t have result location. Pass an empty string to explicitly disable S3 staging and skip the AWS_ATHENA_S3_STAGING_DIR environment variable fallback. This is required when connecting to a workgroup with managed query result storage enabled.
region_name – AWS region name. Uses default region if not specified.
schema_name – Default database/schema name. Defaults to “default”.
catalog_name – Data catalog name. Defaults to “awsdatacatalog”.
work_group – Athena workgroup name. Can substitute for s3_staging_dir if workgroup has result location configured.
poll_interval – Seconds between query status polls. Defaults to 1.0.
encryption_option – S3 encryption for results (“SSE_S3”, “SSE_KMS”, “CSE_KMS”).
kms_key – KMS key ID when using SSE_KMS or CSE_KMS encryption.
profile_name – AWS profile name for authentication.
role_arn – IAM role ARN to assume for authentication.
role_session_name – Session name when assuming IAM role.
external_id – External ID for role assumption (if required by role).
serial_number – MFA device serial number for role assumption.
duration_seconds – Role session duration in seconds. Defaults to 3600.
converter – Custom type converter. Uses DefaultTypeConverter if None.
formatter – Custom parameter formatter. Uses DefaultParameterFormatter if None.
retry_config – Retry configuration for API calls. Uses default if None.
cursor_class – Default cursor class for this connection.
cursor_kwargs – Default keyword arguments for cursor creation.
kill_on_interrupt – Cancel running queries on interrupt. Defaults to True.
session – Pre-configured boto3 Session. Creates new session if None.
config – Boto3 Config object for client configuration.
result_reuse_enable – Enable Athena query result reuse. Defaults to False.
result_reuse_minutes – Minutes to reuse cached results.
on_start_query_execution – Callback function called when query starts.
**kwargs – Additional arguments passed to boto3 Session and client.

Raises:

ProgrammingError – If neither s3_staging_dir nor work_group is provided.

Note

Either s3_staging_dir or work_group must be specified. Environment variables AWS_ATHENA_S3_STAGING_DIR and AWS_ATHENA_WORK_GROUP are checked if parameters are not provided.

When using a workgroup with managed query result storage, pass s3_staging_dir="" to prevent the environment variable fallback from sending a ResultConfiguration that conflicts with ManagedQueryResultsConfiguration.

property session: Session¶

Get the boto3 session used for AWS API calls.

Returns:: The configured boto3 Session object.

property client: BaseClient¶

Get the boto3 Athena client used for query operations.

Returns:: The configured boto3 Athena client.

property retry_config: RetryConfig¶

Get the retry configuration for AWS API calls.

Returns:: The RetryConfig object that controls retry behavior for failed requests.

__enter__()[source]¶

Enter the runtime context for the connection.

Returns:: Self for use in context manager protocol.

__exit__(exc_type, exc_val, exc_tb)[source]¶

Exit the runtime context and close the connection.

Parameters:

exc_type – Exception type if an exception occurred.
exc_val – Exception value if an exception occurred.
exc_tb – Exception traceback if an exception occurred.

cursor(cursor: None = None, **kwargs) → ConnectionCursor[source]¶

cursor(cursor: type[FunctionalCursor], **kwargs) → FunctionalCursor

Create a new cursor object for executing queries.

Creates and returns a cursor object that can be used to execute SQL queries against Amazon Athena. The cursor inherits connection settings but can be customized with additional parameters.

Parameters:

cursor – Custom cursor class to use. If not provided, uses the connection’s default cursor class.
**kwargs – Additional keyword arguments to pass to the cursor constructor. These override connection defaults.

Returns:

A cursor object that can execute SQL queries.

Example

>>> cursor = connection.cursor()
>>> cursor.execute("SELECT * FROM my_table LIMIT 10")
>>> results = cursor.fetchall()

# Using a custom cursor type >>> from pyathena.pandas.cursor import PandasCursor >>> pandas_cursor = connection.cursor(PandasCursor) >>> df = pandas_cursor.execute(“SELECT * FROM my_table”).fetchall()

close() → None[source]¶

Close the connection.

Closes the database connection. This method is provided for DB API 2.0 compatibility. Since Athena connections are stateless, this method currently does not perform any actual cleanup operations.

Note

This method is called automatically when using the connection as a context manager (with statement).

commit() → None[source]¶

Commit any pending transaction.

This method is provided for DB API 2.0 compatibility. Since Athena does not support transactions, this method does nothing.

Note

Athena queries are auto-committed and cannot be rolled back.

rollback() → None[source]¶

Rollback any pending transaction.

This method is required by DB API 2.0 but is not supported by Athena since Athena does not support transactions.

Raises:: NotSupportedError – Always raised since transactions are not supported.

Standard Cursors¶

class pyathena.cursor.Cursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, **kwargs)[source]¶

A DB API 2.0 compliant cursor for executing SQL queries on Amazon Athena.

The Cursor class provides methods for executing SQL queries against Amazon Athena and retrieving results. It follows the Python Database API Specification v2.0 (PEP 249) and provides familiar database cursor operations.

This cursor returns results as tuples by default. For other data formats, consider using specialized cursor classes like PandasCursor or ArrowCursor.

description¶: Sequence of column descriptions for the last query.

rowcount¶: Number of rows affected by the last query (-1 for SELECT queries).

arraysize¶: Default number of rows to fetch with fetchmany().

Example

>>> cursor = connection.cursor()
>>> cursor.execute("SELECT name, age FROM users WHERE age > %s", (18,))
>>> while True:
...     row = cursor.fetchone()
...     if not row:
...         break
...     print(f"Name: {row[0]}, Age: {row[1]}")

>>> cursor.execute("CREATE TABLE test AS SELECT 1 as id, 'test' as name")
>>> print(f"Created table, rows affected: {cursor.rowcount}")

__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, on_start_query_execution: Callable[[str], None] | None = None, **kwargs) → None[source]¶

property arraysize: int¶

execute(operation: str, parameters: dict[str, Any] | list[str] | None = None, work_group: str | None = None, s3_staging_dir: str | None = None, cache_size: int = 0, cache_expiration_time: int = 0, result_reuse_enable: bool | None = None, result_reuse_minutes: int | None = None, paramstyle: str | None = None, on_start_query_execution: Callable[[str], None] | None = None, result_set_type_hints: dict[str | int, str] | None = None, **kwargs) → Cursor[source]¶

Execute a SQL query.

Parameters:

operation – SQL query string to execute.
parameters – Query parameters (optional).
on_start_query_execution – Callback function called immediately after start_query_execution API is called. Function signature: (query_id: str) -> None This allows early access to query_id for monitoring/cancellation.
result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types. For example: {"tags": "array(varchar)", "metadata": "map(varchar, integer)"}
**kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

Example

>>> cursor.execute(
...     "SELECT * FROM table_with_complex_types",
...     result_set_type_hints={
...         "tags": "array(varchar)",
...         "metadata": "map(varchar, integer)",
...     }
... )

DEFAULT_FETCH_SIZE: int = 1000¶

DEFAULT_RESULT_REUSE_MINUTES = 60¶

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

cancel() → None¶

Cancel the currently executing query.

Raises:: ProgrammingError – If no query is currently executing.

property catalog: str | None¶

close() → None¶: Close the cursor and release associated resources.

property completion_date_time: datetime | None¶

property connection: Connection[Any]¶

property data_manifest_location: str | None¶

property data_scanned_in_bytes: int | None¶

property database: str | None¶

property description: list[tuple[str, str, None, None, int, int, str]] | None¶

property effective_engine_version: str | None¶

property encryption_option: str | None¶

property engine_execution_time_in_millis: int | None¶

property error_category: int | None¶

property error_message: str | None¶

property error_type: int | None¶

executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None¶

Execute a SQL query multiple times with different parameters.

Parameters:

operation – SQL query string to execute.
seq_of_parameters – Sequence of parameter sets, one per execution.
**kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]¶

property expected_bucket_owner: str | None¶

fetchall() → list[tuple[Any | None, ...] | dict[Any, Any | None]]¶

Fetch all remaining rows from the result set.

Returns:: List of tuples representing all remaining rows.
Raises:: ProgrammingError – If no result set is available.

fetchmany(size: int | None = None) → list[tuple[Any | None, ...] | dict[Any, Any | None]]¶

Fetch multiple rows from the result set.

Parameters:: size – Maximum number of rows to fetch. Defaults to arraysize.
Returns:: List of tuples representing the fetched rows.
Raises:: ProgrammingError – If no result set is available.

fetchone() → tuple[Any | None, ...] | dict[Any, Any | None] | None¶

Fetch the next row of the result set.

Returns:: A tuple representing the next row, or None if no more rows.
Raises:: ProgrammingError – If no result set is available.

static get_default_converter(unload: bool = False) → DefaultTypeConverter | Any¶

Get the default type converter for this cursor class.

Parameters:: unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
Returns:: The default type converter instance for this cursor type.

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata¶

property has_result_set: bool¶

property kms_key: str | None¶

list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase]¶

list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata]¶

property output_location: str | None¶

property query: str | None¶

property query_id: str | None¶

property query_planning_time_in_millis: int | None¶

property query_queue_time_in_millis: int | None¶

property result_reuse_enabled: bool | None¶

property result_reuse_minutes: int | None¶

property result_set: AthenaResultSet | None¶

property retryable: bool | None¶

property reused_previous_result: bool | None¶

property rowcount: int¶

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:: The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None¶

property s3_acl_option: str | None¶

property selected_engine_version: str | None¶

property service_processing_time_in_millis: int | None¶

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

property state: str | None¶

property state_change_reason: str | None¶

property statement_type: str | None¶

property submission_date_time: datetime | None¶

property substatement_type: str | None¶

property total_execution_time_in_millis: int | None¶

property work_group: str | None¶

class pyathena.cursor.DictCursor(**kwargs)[source]¶

A cursor that returns query results as dictionaries instead of tuples.

DictCursor provides the same functionality as the standard Cursor but returns rows as dictionaries where column names are keys. This makes it easier to access column values by name rather than position.

Example

>>> cursor = connection.cursor(DictCursor)
>>> cursor.execute("SELECT id, name, email FROM users LIMIT 1")
>>> row = cursor.fetchone()
>>> print(f"User: {row['name']} ({row['email']})")

>>> cursor.execute("SELECT * FROM products")
>>> for row in cursor.fetchall():
...     print(f"Product {row['id']}: {row['name']} - ${row['price']}")

__init__(**kwargs) → None[source]¶

DEFAULT_FETCH_SIZE: int = 1000¶

DEFAULT_RESULT_REUSE_MINUTES = 60¶

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

property arraysize: int¶

cancel() → None¶

Cancel the currently executing query.

Raises:: ProgrammingError – If no query is currently executing.

property catalog: str | None¶

close() → None¶: Close the cursor and release associated resources.

property completion_date_time: datetime | None¶

property connection: Connection[Any]¶

property data_manifest_location: str | None¶

property data_scanned_in_bytes: int | None¶

property database: str | None¶

property description: list[tuple[str, str, None, None, int, int, str]] | None¶

property effective_engine_version: str | None¶

property encryption_option: str | None¶

property engine_execution_time_in_millis: int | None¶

property error_category: int | None¶

property error_message: str | None¶

property error_type: int | None¶

Execute a SQL query.

Parameters:

operation – SQL query string to execute.
parameters – Query parameters (optional).
on_start_query_execution – Callback function called immediately after start_query_execution API is called. Function signature: (query_id: str) -> None This allows early access to query_id for monitoring/cancellation.
result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types. For example: {"tags": "array(varchar)", "metadata": "map(varchar, integer)"}
**kwargs – Additional execution parameters.

Returns:

Self reference for method chaining.

Example

>>> cursor.execute(
...     "SELECT * FROM table_with_complex_types",
...     result_set_type_hints={
...         "tags": "array(varchar)",
...         "metadata": "map(varchar, integer)",
...     }
... )

executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None¶

Execute a SQL query multiple times with different parameters.

Parameters:

operation – SQL query string to execute.
seq_of_parameters – Sequence of parameter sets, one per execution.
**kwargs – Additional keyword arguments passed to each execute().

property execution_parameters: list[str]¶

property expected_bucket_owner: str | None¶

fetchall() → list[tuple[Any | None, ...] | dict[Any, Any | None]]¶

Fetch all remaining rows from the result set.

Returns:: List of tuples representing all remaining rows.
Raises:: ProgrammingError – If no result set is available.

fetchmany(size: int | None = None) → list[tuple[Any | None, ...] | dict[Any, Any | None]]¶

Fetch multiple rows from the result set.

Parameters:: size – Maximum number of rows to fetch. Defaults to arraysize.
Returns:: List of tuples representing the fetched rows.
Raises:: ProgrammingError – If no result set is available.

fetchone() → tuple[Any | None, ...] | dict[Any, Any | None] | None¶

Fetch the next row of the result set.

Returns:: A tuple representing the next row, or None if no more rows.
Raises:: ProgrammingError – If no result set is available.

static get_default_converter(unload: bool = False) → DefaultTypeConverter | Any¶

Get the default type converter for this cursor class.

Parameters:: unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
Returns:: The default type converter instance for this cursor type.

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata¶

property has_result_set: bool¶

property kms_key: str | None¶

list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase]¶

list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata]¶

property output_location: str | None¶

property query: str | None¶

property query_id: str | None¶

property query_planning_time_in_millis: int | None¶

property query_queue_time_in_millis: int | None¶

property result_reuse_enabled: bool | None¶

property result_reuse_minutes: int | None¶

property result_set: AthenaResultSet | None¶

property retryable: bool | None¶

property reused_previous_result: bool | None¶

property rowcount: int¶

Get the number of rows affected by the last operation.

For SELECT statements, this returns -1 as per DB API 2.0 specification. For DML operations (INSERT, UPDATE, DELETE) and CTAS, this returns the number of affected rows.

Returns:: The number of rows, or -1 if not applicable or unknown.

property rownumber: int | None¶

property s3_acl_option: str | None¶

property selected_engine_version: str | None¶

property service_processing_time_in_millis: int | None¶

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

property state: str | None¶

property state_change_reason: str | None¶

property statement_type: str | None¶

property submission_date_time: datetime | None¶

property substatement_type: str | None¶

property total_execution_time_in_millis: int | None¶

property work_group: str | None¶

Asynchronous Cursors¶

class pyathena.async_cursor.AsyncCursor(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, max_workers: int = 20, arraysize: int = 1000, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, **kwargs)[source]¶

Asynchronous cursor for non-blocking Athena query execution.

This cursor allows multiple queries to be executed concurrently without blocking the main thread. It’s useful for applications that need to execute multiple queries in parallel or perform other work while queries are running.

The cursor maintains a thread pool for executing queries asynchronously and provides methods to check query status and retrieve results when ready.

description[source]¶: Sequence of column descriptions for the last query.

rowcount¶: Number of rows affected by the last query (-1 for SELECT queries).

arraysize¶: Default number of rows to fetch with fetchmany().

max_workers¶: Maximum number of worker threads for concurrent execution.

Example

>>> cursor = connection.cursor(AsyncCursor)
>>>
>>> # Execute multiple queries concurrently
>>> future1 = cursor.execute("SELECT COUNT(*) FROM table1")
>>> future2 = cursor.execute("SELECT COUNT(*) FROM table2")
>>> future3 = cursor.execute("SELECT COUNT(*) FROM table3")
>>>
>>> # Check if queries are done and get results
>>> if future1.done():
...     result1 = future1.result().fetchall()
>>>
>>> # Wait for all to complete
>>> results = [f.result().fetchall() for f in [future1, future2, future3]]

Note

Each execute() call returns a Future object that can be used to check completion status and retrieve results.

__init__(s3_staging_dir: str | None = None, schema_name: str | None = None, catalog_name: str | None = None, work_group: str | None = None, poll_interval: float = 1, encryption_option: str | None = None, kms_key: str | None = None, kill_on_interrupt: bool = True, max_workers: int = 20, arraysize: int = 1000, result_reuse_enable: bool = False, result_reuse_minutes: int = 60, **kwargs) → None[source]¶

property arraysize: int¶

close(wait: bool = False) → None[source]¶

description(query_id: str) → Future[list[tuple[str, str, None, None, int, int, str]] | None][source]¶

query_execution(query_id: str) → Future[AthenaQueryExecution][source]¶

Get query execution details asynchronously.

Retrieves the current execution status and metadata for a query. This is useful for monitoring query progress without blocking.

Parameters:: query_id – The Athena query execution ID.
Returns:: Future object containing AthenaQueryExecution with query details.

poll(query_id: str) → Future[AthenaQueryExecution][source]¶

Poll for query completion asynchronously.

Waits for the query to complete (succeed, fail, or be cancelled) and returns the final execution status. This method blocks until completion but runs the polling in a background thread.

Parameters:: query_id – The Athena query execution ID to poll.
Returns:: Future object containing the final AthenaQueryExecution status.

Note

This method performs polling internally, so it will take time proportional to your query execution duration.

Execute a SQL query asynchronously.

Starts query execution on Amazon Athena and returns immediately without waiting for completion. The query runs in the background while your application can continue with other work.

Parameters:

operation – SQL query string to execute.
parameters – Query parameters (optional).
work_group – Athena workgroup to use (optional).
s3_staging_dir – S3 location for query results (optional).
cache_size – Query result cache size in MB (optional).
cache_expiration_time – Cache expiration time in seconds (optional).
result_reuse_enable – Enable result reuse for identical queries (optional).
result_reuse_minutes – Result reuse duration in minutes (optional).
paramstyle – Parameter style to use (optional).
result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.
**kwargs – Additional execution parameters.

Returns:

query_id: Athena query execution ID for tracking
future: Future object for result retrieval

Return type:

Tuple of (query_id, future) where

Example

>>> query_id, future = cursor.execute("SELECT * FROM large_table")
>>> print(f"Query started: {query_id}")
>>> # Do other work while query runs...
>>> result_set = future.result()  # Wait for completion

executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None[source]¶

Execute multiple queries asynchronously (not supported).

This method is not supported for asynchronous cursors because managing multiple concurrent queries would be complex and resource-intensive.

Parameters:

operation – SQL query string.
seq_of_parameters – Sequence of parameter sets.
**kwargs – Additional arguments.

Raises:

NotSupportedError – Always raised as this operation is not supported.

Note

For bulk operations, consider using execute() with parameterized queries or batch processing patterns instead.

cancel(query_id: str) → Future[None][source]¶

Cancel a running query asynchronously.

Submits a cancellation request for the specified query. The cancellation itself runs asynchronously in the background.

Parameters:: query_id – The Athena query execution ID to cancel.
Returns:: Future object that completes when the cancellation request finishes.

Example

>>> query_id, future = cursor.execute("SELECT * FROM huge_table")
>>> # Later, cancel the query
>>> cancel_future = cursor.cancel(query_id)
>>> cancel_future.result()  # Wait for cancellation to complete

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

property connection: Connection[Any]¶

static get_default_converter(unload: bool = False) → DefaultTypeConverter | Any¶

Get the default type converter for this cursor class.

Parameters:: unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
Returns:: The default type converter instance for this cursor type.

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata¶

list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase]¶

list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata]¶

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

class pyathena.async_cursor.AsyncDictCursor(**kwargs)[source]¶

Asynchronous cursor that returns query results as dictionaries.

Combines the asynchronous execution capabilities of AsyncCursor with the dictionary-based result format of DictCursor. Results are returned as dictionaries where column names are keys, making it easier to access column values by name rather than position.

Example

>>> cursor = connection.cursor(AsyncDictCursor)
>>> future = cursor.execute("SELECT id, name, email FROM users")
>>> result_cursor = future.result()
>>> row = result_cursor.fetchone()
>>> print(f"User: {row['name']} ({row['email']})")

__init__(**kwargs) → None[source]¶

LIST_DATABASES_MAX_RESULTS = 50¶

LIST_QUERY_EXECUTIONS_MAX_RESULTS = 50¶

LIST_TABLE_METADATA_MAX_RESULTS = 50¶

property arraysize: int¶

cancel(query_id: str) → Future[None]¶

Cancel a running query asynchronously.

Submits a cancellation request for the specified query. The cancellation itself runs asynchronously in the background.

Parameters:: query_id – The Athena query execution ID to cancel.
Returns:: Future object that completes when the cancellation request finishes.

Example

>>> query_id, future = cursor.execute("SELECT * FROM huge_table")
>>> # Later, cancel the query
>>> cancel_future = cursor.cancel(query_id)
>>> cancel_future.result()  # Wait for cancellation to complete

close(wait: bool = False) → None¶

property connection: Connection[Any]¶

description(query_id: str) → Future[list[tuple[str, str, None, None, int, int, str]] | None]¶

Execute a SQL query asynchronously.

Starts query execution on Amazon Athena and returns immediately without waiting for completion. The query runs in the background while your application can continue with other work.

Parameters:

operation – SQL query string to execute.
parameters – Query parameters (optional).
work_group – Athena workgroup to use (optional).
s3_staging_dir – S3 location for query results (optional).
cache_size – Query result cache size in MB (optional).
cache_expiration_time – Cache expiration time in seconds (optional).
result_reuse_enable – Enable result reuse for identical queries (optional).
result_reuse_minutes – Result reuse duration in minutes (optional).
paramstyle – Parameter style to use (optional).
result_set_type_hints – Optional dictionary mapping column names to Athena DDL type signatures for precise type conversion within complex types.
**kwargs – Additional execution parameters.

Returns:

query_id: Athena query execution ID for tracking
future: Future object for result retrieval

Return type:

Tuple of (query_id, future) where

Example

>>> query_id, future = cursor.execute("SELECT * FROM large_table")
>>> print(f"Query started: {query_id}")
>>> # Do other work while query runs...
>>> result_set = future.result()  # Wait for completion

executemany(operation: str, seq_of_parameters: list[dict[str, Any] | list[str] | None], **kwargs) → None¶

Execute multiple queries asynchronously (not supported).

This method is not supported for asynchronous cursors because managing multiple concurrent queries would be complex and resource-intensive.

Parameters:

operation – SQL query string.
seq_of_parameters – Sequence of parameter sets.
**kwargs – Additional arguments.

Raises:

NotSupportedError – Always raised as this operation is not supported.

Note

For bulk operations, consider using execute() with parameterized queries or batch processing patterns instead.

static get_default_converter(unload: bool = False) → DefaultTypeConverter | Any¶

Get the default type converter for this cursor class.

Parameters:: unload – Whether the converter is for UNLOAD operations. Some cursor types may return different converters for UNLOAD operations.
Returns:: The default type converter instance for this cursor type.

get_table_metadata(table_name: str, catalog_name: str | None = None, schema_name: str | None = None, logging_: bool = True) → AthenaTableMetadata¶

list_databases(catalog_name: str | None, max_results: int | None = None) → list[AthenaDatabase]¶

list_table_metadata(catalog_name: str | None = None, schema_name: str | None = None, expression: str | None = None, max_results: int | None = None) → list[AthenaTableMetadata]¶

poll(query_id: str) → Future[AthenaQueryExecution]¶

Poll for query completion asynchronously.

Waits for the query to complete (succeed, fail, or be cancelled) and returns the final execution status. This method blocks until completion but runs the polling in a background thread.

Parameters:: query_id – The Athena query execution ID to poll.
Returns:: Future object containing the final AthenaQueryExecution status.

Note

This method performs polling internally, so it will take time proportional to your query execution duration.

query_execution(query_id: str) → Future[AthenaQueryExecution]¶

Get query execution details asynchronously.

Retrieves the current execution status and metadata for a query. This is useful for monitoring query progress without blocking.

Parameters:: query_id – The Athena query execution ID.
Returns:: Future object containing AthenaQueryExecution with query details.

setinputsizes(sizes)¶: Does nothing by default

setoutputsize(size, column=None)¶: Does nothing by default

Result Sets¶

class pyathena.result_set.AthenaResultSet(connection: Connection[Any], converter: Converter, query_execution: AthenaQueryExecution, arraysize: int, retry_config: RetryConfig, _pre_fetch: bool = True, result_set_type_hints: dict[str | int, str] | None = None)[source]¶

Result set for Athena query execution using the GetQueryResults API.

This class provides a DB API 2.0 compliant result set implementation that fetches query results from Amazon Athena. It uses the GetQueryResults API to retrieve data in paginated chunks, converting each value according to its Athena data type.

The result set exposes query execution metadata (timing, data scanned, state, etc.) through read-only properties, allowing inspection of query performance and status.

This is the base result set implementation used by the standard Cursor. Specialized implementations exist for different output formats:

AthenaArrowResultSet: Apache Arrow format

AthenaPandasResultSet: Pandas DataFrame

AthenaS3FSResultSet: S3 file-based access

Example

>>> cursor.execute("SELECT * FROM my_table")
>>> result_set = cursor.result_set
>>> print(f"Query ID: {result_set.query_id}")
>>> print(f"Data scanned: {result_set.data_scanned_in_bytes} bytes")
>>> for row in result_set:
...     print(row)