S3FS

S3FSCursor

S3FSCursor is a lightweight cursor that directly handles the CSV file of the query execution result output to S3. Unlike ArrowCursor or PandasCursor, this cursor does not require pandas or pyarrow dependencies, making it ideal for environments where installing these libraries is not desirable.

Key features:

  • No pandas or pyarrow dependencies required

  • Lightweight CSV parsing (custom parser or Python’s built-in csv module)

  • Lower memory footprint for simple query results

  • Full DB API 2.0 compatibility

You can use the S3FSCursor by specifying the cursor_class with the connect method or connection object.

from pyathena import connect
from pyathena.s3fs.cursor import S3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2",
                 cursor_class=S3FSCursor).cursor()
from pyathena.connection import Connection
from pyathena.s3fs.cursor import S3FSCursor

cursor = Connection(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                    region_name="us-west-2",
                    cursor_class=S3FSCursor).cursor()

It can also be used by specifying the cursor class when calling the connection object’s cursor method.

from pyathena import connect
from pyathena.s3fs.cursor import S3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2").cursor(S3FSCursor)
from pyathena.connection import Connection
from pyathena.s3fs.cursor import S3FSCursor

cursor = Connection(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                    region_name="us-west-2").cursor(S3FSCursor)

Support fetch and iterate query results.

from pyathena import connect
from pyathena.s3fs.cursor import S3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2",
                 cursor_class=S3FSCursor).cursor()

cursor.execute("SELECT * FROM many_rows")
print(cursor.fetchone())
print(cursor.fetchmany())
print(cursor.fetchall())
from pyathena import connect
from pyathena.s3fs.cursor import S3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2",
                 cursor_class=S3FSCursor).cursor()

cursor.execute("SELECT * FROM many_rows")
for row in cursor:
    print(row)

Execution information of the query can also be retrieved.

from pyathena import connect
from pyathena.s3fs.cursor import S3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2",
                 cursor_class=S3FSCursor).cursor()

cursor.execute("SELECT * FROM many_rows")
print(cursor.state)
print(cursor.state_change_reason)
print(cursor.completion_date_time)
print(cursor.submission_date_time)
print(cursor.data_scanned_in_bytes)
print(cursor.engine_execution_time_in_millis)
print(cursor.query_queue_time_in_millis)
print(cursor.total_execution_time_in_millis)
print(cursor.query_planning_time_in_millis)
print(cursor.service_processing_time_in_millis)
print(cursor.output_location)

Type conversion

S3FSCursor converts Athena data types to Python types using the built-in converter. The following type mappings are used:

Athena Type

Python Type

boolean

bool

tinyint, smallint, integer, bigint

int

float, double, real

float

decimal

decimal.Decimal

char, varchar, string

str

date

datetime.date

timestamp

datetime.datetime

time

datetime.time

binary, varbinary

bytes

array, map, row (struct)

Parsed as Python list/dict using JSON-like parsing

json

Parsed JSON (dict or list)

If you want to customize type conversion, create a converter class like this:

from pyathena.s3fs.converter import DefaultS3FSTypeConverter

class CustomS3FSTypeConverter(DefaultS3FSTypeConverter):
    def __init__(self) -> None:
        super().__init__()
        # Override specific type mappings
        self._mappings["custom_type"] = self._convert_custom

    def _convert_custom(self, value: str) -> Any:
        # Your custom conversion logic
        return value.upper()

Then specify an instance of this class in the converter argument when creating a cursor.

from pyathena import connect
from pyathena.s3fs.cursor import S3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2").cursor(S3FSCursor, converter=CustomS3FSTypeConverter())

CSV reader options

S3FSCursor supports pluggable CSV reader implementations to control how NULL values and empty strings are handled. Two readers are provided:

  • AthenaCSVReader (default): Custom parser that distinguishes between NULL and empty string

  • DefaultCSVReader: Uses Python’s built-in csv module (treats both NULL and empty string as empty string)

Default behavior (AthenaCSVReader):

By default, AthenaCSVReader is used, which correctly distinguishes between NULL values and empty strings in query results.

from pyathena import connect
from pyathena.s3fs.cursor import S3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2",
                 cursor_class=S3FSCursor).cursor()

cursor.execute("SELECT NULL AS null_col, '' AS empty_col")
row = cursor.fetchone()
print(row)  # (None, '')  - NULL is None, empty string is ''

Switching to Python’s built-in csv module (DefaultCSVReader):

If you prefer to use Python’s built-in csv module, you can switch to DefaultCSVReader. Note that this reader cannot distinguish between NULL and empty string - both become empty strings in the parsed result, which are then converted to None by the type converter.

from pyathena import connect
from pyathena.s3fs.cursor import S3FSCursor
from pyathena.s3fs.reader import DefaultCSVReader

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2",
                 cursor_class=S3FSCursor,
                 cursor_kwargs={"csv_reader": DefaultCSVReader}).cursor()

cursor.execute("SELECT NULL AS null_col, '' AS empty_col")
row = cursor.fetchone()
print(row)  # (None, None)  - Both NULL and empty string become None

Comparison of CSV readers:

Reader

Implementation

NULL value

Empty string

AthenaCSVReader (default)

Custom parser

None

‘’ (empty string)

DefaultCSVReader

Python csv module

None

None

Why the difference?

Athena’s CSV output format distinguishes between NULL values and empty strings:

  • NULL: unquoted empty field (e.g., a,,b -> the middle field is NULL)

  • Empty string: quoted empty field (e.g., a,"",b -> the middle field is an empty string)

Python’s standard csv module parses both cases as empty strings, losing this distinction. The AthenaCSVReader implements a custom parser that preserves the difference.

Limitations

S3FSCursor has some limitations compared to ArrowCursor or PandasCursor:

  • No UNLOAD support: S3FSCursor reads CSV results directly and does not support the UNLOAD option that outputs results in Parquet format.

  • Sequential reading: Results are read row by row from the CSV file, which may be slower for very large result sets compared to columnar formats.

  • No DataFrame conversion: There is no as_pandas() or as_arrow() method. Use PandasCursor or ArrowCursor if you need DataFrame operations.

When to use S3FSCursor

S3FSCursor is recommended when:

  • You want to minimize dependencies (no pandas/pyarrow required)

  • You’re working in a constrained environment (e.g., AWS Lambda with size limits)

  • You only need simple row-by-row result processing

  • Memory efficiency is important and results don’t need columnar operations

For large-scale data processing or analytical workloads, consider using ArrowCursor or PandasCursor instead.

AsyncS3FSCursor

AsyncS3FSCursor is an AsyncCursor that uses the same lightweight CSV parsing as S3FSCursor. This cursor is useful when you need to execute queries asynchronously without pandas or pyarrow dependencies.

You can use the AsyncS3FSCursor by specifying the cursor_class with the connect method or connection object.

from pyathena import connect
from pyathena.s3fs.async_cursor import AsyncS3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2",
                 cursor_class=AsyncS3FSCursor).cursor()
from pyathena.connection import Connection
from pyathena.s3fs.async_cursor import AsyncS3FSCursor

cursor = Connection(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                    region_name="us-west-2",
                    cursor_class=AsyncS3FSCursor).cursor()

It can also be used by specifying the cursor class when calling the connection object’s cursor method.

from pyathena import connect
from pyathena.s3fs.async_cursor import AsyncS3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2").cursor(AsyncS3FSCursor)
from pyathena.connection import Connection
from pyathena.s3fs.async_cursor import AsyncS3FSCursor

cursor = Connection(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                    region_name="us-west-2").cursor(AsyncS3FSCursor)

The default number of workers is 5 or cpu number * 5. If you want to change the number of workers you can specify like the following.

from pyathena import connect
from pyathena.s3fs.async_cursor import AsyncS3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2",
                 cursor_class=AsyncS3FSCursor).cursor(max_workers=10)

The execute method of the AsyncS3FSCursor returns the tuple of the query ID and the future object.

from pyathena import connect
from pyathena.s3fs.async_cursor import AsyncS3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2",
                 cursor_class=AsyncS3FSCursor).cursor()

query_id, future = cursor.execute("SELECT * FROM many_rows")

The return value of the future object is an AthenaS3FSResultSet object. This object has an interface similar to AthenaResultSetObject.

from pyathena import connect
from pyathena.s3fs.async_cursor import AsyncS3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2",
                 cursor_class=AsyncS3FSCursor).cursor()

query_id, future = cursor.execute("SELECT * FROM many_rows")
result_set = future.result()
print(result_set.state)
print(result_set.state_change_reason)
print(result_set.completion_date_time)
print(result_set.submission_date_time)
print(result_set.data_scanned_in_bytes)
print(result_set.engine_execution_time_in_millis)
print(result_set.query_queue_time_in_millis)
print(result_set.total_execution_time_in_millis)
print(result_set.query_planning_time_in_millis)
print(result_set.service_processing_time_in_millis)
print(result_set.output_location)
print(result_set.description)
for row in result_set:
    print(row)
from pyathena import connect
from pyathena.s3fs.async_cursor import AsyncS3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2",
                 cursor_class=AsyncS3FSCursor).cursor()

query_id, future = cursor.execute("SELECT * FROM many_rows")
result_set = future.result()
print(result_set.fetchall())

As with AsyncCursor, you need a query ID to cancel a query.

from pyathena import connect
from pyathena.s3fs.async_cursor import AsyncS3FSCursor

cursor = connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                 region_name="us-west-2",
                 cursor_class=AsyncS3FSCursor).cursor()

query_id, future = cursor.execute("SELECT * FROM many_rows")
cursor.cancel(query_id)

AioS3FSCursor

AioS3FSCursor is a native asyncio cursor that uses the same lightweight CSV parsing as S3FSCursor. Unlike AsyncS3FSCursor which uses concurrent.futures, this cursor uses asyncio.to_thread() for both result set creation and fetch operations, keeping the event loop free.

Since AthenaS3FSResultSet lazily streams rows from S3 via a CSV reader, fetch methods are async and require await.

from pyathena import aio_connect
from pyathena.aio.s3fs.cursor import AioS3FSCursor

async with await aio_connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                          region_name="us-west-2") as conn:
    cursor = conn.cursor(AioS3FSCursor)
    await cursor.execute("SELECT * FROM many_rows")
    print(await cursor.fetchone())
    print(await cursor.fetchmany(10))
    print(await cursor.fetchall())

Async iteration is supported:

from pyathena import aio_connect
from pyathena.aio.s3fs.cursor import AioS3FSCursor

async with await aio_connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                          region_name="us-west-2") as conn:
    cursor = conn.cursor(AioS3FSCursor)
    await cursor.execute("SELECT * FROM many_rows")
    async for row in cursor:
        print(row)

Execution information of the query can also be retrieved:

from pyathena import aio_connect
from pyathena.aio.s3fs.cursor import AioS3FSCursor

async with await aio_connect(s3_staging_dir="s3://YOUR_S3_BUCKET/path/to/",
                          region_name="us-west-2") as conn:
    cursor = conn.cursor(AioS3FSCursor)
    await cursor.execute("SELECT * FROM many_rows")
    print(cursor.state)
    print(cursor.data_scanned_in_bytes)
    print(cursor.output_location)