NumPy Integration#

PyArrow allows converting back and forth from NumPy arrays to Arrow Arrays.

NumPy to Arrow#

To convert a NumPy array to Arrow, one can simply call the pyarrow.array() factory function.

>>> import numpy as np
>>> import pyarrow as pa
>>> data = np.arange(10, dtype='int16')
>>> arr = pa.array(data)
>>> arr
<pyarrow.lib.Int16Array object at ...>
[
  0,
  1,
  2,
  3,
  4,
  5,
  6,
  7,
  8,
  9
]

Converting from NumPy supports a wide range of input dtypes, including structured dtypes or strings.

Arrow to NumPy#

In the reverse direction, it is possible to produce a view of an Arrow Array for use with NumPy using the to_numpy() method. This is limited to primitive types for which NumPy has the same physical representation as Arrow, and assuming the Arrow data has no nulls.

>>> import numpy as np
>>> import pyarrow as pa
>>> arr = pa.array([4, 5, 6], type=pa.int32())
>>> view = arr.to_numpy()
>>> view
array([4, 5, 6], dtype=int32)

For more complex data types, you have to use the to_pandas() method (which will construct a Numpy array with Pandas semantics for, e.g., representation of null values).

Timezone-aware Timestamps#

NumPy’s datetime64 type does not support timezones. When converting a timezone-aware Arrow timestamp array to NumPy via to_numpy(), the timezone information is silently dropped:

>>> arr = pa.array([1735689600, 1735689600], type=pa.timestamp("s", tz="UTC"))
>>> arr.type
TimestampType(timestamp[s, tz=UTC])
>>> arr.to_numpy()
array(['2025-01-01T00:00:00', '2025-01-01T00:00:00'],
      dtype='datetime64[s]')

If you need to preserve timezone information, there are two alternatives:

  • Convert to a Pandas Series, which supports timezone-aware datetime64 dtypes:

    >>> arr.to_pandas()
    0   2025-01-01 00:00:00+00:00
    1   2025-01-01 00:00:00+00:00
    dtype: datetime64[s, UTC]
    

    To get a NumPy array while preserving timezone information, use timestamp_as_object=True:

    >>> arr.to_pandas(timestamp_as_object=True).to_numpy()
    array([datetime.datetime(2025, 1, 1, 0, 0, tzinfo=...),
           datetime.datetime(2025, 1, 1, 0, 0, tzinfo=...)],
          dtype=object)
    

    Note

    For nested types (e.g., list arrays containing timestamps), to_pandas() may not preserve timezone information. Structs and maps do retain timezones, but lists currently do not. See GH-41162 for details.

  • Convert to Python datetime objects, which carry tzinfo:

    >>> arr.to_pylist()
    [datetime.datetime(2025, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC')),
     datetime.datetime(2025, 1, 1, 0, 0, tzinfo=zoneinfo.ZoneInfo(key='UTC'))]