torcharrow.dataframe¶

torcharrow.dataframe(data: Optional[Union[Iterable, DType]] = None, dtype: Optional[DType] = None, columns: Optional[List[str]] = None, device: str = '')¶

建立一個 TorchArrow DataFrame。

參數：

data (字典或 元組列表) – 定義 DataFrame 的內容。字典鍵用於欄位名稱，值用於欄位資料。使用 dtype 來強制特定的欄位順序。當 Data 是元組列表時，必須提供 dtype 來推斷欄位名稱。
dtype (dtype, 預設值 None) – 要強制執行的資料類型。如果為 None，則會盡可能自動推斷類型。應該是一個提供 dt.Fields 列表的 dt.Struct()。
columns (字串列表, 預設值 None) – 欄位名稱。當 data 是沒有提供自定義 dtype 的元組列表時使用。當 data 和 dtype 皆為 None 時，應保留為 None（語義是建構一個沒有任何欄位的預設空 DataFrame）。
device (Device, 預設值 "") – Device 從範圍中選擇要使用的執行階段。TorchArrow 支援多個執行階段（CPU 和 GPU）。如果未提供，則使用 Velox 向量化執行階段。有效值為 “cpu”（Velox）、“gpu”（即將推出）。

範例

DataFrame 是一組命名且類型明確、長度相等的欄位

>>> import torcharrow as ta
>>> df = ta.dataframe({'a': list(range(7)),
>>>                    'b': list(reversed(range(7))),
>>>                    'c': list(range(7))
>>>                   })
>>> df
  index    a    b    c
-------  ---  ---  ---
      0    0    6    0
      1    1    5    1
      2    2    4    2
      3    3    3    3
      4    4    2    4
      5    5    1    5
      6    6    0    6
dtype: Struct([Field('a', int64), Field('b', int64), Field('c', int64)]), count: 7, null_count: 0

DataFrame 是不可變的，但您可以隨時新增新的欄位，前提是其名稱尚未被使用。該欄位會附加到現有欄位的末尾

>>> df['d'] = ta.column(list(range(99, 99+7)))
>>> df
  index    a    b    c    d
-------  ---  ---  ---  ---
      0    0    6    0   99
      1    1    5    1  100
      2    2    4    2  101
      3    3    3    3  102
      4    4    2    4  103
      5    5    1    5  104
      6    6    0    6  105
dtype: Struct([Field('a', int64), Field('b', int64), Field('c', int64), Field('d', int64)]), count: 7, null_count: 0

建立巢狀 DataFrame

>>> df_inner = ta.dataframe({'b1': [11, 22, 33], 'b2':[111,222,333]})
>>> df_outer = ta.dataframe({'a': [1, 2, 3], 'b':df_inner})
>>> df_outer
  index    a  b
-------  ---  ---------
      0    1  (11, 111)
      1    2  (22, 222)
      2    3  (33, 333)
dtype: Struct([Field('a', int64), Field('b', Struct([Field('b1', int64), Field('b2', int64)]))]), count: 3, null_count: 0

從元組列表建立 DataFrame

>>> import torcharrow.dtypes as dt
>>> l = [(1, 'a'), (2, 'b'), (3, 'c')]
>>> ta.dataframe(l, dtype = dt.Struct([dt.Field('t1', dt.int64), dt.Field('t2', dt.string)]))
  index    t1  t2
-------  ----  ----
      0     1  a
      1     2  b
      2     3  c
dtype: Struct([Field('t1', int64), Field('t2', string)]), count: 3, null_count: 0

或者

>>> ta.dataframe(l, columns=['t1', 't2'])
  index    t1  t2
-------  ----  ----
      0     1  a
      1     2  b
      2     3  c
dtype: Struct([Field('t1', int64), Field('t2', string)]), count: 3, null_count: 0

torcharrow.dataframe¶

文件

教學課程

資源