Skip to content

QuokkaContext.from_arrow

Create a DataStream for a pyarrow Table. The DataFrame will be materialized. If you don't know what this means, don't worry about it.

Parameters:

Name Type Description Default
df PyArrow Table

The polars DataFrame to create the DataStream from.

required

Returns:

Name Type Description
DataStream

The DataStream created from the polars DataFrame.

Examples:

>>> import polars as pl
>>> from pyquokka.df import QuokkaContext
>>> qc = QuokkaContext()
>>> df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}).to_arrow()
>>> stream = qc.from_arrow(df)
>>> stream.count()
Source code in pyquokka/df.py
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
def from_arrow(self, df):

    """
    Create a DataStream for a pyarrow Table. The DataFrame will be materialized. If you don't know what this means, don't worry about it.

    Args:
        df (PyArrow Table): The polars DataFrame to create the DataStream from.

    Returns:
        DataStream: The DataStream created from the polars DataFrame.

    Examples:

        >>> import polars as pl
        >>> from pyquokka.df import QuokkaContext
        >>> qc = QuokkaContext()
        >>> df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}).to_arrow()
        >>> stream = qc.from_arrow(df)
        >>> stream.count()

    """

    self.nodes[self.latest_node_id] = InputPolarsNode(polars.from_arrow(df))
    self.latest_node_id += 1
    return DataStream(self, df.columns, self.latest_node_id - 1, materialized=True)