Skip to content

QuokkaContext.from_pandas

Create a DataStream from a pandas DataFrame. The DataFrame will be materialized. If you don't know what this means, don't worry about it.

Parameters:

Name Type Description Default
df Pandas DataFrame

The pandas DataFrame to create the DataStream from.

required

Returns:

Name Type Description
DataStream

The DataStream created from the pandas DataFrame.

Examples:

>>> import pandas as pd
>>> from pyquokka.df import QuokkaContext
>>> qc = QuokkaContext()
>>> df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
>>> stream = qc.from_pandas(df)
>>> stream.count()
Source code in pyquokka/df.py
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
def from_pandas(self, df):

    """
    Create a DataStream from a pandas DataFrame. The DataFrame will be materialized. If you don't know what this means, don't worry about it.

    Args:
        df (Pandas DataFrame): The pandas DataFrame to create the DataStream from.

    Returns:
        DataStream: The DataStream created from the pandas DataFrame.

    Examples:

        >>> import pandas as pd
        >>> from pyquokka.df import QuokkaContext
        >>> qc = QuokkaContext()
        >>> df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
        >>> stream = qc.from_pandas(df)
        >>> stream.count()

    """

    self.nodes[self.latest_node_id] = InputPolarsNode(polars.from_pandas(df))
    self.latest_node_id += 1
    return DataStream(self, df.columns, self.latest_node_id - 1, materialized=True)