DataStream.drop

Think of this as the anti-opereator to select. Instead of selecting columns, this will drop columns. This is implemented in Quokka as selecting the columns in the DataStream's schema that are not dropped.

Parameters:

Name	Type	Description	Default
`cols_to_drop`	`list`	a list of columns to drop from the source DataStream	required

Return

A DataStream consisting of all columns in the source DataStream that are not in cols_to_drop.

Examples:

>>> f = qc.read_csv("lineitem.csv")

Drop the l_orderdate and l_orderkey columns

>>> f = f.drop(["l_orderdate", "l_orderkey"])

This will now fail, since you dropped l_orderdate

>>> f = f.select(["l_orderdate"])

Source code in pyquokka/datastream.py

def drop(self, cols_to_drop: list):

    """
    Think of this as the anti-opereator to select. Instead of selecting columns, this will drop columns. 
    This is implemented in Quokka as selecting the columns in the DataStream's schema that are not dropped.

    Args:
        cols_to_drop (list): a list of columns to drop from the source DataStream

    Return:
        A DataStream consisting of all columns in the source DataStream that are not in `cols_to_drop`.

    Examples:
        >>> f = qc.read_csv("lineitem.csv")

        Drop the l_orderdate and l_orderkey columns

        >>> f = f.drop(["l_orderdate", "l_orderkey"])

        This will now fail, since you dropped l_orderdate

        >>> f = f.select(["l_orderdate"])
    """
    assert type(cols_to_drop) == list
    actual_cols_to_drop = []
    for col in cols_to_drop:
        if col in self.schema:
            actual_cols_to_drop.append(col)
        if self.sorted is not None:
            assert col not in self.sorted, "cannot drop a sort key!"
    if len(actual_cols_to_drop) == 0:
        return self
    else:
        if self.materialized:
            df = self._get_materialized_df().drop(actual_cols_to_drop)
            return self.quokka_context.from_polars(df)
        else:
            return self.select([col for col in self.schema if col not in cols_to_drop])