Skip to content

DataStream.mean

Return the mean values of the specified columns.

Parameters:

Name Type Description Default
columns str or list

the column name or a list of column names.

required
collect bool

if True, return a Polars DataFrame. If False, return a Quokka DataStream.

True
Source code in pyquokka/datastream.py
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
2058
2059
2060
2061
2062
2063
2064
def mean(self, columns, collect = True):

    """
    Return the mean values of the specified columns.

    Args:
        columns (str or list): the column name or a list of column names.
        collect (bool): if True, return a Polars DataFrame. If False, return a Quokka DataStream.
    """

    assert type(columns) == str or type(columns) == list
    if type(columns) == str:
        columns = [columns]
    for col in columns:
        assert col in self.schema
    if collect:
        return self.agg({col: "mean" for col in columns}).collect()
    else:
        return self.agg({col: "mean" for col in columns})