Skip to content

GroupedDataStream.count_distinct

Count the number of distinct values of a column for each group. This may result in out of memory. This is not approximate.

Parameters:

Name Type Description Default
col str

the column to count distinct values of

required
Source code in pyquokka/datastream.py
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
def count_distinct(self, col: str):

    """
    Count the number of distinct values of a column for each group. This may result in out of memory. This is not approximate.

    Args:
        col (str): the column to count distinct values of

    """

    return self.source_data_stream._grouped_count_distinct(self.groupby, col, self.orderby)