Skip to content

DataStream.count_distinct

Count the number of distinct values of a column. This may result in out of memory. This is not approximate.

Parameters:

Name Type Description Default
col str

the column to count distinct values of

required
Source code in pyquokka/datastream.py
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
def count_distinct(self, col):

    """
    Count the number of distinct values of a column. This may result in out of memory. This is not approximate.

    Args:
        col (str): the column to count distinct values of

    """

    return self._grouped_count_distinct([], col)