Skip to content

DataStream.compute

This will trigger the execution of computational graph, but store the result cached across the cluster. The result will be a Quokka DataSet. You can read a DataSet x back into a DataStream via qc.read_dataset(x). This is similar to Spark's persist() method.

Return

Quokka DataSet. This can be thought of as a list of objects cached in memory/disk across the cluster.

Examples:

>>> f = qc.read_csv("my_csv.csv")
>>> result = f.collect()  
>>> d = qc.read_dataset(result)
Source code in pyquokka/datastream.py
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def compute(self):
    """
    This will trigger the execution of computational graph, but store the result cached across the cluster.
    The result will be a Quokka DataSet. You can read a DataSet `x` back into a DataStream via `qc.read_dataset(x)`.
    This is similar to Spark's `persist()` method.

    Return:
        Quokka DataSet. This can be thought of as a list of objects cached in memory/disk across the cluster.

    Examples:

        >>> f = qc.read_csv("my_csv.csv")
        >>> result = f.collect()  
        >>> d = qc.read_dataset(result)
    """
    dataset = self.quokka_context.new_dataset(self, self.schema)
    return self.quokka_context.execute_node(dataset.source_node_id, collect=False)