DataStream.compute
This will trigger the execution of computational graph, but store the result cached across the cluster.
The result will be a Quokka DataSet. You can read a DataSet x back into a DataStream via qc.read_dataset(x).
This is similar to Spark's persist() method.
Return
Quokka DataSet. This can be thought of as a list of objects cached in memory/disk across the cluster.
Examples:
>>> f = qc.read_csv("my_csv.csv")
>>> result = f.collect()
>>> d = qc.read_dataset(result)
Source code in pyquokka/datastream.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | |