DataStream.compute
This will trigger the execution of computational graph, but store the result cached across the cluster.
The result will be a Quokka DataSet. You can read a DataSet x
back into a DataStream via qc.read_dataset(x)
.
This is similar to Spark's persist()
method.
Return
Quokka DataSet. This can be thought of as a list of objects cached in memory/disk across the cluster.
Examples:
>>> f = qc.read_csv("my_csv.csv")
>>> result = f.collect()
>>> d = qc.read_dataset(result)
Source code in pyquokka/datastream.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|