QuokkaContext.set_config
This sets a config value for the entire cluster. You should do this at the very start of your program generally speaking.
The following keys are supported:
-
optimize_joins: bool, whether to optimize joins based on cardinality estimates. Default to True
-
s3_csv_materialize_threshold: int, the threshold in bytes for when to materialize a CSV file in S3
-
disk_csv_materialize_threshold: int, the threshold in bytes for when to materialize a CSV file on disk
-
s3_parquet_materialize_threshold: int, the threshold in bytes for when to materialize a Parquet file in S3
-
disk_parquet_materialize_threshold: int, the threshold in bytes for when to materialize a Parquet file on disk
-
hbq_path: str, the disk spill directory. Default to "/data"
-
fault_tolerance: bool, whether to enable fault tolerance. Default to False
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
str
|
the key to set |
required |
value |
any
|
the value to set |
required |
Returns:
Type | Description |
---|---|
None |
Examples:
>>> from pyquokka.df import *
>>> qc = QuokkaContext()
Turn on join order optimization.
>>> qc.set_config("optimize_joins", True)
Turn off fault tolerance.
>>> qc.set_config("fault_tolerance", False)
Source code in pyquokka/df.py
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
|