QuokkaContext.set_config

This sets a config value for the entire cluster. You should do this at the very start of your program generally speaking.

The following keys are supported:

optimize_joins: bool, whether to optimize joins based on cardinality estimates. Default to True
s3_csv_materialize_threshold: int, the threshold in bytes for when to materialize a CSV file in S3
disk_csv_materialize_threshold: int, the threshold in bytes for when to materialize a CSV file on disk
s3_parquet_materialize_threshold: int, the threshold in bytes for when to materialize a Parquet file in S3
disk_parquet_materialize_threshold: int, the threshold in bytes for when to materialize a Parquet file on disk
hbq_path: str, the disk spill directory. Default to "/data"
fault_tolerance: bool, whether to enable fault tolerance. Default to False

Parameters:

Name	Type	Description	Default
`key`	`str`	the key to set	required
`value`	`any`	the value to set	required

Returns:

Type	Description
	None

Examples:

>>> from pyquokka.df import *
>>> qc = QuokkaContext()

Turn on join order optimization.

>>> qc.set_config("optimize_joins", True)

Turn off fault tolerance.

>>> qc.set_config("fault_tolerance", False)

Source code in pyquokka/df.py

def set_config(self, key, value):

    """
    This sets a config value for the entire cluster. You should do this at the very start of your program generally speaking.

    The following keys are supported:

    1. optimize_joins: bool, whether to optimize joins based on cardinality estimates. Default to True

    2. s3_csv_materialize_threshold: int, the threshold in bytes for when to materialize a CSV file in S3

    3. disk_csv_materialize_threshold: int, the threshold in bytes for when to materialize a CSV file on disk

    4. s3_parquet_materialize_threshold: int, the threshold in bytes for when to materialize a Parquet file in S3

    5. disk_parquet_materialize_threshold: int, the threshold in bytes for when to materialize a Parquet file on disk

    6. hbq_path: str, the disk spill directory. Default to "/data"

    7. fault_tolerance: bool, whether to enable fault tolerance. Default to False

    Args:
        key (str): the key to set
        value (any): the value to set

    Returns:
        None

    Examples:

        >>> from pyquokka.df import *
        >>> qc = QuokkaContext()

        Turn on join order optimization.

        >>> qc.set_config("optimize_joins", True)

        Turn off fault tolerance. 

        >>> qc.set_config("fault_tolerance", False)

    """

    if key in self.sql_config:
        self.sql_config[key] = value
    elif key in self.exec_config:
        self.exec_config[key] = value
        assert all(ray.get([task_manager.set_config.remote(key, value) for task_manager in self.task_managers.values()]))
    else:
        raise Exception("key not found in config")