The union of two streams is a stream that contains all the elements of both streams.
The two streams must have the same schema.
Note: since DataStreams are not ordered, you should not make any assumptions on the ordering of the rows in the result.
Parameters:
Name
Type
Description
Default
other
DataStream
another DataStream of the same schema.
required
Return
A new DataStream of the same schema as the two streams, with rows from both.
defunion(self,other):""" The union of two streams is a stream that contains all the elements of both streams. The two streams must have the same schema. Note: since DataStreams are not ordered, you should not make any assumptions on the ordering of the rows in the result. Args: other (DataStream): another DataStream of the same schema. Return: A new DataStream of the same schema as the two streams, with rows from both. Examples: >>> d = qc.read_csv("lineitem.csv") >>> d1 = qc.read_csv("lineitem1.csv") >>> d1 = d.union(d1) """assertself.schema==other.schemaassertself.sorted==other.sortedclassUnionExecutor(Executor):def__init__(self)->None:self.state=Nonedefexecute(self,batches,stream_id,executor_id):returnpa.concat_tables(batches)defdone(self,executor_id):returnexecutor=UnionExecutor()node=StatefulNode(schema=self.schema,# cannot push through any predicates or projections!schema_mapping={col:{0:col,1:col}forcolinself.schema},required_columns={0:set(),1:set()},operator=executor)returnself.quokka_context.new_stream(sources={0:self,1:other},partitioners={0:PassThroughPartitioner(),1:PassThroughPartitioner()},node=node,schema=self.schema,sorted=None)