Class SparkDataSet
- java.lang.Object
-
- it.bancaditalia.oss.vtl.impl.types.dataset.AbstractDataSet
-
- it.bancaditalia.oss.vtl.impl.environment.spark.SparkDataSet
-
- All Implemented Interfaces:
DataSet
,VTLValue
,Serializable
,Iterable<DataPoint>
public class SparkDataSet extends AbstractDataSet
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description SparkDataSet(org.apache.spark.sql.SparkSession session, DataPointEncoder encoder, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame)
SparkDataSet(org.apache.spark.sql.SparkSession session, DataSetMetadata dataStructure, DataSet toWrap)
SparkDataSet(org.apache.spark.sql.SparkSession session, DataSetMetadata dataStructure, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description <TT> DataSet
aggr(DataSetMetadata structure, Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, SerCollector<DataPoint,?,TT> groupCollector, SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,DataPoint> finisher)
Perform a reduction over a dataset, producing a result for each group defined common values of the specified identifiers<TT> DataSet
analytic(Map<DataStructureComponent<ComponentRole.Measure,?,?>,DataStructureComponent<ComponentRole.Measure,?,?>> components, WindowClause clause, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerCollector<ScalarValue<?,?,?,?>,?,TT>> collectors, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerBiFunction<TT,ScalarValue<?,?,?,?>,ScalarValue<?,?,?,?>>> finishers)
DataSet
filter(SerPredicate<DataPoint> predicate)
DataSet
filteredMappedJoin(DataSetMetadata metadata, DataSet other, SerBiPredicate<DataPoint,DataPoint> predicate, SerBinaryOperator<DataPoint> mergeOp)
Creates a new DataSet by joining each DataPoint of this DataSet to all indexed DataPoints of another DataSet by matching the common identifiers.DataSet
getMatching(Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> keyValues)
Create a new DataSet by filtering this DataSet'sDataPoint
s matching the specified values for some identifiers.boolean
isCacheable()
DataSet
mapKeepingKeys(DataSetMetadata metadata, SerFunction<? super DataPoint,? extends Lineage> lineageOperator, SerFunction<? super DataPoint,? extends Map<? extends DataStructureComponent<?,?,?>,? extends ScalarValue<?,?,?,?>>> operator)
DataSet
mappedJoin(DataSetMetadata metadata, DataSet indexed, SerBinaryOperator<DataPoint> merge)
Creates a new DataSet by joining each DataPoint of this DataSet to all indexed DataPoints of another DataSet by matching the common identifiers.DataSet
membership(String alias, Lineage lineage)
Creates a new dataset retaining the specified component along with all identifiers of this datasetlong
size()
NOTE: The default implementation traverses this DataSet entirely.<A,T,TT>
Stream<T>streamByKeys(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> filter, SerCollector<DataPoint,A,TT> groupCollector, SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,T> finisher)
Groups all the datapoints of this DataSet having the same values for the specified identifiers, and performs a mutable reduction over each of a chosen subset of the groups, and applying a final transformation.protected Stream<DataPoint>
streamDataPoints()
-
Methods inherited from class it.bancaditalia.oss.vtl.impl.types.dataset.AbstractDataSet
filteredMappedJoinWithIndex, getComponent, getMetadata, stream, toString
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface it.bancaditalia.oss.vtl.model.data.DataSet
analytic, analytic, contains, getComponent, getComponent, getComponent, getComponents, getComponents, isIndexed, iterator, notContains, streamByKeys, streamByKeys, streamByKeys
-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Constructor Detail
-
SparkDataSet
public SparkDataSet(org.apache.spark.sql.SparkSession session, DataPointEncoder encoder, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame)
-
SparkDataSet
public SparkDataSet(org.apache.spark.sql.SparkSession session, DataSetMetadata dataStructure, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame)
-
SparkDataSet
public SparkDataSet(org.apache.spark.sql.SparkSession session, DataSetMetadata dataStructure, DataSet toWrap)
-
-
Method Detail
-
streamDataPoints
protected Stream<DataPoint> streamDataPoints()
- Specified by:
streamDataPoints
in classAbstractDataSet
-
membership
public DataSet membership(String alias, Lineage lineage)
Description copied from interface:DataSet
Creates a new dataset retaining the specified component along with all identifiers of this dataset- Specified by:
membership
in interfaceDataSet
- Overrides:
membership
in classAbstractDataSet
- Parameters:
alias
- The component to retain.lineage
- the lineage of the membership operator- Returns:
- The projected dataset
-
filter
public DataSet filter(SerPredicate<DataPoint> predicate)
Description copied from interface:DataSet
- Specified by:
filter
in interfaceDataSet
- Overrides:
filter
in classAbstractDataSet
- Parameters:
predicate
- ThePredicate
to be applied.- Returns:
- A new filtered DataSet.
-
getMatching
public DataSet getMatching(Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> keyValues)
Description copied from interface:DataSet
Create a new DataSet by filtering this DataSet'sDataPoint
s matching the specified values for some identifiers.- Parameters:
keyValues
- AMap
containing values for some of this DataSetComponentRole.Identifier
s. If the map is empty, the result is thisDataSet
.- Returns:
- A new
DataSet
of matchingDataPoint
s, eventually empty.
-
mappedJoin
public DataSet mappedJoin(DataSetMetadata metadata, DataSet indexed, SerBinaryOperator<DataPoint> merge)
Description copied from interface:DataSet
Creates a new DataSet by joining each DataPoint of this DataSet to all indexed DataPoints of another DataSet by matching the common identifiers. The same asfilteredMappedJoin(metadata, other, (a, b) -> true, merge)
.- Parameters:
metadata
- Thestructure
the new DataSet must conform to.indexed
- another DataSet that will be indexed and joined to each DataPoint of this DataSet.merge
- aBinaryOperator
that merges two selected joined DataPoints together into one.- Returns:
- The new DataSet.
-
size
public long size()
Description copied from interface:DataSet
NOTE: The default implementation traverses this DataSet entirely.- Returns:
- The size of this DataSet.
-
mapKeepingKeys
public DataSet mapKeepingKeys(DataSetMetadata metadata, SerFunction<? super DataPoint,? extends Lineage> lineageOperator, SerFunction<? super DataPoint,? extends Map<? extends DataStructureComponent<?,?,?>,? extends ScalarValue<?,?,?,?>>> operator)
Description copied from interface:DataSet
- Specified by:
mapKeepingKeys
in interfaceDataSet
- Overrides:
mapKeepingKeys
in classAbstractDataSet
- Parameters:
metadata
- Thestructure
the new dataset must conform to.lineageOperator
- TODOoperator
- aFunction
that maps each of this DataSet'sDataPoint
s.- Returns:
- The new transformed DataSet.
-
filteredMappedJoin
public DataSet filteredMappedJoin(DataSetMetadata metadata, DataSet other, SerBiPredicate<DataPoint,DataPoint> predicate, SerBinaryOperator<DataPoint> mergeOp)
Description copied from interface:DataSet
Creates a new DataSet by joining each DataPoint of this DataSet to all indexed DataPoints of another DataSet by matching the common identifiers.- Specified by:
filteredMappedJoin
in interfaceDataSet
- Overrides:
filteredMappedJoin
in classAbstractDataSet
- Parameters:
metadata
- Thestructure
the new DataSet must conform to.other
- another DataSet that will be indexed and joined to each DataPoint of this DataSet.predicate
- aBiPredicate
used to select only a subset of the joinedDataPoint
s.mergeOp
- aBinaryOperator
that merges two selected joined DataPoints together into one.- Returns:
- The new DataSet.
-
analytic
public <TT> DataSet analytic(Map<DataStructureComponent<ComponentRole.Measure,?,?>,DataStructureComponent<ComponentRole.Measure,?,?>> components, WindowClause clause, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerCollector<ScalarValue<?,?,?,?>,?,TT>> collectors, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerBiFunction<TT,ScalarValue<?,?,?,?>,ScalarValue<?,?,?,?>>> finishers)
- Specified by:
analytic
in interfaceDataSet
- Overrides:
analytic
in classAbstractDataSet
-
aggr
public <TT> DataSet aggr(DataSetMetadata structure, Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, SerCollector<DataPoint,?,TT> groupCollector, SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,DataPoint> finisher)
Description copied from interface:DataSet
Perform a reduction over a dataset, producing a result for each group defined common values of the specified identifiers- Specified by:
aggr
in interfaceDataSet
- Overrides:
aggr
in classAbstractDataSet
- Type Parameters:
TT
- The type of the result of the aggregation- Parameters:
structure
- the metadata of the structure producedkeys
- the identifiers on whose values datapoints should be groupedgroupCollector
- the aggregator that performs the reductionfinisher
- a finisher that may manipulate the result given the group where it belongs- Returns:
- a new dataset where each datapoint is the result of the aggregation of a group.
-
streamByKeys
public <A,T,TT> Stream<T> streamByKeys(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> filter, SerCollector<DataPoint,A,TT> groupCollector, SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,T> finisher)
Description copied from interface:DataSet
Groups all the datapoints of this DataSet having the same values for the specified identifiers, and performs a mutable reduction over each of a chosen subset of the groups, and applying a final transformation.- Specified by:
streamByKeys
in interfaceDataSet
- Overrides:
streamByKeys
in classAbstractDataSet
T
- the type of the result of the computation.- Parameters:
keys
- theComponentRole.Identifier
s used to group the datapointsfilter
- aMap
ofComponentRole.Identifier
's values used to exclude matching groupsgroupCollector
- aCollector
applied to each group to produce the resultfinisher
- aBiFunction
to apply to the group key and result to produce the final result- Returns:
- a
Stream
of<T>
objects containing the result of the computation for each group.
-
isCacheable
public boolean isCacheable()
- Returns:
- true if this DataSet can be cached
-
-