java.lang.Object
- it.bancaditalia.oss.vtl.impl.types.dataset.AbstractDataSet
- - it.bancaditalia.oss.vtl.impl.environment.spark.SparkDataSet

All Implemented Interfaces:

DataSet, VTLValue, Serializable, Iterable<DataPoint>
```
public class SparkDataSet
extends AbstractDataSet
```
See Also:

Serialized Form

Constructor Summary

Constructors
Constructor	Description
`SparkDataSet(org.apache.spark.sql.SparkSession session, DataPointEncoder encoder, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame)`
`SparkDataSet(org.apache.spark.sql.SparkSession session, DataSetMetadata dataStructure, DataSet toWrap)`
`SparkDataSet(org.apache.spark.sql.SparkSession session, DataSetMetadata dataStructure, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`<TT> DataSet`	`aggr(DataSetMetadata structure, Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, SerCollector<DataPoint,?,TT> groupCollector, SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,DataPoint> finisher)`	Perform a reduction over a dataset, producing a result for each group defined common values of the specified identifiers
`<TT> DataSet`	`analytic(Map<DataStructureComponent<ComponentRole.Measure,?,?>,DataStructureComponent<ComponentRole.Measure,?,?>> components, WindowClause clause, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerCollector<ScalarValue<?,?,?,?>,?,TT>> collectors, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerBiFunction<TT,ScalarValue<?,?,?,?>,ScalarValue<?,?,?,?>>> finishers)`
`DataSet`	`filter(SerPredicate<DataPoint> predicate)`	Creates a new DataSet by filtering this DataSet with a given `Predicate` on each of its `DataPoint`.
`DataSet`	`filteredMappedJoin(DataSetMetadata metadata, DataSet other, SerBiPredicate<DataPoint,DataPoint> predicate, SerBinaryOperator<DataPoint> mergeOp)`	Creates a new DataSet by joining each DataPoint of this DataSet to all indexed DataPoints of another DataSet by matching the common identifiers.
`DataSet`	`getMatching(Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> keyValues)`	Create a new DataSet by filtering this DataSet's `DataPoint`s matching the specified values for some identifiers.
`boolean`	`isCacheable()`
`DataSet`	`mapKeepingKeys(DataSetMetadata metadata, SerFunction<? super DataPoint,? extends Lineage> lineageOperator, SerFunction<? super DataPoint,? extends Map<? extends DataStructureComponent<?,?,?>,? extends ScalarValue<?,?,?,?>>> operator)`	Creates a new DataSet by transforming each of this DataSet's `DataPoint` by a given `Function`.
`DataSet`	`mappedJoin(DataSetMetadata metadata, DataSet indexed, SerBinaryOperator<DataPoint> merge)`	Creates a new DataSet by joining each DataPoint of this DataSet to all indexed DataPoints of another DataSet by matching the common identifiers.
`DataSet`	`membership(String alias, Lineage lineage)`	Creates a new dataset retaining the specified component along with all identifiers of this dataset
`long`	`size()`	NOTE: The default implementation traverses this DataSet entirely.
`<A,T,TT> Stream<T>`	`streamByKeys(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> filter, SerCollector<DataPoint,A,TT> groupCollector, SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,T> finisher)`	Groups all the datapoints of this DataSet having the same values for the specified identifiers, and performs a mutable reduction over each of a chosen subset of the groups, and applying a final transformation.
`protected Stream<DataPoint>`	`streamDataPoints()`

Methods inherited from class it.bancaditalia.oss.vtl.impl.types.dataset.AbstractDataSet
filteredMappedJoinWithIndex, getComponent, getMetadata, stream, toString

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface it.bancaditalia.oss.vtl.model.data.DataSet
analytic, analytic, contains, getComponent, getComponent, getComponent, getComponents, getComponents, isIndexed, iterator, notContains, streamByKeys, streamByKeys, streamByKeys

Methods inherited from interface java.lang.Iterable
forEach, spliterator

- Constructor Detail
  - SparkDataSet
```
public SparkDataSet(org.apache.spark.sql.SparkSession session,
                    DataPointEncoder encoder,
                    org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame)
```
  - SparkDataSet
```
public SparkDataSet(org.apache.spark.sql.SparkSession session,
                    DataSetMetadata dataStructure,
                    org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> dataFrame)
```
  - SparkDataSet
```
public SparkDataSet(org.apache.spark.sql.SparkSession session,
                    DataSetMetadata dataStructure,
                    DataSet toWrap)
```
- Method Detail
  - streamDataPoints
```
protected Stream<DataPoint> streamDataPoints()
```
    Specified by:
    
    streamDataPoints in class AbstractDataSet
  - membership
```
public DataSet membership(String alias,
                          Lineage lineage)
```
    Description copied from interface: DataSet
    
    Creates a new dataset retaining the specified component along with all identifiers of this dataset
    
    Specified by:
    
    membership in interface DataSet
    
    Overrides:
    
    membership in class AbstractDataSet
    
    Parameters:
    
    alias - The component to retain.
    
    lineage - the lineage of the membership operator
    
    Returns:
    
    The projected dataset
  - filter
```
public DataSet filter(SerPredicate<DataPoint> predicate)
```
    Description copied from interface: DataSet
    
    Creates a new DataSet by filtering this DataSet with a given Predicate on each of its DataPoint.
    
    Specified by:
    
    filter in interface DataSet
    
    Overrides:
    
    filter in class AbstractDataSet
    
    Parameters:
    
    predicate - The Predicate to be applied.
    
    Returns:
    
    A new filtered DataSet.
  - getMatching
```
public DataSet getMatching(Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> keyValues)
```
    Description copied from interface: DataSet
    
    Create a new DataSet by filtering this DataSet's DataPoints matching the specified values for some identifiers.
    
    Parameters:
    
    keyValues - A Map containing values for some of this DataSet ComponentRole.Identifiers. If the map is empty, the result is this DataSet.
    
    Returns:
    
    A new DataSet of matching DataPoints, eventually empty.
  - mappedJoin
```
public DataSet mappedJoin(DataSetMetadata metadata,
                          DataSet indexed,
                          SerBinaryOperator<DataPoint> merge)
```
    Description copied from interface: DataSet
    
    Creates a new DataSet by joining each DataPoint of this DataSet to all indexed DataPoints of another DataSet by matching the common identifiers. The same as filteredMappedJoin(metadata, other, (a, b) -> true, merge).
    
    Parameters:
    
    metadata - The structure the new DataSet must conform to.
    
    indexed - another DataSet that will be indexed and joined to each DataPoint of this DataSet.
    
    merge - a BinaryOperator that merges two selected joined DataPoints together into one.
    
    Returns:
    
    The new DataSet.
  - size
```
public long size()
```
    Description copied from interface: DataSet
    
    NOTE: The default implementation traverses this DataSet entirely.
    
    Returns:
    
    The size of this DataSet.
  - mapKeepingKeys
```
public DataSet mapKeepingKeys(DataSetMetadata metadata,
                              SerFunction<? super DataPoint,? extends Lineage> lineageOperator,
                              SerFunction<? super DataPoint,? extends Map<? extends DataStructureComponent<?,?,?>,? extends ScalarValue<?,?,?,?>>> operator)
```
    Description copied from interface: DataSet
    
    Creates a new DataSet by transforming each of this DataSet's DataPoint by a given Function.
    
    Specified by:
    
    mapKeepingKeys in interface DataSet
    
    Overrides:
    
    mapKeepingKeys in class AbstractDataSet
    
    Parameters:
    
    metadata - The structure the new dataset must conform to.
    
    lineageOperator - TODO
    
    operator - a Function that maps each of this DataSet's DataPoints.
    
    Returns:
    
    The new transformed DataSet.
  - filteredMappedJoin
```
public DataSet filteredMappedJoin(DataSetMetadata metadata,
                                  DataSet other,
                                  SerBiPredicate<DataPoint,DataPoint> predicate,
                                  SerBinaryOperator<DataPoint> mergeOp)
```
    Description copied from interface: DataSet
    
    Creates a new DataSet by joining each DataPoint of this DataSet to all indexed DataPoints of another DataSet by matching the common identifiers.
    
    Specified by:
    
    filteredMappedJoin in interface DataSet
    
    Overrides:
    
    filteredMappedJoin in class AbstractDataSet
    
    Parameters:
    
    metadata - The structure the new DataSet must conform to.
    
    other - another DataSet that will be indexed and joined to each DataPoint of this DataSet.
    
    predicate - a BiPredicate used to select only a subset of the joined DataPoints.
    
    mergeOp - a BinaryOperator that merges two selected joined DataPoints together into one.
    
    Returns:
    
    The new DataSet.
  - analytic
```
public <TT> DataSet analytic(Map<DataStructureComponent<ComponentRole.Measure,?,?>,DataStructureComponent<ComponentRole.Measure,?,?>> components,
                             WindowClause clause,
                             Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerCollector<ScalarValue<?,?,?,?>,?,TT>> collectors,
                             Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerBiFunction<TT,ScalarValue<?,?,?,?>,ScalarValue<?,?,?,?>>> finishers)
```
    Specified by:
    
    analytic in interface DataSet
    
    Overrides:
    
    analytic in class AbstractDataSet
  - aggr
```
public <TT> DataSet aggr(DataSetMetadata structure,
                         Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys,
                         SerCollector<DataPoint,?,TT> groupCollector,
                         SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,DataPoint> finisher)
```
    Description copied from interface: DataSet
    
    Perform a reduction over a dataset, producing a result for each group defined common values of the specified identifiers
    
    Specified by:
    
    aggr in interface DataSet
    
    Overrides:
    
    aggr in class AbstractDataSet
    
    Type Parameters:
    
    TT - The type of the result of the aggregation
    
    Parameters:
    
    structure - the metadata of the structure produced
    
    keys - the identifiers on whose values datapoints should be grouped
    
    groupCollector - the aggregator that performs the reduction
    
    finisher - a finisher that may manipulate the result given the group where it belongs
    
    Returns:
    
    a new dataset where each datapoint is the result of the aggregation of a group.
  - streamByKeys
```
public <A,T,TT> Stream<T> streamByKeys(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys,
                                                   Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> filter,
                                                   SerCollector<DataPoint,A,TT> groupCollector,
                                                   SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,T> finisher)
```
    Description copied from interface: DataSet
    
    Groups all the datapoints of this DataSet having the same values for the specified identifiers, and performs a mutable reduction over each of a chosen subset of the groups, and applying a final transformation.
    
    Specified by:
    
    streamByKeys in interface DataSet
    
    Overrides:
    
    streamByKeys in class AbstractDataSet
    
    T - the type of the result of the computation.
    
    Parameters:
    
    keys - the ComponentRole.Identifiers used to group the datapoints
    
    filter - a Map of ComponentRole.Identifier's values used to exclude matching groups
    
    groupCollector - a Collector applied to each group to produce the result
    
    finisher - a BiFunction to apply to the group key and result to produce the final result
    
    Returns:
    
    a Stream of <T> objects containing the result of the computation for each group.
  - isCacheable
```
public boolean isCacheable()
```
    Returns:
    
    true if this DataSet can be cached

Class SparkDataSet

Constructor Summary

Method Summary

Methods inherited from class it.bancaditalia.oss.vtl.impl.types.dataset.AbstractDataSet

Methods inherited from class java.lang.Object

Methods inherited from interface it.bancaditalia.oss.vtl.model.data.DataSet

Methods inherited from interface java.lang.Iterable

Constructor Detail

SparkDataSet

SparkDataSet

SparkDataSet

Method Detail

streamDataPoints

membership

filter

getMatching

mappedJoin

size

mapKeepingKeys

filteredMappedJoin

analytic

aggr

streamByKeys

isCacheable