Interface DataSet
-
- All Superinterfaces:
Iterable<DataPoint>
,Serializable
,VTLValue
- All Known Implementing Classes:
AbstractDataSet
,CachedDataSet
,ColumnarDataSet
,LightDataSet
,LightF2DataSet
,LightFDataSet
,NamedDataSet
,SparkDataSet
public interface DataSet extends VTLValue, Iterable<DataPoint>
The base interface describing a dataset- Author:
- Valentino Pinna
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description <TT> DataSet
aggr(DataSetMetadata structure, Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, SerCollector<DataPoint,?,TT> groupCollector, SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,DataPoint> finisher)
Perform a reduction over a dataset, producing a result for each group defined common values of the specified identifiers<TT> DataSet
analytic(Map<DataStructureComponent<ComponentRole.Measure,?,?>,DataStructureComponent<ComponentRole.Measure,?,?>> components, WindowClause clause, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerCollector<ScalarValue<?,?,?,?>,?,TT>> collectors, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerBiFunction<TT,ScalarValue<?,?,?,?>,ScalarValue<?,?,?,?>>> finishers)
default DataSet
analytic(Set<DataStructureComponent<ComponentRole.Measure,?,?>> components, WindowClause clause, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerCollector<ScalarValue<?,?,?,?>,?,ScalarValue<?,?,?,?>>> collectors)
default <TT> DataSet
analytic(Set<DataStructureComponent<ComponentRole.Measure,?,?>> components, WindowClause clause, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerCollector<ScalarValue<?,?,?,?>,?,TT>> collectors, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerBiFunction<TT,ScalarValue<?,?,?,?>,ScalarValue<?,?,?,?>>> finishers)
default boolean
contains(DataPoint datapoint)
Checks if a DataPoint is contained in this DataSet.DataSet
filter(SerPredicate<DataPoint> predicate)
DataSet
filteredMappedJoin(DataSetMetadata metadata, DataSet indexed, SerBiPredicate<DataPoint,DataPoint> filter, SerBinaryOperator<DataPoint> merge)
Creates a new DataSet by joining each DataPoint of this DataSet to all indexed DataPoints of another DataSet by matching the common identifiers.Optional<DataStructureComponent<?,?,?>>
getComponent(String name)
Finds a component with given namedefault <R extends ComponentRole>
Optional<DataStructureComponent<R,?,?>>getComponent(String name, Class<R> role)
Obtains a component with given name, and checks that it belongs to the specified domain.default <R extends ComponentRole,S extends ValueDomainSubset<S,D>,D extends ValueDomain>
Optional<DataStructureComponent<R,S,D>>getComponent(String name, Class<R> role, S domain)
Obtains a component with given name if it has the specified role, and checks that it belongs to the specified domain.default <S extends ValueDomainSubset<S,D>,D extends ValueDomain>
Optional<DataStructureComponent<?,S,D>>getComponent(String name, S domain)
Obtains a component with given name, and checks that it belongs to the specified domain.default <R extends ComponentRole>
Set<DataStructureComponent<R,?,?>>getComponents(Class<R> typeOfComponent)
default <R extends ComponentRole,S extends ValueDomainSubset<S,D>,D extends ValueDomain>
Set<DataStructureComponent<R,S,D>>getComponents(Class<R> role, S domain)
default DataSet
getMatching(Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> keyValues)
Create a new DataSet by filtering this DataSet'sDataPoint
s matching the specified values for some identifiers.DataSetMetadata
getMetadata()
default boolean
isCacheable()
default boolean
isIndexed(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys)
Checks if this DataSet is indexed.default Iterator<DataPoint>
iterator()
DataSet
mapKeepingKeys(DataSetMetadata metadata, SerFunction<? super DataPoint,? extends Lineage> lineageOperator, SerFunction<? super DataPoint,? extends Map<? extends DataStructureComponent<?,?,?>,? extends ScalarValue<?,?,?,?>>> operator)
default DataSet
mappedJoin(DataSetMetadata metadata, DataSet indexed, SerBinaryOperator<DataPoint> merge)
Creates a new DataSet by joining each DataPoint of this DataSet to all indexed DataPoints of another DataSet by matching the common identifiers.DataSet
membership(String component, Lineage lineage)
Creates a new dataset retaining the specified component along with all identifiers of this datasetdefault boolean
notContains(DataPoint datapoint)
Checks if a DataPoint is not contained in this DataSet.default long
size()
NOTE: The default implementation traverses this DataSet entirely.Stream<DataPoint>
stream()
default <T> Stream<T>
streamByKeys(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, SerCollector<DataPoint,?,T> groupCollector)
Groups all the datapoints of this DataSet having the same values for the specified identifiers, and performs a mutable reduction over each of the groups.default <A,T,TT>
Stream<T>streamByKeys(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, SerCollector<DataPoint,A,TT> groupCollector, SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,T> finisher)
Groups all the datapoints of this DataSet having the same values for the specified identifiers, and performs a mutable reduction over each of the groups, and applying a final transformation.default <A,T>
Stream<T>streamByKeys(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> filter, SerCollector<DataPoint,A,T> groupCollector)
Groups all the datapoints of this DataSet having the same values for the specified identifiers, and performs a mutable reduction over each of a chosen subset of the groups.<A,T,TT>
Stream<T>streamByKeys(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> filter, SerCollector<DataPoint,A,TT> groupCollector, SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,T> finisher)
Groups all the datapoints of this DataSet having the same values for the specified identifiers, and performs a mutable reduction over each of a chosen subset of the groups, and applying a final transformation.-
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
-
-
-
Method Detail
-
getMetadata
DataSetMetadata getMetadata()
- Specified by:
getMetadata
in interfaceVTLValue
- Returns:
- The
structure
of this DataSet.
-
membership
DataSet membership(String component, Lineage lineage)
Creates a new dataset retaining the specified component along with all identifiers of this dataset- Parameters:
component
- The component to retain.lineage
- the lineage of the membership operator- Returns:
- The projected dataset
-
getComponent
Optional<DataStructureComponent<?,?,?>> getComponent(String name)
Finds a component with given name- Parameters:
name
- The requested component's name.- Returns:
- an
Optional
eventually containing the requestedDataStructureComponent
if one was found.
-
getMatching
default DataSet getMatching(Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> keyValues)
Create a new DataSet by filtering this DataSet'sDataPoint
s matching the specified values for some identifiers.- Parameters:
keyValues
- AMap
containing values for some of this DataSetComponentRole.Identifier
s. If the map is empty, the result is thisDataSet
.- Returns:
- A new
DataSet
of matchingDataPoint
s, eventually empty.
-
filter
DataSet filter(SerPredicate<DataPoint> predicate)
- Parameters:
predicate
- ThePredicate
to be applied.- Returns:
- A new filtered DataSet.
-
mapKeepingKeys
DataSet mapKeepingKeys(DataSetMetadata metadata, SerFunction<? super DataPoint,? extends Lineage> lineageOperator, SerFunction<? super DataPoint,? extends Map<? extends DataStructureComponent<?,?,?>,? extends ScalarValue<?,?,?,?>>> operator)
-
filteredMappedJoin
DataSet filteredMappedJoin(DataSetMetadata metadata, DataSet indexed, SerBiPredicate<DataPoint,DataPoint> filter, SerBinaryOperator<DataPoint> merge)
Creates a new DataSet by joining each DataPoint of this DataSet to all indexed DataPoints of another DataSet by matching the common identifiers.- Parameters:
metadata
- Thestructure
the new DataSet must conform to.indexed
- another DataSet that will be indexed and joined to each DataPoint of this DataSet.filter
- aBiPredicate
used to select only a subset of the joinedDataPoint
s.merge
- aBinaryOperator
that merges two selected joined DataPoints together into one.- Returns:
- The new DataSet.
-
isIndexed
default boolean isIndexed(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys)
Checks if this DataSet is indexed.- Parameters:
keys
- An hint to the implementation about the keys over which an index would be eventually requested- Returns:
- true if this DataSet is indexed.
-
mappedJoin
default DataSet mappedJoin(DataSetMetadata metadata, DataSet indexed, SerBinaryOperator<DataPoint> merge)
Creates a new DataSet by joining each DataPoint of this DataSet to all indexed DataPoints of another DataSet by matching the common identifiers. The same asfilteredMappedJoin(metadata, other, (a, b) -> true, merge)
.- Parameters:
metadata
- Thestructure
the new DataSet must conform to.indexed
- another DataSet that will be indexed and joined to each DataPoint of this DataSet.merge
- aBinaryOperator
that merges two selected joined DataPoints together into one.- Returns:
- The new DataSet.
-
streamByKeys
<A,T,TT> Stream<T> streamByKeys(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> filter, SerCollector<DataPoint,A,TT> groupCollector, SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,T> finisher)
Groups all the datapoints of this DataSet having the same values for the specified identifiers, and performs a mutable reduction over each of a chosen subset of the groups, and applying a final transformation.- Type Parameters:
T
- the type of the result of the computation.- Parameters:
keys
- theComponentRole.Identifier
s used to group the datapointsfilter
- aMap
ofComponentRole.Identifier
's values used to exclude matching groupsgroupCollector
- aCollector
applied to each group to produce the resultfinisher
- aBiFunction
to apply to the group key and result to produce the final result- Returns:
- a
Stream
of<T>
objects containing the result of the computation for each group.
-
streamByKeys
default <A,T,TT> Stream<T> streamByKeys(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, SerCollector<DataPoint,A,TT> groupCollector, SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,T> finisher)
Groups all the datapoints of this DataSet having the same values for the specified identifiers, and performs a mutable reduction over each of the groups, and applying a final transformation. The same asstreamByKeys(Set, Map, SerCollector, SerBiFunction)
with an empty filter.- Type Parameters:
T
- the type of the result of the computation.- Parameters:
keys
- theComponentRole.Identifier
s used to group the datapointsgroupCollector
- aCollector
applied to each group to produce the resultfinisher
- aBiFunction
to apply to the group key and result to produce the final result- Returns:
- a
Stream
of<T>
objects containing the result of the computation for each group.
-
streamByKeys
default <A,T> Stream<T> streamByKeys(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>> filter, SerCollector<DataPoint,A,T> groupCollector)
Groups all the datapoints of this DataSet having the same values for the specified identifiers, and performs a mutable reduction over each of a chosen subset of the groups. The same asstreamByKeys(Set, Map, SerCollector, SerBiFunction)
with an identity finisher.- Type Parameters:
T
- the type of the result of the computation.- Parameters:
keys
- theComponentRole.Identifier
s used to group the datapointsfilter
- aMap
ofComponentRole.Identifier
's values used to exclude matching groupsgroupCollector
- aCollector
applied to each group to produce the result- Returns:
- a
Stream
of<T>
objects containing the result of the computation for each group.
-
streamByKeys
default <T> Stream<T> streamByKeys(Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, SerCollector<DataPoint,?,T> groupCollector)
Groups all the datapoints of this DataSet having the same values for the specified identifiers, and performs a mutable reduction over each of the groups. The same asstreamByKeys(Set, Map, SerCollector, SerBiFunction)
with an empty filter and an identity finisher.- Type Parameters:
T
- the type of the result of the computation.- Parameters:
keys
- theComponentRole.Identifier
s used to group the datapointsgroupCollector
- aCollector
applied to each group to produce the result- Returns:
- a
Stream
of<T>
objects containing the result of the computation for each group.
-
aggr
<TT> DataSet aggr(DataSetMetadata structure, Set<DataStructureComponent<ComponentRole.Identifier,?,?>> keys, SerCollector<DataPoint,?,TT> groupCollector, SerBiFunction<TT,Map<DataStructureComponent<ComponentRole.Identifier,?,?>,ScalarValue<?,?,?,?>>,DataPoint> finisher)
Perform a reduction over a dataset, producing a result for each group defined common values of the specified identifiers- Type Parameters:
TT
- The type of the result of the aggregation- Parameters:
structure
- the metadata of the structure producedkeys
- the identifiers on whose values datapoints should be groupedgroupCollector
- the aggregator that performs the reductionfinisher
- a finisher that may manipulate the result given the group where it belongs- Returns:
- a new dataset where each datapoint is the result of the aggregation of a group.
-
analytic
<TT> DataSet analytic(Map<DataStructureComponent<ComponentRole.Measure,?,?>,DataStructureComponent<ComponentRole.Measure,?,?>> components, WindowClause clause, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerCollector<ScalarValue<?,?,?,?>,?,TT>> collectors, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerBiFunction<TT,ScalarValue<?,?,?,?>,ScalarValue<?,?,?,?>>> finishers)
-
analytic
default <TT> DataSet analytic(Set<DataStructureComponent<ComponentRole.Measure,?,?>> components, WindowClause clause, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerCollector<ScalarValue<?,?,?,?>,?,TT>> collectors, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerBiFunction<TT,ScalarValue<?,?,?,?>,ScalarValue<?,?,?,?>>> finishers)
-
analytic
default DataSet analytic(Set<DataStructureComponent<ComponentRole.Measure,?,?>> components, WindowClause clause, Map<DataStructureComponent<ComponentRole.Measure,?,?>,SerCollector<ScalarValue<?,?,?,?>,?,ScalarValue<?,?,?,?>>> collectors)
-
getComponent
default <S extends ValueDomainSubset<S,D>,D extends ValueDomain> Optional<DataStructureComponent<?,S,D>> getComponent(String name, S domain)
Obtains a component with given name, and checks that it belongs to the specified domain.- Parameters:
name
- The requested component's name.domain
- A non-null instance of a domain.- Returns:
- An
Optional
containing the requested component if it exists. - Throws:
NullPointerException
- if domain is null.
-
getComponent
default <R extends ComponentRole> Optional<DataStructureComponent<R,?,?>> getComponent(String name, Class<R> role)
Obtains a component with given name, and checks that it belongs to the specified domain.- Parameters:
name
- The requested component's name.role
- The role of component desired (Measure, Identifier, Attribute).- Returns:
- The requested component, or null if no one was found.
- Throws:
NullPointerException
- if domain is null.
-
getComponent
default <R extends ComponentRole,S extends ValueDomainSubset<S,D>,D extends ValueDomain> Optional<DataStructureComponent<R,S,D>> getComponent(String name, Class<R> role, S domain)
Obtains a component with given name if it has the specified role, and checks that it belongs to the specified domain.- Parameters:
name
- The requested component's name.role
- The role of component desired (Measure, Identifier, Attribute).domain
- A non-null instance of a domain.- Returns:
- The requested component, or null if no one was found.
- Throws:
NullPointerException
- if domain is null.
-
size
default long size()
NOTE: The default implementation traverses this DataSet entirely.- Returns:
- The size of this DataSet.
-
getComponents
default <R extends ComponentRole> Set<DataStructureComponent<R,?,?>> getComponents(Class<R> typeOfComponent)
- See Also:
DataSetMetadata.getComponents(Class)
-
getComponents
default <R extends ComponentRole,S extends ValueDomainSubset<S,D>,D extends ValueDomain> Set<DataStructureComponent<R,S,D>> getComponents(Class<R> role, S domain)
-
contains
default boolean contains(DataPoint datapoint)
Checks if a DataPoint is contained in this DataSet. NOTE: The default implementation performs a linear search, potentially traversing this DataSet entirely.
-
notContains
default boolean notContains(DataPoint datapoint)
Checks if a DataPoint is not contained in this DataSet. NOTE: The default implementation performs a linear search, potentially traversing this DataSet entirely.
-
isCacheable
default boolean isCacheable()
- Returns:
- true if this DataSet can be cached
-
-