SecondaryIndexIntegrator (sdn-apidoc 2.5.14 API)

java.lang.Object
- com.hp.util.persistence.cassandra.index.SecondaryIndexIntegrator

```
public final class SecondaryIndexIntegrator
extends Object
```
Workaround to support filter combinations in Cassandra similar to the relational model. For example: Assume we have a Person entity with the following attributes: name, lastName, birthdate, gender and status.
In a relational system it would be relatively simple to create queries involving any combination of filters applied to the Person's attributes:
Select * from Person Where name='SomeName' AND lastName='someLastName'
Select * from Person Where birthdate < 'someDate' AND gender='FEMALE' AND status in ['SINGLE', 'DIVORCED']
In Cassandra this is not possible because one column family for each expected type of query has to be created. Thus, in order to support any combination of the Person's attributes at least 32 column families would have to be created (Using natural order, involving sorting increases the number of column families).
In general the number of column families needed to support any combination of filtering and sorting is given by:
column families = 2^f * s!
where
- f is the number of filter-able attributes
- s is the number of sort-able attributes combinations.
Using SecondaryIndexIntegrator the number of column families is given by:
column families = (f * s!) + w
where
- f is the number of filter-able attributes
- s is the number of sort-able attributes. Note that if not all sorting combinations are needed then this number would be smaller.
- w is the number of optimized well-known queries that involve combination of attributes.
Some key points about SecondaryIndexIntegrator
- A column family (native secondary index or custom secondary index) should be created for each well-known query to optimize for such query. This should be the default data modeling approach.
- In order to support filter combinations, one column family (secondary index) should be created for each attribute and then use SecondaryIndexIntegrator to combine results.
- Sorting is supported by SecondaryIndexIntegrator. All integrated indexes are forced to have the same column name, thus sorting is part of the integration: It isn't possible to integrate indexes using different sorting criteria.
- SecondaryIndexIntegrator is a workaround and should be used just to support legacy code. If you have to support any combination of queries then you have to ask yourself if Cassandra is the right technology to use. In Cassandra is expected to satisfy a query with a single read (Reading a single row). SecondaryIndexIntegrator performs multiple reads to combine filters and thus will access multiple nodes. Seconday index combination will be extremely low compared to an optimized secondary index query.
Data Model Summary
f
In relational systems entities and its relationships are modeled and then indexes are created to support whatever queries become necessary. In a relational database, data is stored in tables and the tables comprising an application are typically related to each other. Data is usually normalized to reduce redundant entries, and tables are joined on common keys to satisfy a given query.
Cassandra does not enforce relationships between column families the way relational databases do between tables: there are no formal foreign keys in Cassandra, and joining column families at query time is not supported.
With Cassandra it is primary to think about what queries the system needs to support efficiently ahead of time, and model appropriately. Since there are no automatically-provided indexes, the application will be much closer to one Column Family per query than it'd be with tables-queries relationally. There shouldn't be concerns with this denormalization; Cassandra is much faster at writesthan relational systems, without giving up speed on reads.
In Cassandra, denormalization is the norm. A standard and very efficient way of working with the Cassandra data model is to create one column family for each expected type of query. With this approach, data is denormalized and structured so that one or multiple rows in a single column family are used to answer each query.
Unlike the fully-relational model, where data is normalized for storage in the database and then joined during queries, Cassandra is at its best when there is approximately one column family per expected type of query. This sacrifices disk space (one of the cheapest resources for a server) in order to reduce the number of disk seeks and the amount of network traffic.
The Cassandra data model is a dynamic schema, column-oriented data model. This means that, unlike a relational database, there isn't need to model all of the columns required by the application up front, as each row is not required to have the same set of columns. Columns and their metadata can be added by the application as they are needed without incurring downtime to the application. Planning a data model in Cassandra has different design considerations than one may be used to from relational databases. Ultimately, the data model design depends on the data to capture and how such data is accessed. However, there are some common design considerations for Cassandra data model planning.
Author:

Fabiel Zuniga

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static interface SecondaryIndexIntegrator.SecondaryIndexReader<C extends Serializable & Comparable<C>>
Secondary index reader.

Nested Classes
Modifier and Type	Class and Description
`static interface`	`SecondaryIndexIntegrator.SecondaryIndexReader<C extends Serializable & Comparable<C>>` Secondary index reader.

Method Summary

Methods
Modifier and Type	Method and Description
`static <C extends Serializable & Comparable<C>> List<ColumnName<C,?>>`	`intersect(Collection<SecondaryIndexIntegrator.SecondaryIndexReader<C>> indexes)` Combines the indexes using `AND` operation.
`static <C extends Serializable & Comparable<C>> List<ColumnName<C,?>>`	`merge(Collection<SecondaryIndexIntegrator.SecondaryIndexReader<C>> indexes)` Combines the indexes using `OR` operation.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - intersect
```
public static <C extends Serializable & Comparable<C>> List<ColumnName<C,?>> intersect(Collection<SecondaryIndexIntegrator.SecondaryIndexReader<C>> indexes)
```
    Combines the indexes using AND operation.
    Denormalized data is not part of the result because different indexes might use different denormalization.
    
    Type Parameters:
    C - type of the column name in the secondary index column family (row key in the main column family or composite value when sorting information is included). Note how all indexes must have the same column name, thus sorting is part of the integration: It isn't possible to integrate indexes using different sorting criteria.
    Parameters:
    indexes - index readers to combine.
    
    Returns:
    resultant integration of the given indexes.
  - merge
```
public static <C extends Serializable & Comparable<C>> List<ColumnName<C,?>> merge(Collection<SecondaryIndexIntegrator.SecondaryIndexReader<C>> indexes)
```
    Combines the indexes using OR operation.
    Denormalized data is not part of the result because different indexes might use different denormalization.
    
    Type Parameters:
    C - type of the column name in the secondary index column family (row key in the main column family or composite value when sorting information is included). Note how all indexes must have the same column name, thus sorting is part of the integration: It isn't possible to integrate indexes using different sorting criteria.
    Parameters:
    indexes - index readers to combine.
    
    Returns:
    resultant integration of the given indexes.

Class SecondaryIndexIntegrator

Data Model Summary

Nested Class Summary

Method Summary

Methods inherited from class java.lang.Object

Method Detail

intersect

merge