public final class SecondaryIndexIntegrator extends Object
Person
entity with the following attributes: name
,
lastName
, birthdate
, gender
and status
.
In a relational system it would be relatively simple to create queries involving any combination
of filters applied to the Person
's attributes:
Select * from Person Where name='SomeName' AND lastName='someLastName'
Select * from Person Where birthdate < 'someDate' AND gender='FEMALE' AND status in ['SINGLE', 'DIVORCED']
In Cassandra this is not possible because one column family for each expected type of query has to
be created. Thus, in order to support any combination of the Person
's attributes at least
32 column families would have to be created (Using natural order, involving sorting increases the
number of column families).
In general the number of column families needed to support any combination of filtering and sorting is given by:
column families = 2f * s!
where
f
is the number of filter-able attributess
is the number of sort-able attributes combinations.
Using SecondaryIndexIntegrator
the number of column families is given by:
column families = (f * s!) + w
where
f
is the number of filter-able attributess
is the number of sort-able attributes. Note that if not all sorting combinations are needed
then this number would be smaller.
w
is the number of optimized well-known queries that involve combination of attributes.
Some key points about SecondaryIndexIntegrator
SecondaryIndexIntegrator
to combine results.
SecondaryIndexIntegrator
. All integrated indexes are forced to have
the same column name, thus sorting is part of the integration: It isn't possible to integrate indexes
using different sorting criteria.
SecondaryIndexIntegrator
is a workaround and should be used just to support legacy code.
If you have to support any combination of queries then you have to ask yourself if Cassandra is the
right technology to use. In Cassandra is expected to satisfy a query with a single read (Reading a
single row). SecondaryIndexIntegrator
performs multiple reads to combine filters and thus
will access multiple nodes. Seconday index combination will be extremely low compared to an optimized
secondary index query.
In relational systems entities and its relationships are modeled and then indexes are created to support whatever queries become necessary. In a relational database, data is stored in tables and the tables comprising an application are typically related to each other. Data is usually normalized to reduce redundant entries, and tables are joined on common keys to satisfy a given query.
Cassandra does not enforce relationships between column families the way relational databases do between tables: there are no formal foreign keys in Cassandra, and joining column families at query time is not supported.
With Cassandra it is primary to think about what queries the system needs to support efficiently ahead of time, and model appropriately. Since there are no automatically-provided indexes, the application will be much closer to one Column Family per query than it'd be with tables-queries relationally. There shouldn't be concerns with this denormalization; Cassandra is much faster at writesthan relational systems, without giving up speed on reads.
In Cassandra, denormalization is the norm. A standard and very efficient way of working with the Cassandra data model is to create one column family for each expected type of query. With this approach, data is denormalized and structured so that one or multiple rows in a single column family are used to answer each query.
Unlike the fully-relational model, where data is normalized for storage in the database and then joined during queries, Cassandra is at its best when there is approximately one column family per expected type of query. This sacrifices disk space (one of the cheapest resources for a server) in order to reduce the number of disk seeks and the amount of network traffic.
The Cassandra data model is a dynamic schema, column-oriented data model. This means that, unlike a relational database, there isn't need to model all of the columns required by the application up front, as each row is not required to have the same set of columns. Columns and their metadata can be added by the application as they are needed without incurring downtime to the application. Planning a data model in Cassandra has different design considerations than one may be used to from relational databases. Ultimately, the data model design depends on the data to capture and how such data is accessed. However, there are some common design considerations for Cassandra data model planning.
Modifier and Type | Class and Description |
---|---|
static interface |
SecondaryIndexIntegrator.SecondaryIndexReader<C extends Serializable & Comparable<C>>
Secondary index reader.
|
Modifier and Type | Method and Description |
---|---|
static <C extends Serializable & Comparable<C>> |
intersect(Collection<SecondaryIndexIntegrator.SecondaryIndexReader<C>> indexes)
Combines the indexes using
AND operation. |
static <C extends Serializable & Comparable<C>> |
merge(Collection<SecondaryIndexIntegrator.SecondaryIndexReader<C>> indexes)
Combines the indexes using
OR operation. |
public static <C extends Serializable & Comparable<C>> List<ColumnName<C,?>> intersect(Collection<SecondaryIndexIntegrator.SecondaryIndexReader<C>> indexes)
AND
operation.
Denormalized data is not part of the result because different indexes might use different denormalization.
C
- type of the column name in the secondary index column family
(row key in the main column family or composite value when
sorting information is included). Note how all indexes must have
the same column name, thus sorting is part of the integration:
It isn't possible to integrate indexes using different sorting
criteria.indexes
- index readers to combine.public static <C extends Serializable & Comparable<C>> List<ColumnName<C,?>> merge(Collection<SecondaryIndexIntegrator.SecondaryIndexReader<C>> indexes)
OR
operation.
Denormalized data is not part of the result because different indexes might use different denormalization.
C
- type of the column name in the secondary index column family (row key in the main column
family or composite value when sorting information is included). Note how all indexes must
have the same column name, thus sorting is part of the integration: It isn't possible to
integrate indexes using different sorting criteria.indexes
- index readers to combine.Copyright © 2015. All Rights Reserved.