This patch ensures that when the schema is dense, regardless of
compact_storage being set, the single regular columns is translated
into a compact column.
This fixes an issue where Thrift dynamic column families are
translated to a dense schema with a regular column, instead of a
compact one.
Since a compact column is also a regular column (e.g., for purposes of
querying), no further changes are required.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1470062410-1414-1-git-send-email-duarte@scylladb.com>
(cherry picked from commit 5995aebf39)
Fixes#1535.
This patch adds the is_dynamic() function to thrift_schema, which
tells whether the underlying column family is dynamic or not,
according to thrift rules.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
is_atomic() is called for each cell in mutation applies, compaction
and query. Since the value doesn't change it can be easily cached which
would save one indirection and virtual call.
Results of perf_simple_query -c1 (median, duration 60):
before after
read 54611.49 55396.01 +1.44%
write 65378.92 68554.25 +4.86%
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1465991045-11140-1-git-send-email-pdziepak@scylladb.com>
The correct format of collection information in comparator is:
o.a.c.db.m.ColumnToCollection(<name1>:<type1>, <name2>:<type2>, ...)
not:
o.a.c.db.m.ColumnToCollection(<name1>:<type1>),
o.a.c.db.m.ColumnToCollection(<name2>:<type2>) ...
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Cassandra disallows adding a column with the same name as a collection
that existed in the past in that table if the types aren't compatible.
To enforce that Scylla needs to keep track of all collections that ever
existed in the column family.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
If the schema_builder is constructed from an existing schema we need to
make sure that the original column ids of regular and static columns are
*not* used since they may become invalid if columns are added or
removed.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
When a column is dropped its name and deletion timestamp are added
to schema::_raw._dropped_columns to prevent data resurrection in case a
column with the same name is added. To reduce the number of lookups in
_dropped_columns this patch makes each instance of column_definition
to caches this information (i.e. timestamp of the latest removal of a
column with the same name).
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Knowing which columns were dropped (and when) is important to prevent
the data from the dropped ones reappearing if a new column is added with
the same name.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
We need to track which schema version were synced with on current node
to avoid triggering the sync on every mutation. We need to sync before
mutating to be able to apply the incoming mutation using current
node's schema, possibly applying irreverdible transformations to it to
make it conform.
Right now in some places we use column_id, and in some places
size_t. Solve it by using column_count_type whose meaning is "an
integer sufficiently large for indexing columns". Note that we cannot
use column_id because it has more meaning to it than that.
The version needs to change value not only on structural changes but
also temporal. This is needed for nodes to detect if the version they
see was already synchronized with or not even if it has the same
structure as the past versions. We also need to end up with the same
version on all nodes when schema changes are commuted.
For regular mutable schemas version will be calculated from underlying
mutations when schema is announced. For static schemas of system
keyspace it is calculated by hashing scylla version and column id,
because we don't have mutations at the time of building the schema.
For static and regular (row) columns it is very convenient in some
cases to utilize the fact that columns ordered by ids are also ordered
by name. It currently holds, so make schema export this guarantee and
enable consumers to rely on.
The static schema::row_column_ids_are_ordered_by_name field is about
allowing code external to schema to make it very explicit (via
static_assert) that it relies on this guarantee, and be easily
discoverable in case we would have to relax this.
Instead of accepting a column resolver callable, accept a schema and
column_kind or column_selector. Makes the interface easier to use and
enables us to move implementation to .cc file.
We are currently using the ColumnToCollectionType wrongly: we are wrapping
by that string to every collection. But that is not how Origin operates: a single
ColumnToCollectionType hosts all collections a schema has.
Funny enough, sstable2json seems to work all right without any comparator - and
that is how it worked before, but when a comparator is present, it expects it to
abide by what Origin expects. That causes us to crash.
Fixes#148
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
This is the biggest change from 2.2: for the 2.1 series, the default type is
always stored in the comparator for compound types.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
They should be set. As a result, those columns will have the index "null"
at the schema_columns table.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
We will invoke the schema builder from schema_tables.cc, and at that point, the
information about compact storage no longer exists anywhere. If we just call it
like this, it will be the same as calling it with compact_storage::no, which
will trigger a (wrong) recomputation for compact_storage::yes CFs
The best way to solve that, is make the compact_storage parameter mandatory
every time we create a new table - instead of defaulting to no. This will
ensure that the correct dense and compound calculation are always done when
calling the builder with a parameter, and not done at all when we call it
without a parameter.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
If we alter the compound property, we also have to rebuild the schema,
since some aspects of the columns depend on it. Let's just go ahead and
always rebuild the schema.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
We will use those properties during initialization - for instance, to calculate
thrift_bits.is_on_all_components. In order to do that, it has to be available at
schema creation, and not through the schema builder.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
"This is my current proposal for Compact Storage tables - plus
the needed infrastructure.
Getting rid of the CellName abstraction allows us to simplify
things by quite a lot: now all we need is to mark whether or
not a table is composite, and provide functions to play the
role of the comparator when dealing with the strings."
This is how Java does. But in C++, "throw new", although valid, would require
the catcher to catch a pointer to the exception - which isn't really what we
do.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Origin has another column_kind, that we lack: compact_value. This kind is
used to identify regular columns of dense tables.
Take for instance, the following table:
CREATE TABLE ks2.compact (
ks text,
cl1 text,
cl2 text,
PRIMARY KEY (ks, cl1)
) WITH COMPACT STORAGE
cqlsh> select keyspace_name, columnfamily_name, column_name, type from system.schema_columns \
where keyspace_name='ks2' and columnfamily_name='compact';
keyspace_name | columnfamily_name | column_name | type
---------------+-------------------+-------------+----------------
ks2 | compact | cl1 | clustering_key
ks2 | compact | cl2 | compact_value
ks2 | compact | ks | partition_key
We will treat those columns as regular columns for most purposes. Because of
that, we don't need to separate them from the regular columns when we sort
initially, for instance. All we have to do is change its type.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
This is how it happens for Origin. Take for instance the following CF:
CREATE TABLE ks2.noregular_cs2 (
ks text,
cl1 text,
cl2 text,
PRIMARY KEY (ks, cl1, cl2)
) WITH COMPACT STORAGE;
cqlsh> select keyspace_name, columnfamily_name, column_name from system.schema_columns \
where keyspace_name='ks2' and columnfamily_name='noregular_cs2';
keyspace_name | columnfamily_name | column_name
---------------+-------------------+-------------
ks2 | noregular_cs2 | <===== added this.
ks2 | noregular_cs2 | cl1
ks2 | noregular_cs2 | cl2
ks2 | noregular_cs2 | ks
In order to achieve that, we need to relax the test in db/legacy_schema_tables.cc.
It will throw in case it finds an empty name.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
We are deviating a bit from Origin here: In Origin, we would store a full
comparator class. However, due to the fact that our types are very different,
and as a consequence we will not call a serializer directly on the cell name,
that is not necessary.
The only information that we will need to store is whether or not the table is
compound. Some functions to manipulate it will be presented in the next patch.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
We currently have code to calculate "is_dense" in the create statement handler.
That obviously don't work for the system schemas, which are not defined this
way.
Since all of our schemas now have to pass through the schema_builder one way or
another, that is the best place in which to do that calculation.
Note that unfortunately, that does not mean we can just get rid of
set_is_dense() in the schema builder: we still need to set it in some
situations, where for instance, we read that property in schema_columnfamilies,
and then apply to the relevant CF. Those uses are, however, all internal to
legacy_schema_tables.cc
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>