Unfortunately, we encode the "target" of a secondary index in one of
three ways:
1. It can be just a column name
2. It can be a string like keys(colname) - for the new type of
collection indexes introduced in this series.
3. It can be a JSON map ({ ... }). This form is used for local indexes.
The code parsing this target - target_parser::parse() - needs not to
confuse these different formats. Before this patch, if the column name
contains special characters like braces or parentheses (this is allowed
in CQL syntax, via quoting), we can confuse case 1, 2, and 3: A column
named "keys(colname)" will be confused for case 2, and a column named
"{123}" will be confused with case 3.
This problem can break indexing of some specially-crafted column names -
as reproduced by test_secondary_index.py::test_index_quoted_names.
The solution adopted in this patch is that the column name in case 1
should be escaped somehow so it cannot be possibly confused with either
cases 2 and 3. The way we chose is to convert the column name to CQL (with
column_definition::as_cql_name()). In other words, if the column name
contains non-alphanumeric characters, it is wrapped in quotes and also
quotes are doubled, as in CQL. The result of this can't be confused
with case 2 or 3, neither of which may begin with a quote.
This escaping is not the minimal we could have done, but incidentally it
is exactly what Cassandra does as well, so I used it as well.
This change is *mostly* backward compatible: Already-existing indexes will
still have unescaped column names stored for their "target" string,
and the unescaping code will see they are not wrapped in quotes, and
not change them. Backward compatibility will only fail on existing indexes
on columns whose name begin and end in the quote characters - but this
case is extremely unlikely.
This patch illustrates how un-ideal our index "target" encoding is,
but isn't what made it un-ideal. We should not have used three different
formats for the index target - the third representation (JSON) should
have sufficed. However, two two other representations are identical
to Cassandra's, so using them when we can has its compatibility
advantages.
The patch makes test_secondary_index.py::test_index_quoted_names pass.
Fixes#10707.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Before this patch, trying to create an index on entries(x) where x is
not a map results in an error message:
Cannot create index on index_keys_and_values of column x
The string "index_keys_and_values" is strange - Cassandra prints the
easier to understand string "entries()" - which better corresponds to
what the user actually did.
It turns out that this string "index_keys_and_values" comes from an
elaborate set of variables and functions spanning multiple source files,
used to convert our internal target_type variable into such a string.
But although this code was called "index_option" and sounded very
important, it was actually used just for one thing - error messages!
So in this patch we drop the entire "index_option" abstraction,
replacing it by a static trivial function defined exactly where
it's used (create_index_statement.cc), which prints a target type.
While at it, we print "entries()" instead of "index_keys_and_values" ;-)
After this patch, the
test_secondary_index.py::test_index_collection_wrong_type
finally passes (the previous patch fixed the default table names it
assumes, and this patch fixes the expected error messages), so its
"xfail" tag is removed.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Allow CQL like this:
CREATE INDEX idx ON table(some_map);
CREATE INDEX idx ON table(KEYS(some_map));
CREATE INDEX idx ON table(VALUES(some_map));
CREATE INDEX idx ON table(ENTRIES(some_map));
CREATE INDEX idx ON table(some_set);
CREATE INDEX idx ON table(VALUES(some_set));
CREATE INDEX idx ON table(some_list);
CREATE INDEX idx ON table(VALUES(some_list));
This is needed to support creating indexes on collections.
Brings support of cql syntax `INDEX ON table(VALUES(collection))`, even
though there is still no support for indexes over collections.
Previously, index_target::target_type::values was refering to values of
a regular (non-collection) column. Rename it to `regular_values`.
Fixes#8745.
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.
Closes#10562
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
The rjson::set() *sounds* like it can set any member of a JSON object
(i.e., map), but that's not true :-( It calls the RapidJson function
AddMember() so it can only add a member to an object which doesn't have
a member with the same name (i.e., key). If it is called with a key
that already has a value, the result may have two values for the same
key, which is ill-formed and can cause bugs like issue #9542.
So in this patch we begin by renaming rjson::set() and its variant to
rjson::add() - to suggest to its user that this function only adds
members, without checking if they already exist.
After this rename, I was left with dozens of calls to the set() functions
that need to changed to either add() - if we're sure that the object
cannot already have a member with the same name - or to replace() if
it might.
The vast majority of the set() calls were starting with an empty item
and adding members with fixed (string constant) names, so these can
be trivially changed to add().
It turns out that *all* other set() calls - except the one fixed in
issue #9542 - can also use add() because there are various "excuses"
why we know the member names will be unique. A typical example is
a map with column-name keys, where we know that the column names
are unique. I added comments in front of such non-obvious uses of
add() which are safe.
Almost all uses of rjson except a handful are in Alternator, so I
verified that all Alternator test cases continue to pass after this
patch.
Fixes#9583
Refs #9542
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211104152540.48900-1-nyh@scylladb.com>
Eliminate not used includes and replace some more includes
with forward declarations where appropriate.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
In order to eventually switch to a single JSON library,
most of the libjsoncpp usage is dropped in favor of rjson.
Unfortunately, one usage still remains:
test/utils/test_repl utility heavily depends on the *exact textual*
format of its output JSON files, so replacing a library results
in all tests failing because of differences in formatting.
It is possible to force rjson to print its documents in the exact
matching format, but that's left for later, since the issue is not
critical. It would be nice though if our test suite compared
JSON documents with a real JSON parser, since there are more
differences - e.g. libjsoncpp keeps children of the object
sorted, while rapidjson uses an unordered data structure.
This change should cause no change in semantics, it strives
just to replace all usage of libjsoncpp with rjson.
When (re)creating a local index, the target string needs to be used
to parse out the actual indexed column:
"(base_pk_part1,base_pk_part2,base_pk_part3),actual_indexed_column".
This column is later used to deterine if an index should be applied
to a SELECT statement.
The db/index directory contains just a few lines of code that exists
there for historical reasons. It's confusing that we have both db/index
and index/ directory related to secondary-indexing.
This patch moves what little is still in db/index/ to index/. In the
future we should probably get rid of the "secondary_index" class we had
there, but for now, let's at least not have a whole new directory for it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180501101246.21143-1-nyh@scylladb.com>