Commit Graph

197 Commits

Author SHA1 Message Date
Piotr Sarna
5f2eadce09 alternator: wait for schema agreement after table creation
In order to be sure that all nodes acknowledged that a table was
created, the CreateTable request will now only return after
seeing that schema agreement was reached.
Rationale: alternator users check if the table was created by issuing
a DescribeTable request, and assume that the table was correctly
created if it returns nonempty results. However, our current
implementation of DescribeTable returns local results, which is
not enough to judge if all the other nodes acknowledge the new table.
CQL drivers are reported to always wait for schema agreement after
issuing DDL-changing requests, so there should be no harm in waiting
a little longer for alternator's CreateTable as well.

Fixes #6361
Tests: alternator(local)
2020-05-11 21:51:12 +03:00
Piotr Sarna
517f2c0490 alternator: unify error messages for existing tables/keyspaces
Since alternator is based on Scylla, two "already exists" error types
can appear when trying to create a table - that a table itself exists,
or that its keyspace does. That's however an implementation detail,
since alternator does not have a notion of keyspaces at all.
This patch unifies the error message to simply mention that a table
already exists, and comes with a more robust test case.
If the keyspace already exists, table creation will still be attempted.

Fixes #6340
Tests: alternator(local, remote)
2020-05-11 18:30:02 +03:00
Gleb Natapov
0fed86e4c6 lwt: change cas_request::apply signature
Change the way query result is passed from getting a reference to a
result to getting a foreign_ptr<lw_shared_ptr<query::result>>. This will
allow cas_request to keep it without copying.
2020-05-05 12:38:23 +03:00
Piotr Sarna
09e4f3b917 alternator: implement ScanIndexForward
The ScanIndexForward parameter is now fully implemented
and can accept ScanIndexForward=false in order to query
the partitions in reverse clustering order.
Note that reading partition slices in reverse order is less
efficient than forward scans and may put a strain on memory
usage, especially for large partitions, since the whole partition
is currently fetched in order to be reversed.

Fixes #5153
2020-04-28 11:44:46 +03:00
Pavel Solodovnikov
ed7a7554b8 storage_proxy: allow cas() to accept nullptr read_command
This patch allows users of storage_proxy::cas() to supply nullptr
as `query::read_command` which is supposed to skip the procedure
of reading the existing value.

The feature is used in alternator code for Read-Modify-Write
operations: some of them don't require reading previous item
values before updating.

Move `read_nothing_read_command` from alternator code to
storage_proxy layer and fabricate a new no-op command from it when
storage_proxy::cas() is used with nullptr read_command.

This allows to avoid sprinkling if-else branches all over the code
in order to check for null-equality of `cmd`.

We return from storage_proxy::query() very early with an empty
result in case we're given an empty partition_slice (which resides
inside the passed `read_command`) so this approach should be
perfectly fine.

Expand documentation for the `cas()` function to cover new
possible value for `cmd` argument.

Fixes: #6238
Tests: unit(dev, debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200428065235.5714-1-pa.solodovnikov@scylladb.com>
2020-04-28 10:44:19 +03:00
Piotr Sarna
e17c237feb alternator: fix integer overflow warning in token generation
When generating tokens for parallel scan, debug mode undefined behavior
sanitizer complained that integer overflow sometimes happens when
multiplying two big values - delta and segment number.
In order to mitigate this warning, the multiplication is now split
into two smaller ones, and the generated machine code remains
identical (verified on gcc and clang via compiler explorer).

Fixes #6280
Tests: unit(dev)
2020-04-26 19:06:07 +03:00
Nadav Har'El
1f75efb556 alternator: use RF=3 even if some nodes are temporarily down
Alternator is supposed to use RF=3 for new tables. Only when the cluster is
smaller than 3 nodes do we use RF=1 (and warn about it) - this is useful for
testing.

However, our implementation incorrectly tested the number of *live* nodes in
the cluster instead of the total number of nodes. As a result, if a 3-node
cluster had one node down, and a new table was created, it was created with
RF=1, and immediately could not be written because when RF=1, any node down
means part of the data is unavailable.

This patch fixes this: The total number of nodes in the cluster - not the
number of live nodes - is consulted. The three-node-cluster-with-a-dead-node
setup above creates the table with RF=3, and it can be written because two
living nodes out of three are enough when RF=3 and we do quorum writes and
reads.

We have a dtest to reproduce this bug (and its fix), and it's also easy to
reproduce manually by starting a 3-node cluster, killing one of the nodes,
and then running "pytests". Before this patch, the tests can create tables
but then fail to write to them. After this patch, the test succeed on the
same cluster with the dead node.

Fixes #6267

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200422182035.15106-2-nyh@scylladb.com>
2020-04-23 08:23:05 +02:00
Piotr Sarna
dbb9574aa2 alternator: allow parallel scan
Parallel scans can be performed by providing Segment and TotalSegments
attributes to Scan request, which can be used to split the work among
many workers.
This test makes the parallel scan test succeed, so the xfail is removed.

Fixes #5059
2020-04-22 11:06:15 +03:00
Piotr Sarna
53bbef1e6c alternator: add a way of accessing system tables from alternator
Scylla's system tables often provide interesting information for
clients. In order to be able to access this information without CQL,
a notion of virtual tables is introduced to alternator.
When a table named .scylla.alternator.KS_NAME.TABLE_NAME is accessed
with read-only operation - Query or Scan, Scylla's internal
KS_NAME.TABLE_NAME table will be queried instead. For instance,
if a user wants to read about system_auth.roles, the Scan request
should target the following table: ".scylla.alternator.system_auth.roles".

Fixes #6122
2020-04-09 09:41:30 +02:00
Piotr Sarna
09d09ddefb alternator: add fetching static columns if they exist
Until now, the list of static column ids was always empty for alternator
tables anyway, so the list wasn't fetched. However, with the virtual
interface of fetching Scylla internal tables, we need to list the ids
of selected static columns explicitly to avoid segfaults - since we
select the whole row, static columns included.
2020-04-09 09:41:30 +02:00
Piotr Sarna
123edfc10c alternator: fix failure on incorrect table name with no indexes
If a table name is not found, it may still exist as a local index,
but the check tried to fetch a local index name regardless if it was
present in the request, which was a nullptr dereference bug.

Fixes #6161
Tests: alternator-test(local, remote)
Message-Id: <428c21e94f6c9e450b1766943677613bd46cbc68.1586347130.git.sarna@scylladb.com>
2020-04-08 15:33:48 +03:00
Piotr Sarna
0a2d7addc0 alternator: use partition tombstone if there's no clustering key
As @tgrabiec helpfully pointed out, creating a row tombstone
for a table which does not have a clustering key in its schema
creates something that looks like an open-ended range tombstone.
That's problematic for KA/LA sstable formats, which are incapable
of writing such tombstones, so a workaround is provided
in order to allow using KA/LA in alternator.

Fixes #6035
2020-04-08 08:08:45 +02:00
Rafael Ávila de Espíndola
c5795e8199 everywhere: Replace engine().cpu_id() with this_shard_id()
This is a bit simpler and might allow removing a few includes of
reactor.hh.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200326194656.74041-1-espindola@scylladb.com>
2020-03-27 11:40:03 +03:00
Rafael Ávila de Espíndola
eca0ac5772 everywhere: Update for deprecated apply functions
Now apply is only for tuples, for varargs use invoke.

This depends on the seastar changes adding invoke.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200324163809.93648-1-espindola@scylladb.com>
2020-03-25 08:49:53 +02:00
Piotr Sarna
781fbe8070 alternator: add service permit to callbacks
As a first step towards introducing admission control, the API
of alternator callbacks is extended with an additional 'permit'
parameter.
2020-03-16 07:44:25 +01:00
Nadav Har'El
77444a38a1 alternator: allow consistent reads on LSI - but not on GSI
Recently, Materialized Views were modified (see issue #4365) so that local
view updates (when both base and view replicas are the same node) are
synchronous. In particular, when the view's partition key is the same as
the base table's, view writes are synchronous: A write now only returns
after CL copies of the view data have been written.

Alternator's LSI have exactly this case (same partition key as the base).
This makes strongly-consistent (CL=LOCAL_QUORUM) reads in Alternator work
correctly, so we update the documentation accordingly to no longer say
that we don't support this DynamoDB feature.

However unlike LSIs, for GSIs strongly-consistent reads are still not
supported, and should not be supported (they are also not supported by
DynamoDB). Such reads should generate an error. So this patch fixes this
too. A GSI test which tested that strongly consistent reads are forbidden,
which used to xfail, now passes so the patch removes the "xfail".

Finally, we can simplify the LSI tests by using consistent reads instead of
eventually-consistent reads with retries. Beyond simplifying the test, it's
also an opportunity to *use* strongly-consistent reads and make sure that
they work (while, as mentioned above, similar reads for GSIs are refused).

Fixes #5007

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200311170446.28611-1-nyh@scylladb.com>
2020-03-12 09:18:00 +01:00
Nadav Har'El
96ca5ac2c8 alternator: use separate smp_service_group for bouncing requests
Until this patch, we used the default_smp_service_group() when bouncing
Alternator requests between shards (which is needed for LWT).

This patch creates a new smp_service_group for this purpose, which is
limited to 5000 concurrent requests (the same limit used for CQL's
bounce_request_smp_service_group). The purpose of this limit is to avoid
many shards admitting a huge number of requests and bouncing all of them
to the same shard who now can't "unadmit" these requests.

Fixes #5664.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200304170825.27226-1-nyh@scylladb.com>
2020-03-05 10:17:51 +01:00
Nadav Har'El
91d9632909 alternator: add rjson::remove_member() convenience function
This patch adds a rjson::remove_member() wrapper to the RemoveMember
method, which takes a std::string_view. But beyond the convenience, this
actually works around a subtle bug in RemoveMember where, if given a
StringRef parameter, ignores its length (see upstream issue
https://github.com/Tencent/rapidjson/issues/1649).

In the one place we used RemoveMember, it forced us to copy the string
because it wasn't null-terminated. The solution proposed here involves
wrapping the string view in a GenericValue - which no longer needs to copy
the string, but still works around the bug.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200303143524.28300-1-nyh@scylladb.com>
2020-03-03 16:35:41 +01:00
Nadav Har'El
0fcb226412 alternator: switch rjson::find() to use std::string_view
Our rjson::find() convenience function used RapidJson's "StringRef" type,
which is almost exactly like std::string_view. If we switch to use
string_view as we do in this patch, a lot of call sites become much simpler.

Moreover, there was an even more important motivation for this patch:
the RapidJson FindMember() function we used in rjson::find() has a bug when
given a StringRef - although a StringRef contains a length, the FindMember()
code ignores it and expects the string to be null-terminated (see:
https://github.com/Tencent/rapidjson/issues/1649). In this patch, we wrap
the pointer and length of a std::string_view in an rjson::value, a code path
which bypasses the FindMember bug, and yet does not require copying the
string.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200303141814.26929-1-nyh@scylladb.com>
2020-03-03 16:35:41 +01:00
Piotr Sarna
6f8c70d54b alternator: fix returning raw JSON errors
A couple of places in executor code leaked raw JSON errors to the user
instead of formulating a proper ValidationException message.
These places are now fixed, and the next patch in this series will
act as a regression checker, since all JSON errors will be returned
as SerializationException, not ValidationException instances.
2020-02-28 07:57:12 +02:00
Piotr Sarna
2402955d45 alternator: move parsing in front of executor
Parsing a request string into JSON happens as a first thing
in every request, so it can be performed before calling
any executor callbacks. The most important thing however,
is that making parsing a separate stage allows certain optimizations,
e.g. running all parsing in a single seastar thread, which allows
adding yields to rjson parsing later.
2020-02-28 07:57:12 +02:00
Nadav Har'El
0ab6c7fcef alternator: stricter checks for user-supplied attribute values
Until now, PutItem or UpdateItem could be used to insert almost any JSON
as an attribute's value - even those that do not match DynamoDB's typed
value specification.

Among other things, the new validation allows us to reject empty sets,
strings or byte arrays - which are (somewhat artificially) forbidden in
DynamoDB.

Also added tests for the empty sets, strings and byte arrays that should
be rejected.

Fixes #5896

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200225150525.4926-1-nyh@scylladb.com>
2020-02-26 08:12:26 +01:00
Nadav Har'El
6339f419ac alternator: removing all elements from a set should delete it
DynamoDB does not support empty sets. Operations which remove elements
from a set attribute should remove the attribute when the last item is
removed - not leave an empty set as it incorrectly does now.

Incidentally, the same patch fixes another bug - deleting elements from
a non-existent set attribute should be allowed (and do nothing), not fail
as it does now.

This patch also includes tests for both bugs.

Fixes #5895

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200225125343.31629-1-nyh@scylladb.com>
2020-02-26 08:12:19 +01:00
Nadav Har'El
e075eff915 alternator: complete implementation of ReturnValues parameter
This patch completes the support for the ReturnValues parameter for
the UpdateItem operation. This parameter has five settings - NONE, ALL_OLD,
ALL_NEW, UPDATED_OLD and UPDATED_NEW. Before this patch we already
supported NONE and ALL_OLD - and this patch completes the support for the
three remaining modes: ALL_NEW, UPDATED_OLD and UPDATED_NEW.

The patch also continues to improve test_returnvalues.py with additional
corner cases discovered during the development. After this patch, only
one xfailing test remains - testing updates to nested document paths,
which we do not yet support (even without the ReturnValues parameter).

After this patch, the support of ReturnValues is complete - for all
operations (UpdateItem, PutItem and DeleteItem) and all of its possible
settings.

Fixes #5053

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200221224221.31237-5-nyh@scylladb.com>
2020-02-24 10:40:53 +01:00
Nadav Har'El
1e500a2a34 alternator: rjson: another variant of set_with_string_name() utility
The rjson::set_with_string_name() utility function copies the given
string into the JSON key. The existing implementation required that this
input string be an std::string&, but a std::string_view would be fine too,
and I want to use it in new code to avoid yet another unnecessary copy.

Adding the overloads also exposes a few places where things were
implicitly converted to std::string and now cause an ambiguity - and
clearing up this ambiguity also allowed me to find places where this
conversion was unnecessary.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200221224221.31237-4-nyh@scylladb.com>
2020-02-24 10:38:54 +01:00
Nadav Har'El
fa5c2a4f58 alternator: UpdateItem only deleting attribute shouldn't create item
UpdateItem operations usually need to add a row marker:

 * An empty UpdateItem is supposed to create a new empty item (row).
   Such an empty item needs to have a row marker.

 * An UpdateItem to add an attribute x and then later an UpdateItem
   to remove this attribute x should leave an empty item behind.
   This means the first UpdateItem needed to add a row marker, so
   it will be left behind after the second UpdateItem.

So the existing code always added a row marker in UpdateItem.

However, there is one case where we should NOT create the row marker:
When the UpdateItem operation only has attribute deletions, and nothing
else, and it is applied to a key with no pre-existing item, DynamoDB
does not create this item. So neither should we.

This patch includes a new test for this test_update_item_non_existent,
which passes on DynamoDB, failed on Alternator before this patch, and
passes after the patch.

Fixes #5862.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200221224221.31237-3-nyh@scylladb.com>
2020-02-24 10:38:10 +01:00
Nadav Har'El
e8cbbba653 alternator: partial implementation of ReturnValues parameter
Before this patch, we only supported the ReturnValues=NONE setting of the
PutItem, UpdateItem and DeleteItem operations.

This patch also adds full support for the ReturnValues=ALL_OLD option
in all three operation. This option directs Alternator to return the full
old (i.e., pre-modification) contents of the item.

We implement this as a RMW (read-modify-write) operation just as we do
other RMW operations - i.e., by default we use LWT, to ensure that we really
return the value of the item directly before the modification, the same
value that would have been used in a conditional expression if there was one.

NOTE: This implementation means one cannot use ReturnValues=ALL_OLD in
forbid_rmw write isolation mode. One may theorize that if we only need the
read-before-write for ReturnValues and not for a conditional expression,
it should have been enough to use a separate read (as we do in unsafe_rmw
isolation mode) before the write. But we don't have this "optimization" yet
and I'm not sure it's a valid optimization at all - see discussion in
a new issue #5851.

This patch completes the ReturnValues support for the PutItem and DeleteItem
operations. However, the third operation, UpdateItem, supports three more
ReturnValues modes: UPDATED_OLD, ALL_NEW and UPDATED_NEW. We do not yet
support those in this patch. If a user tries to use one of these three modes,
an informative error message will be returned. The three tests for these
three unimplemented settings continue to xfail, but the rest of the tests
in test_returnvalues.py (except one test of nested attribute paths) now
pass so their xfail flag is dropped.

Refs #5053

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200219135658.7158-1-nyh@scylladb.com>
2020-02-21 08:32:47 +01:00
Nadav Har'El
405115fa5f alternator: cleanup of get_string_attribute() function
The get_string_attribute() function used attribute_value->GetString()
to return an std::string. But this function does not actually return a
std::string - it returns a char*, which gets implicitly converted to
an std::string by looking for the first null character. This lookup is
unnecessary, because rjson already knows the length of the string, and
we can use it.

This patch is just a cleanup and a very small performance improvement -
I do not expect it fixes any bugs or changes anything functional, because
JSON strings anyway cannot contain verbatim nulls.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200219101159.26717-1-nyh@scylladb.com>
2020-02-19 11:59:54 +01:00
Avi Kivity
6c7aa18238 Merge "Introduce schema::get_partitioner" from Piotr
"
Introduce schema::get_partitioner and use it instead of dht::global_partitioner.

Fixes #5493

Tests: unit(dev, release, debug)
"

* 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits)
  cdc: stop using partitioners
  partitioner_test: stop calling set_global_partitioner
  storage_service: stop calling global_partitioner()
  mutation_writer_test: stop calling global_partitioner()
  schema: reduce number of global_partitioner() calls
  test_services: stop calling global_partitioner()
  sstable_utils: stop calling global_partitioner()
  sstable_resharding_test: stop depending on global partitioner
  sstable_mutation_test: stop calling global_partitioner()
  sstable_data_file_test: stop calling global_partitioner()
  random_schema: stop taking partitioner in constructor
  mutation_reader_test: stop calling global_partitioner()
  multishard_mutation_query_test: stop calling global_partitioner()
  row_level repair: stop calling global_partitioner()
  distribute_reader_and_consume_on_shards: don't take partitioner
  thrift: reduce global_partitioner() calls
  binary_search: stop calling global_partitioner()
  index_entry: stop calling global_partitioner()
  mc writer: stop calling global_partitioner()
  sstable: stop calling global_partitioner()
  ...
2020-02-17 18:12:53 +02:00
Piotr Jastrzebski
2d7532f87f dht: add dht::get_token
and replace all calls to dht::global_partitioner().get_token

dht::get_token is better because it takes schema and uses it
to obtain partitioner instead of using a global partitioner.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:15 +01:00
Piotr Jastrzebski
ca4a89d239 dht: add dht::decorate_key
and replace all dht::global_partitioner().decorate_key
with dht::decorate_key

It is an improvement because dht::decorate_key takes schema
and uses it to obtain partitioner instead of using global
partitioner as it was before.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:59:06 +01:00
Piotr Jastrzebski
abd76e566f dht::shard_of: stop calling global_partitioner()
Take const schema& as a parameter of shard_of and
use it to obtain partitioner instead of calling
global_partitioner().

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-02-17 10:23:16 +01:00
Pavel Solodovnikov
d64fd52ae5 paging_state: switch from shared_ptr to lw_shared_ptr
Change the way `service::pager::paging_state` is passed around
from `shared_ptr` to `lw_shared_ptr`. It's safe since
`paging_state` is final.

Tests: unit(dev, debug)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2020-02-16 17:23:36 +03:00
Nadav Har'El
b01b11c1f3 alternator: implement KeyConditionExpression
This patch adds to Alternator's Query operation full support for the
KeyConditionExpression parameter - a newer syntax for specifying which
partition and which sort-key range are to be queried. The older syntax
for the same thing, "KeyConditions", was already supported by Alternator.

The patch also includes additional test cases for more corner cases
discovered during the development. After this patch, all 47 test cases
in test_key_condition_expression.py pass on Alternator (and, of course,
also on DynamoDB).

One interesting thing to note about this patch is that it does *not*
include a new parser for the KeyConditionExpression syntax. It turns out
that we need - to be fully compatible with DynamoDB - to use the
already existing parser for *ConditionExpression* syntax, and then forbid
certain things not allowed in KeyConditionExpression (you can see a lot
of examples in code comments and in the tests included in this patch).
Most importantly, allowing the full ConditionExpression syntax also
means we allow completely useless parentheses on key conditions, e.g.,
'((p=:p) AND (c=:c))'. While the KeyConditionExpression documentation
doesn't mention allowing these parentheses, DynamoDB does support them -
and it turns out that boto3 uses them when you use its condition builders,
as we do in one test case (test_query_key_condition_expression).

Fixes #5037.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200213192509.32685-4-nyh@scylladb.com>
2020-02-16 11:22:30 +02:00
Nadav Har'El
15515b2cc1 alternator: more useful get_key_from_typed_value() utility function
We had a get_key_from_typed_value() utility function to decode a
JSON-encoded value with a known type (the JSON encoding is a map whose
key is the type, the value always a string because all possible key types -
string, bytes and number, are encoded as strings).

However, the function was less useful than it could have been - it was
missing one check for a malformed object (a check which only appeared in
one of its callers), it unnecessarily received the column's expected type
(all the callers passed it the given key column's type).

The cleaned up function will be more useful for the following patch
to support KeyConditionExpression, which wants to reuse it.

While at it, this patch also uses rjson::to_string_view(it->value)
instead of the less correct it->value.GetString() (the latter relies
on null-termination, which is actually true for JSON strings, but there
is no reason to rely on it).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200213192509.32685-3-nyh@scylladb.com>
2020-02-16 11:22:30 +02:00
Nadav Har'El
65d0a776c2 merge: alternator: Add keyspace per table
This series implements keyspace-per-table approach for Alternator.
The changes are as follows:
 - when a table is created, its keyspace is created first
 - after table deletion, its keyspace is deleted as well;
   works with views too, since these must be deleted
   before the base table is dropped
 - instead of SimpleStrategy, network topology is used

Keyspaces are created with a prefix not legal from CQL - 'a#'.
I validated that even though not reachable via CQL, keyspaces
created with # character work well and produce correct directories,
restarts work flawlessly too.

Fixes #5611
Refs #5596

Tests: alternator(local, remote)

Piotr Sarna (3):
  alternator: switch to keyspace-per-table approach
  alternator: move to NetworkTopologyStrategy
  alternator-test: add test for recreating a table
2020-02-16 11:22:30 +02:00
Piotr Sarna
fa4ddd2947 alternator: add validating write_isolation tag
In order to prevent users from using incorrect write isolation
configuration, a set of allowed values is introduced.
When tagging a resource (which is considered rare), a tag
will only be allowed if it belongs to the allowed set.
2020-02-13 13:51:31 +01:00
Piotr Sarna
7e6c9cad9a alternator: move rmw_operation to a header
rmw_operation is a class with a public interface, including
a write_isolation enum and a fixed tag name for its configuration.
For convenience, it's moved to a header file, so that code
from executor.cc can use the definitions regardless of their
position in the source file - it prevents reordering functions
just to make sure that rmw_operation is defined before a function
that uses its attributes.
2020-02-13 13:51:31 +01:00
Piotr Sarna
dca6c2c81d alternator: move to NetworkTopologyStrategy
Imstead of SimpleStrategy, NetworkTopologyStrategy is used
for setting up the replication configuration for alternator tables.
Replication factor 3 is used along with a local datacenter,
unless alternator discovers that it's running on a test cluster with
less than 3 nodes - then, RF is reduced accordingly and emits a warning,
which was also the case for SimpleStrategy.
2020-02-13 09:46:46 +01:00
Piotr Sarna
3eb6da224b alternator: switch to keyspace-per-table approach
Instead of a monolith alternator keyspace, each table creates its own
keyspace, named in the following pattern: `a#TABLE_NAME`.
The `a#` prefix contains an illegal CQL character in order to ensure
that these keyspaces are never created via CQL.
2020-02-13 09:46:19 +01:00
Piotr Sarna
dcf54331ea alternator: allow custom names for keyspaces
The maybe_create_keyspace utility now accepts a parameter - the desired
name for a newly created keyspace.
2020-02-13 09:16:37 +01:00
Nadav Har'El
b93204d6bf Alternator: allow CreateTable with streams explicitly turned off
While Alternator doesn't yet support creating a table with streams
(i.e., CDC) turned on, we should only failed the creation if streams
were really turned on. If the StreamSpecification option exists, but
does *not* ask to turn on streams, we should not fail the creation -
and this patch fixes this.

This patch also adds two tests - one where StreamSpecification is
passed but does not ask to turn on streams (so table creation should
succeed), and another test which explicitly requests to turn on
streams. The second test still xfails on Alternator, and should continue
to do so until we implement streams (we do *not* want to silently
ignore a request to turn on streams).

Fixes #5796

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200212100546.16337-1-nyh@scylladb.com>
2020-02-12 17:29:02 +01:00
Piotr Sarna
f4e51a96ca alternator: replace overloaded with overloaded_functor
Turns out we already have a utility header for a visitor
with overloaded lambdas. This patch purges the explicit
reimplementation of the same trick and uses the existing
class instead.
Message-Id: <60c0b9a978f8208b188ef6ddc0564cb133bed707.1581496049.git.sarna@scylladb.com>
2020-02-12 14:21:42 +02:00
Gleb Natapov
38fcab3db4 alternator: pass tracing state explicitly instead of relying on it been in the client_state
Multiple requests can use the same client_state simultaneously, so it is
not safe to use it as a container for a tracing state which is per
request. This is not yet an issue for the alternator since it creates
new client_state object for each request, but first of all it should not
and second trace state will be dropped from the client_state, by later
patch.
2020-02-10 14:50:55 +02:00
Piotr Sarna
4a9536b7c1 alternator: add configuring write isolation policy via tags
Until now, write isolation policy was hardcoded to always enforcing LWT.
From now on, setting a tag via UpdateTags request or during table
creation will associate a policy with given table.
The tag key is 'system:write_isolation' and its value can be one of:
 * 'f' - forbid RMW
 * 'a' - always enforce RMW
 * 'o' - only RMW writes will go through LWT
 * 'u' - unsafe RMW (to be deprecated/eradicated)
2020-02-06 10:26:26 +01:00
Piotr Sarna
0479a1bf67 alternator: make _write_isolation a protected attribute
No useful semantic changes yet, but it will help produce better
diffs for future patches.
2020-02-06 10:04:34 +01:00
Piotr Sarna
51c14cb1ce alternator: fix overwriting tags
Tagging a resource with a tag key that already exists should result
in overwriting the old value. It wasn't the case, so it's now fixed
and an appropriate test is added.
2020-02-06 10:04:34 +01:00
Piotr Sarna
ed940f000d alternator: return tags for a table via const reference
The signature of the helper function is changed, so that it's possible
to acquire a const reference of the tags, instead of being forced
to get a copy of the whole map (potentially large).
2020-02-06 10:04:34 +01:00
Nadav Har'El
9fd9ec14c2 storage_proxy: make it into a peering sharded service
We consider globals like service::get_storage_proxy() a bad idea,
and would like to reduce their use as much as possible - and eventually,
eliminate it completely.

One easy case to fix case is when we already have a shard-local proxy,
but now we need the sharded object, to invoke_on() something on it.

In this patch, we turn storage_proxy into a peering_sharded_service.
This means that if you already have a storage_proxy, you can call
its container() function to get the sharded<storage_proxy>, without
needing to call the global service::get_storage_proxy().

We found a few such cases in storage_proxy itself, and in Alternator,
and fixed them to use container() instead of the global function.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2020-02-05 21:14:18 +02:00
Nadav Har'El
95351016fd alternator: use LWT timestamp - in BatchWriteItems too
A previous patch fixed Alternator's writes to use the timestamp provided
by LWT instead of the current timestamp. That patch fixed the PutItem,
DeleteItem and UpdateItem operations - and this patch fixes the remaining
write operation: BatchWriteItems. So,

Fixes #5653.

Unfortunatly, the requirements of both BatchWriteItems and LWT make the
resulting code - and this patch - somewhat inelegant. BatchWriteItems
requires that we prepare all the operations first - failing if any of them
has an error. Before this patch, the result of this preparation was an
array of mutations, which in a second step we wrote to the database.
But we can no longer use mutations for the result of the first step,
because creating a mutation requires knowing the timestamp, which we don't
know during the preparate phase - we will only know it during the later
LWT operation. So now we need to invent a new intermediate format between
the request and the mutation. This intermediate format is further
complicated by the need to be send it between shards (for LWT's shard
forwarding) so it cannot, for example, contain a reference to a schema.
The fact that different sub-operations need to be sent to different shards,
and that different sub-operations may write to different tables, further
complicate the book-keeping and gives us a bunch of funky-typed maps.
But eventually it all fits together.

After this patch, as before this patch, the same code (now called
put_or_delete_item), is used to implement both the PutItem and DeleteItem
stand-alone operation, and the BachWriteItems operation which includes a
whole list of these PutItem and DeleteItem operation.

This patch also includes two more tests in test_batch.py, which test two
more corner tests we haven't tested before: One tests the capability of
BatchWriteItems to write to more than one table. The other tests that
BatchWriteItems can write an empty item (it is not surprising that it does,
but we do have special code for this case, so we should test it).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2020-02-05 21:14:18 +02:00