* seastar 8b6bc659c7...7a3b4b4e4e (3):
> Merge "Add custom stack size to seastar threads" from Piotr
Ref #5742.
> expiring_fifo: Optimize memory usage for single-element lists
Ref #4235.
> Close connection, when reach to max retransmits
This patch adds a warning of deprecation to DTCS. In a follow up step,
we will start requiring a flag for it to be enabled to make sure users
notice.
For now we'll just be nice and add a warning for the log watchers.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200224164405.9656-1-glauber@scylladb.com>
Since we set 'eth0' as default NIC name, we get following error when running scylla_setup in non-interactive mode without --nic parameter:
$ sudo scylla_setup --setup-nic-and-disks --no-raid-setup --no-verify-package --no-io-setup
NIC eth0 doesn't exist.
It looks strange since user actually does not specified 'eth0', they might forget to specify --nic.
I think we should shows up usage, when eth0 is not available on the system.
Fixes#5828
Changes the name of storage_proxy::mutate_hint_from_scratch function to
another name, whose meaning is more clear: send_hint_to_all_replicas.
Tests: unit(dev)
It seems like *.service is conflicting on install time because the file
installed twice, both debian/*.service and debian/scylla-server.install.
We don't need to use *.install, so we can just drop the line.
Fixes#5640
Aggregate functions on counters do not exist. Until now counters
could, at best, fall back to blob->blob overloads, e.g.:
```
cqlsh> select max(cnt) from ks.tbl;
system.max(cnt)
----------------------
0x000000000000000a
(1 rows)
cqlsh> select sum(entities) from ks.tbl;
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Invalid call to function sum, none of its type signatures match
[...]
```
Meanwhile, counters are compatible with bigints (aka. `long_type'),
so bigint overloads can be used on them (e.g. sum(bigint)->bigint).
This is achieved here by a special rule in overload resolution, which
makes `selector' perceive counters as an `EXACT_MATCH' to counter's
underlying type (`long_type', aka. bigint).
Until now, attempts to print counter update cell would end up
calling abort() because `atomic_cell_view::value()` has no
specialized visitor for `imr::pod<int64_t>::basic_view<is_mutable>`,
i.e. counter update IMR type. Such visitor is not easy to write
if we want to intercept counters only (and not all int64_t values).
Anyway, linearized byte representation of counter cell would not
be helpful without knowing if it consists of counter shards or
counter update (delta) - and this must be known upon `deserialize`.
This commit introduces simple approach: it determines cell type on
high level (from `atomic_cell_view`) and prints counter contents by
`counter_cell_view` or `atomic_cell_view::counter_update_value()`.
Fixes#5616
By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with
the binary's build-id when stripping its debug info as it is passed
the `--build-id-seed <version>.<release>` option.
To prevent that we need to set the following macros as follows:
unset `_unique_build_ids`
set `_no_recompute_build_ids` to 1
Fixes#5881
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The official documentation language of Scylla is English, not French.
So correct the word "existant", which appeared several times throughout
Alternator's tests, to "existent".
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200221224221.31237-6-nyh@scylladb.com>
This patch completes the support for the ReturnValues parameter for
the UpdateItem operation. This parameter has five settings - NONE, ALL_OLD,
ALL_NEW, UPDATED_OLD and UPDATED_NEW. Before this patch we already
supported NONE and ALL_OLD - and this patch completes the support for the
three remaining modes: ALL_NEW, UPDATED_OLD and UPDATED_NEW.
The patch also continues to improve test_returnvalues.py with additional
corner cases discovered during the development. After this patch, only
one xfailing test remains - testing updates to nested document paths,
which we do not yet support (even without the ReturnValues parameter).
After this patch, the support of ReturnValues is complete - for all
operations (UpdateItem, PutItem and DeleteItem) and all of its possible
settings.
Fixes#5053
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200221224221.31237-5-nyh@scylladb.com>
The rjson::set_with_string_name() utility function copies the given
string into the JSON key. The existing implementation required that this
input string be an std::string&, but a std::string_view would be fine too,
and I want to use it in new code to avoid yet another unnecessary copy.
Adding the overloads also exposes a few places where things were
implicitly converted to std::string and now cause an ambiguity - and
clearing up this ambiguity also allowed me to find places where this
conversion was unnecessary.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200221224221.31237-4-nyh@scylladb.com>
UpdateItem operations usually need to add a row marker:
* An empty UpdateItem is supposed to create a new empty item (row).
Such an empty item needs to have a row marker.
* An UpdateItem to add an attribute x and then later an UpdateItem
to remove this attribute x should leave an empty item behind.
This means the first UpdateItem needed to add a row marker, so
it will be left behind after the second UpdateItem.
So the existing code always added a row marker in UpdateItem.
However, there is one case where we should NOT create the row marker:
When the UpdateItem operation only has attribute deletions, and nothing
else, and it is applied to a key with no pre-existing item, DynamoDB
does not create this item. So neither should we.
This patch includes a new test for this test_update_item_non_existent,
which passes on DynamoDB, failed on Alternator before this patch, and
passes after the patch.
Fixes#5862.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200221224221.31237-3-nyh@scylladb.com>
In issue #5698 I raised a theory that we might have a bug when
BatchWriteItem is given two writes to the *same* key but in two different
tables. The test added here verifies that this theory was wrong, and
this case already works correctly.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200221224221.31237-2-nyh@scylladb.com>
This series adds an option to the API that supports deleting
a specific table from a snapshot.
The implementation works in a similar way to the option
to specify specific keyspaces when deleting a snapshot.
The motivation is to allow reducing disk-space when using
the snapshot for backup. A dtest PR is sent to the dtest
repository.
Fixes#5658
Original PR #5805
Tests: (database_test) (dtest snapshot_test.py:TestSnapshot.test_cleaning_snapshot_by_cf)
* amnonh/delete_table_snapshot:
test/boost/database_test: adopt new clear_snapshot signature
api/storage_service: Support specifying a table when deleting a snapshot
storage_service: Add optional table name to clear snapshot
* amnonh/delete_table_snapshot:
test/boost/database_test: adopt new clear_snapshot signature
api/storage_service: Support specifying a table when deleting a snapshot
storage_service: Add optional table name to clear snapshot
The error message (silently) changed to "DB index is out of range" the
following commit:
c7a4e694ad
The new error message is part of Redis 4.0, released in 2017, so let's
switch Scylla to use the new one.
Message-Id: <20200211133946.746-1-penberg@scylladb.com>
The original idea of prefixing alternator keyspace names with 'a#'
leveraged the fact that '#' is not a legal CQL character for keyspace
names. The idea is flawed though, since '#' proved to confuse
existing Scylla tools (e.g. nodetool).
Thus, the prefix is changed to more orthodox 'alternator_'.
It is possible to create such keyspaces with CQL as well, but then
the alternator CreateTable request would simply fail, because
the keyspace already exists, which is graceful enough.
Hiding alternator keyspaces and tables from CQL is another issue,
but there are other ways to distinguish them than a non-standard
prefix, e.g. tags.
Fixes#5883
The set_config registers lambdas that need db.local(), so
these routes must be registered after database is started.
Fixes: #5849
Tests: unit(dev), manual wget on API
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200219130654.24259-1-xemul@scylladb.com>
On Debian, we don't add xfsprogs/mdadm on package dependency, install on
scylla_raid_setup script instead.
Since xfsprogs/mdadm only needed for constructing RAID, we can move
dependencies to scylla_raid_setup too.
In order to decrease the developer's time spent on waiting
for boto3 to retry the request many times, the retry count
is configured to be 3.
Two major benefits:
- vastly decrease wait time when debugging a failing test
- for requests which are expected to fail, but return results
not compatible with boto3, execution time is decreased
Tests: alternator-test(local,remote)
Message-Id: <46a3a9344d9427df7ea55c855f32b8f0e39c9b79.1582285070.git.sarna@scylladb.com>
The nr_ranges_streamed denotes the number of ranges streamed
so far, but by the time the sending lambda is called this
counter is already incremented by the number of ranges to be
streamed in this call. And the variable is not used for
anything else but logging.
Fix this by swapping logging with incrementing.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200221101601.18779-1-xemul@scylladb.com>
This patch adds an alternative way to locate sstables by looking at
sstable sets in table objects:
scylla sstables -t
This may be useful for several things. One is to identify sstables
which are not attached to tables.
Another use case is to be able to use the command on older versions of
scylla which don't have sstable tracking.
Message-Id: <1582308099-24563-1-git-send-email-tgrabiec@scylladb.com>
I neither is used, we get the default behavior: only release is built
without stack guards.
With --disable-stack-guards all modes are built without stack guards.
With --enable-stack-guards all modes are built with stack guards.
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200222012732.992380-1-espindola@scylladb.com>
The variable in question was used to check that the bootstrap mode
finishes correctly, but it was removed, becase this check was for
self-evident code and thus useless (dbca327b)
Later, the patch was reverted to keep track the bootstrap mode for
API is_cleanup_allowed call (a39c8d0e)
This patch is a reworked combination of both -- the variable is
kept for API sake, but in a much simpler manner.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200221101813.18945-1-xemul@scylladb.com>
The _scheduled_gossip_task timer needs token_metadata and thus should
be stopped before. However, this is not always the case.
The timer is armed in start_gossiping, which is called by storage_service
init_server_without_the_messaging_service_part, and is canceled inside
stop_gossiping, which in turn is called by drain_on_shutdown, which in
turn is registered too late.
If something fails between the internals of the init_server_... and
defered registration of drain_on_shutdown (lots of reasons) the timer is
not stopped and may run, thus accessing the freed token_metadata.
Bandaid this by scheduling stop_gossiping right after the gossiper
instances are created. This can be too early (before storage_service
starts gossiping) or too late (after drain_on_shutdown stops it), but
this function is re-entrable.
Fixes#5844
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200221085226.16494-1-xemul@scylladb.com>
* seastar 2b510220...cdda3051 (10):
> core: discard unused variable / function
> pollable_fd: use boost::intrusive_ptr rather than std::unique_ptr for lifecycle management
> build: check for pthread_setname_np()
> build: link against Threads::Threads
> future: Avoid recursion in do_for_each
> future: Expand description of parallel_for_each
> merge: Add content length limit to httpd
> tests/scheduling_group_test: verify current scheduling group is inherited as expected
> net: return future<> instead of subscription<>
> cmake: be more verbose when looking for libraries
Before this patch, we only supported the ReturnValues=NONE setting of the
PutItem, UpdateItem and DeleteItem operations.
This patch also adds full support for the ReturnValues=ALL_OLD option
in all three operation. This option directs Alternator to return the full
old (i.e., pre-modification) contents of the item.
We implement this as a RMW (read-modify-write) operation just as we do
other RMW operations - i.e., by default we use LWT, to ensure that we really
return the value of the item directly before the modification, the same
value that would have been used in a conditional expression if there was one.
NOTE: This implementation means one cannot use ReturnValues=ALL_OLD in
forbid_rmw write isolation mode. One may theorize that if we only need the
read-before-write for ReturnValues and not for a conditional expression,
it should have been enough to use a separate read (as we do in unsafe_rmw
isolation mode) before the write. But we don't have this "optimization" yet
and I'm not sure it's a valid optimization at all - see discussion in
a new issue #5851.
This patch completes the ReturnValues support for the PutItem and DeleteItem
operations. However, the third operation, UpdateItem, supports three more
ReturnValues modes: UPDATED_OLD, ALL_NEW and UPDATED_NEW. We do not yet
support those in this patch. If a user tries to use one of these three modes,
an informative error message will be returned. The three tests for these
three unimplemented settings continue to xfail, but the rest of the tests
in test_returnvalues.py (except one test of nested attribute paths) now
pass so their xfail flag is dropped.
Refs #5053
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200219135658.7158-1-nyh@scylladb.com>
Row cache needs to be invalidated whenever data in sstables
changes. Cleanup removes data from sstables which doesn't belong to
the node anymore, which means cache must be invalidated on cleanup.
Currently, stale data can be returned when a node re-owns ranges which
data are still stored in the node's row cache, because cleanup didn't
invalidate the cache."
Fixes#4446.
tests:
- unit tests (dev mode)
- dtests:
update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test
cleanup_test.py
* Pass raw::select_statement::parameters as lw_shared_ptr
* Some more const cleanups here and there
* lists,maps,sets::equals now accept const-ref to *_type_impl
instead of shared_ptr
* Remove unused `get_column_for_condition` from modification_statement.hh
* More methods now accept const-refs instead of shared_ptr
Every call site where a shared_ptr was required as an argument
has been inspected to be sure that no dangling references are
possible.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20200220153204.279940-1-pa.solodovnikov@scylladb.com>
Due to a bug the entire segment is written in one huge write of 32Mb.
The idea was to split it to writes of 128K, so fix it.
Fixes#5857
Message-Id: <20200220102939.30769-1-gleb@scylladb.com>
There may be other commitlog writes waiting for zeroing to complete, so
not using proper scheduling class causes priority inversion.
Fixes#5858.
Message-Id: <20200220102939.30769-2-gleb@scylladb.com>
Row cache needs to be invalidated whenever data in sstables changes. Cleanup removes
data from sstables which doesn't belong to the node anymore, which means cache must
be invalidated on cleanup.
Currently, stale data can be returned when a node re-owns ranges which data are still
stored in the node's row cache, because cleanup didn't invalidate the cache.
To prevent data that belongs to the node from being purged from the row cache, cleanup
will only invalidate the cache with a set of token ranges that will not overlap with
any of ranges owned by the node.
update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_decommission_node_2_test
now passes.
Fixes#4446.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This procedure will calculate ranges for cache invalidation by subtracting
all owned ranges from the sstables' partition ranges. That's done so as
to reduce the size of invalidated ranges.
Refs #4446.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
In order to avoid stack overflow issues represented by the attached
test case, rapidjson's parser now has a limit of nested level.
Previous iterations of this patch used iterative parsing
provided by rapidjson, but that solution has two main flaws:
1. While parsing can be done iteratively, printing the document
is based on a recursive algorithm, which makes the iteratively
parsed JSON still prone to stack overflow on reads.
Documents with depth 35k were already prone to that.
2. Even if reading the document would have been performed iteratively,
its destruction is stack-based as well - the chain of C++ destructors
is called. This error is sneaky, because it only shows with depths
around 100k with my local configuration, but it's just as dangerous.
Long story short, capping the depth of the object to an arguably large
value (39) was introduced to prevent stack overflows. Real life
objects are expected to rarely have depth of 10, so 39 sounds like
a safe value both for the clients and for the stack.
DynamoDB has a nesting limit of 32.
Fixes#5842
Tests: alternator-test(local,remote)
Message-Id: <b083bacf9df091cc97e4a9569aad415cf6560daa.1582194420.git.sarna@scylladb.com>
This patch causes inclusive and exclusive range deletes to be
distinguished in cdc log. Previously, operations `range_delete_start`
and `range_delete_end` were used for both inclusive and exclusive bounds
in range deletes. Now, old operations were renamed to
`range_delete_*_inclusive`, and for exclusive deletes, new operations
`range_delete_*_exclusive` are used.
Tests: unit(dev)
User reported an issue that after a node restart, the restarted node
is marked as DOWN by other nodes in the cluster while the node is up
and running normally.
Consier the following:
- n1, n2, n3 in the cluster
- n3 shutdown itself
- n3 send shutdown verb to n1 and n2
- n1 and n2 set n3 in SHUTDOWN status and force the heartbeat version to
INT_MAX
- n3 restarts
- n3 sends gossip shadow rounds to n1 and n2, in
storage_service::prepare_to_join,
- n3 receives response from n1, in gossiper::handle_ack_msg, since
_enabled = false and _in_shadow_round == false, n3 will apply the
application state in fiber1, filber 1 finishes faster filber 2, it
sets _in_shadow_round = false
- n3 receives response from n2, in gossiper::handle_ack_msg, since
_enabled = false and _in_shadow_round == false, n3 will apply the
application state in fiber2, filber 2 yields
- n3 finishes the shadow round and continues
- n3 resets gossip endpoint_state_map with
gossiper.reset_endpoint_state_map()
- n3 resumes fiber 2, apply application state about n3 into
endpoint_state_map, at this point endpoint_state_map contains
information including n3 itself from n2.
- n3 calls gossiper.start_gossiping(generation_number, app_states, ...)
with new generation number generated correctly in
storage_service::prepare_to_join, but in
maybe_initialize_local_state(generation_nbr), it will not set new
generation and heartbeat if the endpoint_state_map contains itself
- n3 continues with the old generation and heartbeat learned in fiber 2
- n3 continues the gossip loop, in gossiper::run,
hbs.update_heart_beat() the heartbeat is set to the number starting
from 0.
- n1 and n2 will not get update from n3 because they use the same
generation number but n1 and n2 has larger heartbeat version
- n1 and n2 will mark n3 as down even if n3 is alive.
To fix, always use the the new generation number.
Fixes: #5800
Backports: 3.0 3.1 3.2
Previously we required MODIFY permissions on all materialized views in
order to modify a table. This is wrong, because the views should be
synced to the table unconditionally. For the same reason,
users *shouldn't* be granted MODIFY on views, to prevent them manually
changing (and breaking) a view.
This patch removes an explicit permissions check in
modification_statement introduced by 65535b3. It also tests that a
user can indeed modify a table they are allowed to modify, regardless
of lacking permissions on the table's views and indices.
Fixes#5205.
Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
"
This series ensures the server more often than not initializes
raw_cql_statement, a variable responsible for holding the original
CQL query, and adds logging events to all places executing CQL,
and logs CQL text in them.
A prepared statement object is the third incarnation of
parser output in Scylla:
- first, we create a parsed_statement descendent.
This has ~20 call sites inside Cql.g
- then, we create a cql_statement descendent, at ~another 20 call sites
- finally, in ~5 call sites we create a prepared statement object,
wrapping cql_statement. Sometimes we use cql_statement object
without a prepared statement object (e.g. BATCHes).
Ideally we'd want to capture the CQL text right in the parser, but
due to complicated transformations above that would require
patching dozens of call sites.
This series moves raw_cql_statement from class prepared_statement
to its nested object, cql_statement, batches, and initializes this
variable in all major call sites. View prepared statements and
some internal DDL statements still skip setting it.
"
* 'query_processor_trace_cql_v2' of https://github.com/kostja/scylla:
query_processor: add CQL logging to all major execute call sites.
query_procesor: move raw_cql_statement to cql_statement
query_processor: set raw_cql_statement consistently