scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 14:15:46 +00:00

Author	SHA1	Message	Date
Eliran Sinvani	ab7429b77d	cql: Fix crash upon use of the word empty for service level name Wrong access to an uninitialized token instead of the actual generated string caused the parser to crash, this wasn't detected by the ANTLR3 compiler because all the temporary variables defined in the ANTLR3 statements are global in the generated code. This essentialy caused a null dereference. Tests: 1. The fixed issue scenario from github. 2. Unit tests in release mode. Fixes #11774 Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Message-Id: <20190612133151.20609-1-eliransin@scylladb.com> Closes #11777	2022-11-09 15:58:57 +02:00
Botond Dénes	94db2123b9	Update tools/java submodule * tools/java 583261fc0e...caf754f243 (1): > build: remove JavaScript snippets in ant build file	2022-11-09 07:59:04 +02:00
Gleb Natapov' via ScyllaDB development	2100a8f4ca	service: raft: demote configuration change error to warning since it is retried anyway Message-Id: <Y2ohbFtljmd5MNw0@scylladb.com>	2022-11-09 00:09:39 +01:00
Avi Kivity	04ecf4ee18	Update tools/java submodule (cassandra-stress fails with node down) * tools/java 87672be28e...583261fc0e (1): > cassandra-stress: pass all hosts stright to the driver	2022-11-08 14:58:14 +02:00
Botond Dénes	7f69cccbdf	scylla-gdb.py: $downcast_vptr(): add multiple inheritance support When a class inherits from multiple virtual base classes, pointers to instances of this class via one of its base classes, might point to somewhere into the object, not at its beginning. Therefore, the simple method employed currently by $downcast_vptr() of casting the provided pointer to the type extracted from the vtable name fails. Instead when this situation is detected (detectable by observing that the symbol name of the partial vtable is not to an offset of +16, but larger), $downcast_vptr() will iterate over the base classes, adjusting the pointer with their offsets, hoping to find the true start of the object. In the one instance I tested this with, this method worked well. At the very least, the method will now yield a null pointer when it fails, instead of a badly casted object with corrupt content (which the developer might or might not attribute to the bad cast). Closes #11892	2022-11-08 14:51:26 +02:00
Michał Chojnowski	3e0c7a6e9f	test: sstable_datafile_test: eliminate a use of std::regex to prevent stack overflow This usage of std::regex overflows the seastar::thread stack size (128 KiB), causing memory corruption. Fix that. Closes #11911	2022-11-08 14:41:34 +02:00
Botond Dénes	2037d7f9cd	Merge 'doc: add the "ScyllaDB Enterprise" label to highlight the Enterprise-only features' from Anna Stuchlik This PR adds the "ScyllaDB Enterprise" label to highlight the Enterprise-only features on the following pages: - Encryption at Rest - the label indicates that the entire page is about an Enterprise-only feature. - Compaction - the labels indicate the sections that are Enterprise-only. There are more occurrences across the docs that require a similar update. I'll update them in another PR if this PR is approved. Closes #11918 * github.com:scylladb/scylladb: doc: fix the links to resolve the warnings doc: add the Enterprise label on the Compaction page (to a subheading and on a list of strategies) to replace the info box doc: add the Enterprise label to the Encryption at Rest page (the entire page) to replace the info box	2022-11-08 09:53:48 +02:00
Raphael S. Carvalho	a57724e711	Make off-strategy compaction wait for view building completion Prior to off-strategy compaction, streaming / repair would place staging files into main sstable set, and wait for view building completion before they could be selected for regular compaction. The reason for that is that view building relies on table providing a mutation source without data in staging files. Had regular compaction mixed staging data with non-staging one, table would have a hard time providing the required mutation source. After off-strategy compaction, staging files can be compacted in parallel to view building. If off-strategy completes first, it will place the output into the main sstable set. So a parallel view building (on sstables used for off-strategy) may potentially get a mutation source containing staging data from the off-strategy output. That will mislead view builder as it won't be able to detect changes to data in main directory. To fix it, we'll do what we did before. Filter out staging files from compaction, and trigger the operation only after we're done with view building. We're piggybacking on off-strategy timer for still allowing the off-strategy to only run at the end of the node operation, to reduce the amount of compaction rounds on the data introduced by repair / streaming. Fixes #11882. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11919	2022-11-08 08:53:58 +02:00
Botond Dénes	243fcb96f0	Update tools/python3 submodule * tools/python3 bf6e892...773070e (1): > create-relocatable-package: harden against missing files	2022-11-08 08:43:30 +02:00
Avi Kivity	46690bcb32	build: harden create-relocatable-package.py against changes in libthread-db.so name create-relocatable-package.py collects shared libraries used by executables for packaging. It also adds libthread-db.so to make debugging possible. However, the name it uses has changed in glibc, so packaging fails in Fedora 37. Switch to the version-agnostic names, libthread-db.so. This happens to be a symlink, so resolve it. Closes #11917	2022-11-08 08:41:22 +02:00
Takuya ASADA	acc408c976	scylla_setup: fix incorrect type definition on --online-discard option --online-discard option defined as string parameter since it doesn't specify "action=", but has default value in boolean (default=True). It breaks "provisioning in a similar environment" since the code supposed boolean value should be "action='store_true'" but it's not. We should change the type of the option to int, and also specify "choices=[0, 1]" just like --io-setup does. Fixes #11700 Closes #11831	2022-11-08 08:40:44 +02:00
Avi Kivity	3d345609d8	config: disable "mc" format sstables for new data "md" format was introduced in 4.3, in `3530e80ce1`, two years ago. Disable the option to create new sstables with the "mc" format. Closes #11265	2022-11-08 08:36:27 +02:00
Anna Stuchlik	0eaafced9d	doc: fix the links to resolve the warnings	2022-11-07 19:15:21 +01:00
Anna Stuchlik	b57e0cfb7c	doc: add the Enterprise label on the Compaction page (to a subheading and on a list of strategies) to replace the info box	2022-11-07 18:54:35 +01:00
Anna Stuchlik	9f3fcb3fa0	doc: add the Enterprise label to the Encryption at Rest page (the entire page) to replace the info box	2022-11-07 18:48:37 +01:00
Tomasz Grabiec	a9063f9582	Merge 'service/raft: failure detector: ping `raft::server_id`s, not `gms::inet_address`es' from Kamil Braun Whenever a Raft configuration change is performed, `raft::server` calls `raft_rpc::add_server`/`raft_rpc::remove_server`. Our `raft_rpc` implementation has a function, `_on_server_update`, passed in the constructor, which it called in `add_server`/`remove_server`; that function would update the set of endpoints detected by the direct failure detector. `_on_server_update` was passed an IP address and that address was added to / removed from the failure detector set (there's another translation layer between the IP addresses and internal failure detector 'endpoint ID's; but we can ignore it for the purposes of this commit). Therefore: the failure detector was pinging a certain set of IP addresses. These IP addresses were updated during Raft configuration changes. To implement the `is_alive(raft::server_id)` function (required by `raft::failure_detector` interface), we would translate the ID using the Raft address map, which is currently also updated during configuration changes, to an IP address, and check if that IP address is alive according to the direct failure detector (which maintained an `_alive_set` of type `unordered_set<gms::inet_address>`). This all works well but it assumes that servers can be identified using IP addresses - it doesn't play well with the fact that servers may change their IP addresses. The only immutable identifier we have for a server is `raft::server_id`. In the future, Raft configurations will not associate IP addresses with Raft servers; instead we will assume that IP addresses can change at any time, and there will be a different mechanism that eventually updates the Raft address map with the latest IP address for each `raft::server_id`. To prepare us for that future, in this commit we no longer operate in terms of IP addresses in the failure detector, but in terms of `raft::server_id`s. Most of the commit is boilerplate, changing `gms::inet_address` to `raft::server_id` and function/variable names. The interesting changes are: - in `is_alive`, we no longer need to translate the `raft::server_id` to an IP address, because now the stored `_alive_set` already contains `raft::server_id`s instead of `gms::inet_address`es. - the `ping` function now takes a `raft::server_id` instead of `gms::inet_address`. To send the ping message, we need to translate this to IP address; we do it by the `raft_address_map` pointer introduced in an earlier commit. Thus, there is still a point where we have to translate between `raft::server_id` and `gms::inet_address`; but observe we now do it at the last possible moment - just before sending the message. If we have no translation, we consider the `ping` to have failed - it's equivalent to a network failure where no route to a given address was found. Closes #11759 * github.com:scylladb/scylladb: direct_failure_detector: get rid of complex `endpoint_id` translations service/raft: ping `raft::server_id`s, not `gms::inet_address`es service/raft: store `raft_address_map` reference in `direct_fd_pinger` gms: gossiper: move `direct_fd_pinger` out to a separate service gms: gossiper: direct_fd_pinger: extract generation number caching to a separate class	2022-11-07 16:42:35 +01:00
Botond Dénes	2b572d94f5	Merge 'doc: improve the documentation landing page ' from Anna Stuchlik This PR introduces the following changes to the documentation landing page: - The " New to ScyllaDB? Start here!" box is added. - The "Connect your application to Scylla" box is removed. - Some wording has been improved. - "Scylla" has been replaced with "ScyllaDB". Closes #11896 * github.com:scylladb/scylladb: Update docs/index.rst doc: replace Scylla with ScyllaDB on the landing page doc: improve the wording on the landing page doc: add the link to the ScyllaDB Basics page to the documentation landing page	2022-11-07 16:18:59 +02:00
Avi Kivity	91f2cd5ac4	test: lib: exception_predicate: use boost::regex instead of std::regex std::regex was observed to overflow stack on aarch64 in debug mode. Use boost::regex until the libstdc++ bug[1] is fixed. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582 Closes #11888	2022-11-07 14:03:25 +02:00
Kamil Braun	0c7ff0d2cb	docs: a single 5.0 -> 5.1 upgrade guide There were 4 different pages for upgrading Scylla 5.0 to 5.1 (and the same is true for other version pairs, but I digress) for different environments: - "ScyllaDB Image for EC2, GCP, and Azure" - Ubuntu - Debian - RHEL/CentOS THe Ubuntu and Debian pages used a common template: ``` .. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p1.rst .. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p2.rst ``` with different variable substitutions. The "Image" page used a similar template, with some extra content in the middle: ``` .. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p1.rst .. include:: /upgrade/_common/upgrade-image-opensource.rst .. include:: /upgrade/_common/upgrade-guide-v5-ubuntu-and-debian-p2.rst ``` The RHEL/CentOS page used a different template: ``` .. include:: /upgrade/_common/upgrade-guide-v4-rpm.rst ``` This was an unmaintainable mess. Most of the content was "the same" for each of these options. The only content that must actually be different is the part with package installation instructions (e.g. calls to `yum` vs `apt-get`). The rest of the content was logically the same - the differences were mistakes, typos, and updates/fixes to the text that were made in some of these docs but not others. In this commit I prepare a single page that covers the upgrade and rollback procedures for each of these options. The section dependent on the system was implemented using Sphinx Tabs. I also fixed and changed some parts: - In the "Gracefully stop the node" section: Ubuntu/Debian/Images pages had: ```rst .. code:: sh sudo service scylla-server stop ``` RHEL/CentOS pages had: ```rst .. code:: sh .. include:: /rst_include/scylla-commands-stop-index.rst ``` the stop-index file contained this: ```rst .. tabs:: .. group-tab:: Supported OS .. code-block:: shell sudo systemctl stop scylla-server .. group-tab:: Docker .. code-block:: shell docker exec -it some-scylla supervisorctl stop scylla (without stopping some-scylla container) ``` So the RHEL/CentOS version had two tabs: one for Scylla installed directly on the system, one for Scylla running in Docker - which is interesting, because nothing anywhere else in the upgrade documents mentions Docker. Furthermore, the RHEL/CentOS version used `systemctl` while the ubuntu/debian/images version used `service` to stop/start scylla-server. Both work on modern systems. The Docker option is completely out of place - the rest of the upgrade procedure does not mention Docker. So I decided it doesn't make sense to include it. Docker documentation could be added later if we actually decide to write upgrade documentation when using Docker... Between `systemctl` and `service` I went with `service` as it's a bit higher-level. - Similar change for "Start the node" section, and corresponding stop/start sections in the Rollback procedure. - To reuse text for Ubuntu and Debian, when referencing "ScyllaDB deb repo" in the Debian/Ubuntu tabs, I provide two separate links: to Debian and Ubuntu repos. - the link to rollback procedure in the RPM guide (in 'Download and install the new release' section) pointed to rollback procedure from 3.0 to 3.1 guide... Fixed to point to the current page's rollback procedure. - in the rollback procedure steps summary, the RPM version missed the "Restore system tables" step. - in the rollback procedure, the repository links were pointing to the new versions, while they should point to the old versions. There are some other pre-existing problems I noticed that need fixing: - EC2/GCP/Azure option has no corresponding coverage in the rollback section (Download and install the old release) as it has in the upgrade section. There is no guide for rolling back 3rd party and OS packages, only Scylla. I left a TODO in a comment. - the repository links assume certain Debian and Ubuntu versions (Debian 10 and Ubuntu 20), but there are more available options (e.g. Ubuntu 22). Not sure how to deal with this problem. Maybe a separate section with links? Or just a generic link without choice of platform/version? Closes #11891	2022-11-07 14:02:08 +02:00
Avi Kivity	9fa1783892	Merge 'cleanup compaction: flush memtable' from Benny Halevy Flush the memtable before cleaning up the table so not to leave any disowned tokens in the memtable as they might be resurrected if left in the memtable. Fixes #1239 Closes #11902 * github.com:scylladb/scylladb: table: perform_cleanup_compaction: flush memtable table: add perform_cleanup_compaction api: storage_service: add logging for compaction operations et al	2022-11-07 13:18:12 +02:00
Anna Stuchlik	c8455abb71	Update docs/index.rst Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>	2022-11-07 10:25:24 +01:00
AdamStawarz	6bc455ebea	Update tombstones-flush.rst change syntax: nodetool compact <keyspace>.<mytable>; to nodetool compact <keyspace> <mytable>; Closes #11904	2022-11-07 11:19:26 +02:00
Avi Kivity	224a2877b9	build: disable -Og in debug mode to avoid coroutine asan breakage Coroutines and asan don't mix well on aarch64. This was seen in `22f13e7ca3` (" Revert "Merge 'cql3: select_statement: coroutinize indexed_table_select_statement::do_execute_base_query()' from Avi Kivity"") where a routine coroutinization was reverted due to failures on aarch64 debug mode. In clang 15 this is even worse, the existing code starts failing. However, if we disable optimization (-O0 rather than -Og), things begin to work again. In fact we can reinstate the patch reverted above even with clang 12. Fix (or rather workaround) the problem by avoiding -Og on aarch64 debug mode. There's the lingering fear that release mode is miscompiled too, but all the tests pass on clang 15 in release mode so it appears related to asan. Closes #11894	2022-11-07 10:55:13 +02:00
Benny Halevy	eb3a94e2bc	table: perform_cleanup_compaction: flush memtable We don't explicitly cleanup the memtable, while it might hold tokens disowned by the current node. Flush the memtable before performing cleanup compaction to make sure all tokens in the memtable are cleaned up. Note that non-owned ranges are invalidate in the cache in compaction_group::update_main_sstable_list_on_compaction_completion using desc.ranges_for_cache_invalidation. Fixes #1239 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-06 19:41:40 +02:00
Benny Halevy	fc278be6c4	table: add perform_cleanup_compaction Move the integration with compaction_manager from the api layer to the tabel class so it can also make sure the memtable is cleaned up in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-06 19:41:33 +02:00
Benny Halevy	85523c45c0	api: storage_service: add logging for compaction operations et al Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-06 19:41:31 +02:00
Petr Gusev	44f48bea0f	raft: test_remove_node_with_concurrent_ddl The test runs remove_node command with background ddl workload. It was written in an attempt to reproduce scylladb#11228 but seems to have value on its own. The if_exists parameter has been added to the add_table and drop_table functions, since the driver could retry the request sent to a removed node, but that request might have already been completed. Function wait_for_host_known waits until the information about the node reaches the destination node. Since we add new nodes at each iteration in main, this can take some time. A number of abort-related options was added SCYLLA_CMDLINE_OPTIONS as it simplifies nailing down problems. Closes #11734	2022-11-04 17:16:35 +01:00
David Garcia	26bc53771c	docs: automatic previews configuration Closes #11591	2022-11-04 15:44:22 +02:00
Kamil Braun	e086521c1a	direct_failure_detector: get rid of complex `endpoint_id` translations The direct failure detector operates on abstract `endpoint_id`s for pinging. The `pigner` interface is responsible for translating these IDs to 'real' addresses. Earlier we used two types of addresses: IP addresses in 'production' code (`gms::gossiper::direct_fd_pinger`) and `raft::server_id`s in test code (in `randomized_nemesis_test`). For each of these use cases we would maintain mappings between `endpoint_id`s and the address type. In recent commits we switched the 'production' code to also operate on Raft server IDs, which are UUIDs underneath. In this commit we switch `endpoint_id`s from `unsigned` type to `utils::UUID`. Because each use case operates in Raft server IDs, we can perform a simple translation: `raft_id.uuid()` to get an `endpoint_id` from a Raft ID, `raft::server_id{ep_id}` to obtain a Raft ID from an `endpoint_id`. We no longer have to maintain complex sharded data structures to store the mappings.	2022-11-04 09:38:08 +01:00
Kamil Braun	bdeef77f20	service/raft: ping `raft::server_id`s, not `gms::inet_address`es Whenever a Raft configuration change is performed, `raft::server` calls `raft_rpc::add_server`/`raft_rpc::remove_server`. Our `raft_rpc` implementation has a function, `_on_server_update`, passed in the constructor, which it called in `add_server`/`remove_server`; that function would update the set of endpoints detected by the direct failure detector. `_on_server_update` was passed an IP address and that address was added to / removed from the failure detector set (there's another translation layer between the IP addresses and internal failure detector 'endpoint ID's; but we can ignore it for the purposes of this commit). Therefore: the failure detector was pinging a certain set of IP addresses. These IP addresses were updated during Raft configuration changes. To implement the `is_alive(raft::server_id)` function (required by `raft::failure_detector` interface), we would translate the ID using the Raft address map, which is currently also updated during configuration changes, to an IP address, and check if that IP address is alive according to the direct failure detector (which maintained an `_alive_set` of type `unordered_set<gms::inet_address>`). This all works well but it assumes that servers can be identified using IP addresses - it doesn't play well with the fact that servers may change their IP addresses. The only immutable identifier we have for a server is `raft::server_id`. In the future, Raft configurations will not associate IP addresses with Raft servers; instead we will assume that IP addresses can change at any time, and there will be a different mechanism that eventually updates the Raft address map with the latest IP address for each `raft::server_id`. To prepare us for that future, in this commit we no longer operate in terms of IP addresses in the failure detector, but in terms of `raft::server_id`s. Most of the commit is boilerplate, changing `gms::inet_address` to `raft::server_id` and function/variable names. The interesting changes are: - in `is_alive`, we no longer need to translate the `raft::server_id` to an IP address, because now the stored `_alive_set` already contains `raft::server_id`s instead of `gms::inet_address`es. - the `ping` function now takes a `raft::server_id` instead of `gms::inet_address`. To send the ping message, we need to translate this to IP address; we do it by the `raft_address_map` pointer introduced in an earlier commit. Thus, there is still a point where we have to translate between `raft::server_id` and `gms::inet_address`; but observe we now do it at the last possible moment - just before sending the message. If we have no translation, we consider the `ping` to have failed - it's equivalent to a network failure where no route to a given address was found.	2022-11-04 09:38:08 +01:00
Kamil Braun	ac70a05c7e	service/raft: store `raft_address_map` reference in `direct_fd_pinger` The pinger will use the map to translate `raft::server_id`s to `gms::inet_address`es when pinging.	2022-11-04 09:38:08 +01:00
Kamil Braun	2c20f2ab9d	gms: gossiper: move `direct_fd_pinger` out to a separate service In later commit `direct_fd_pinger` will operate in terms of `raft::server_id`s. Decouple it from `gossiper` since we don't want to entangle `gossiper` with Raft-specific stuff.	2022-11-04 09:38:08 +01:00
Kamil Braun	e9a4263e14	gms: gossiper: direct_fd_pinger: extract generation number caching to a separate class `gms::gossiper::direct_fd_pinger` serves multiple purposes: one of them is to maintain a mapping between `gms::inet_address`es and `direct_failure_detector::pinger::endpoint_id`s, another is to cache the last known gossiper's generation number to use it for sending gossip echo messages. The latter is the only gossiper-specific thing in this class. We want to move `direct_fd_pinger` utside `gossiper`. To do that, split the gossiper-specific thing -- the generation number management -- to a smaller class, `echo_pinger`. `echo_pinger` is a top-level class (not a nested one like `direct_fd_pinger` was) so we can forward-declare it and pass references to it without including gms/gossiper.hh header.	2022-11-04 09:38:08 +01:00
Avi Kivity	768d77d31b	Update seastar submodule * seastar f32ed00954...e0dabb361f (12): > sstring: define formatter > file: Dont violate API layering > Add compile_commands.json to gitignore > Merge 'Add an allocation failure metric' from Travis Downs > Use const test objects > Ragel chunk parser: compilation err, unused var > build: do not expose Valgrind in SeastarTargets.cmake > defer: mark deferred_* with [[nodiscard]] > Log selected reactor backend during startup > http: mark str with [[maybe_unused]] > Merge 'reactor: open fd without O_NONBLOCK when using io_uring backend' from Kefu Chai > reactor: add accept and connect to io_uring backend Closes #11895	2022-11-04 09:27:56 +04:00
Anna Stuchlik	fb01565a15	doc: replace Scylla with ScyllaDB on the landing page	2022-11-03 17:42:49 +01:00
Anna Stuchlik	7410ab0132	doc: improve the wording on the landing page	2022-11-03 17:38:14 +01:00
Anna Stuchlik	ab5e48261b	doc: add the link to the ScyllaDB Basics page to the documentation landing page	2022-11-03 17:31:03 +01:00
Pavel Emelyanov	efbfcdb97e	Merge 'Replicate `raft_address_map` non-expiring entries to other shards' from Kamil Braun Replicating `raft_address_map` entries is needed for the following use cases: - the direct failure detector - currently it assumes a static mapping of `raft::server_id`s to `gms::inet_address`es, which is obtained on Raft group 0 configuration changes. To handle dynamic mappings we need to modify the failure detector so it pings `raft::server_id`s and obtains the `gms::inet_address` before sending the message from `raft_address_map`. The failure detector is sharded, so we need the mappings to be available on all shards. - in the future we'll have multiple Raft groups running on different shards. To send messages they'll need `raft_address_map`. Initially I tried to replicate all entries - expiring and non-expiring. The implementation turned out to be very complex - we need to handle dropping expired entries and refreshing expiring entries' timestamps across shards, and doing this correctly while accounting for possible races is quite problematic. Eventually I arrived at the conclusion that replicating only non-expiring entries, and furthermore allowing non-expiring entries to be added only on shard 0, is good enough for our use cases: - The direct failure detector is pinging group 0 members only; group 0 members correspond exactly to the non-expiring entries. - Group 0 configuration changes are handled on shard 0, so non-expiring entries are added/removed on shard 0. - When we have multiple Raft groups, we can reuse a single Raft server ID for all Raft servers running on a single node belonging to different groups; they are 'namespaced' by the group IDs. Furthermore, every node has a server that belongs to group 0. Thus for every Raft server in every group, it has a corresponding server in group 0 with the same ID, which has a non-expiring entry in `raft_address_map`, which is replicated to all shards; so every group will be able to deliver its messages. With these assumptions the implementation is short and simple. We can always complicate it in the future if we find that the assumptions are too strong. Closes #11791 * github.com:scylladb/scylladb: test/raft: raft_address_map_test: add replication test service/raft: raft_address_map: replicate non-expiring entries to other shards service/raft: raft_address_map: assert when entry is missing in drop_expired_entries service/raft: turn raft_address_map into a service	2022-11-03 18:34:42 +03:00
Avi Kivity	ca2010144e	test: loading_cache_test: fix use-after-free in test_loading_cache_remove_leaves_no_old_entries_behind We capture `key` by reference, but it is in a another continuation. Capture it by value, and avoid the default capture specification. Found by clang 15 + asan + aarch64. Closes #11884	2022-11-03 17:23:40 +02:00
Avi Kivity	0c3967cf5e	Merge 'scylla-gdb.py: improve scylla-fiber' from Botond Dénes The main theme of this patchset is improving `scylla-fiber`, with some assorted unrelated improvement tagging along. In lieu of explicit support for mapping up continuation chains in memory from seastar (there is one but it uses function calls), scylla fiber uses a quite crude method to do this: it scans task objects for outbound references to other task objects to find waiters tasks and scans inbound references from other tasks to find waited-on tasks. This works well for most objects, but there are some problematic ones: * `seastar::thread_context`: the waited-on task (`seastar::(anonymous namespace)::thread_wake_task`) is allocated on the thread's stack which is not in the object itself. Scylla fiber now scans the stack bottom-up to find this task. * `seastar::smp_message_queue::async_work_item`: the waited on task lives on another shard. Scylla fiber now digs out the remote shard from the work item and continues the search on the remote shard. * `seastar::when_all_state`: the waited on task is a member in the same object tripping loop detection and terminating the search. Seastar fiber now uses the `_continuation` member explicitely to look for the next links. Other minor improvements were also done, like including the shard of the task in the printout. Example demonstrating all the new additions: ``` (gdb) scylla fiber 0x000060002d650200 Stopping because loop is detected: task 0x000061c00385fb60 was seen before. [shard 28] #-13 (task) 0x000061c00385fba0 0x00000000003b5b00 vtable for seastar::internal::when_all_state_component<seastar::future<void> > + 16 [shard 28] #-12 (task) 0x000061c00385fb60 0x0000000000417010 vtable for seastar::internal::when_all_state<seastar::internal::identity_futures_tuple<seastar::future<void>, seastar::future<void> >, seastar::future<void>, seastar::future<void> > + 16 [shard 28] #-11 (task) 0x000061c009f16420 0x0000000000419830 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEEZNS_6futureISt5tupleIJNS4_IvEES6_EEE14discard_resultEvEUlDpOT_E_ZNS8_14then_impl_nrvoISC_S6_EET0_OT_EUlOS3_RSC_ONS_12future_stateIS7_EEE_S7_EE + 16 [shard 28] #-10 (task) 0x000061c0098e9e00 0x0000000000447440 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>::run_and_dispose()::{lambda(auto:1)#1}, seastar::future<void>::then_wrapped_nrvo<void, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}> >(seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}>&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #-9 (task) 0x000060000858dcd0 0x0000000000449d68 vtable for seastar::smp_message_queue::async_work_item<seastar::sharded<cql_transport::cql_server>::stop()::{lambda(unsigned int)#1}::operator()(unsigned int)::{lambda()#1}> + 16 [shard 0] #-8 (task) 0x0000600050c39f60 0x00000000007abe98 vtable for seastar::parallel_for_each_state + 16 [shard 0] #-7 (task) 0x000060000a59c1c0 0x0000000000449f60 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::sharded<cql_transport::cql_server>::stop()::{lambda(seastar::future<void>)#2}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(seastar::future<void>)#2}>({lambda(seastar::future<void>)#2}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(seastar::future<void>)#2}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #-6 (task) 0x000060000a59c400 0x0000000000449ea0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, cql_transport::controller::do_stop_server()::{lambda(std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > >&)#1}::operator()(std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > >&) const::{lambda()#1}::operator()() const::{lambda()#1}, seastar::future<void>::then_impl_nrvo<{lambda()#1}, {lambda()#1}>({lambda()#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda()#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #-5 (task) 0x0000600009d86cc0 0x0000000000449c00 vtable for seastar::internal::do_with_state<std::tuple<std::unique_ptr<seastar::sharded<cql_transport::cql_server>, std::default_delete<seastar::sharded<cql_transport::cql_server> > > >, seastar::future<void> > + 16 [shard 0] #-4 (task) 0x00006000019ffe20 0x00000000007ab368 vtable for seastar::(anonymous namespace)::thread_wake_task + 16 [shard 0] #-3 (task) 0x00006000085ad080 0x0000000000809e18 vtable for seastar::thread_context + 16 [shard 0] #-2 (task) 0x0000600009c04100 0x00000000006067f8 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEEZNS_5asyncIZZN7service15storage_service5drainEvENKUlRS6_E_clES7_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSC_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSD_DpOSG_EUlvE0_ZNS_6futureIvE14then_impl_nrvoIST_SV_EET0_SQ_EUlOS3_RST_ONS_12future_stateINS1_9monostateEEEE_vEE + 16 [shard 0] #-1 (task) 0x000060000a59c080 0x0000000000606ae8 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEENS_6futureIvE12finally_bodyIZNS_5asyncIZZN7service15storage_service5drainEvENKUlRS9_E_clESA_EUlvE_JEEENS_8futurizeINSt9result_ofIFNSt5decayIT_E4typeEDpNSF_IT0_E4typeEEE4typeEE4typeENS_17thread_attributesEOSG_DpOSJ_EUlvE1_Lb0EEEZNS5_17then_wrapped_nrvoIS5_SX_EENSD_ISG_E4typeEOT0_EUlOS3_RSX_ONS_12future_stateINS1_9monostateEEEE_vEE + 16 [shard 0] #0 (task) 0x000060002d650200 0x0000000000606378 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<service::storage_service::run_with_api_lock<service::storage_service::drain()::{lambda(service::storage_service&)#1}>(seastar::basic_sstring<char, unsigned int, 15u, true>, service::storage_service::drain()::{lambda(service::storage_service&)#1}&&)::{lambda(service::storage_service&)#1}::operator()(service::storage_service&)::{lambda()#1}, false>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(service::storage_service&)#1}>({lambda(service::storage_service&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(service::storage_service&)#1}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #1 (task) 0x000060000bc40540 0x0000000000606d48 _ZTVN7seastar12continuationINS_8internal22promise_base_with_typeIvEENS_6futureIvE12finally_bodyIZNS_3smp9submit_toIZNS_7shardedIN7service15storage_serviceEE9invoke_onIZNSB_17run_with_api_lockIZNSB_5drainEvEUlRSB_E_EEDaNS_13basic_sstringIcjLj15ELb1EEEOT_EUlSF_E_JES5_EET1_jNS_21smp_submit_to_optionsESK_DpOT0_EUlvE_EENS_8futurizeINSt9result_ofIFSJ_vEE4typeEE4typeEjSN_SK_EUlvE_Lb0EEEZNS5_17then_wrapped_nrvoIS5_S10_EENSS_ISJ_E4typeEOT0_EUlOS3_RS10_ONS_12future_stateINS1_9monostateEEEE_vEE + 16 [shard 0] #2 (task) 0x000060000332afc0 0x00000000006cb1c8 vtable for seastar::continuation<seastar::internal::promise_base_with_type<seastar::json::json_return_type>, api::set_storage_service(api::http_context&, seastar::httpd::routes&)::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}::operator()(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >) const::{lambda()#1}, seastar::future<void>::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}, {lambda()#1}<seastar::json::json_return_type> >({lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::json::json_return_type>&&, {lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)#38}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #3 (task) 0x000060000a1af700 0x0000000000812208 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::httpd::function_handler::function_handler(std::function<seastar::future<seastar::json::json_return_type> (std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)> const&)::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}::operator()(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >) const::{lambda(seastar::json::json_return_type&&)#1}, seastar::future<seastar::json::json_return_type>::then_impl_nrvo<seastar::json::json_return_type&&, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > >(seastar::json::json_return_type&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, seastar::json::json_return_type&, seastar::future_state<seastar::json::json_return_type>&&)#1}, seastar::json::json_return_type> + 16 [shard 0] #4 (task) 0x0000600009d86440 0x0000000000812228 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::httpd::function_handler::handle(seastar::basic_sstring<char, unsigned int, 15u, true> const&, std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future>({lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, {lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16 [shard 0] #5 (task) 0x0000600009dba0c0 0x0000000000812f48 vtable for seastar::continuation<seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::handle_exception<std::function<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > (std::__exception_ptr::exception_ptr)>&>(std::function<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > (std::__exception_ptr::exception_ptr)>&)::{lambda(auto:1&&)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_wrapped_nrvo<seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >, {lambda(auto:1&&)#1}>({lambda(auto:1&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&, {lambda(auto:1&&)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16 [shard 0] #6 (task) 0x0000600026783ae0 0x00000000008118b0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<bool>, seastar::httpd::connection::generate_reply(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::future<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}, seastar::httpd::connection::generate_reply(std::unique_ptr<seastar::httpd::request, std::default_delete<seastar::httpd::request> >)::{lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}<bool> >({lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&&)::{lambda(seastar::internal::promise_base_with_type<bool>&&, {lambda(std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> >)#1}&, seastar::future_state<std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > >&&)#1}, std::unique_ptr<seastar::httpd::reply, std::default_delete<seastar::httpd::reply> > > + 16 [shard 0] #7 (task) 0x000060000a4089c0 0x0000000000811790 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::httpd::connection::read_one()::{lambda()#1}::operator()()::{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}::operator()(std::default_delete<std::unique_ptr>) const::{lambda(std::default_delete<std::unique_ptr>)#1}::operator()(std::default_delete<std::unique_ptr>) const::{lambda(bool)#2}, seastar::future<bool>::then_impl_nrvo<{lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}, {lambda(std::default_delete<std::unique_ptr>)#1}<void> >({lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(std::unique_ptr<seastar::httpd::request, std::default_delete<std::unique_ptr> >)#2}&, seastar::future_state<bool>&&)#1}, bool> + 16 [shard 0] #8 (task) 0x000060000a5b16e0 0x0000000000811430 vtable for seastar::internal::do_until_state<seastar::httpd::connection::read()::{lambda()#1}, seastar::httpd::connection::read()::{lambda()#2}> + 16 [shard 0] #9 (task) 0x000060000aec1080 0x00000000008116d0 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::httpd::connection::read()::{lambda(seastar::future<void>)#3}, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, {lambda(seastar::future<void>)#3}>({lambda(seastar::future<void>)#3}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, {lambda(seastar::future<void>)#3}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 [shard 0] #10 (task) 0x000060000b7d2900 0x0000000000811950 vtable for seastar::continuation<seastar::internal::promise_base_with_type<void>, seastar::future<void>::finally_body<seastar::httpd::connection::read()::{lambda()#4}, true>, seastar::future<void>::then_wrapped_nrvo<seastar::future<void>, seastar::httpd::connection::read()::{lambda()#4}>(seastar::httpd::connection::read()::{lambda()#4}&&)::{lambda(seastar::internal::promise_base_with_type<void>&&, seastar::httpd::connection::read()::{lambda()#4}&, seastar::future_state<seastar::internal::monostate>&&)#1}, void> + 16 Found no further pointers to task objects. If you think there should be more, run `scylla fiber 0x000060002d650200 --verbose` to learn more. Note that continuation across user-created seastar::promise<> objects are not detected by scylla-fiber. ``` Closes #11822 * github.com:scylladb/scylladb: scylla-gdb.py: collection_element: add support for boost::intrusive::list scylla-gdb.py: optional_printer: eliminate infinite loop scylla-gdb.py: scylla-fiber: add note about user-instantiated promise objects scylla-gdb.py: scylla-fiber: reject self-references when probing pointers scylla-gdb.py: scylla-fiber: add starting task to known tasks scylla-gdb.py: scylla-fiber: add support for walking over when_all scylla-gdb.py: add when_all_state to task type whitelist scylla-gdb.py: scylla-fiber: also print shard of tasks scylla-gdb.py: scylla-fiber: unify task printing scylla-gdb.py: scylla fiber: add support for walking over shards scylla-gdb.py: scylla fiber: add support for walking over seastar threads scylla-gdb.py: scylla-ptr: keep current thread context scylla-gdb.py: improve scylla column_families scylla-gdb.py: scylla_sstables.filename(): fix generation formatting scylla-gdb.py: improve schema_ptr scylla-gdb.py: scylla memory: restore compatibility with <= 5.1	2022-11-03 13:52:31 +02:00
Kamil Braun	2049962e11	Fix version numbers in upgrade page title Closes #11878	2022-11-03 10:06:25 +02:00
Takuya ASADA	45789004a3	install-dependencies.sh: update node_exporter to 1.4.0 To fix CVE-2022-24675, we need to a binary compiled in <= golang 1.18.1. Only released version which compiled <= golang 1.18.1 is node_exporter 1.4.0, so we need to update to it. See scylladb/scylla-enterprise#2317 Closes #11400 [avi: regenerated frozen toolchain] Closes #11879	2022-11-03 10:15:22 +04:00
Yaron Kaikov	20110bdab4	configure.py: remove un-used tar files creation Starting from https://github.com/scylladb/scylla-pkg/pull/3035 we removed all old tar.gz prefix from uploading to S3 or been used by downstream jobs. Hence, there is no point building those tar.gz files anymore Closes #11865	2022-11-02 17:44:09 +02:00
Anna Stuchlik	d1f7cc99bc	doc: fix the external links to the ScyllaDB University lesson about TTL Closes #11876	2022-11-02 15:05:43 +02:00
Nadav Har'El	59fa8fe903	Merge 'doc: add the information about AArch64 support to Requirements' from Anna Stuchlik Fix https://github.com/scylladb/scylla-doc-issues/issues/864 This PR: - updates the introduction to add information about AArch64 and rewrite the content. - replaces "Scylla" with "ScyllaDB". Closes #11778 * github.com:scylladb/scylladb: Update docs/getting-started/system-requirements.rst doc: fix the link to the OS Support page doc: replace Scylla with ScyllaDB doc: update the info about supported architecture and rewrite the introduction	2022-11-02 11:18:20 +02:00
Anna Stuchlik	ea799ad8fd	Update docs/getting-started/system-requirements.rst Co-authored-by: Tzach Livyatan <tzach.livyatan@gmail.com>	2022-11-02 09:56:56 +01:00
guy9	097a65df9f	adding top banner to the Docs website with a link to the ScyllaDB University fall LIVE event Closes #11873	2022-11-02 10:20:40 +02:00
Nadav Har'El	b9d88a3601	cql/pytest: add reproducer for timestamp column validation issue This patch adds a reproducing test for issue #11588, which is still open so the test is expected to fail on Scylla ("xfail), and passes on Cassandra. The test shows that Scylla allows an out-of-range value to be written to timestamp column, but then it can't be read back. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #11864	2022-11-01 08:11:01 +02:00
Botond Dénes	dc46bfa783	Merge 'Prepare repair for task manager integration' from Aleksandra Martyniuk The PR prepares repair for task manager integration: - Creates repair_module - Keeps repair_module in repair_service - Moves tracker methods to repair_module - Changes UUID to task_id in repair module Closes #11851 * github.com:scylladb/scylladb: repair: check shutdown with abort source in repair module repair: use generic module gate for repair module operations repair: move tracker to repair module repair: move next_repair_command to repair_module repair: generate repair id in repair module repair: keep shard number in repair_uniq_id repair: change UUID to task_id repair: add task_manager::module to repair_service repair: create repair module and task	2022-11-01 08:05:14 +02:00
Aleksandra Martyniuk	f2fe586f03	repair: check shutdown with abort source in repair module In repair module the shutdown can be checked using abort_source. Thus, we can get rid of shutdown flag.	2022-10-31 10:57:29 +01:00

1 2 3 4 5 ...

33671 Commits